[jira] [Commented] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785380#comment-16785380
 ] 

Hive QA commented on HIVE-21376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961205/HIVE-21376.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15818 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16355/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16355/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16355/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961205 - PreCommit-HIVE-Build

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.01.patch, HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208479
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 07:13
Start Date: 06/Mar/19 07:13
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262814921
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -298,25 +298,26 @@ a database ( directory )
* @throws IOException File operations failure.
* @throws InvalidInputException Invalid input dump directory.
*/
-  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
-Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+  private void cleanTablesFromBootstrap() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToCleanTables)
 .addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
 FileSystem fs = bootstrapDirectory.getFileSystem(conf);
 
 if (!fs.exists(bootstrapDirectory)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is invalid: "
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208479)
Time Spent: 5.5h  (was: 5h 20m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208480
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 07:15
Start Date: 06/Mar/19 07:15
Worklog Time Spent: 10m 
  Work Description: sankarh commented on pull request #551: HIVE-21286: 
Hive should support clean-up of previously bootstrapped tables when retry from 
different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262815016
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -298,25 +298,26 @@ a database ( directory )
* @throws IOException File operations failure.
* @throws InvalidInputException Invalid input dump directory.
*/
-  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
-Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+  private void cleanTablesFromBootstrap() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToCleanTables)
 .addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
 FileSystem fs = bootstrapDirectory.getFileSystem(conf);
 
 if (!fs.exists(bootstrapDirectory)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is invalid: "
   + bootstrapDirectory);
 }
 
 FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
 if ((fileStatuses == null) || (fileStatuses.length == 0)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is empty: "
   + bootstrapDirectory);
 }
 
 if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
-  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
-  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables has multiple"
+  + " DB dirs in the dump: " + bootstrapDirectory
+  + " which is not allowed on single target DB: " + 
work.dbNameToLoadIn);
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208480)
Time Spent: 5h 40m  (was: 5.5h)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785287#comment-16785287
 ] 

Hive QA commented on HIVE-21376:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
53s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
50s{color} | {color:blue} serde in master has 197 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
38s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
18s{color} | {color:red} serde: The patch generated 2 new + 264 unchanged - 0 
fixed = 266 total (was 264) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16355/dev-support/hive-personality.sh
 |
| git revision | master / 9dc28db |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16355/yetus/diff-checkstyle-serde.txt
 |
| modules | C: serde ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16355/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.01.patch, HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:

[jira] [Commented] (HIVE-21182) Skip setting up hive scratch dir during planning

2019-03-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785283#comment-16785283
 ] 

Ashutosh Chauhan commented on HIVE-21182:
-

+1

> Skip setting up hive scratch dir during planning
> 
>
> Key: HIVE-21182
> URL: https://issues.apache.org/jira/browse/HIVE-21182
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21182.1.patch, HIVE-21182.2.patch, 
> HIVE-21182.3.patch
>
>
> During metadata gathering phase hive creates staging/scratch dir which is 
> further used by FS op (FS op sets up staging dir within this dir for tasks to 
> write to).
> Since FS op do mkdirs to setup staging dir we can skip creating scratch dir 
> during metadata gathering phase. FS op will take care of setting up all the 
> dirs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20546) Upgrade to Apache Druid 0.13.0-incubating

2019-03-05 Thread Nishant Bangarwa (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa updated HIVE-20546:

Attachment: HIVE-20546.7.patch

> Upgrade to Apache Druid 0.13.0-incubating
> -
>
> Key: HIVE-20546
> URL: https://issues.apache.org/jira/browse/HIVE-20546
> Project: Hive
>  Issue Type: Task
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
> Attachments: HIVE-20546.1.patch, HIVE-20546.2.patch, 
> HIVE-20546.3.patch, HIVE-20546.4.patch, HIVE-20546.5.patch, 
> HIVE-20546.6.patch, HIVE-20546.7.patch, HIVE-20546.patch
>
>
> This task is to upgrade to druid 0.13.0 when it is released. Note that it 
> will hopefully be first apache release for Druid. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20546) Upgrade to Apache Druid 0.13.0-incubating

2019-03-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785266#comment-16785266
 ] 

Ashutosh Chauhan commented on HIVE-20546:
-

+1

> Upgrade to Apache Druid 0.13.0-incubating
> -
>
> Key: HIVE-20546
> URL: https://issues.apache.org/jira/browse/HIVE-20546
> Project: Hive
>  Issue Type: Task
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>Priority: Major
> Attachments: HIVE-20546.1.patch, HIVE-20546.2.patch, 
> HIVE-20546.3.patch, HIVE-20546.4.patch, HIVE-20546.5.patch, 
> HIVE-20546.6.patch, HIVE-20546.patch
>
>
> This task is to upgrade to druid 0.13.0 when it is released. Note that it 
> will hopefully be first apache release for Druid. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208449
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:13
Start Date: 06/Mar/19 06:13
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262800301
 
 

 ##
 File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosExternalTables.java
 ##
 @@ -573,23 +577,32 @@ public void 
retryBootstrapExternalTablesFromDifferentDump() throws Throwable {
 .run("create table t5 as select * from t4")
 .dump(primaryDbName, 
tupleBootstrapWithoutExternal.lastReplicationId, dumpWithClause);
 
-// Verify if bootstrapping with same dump is idempotent and return same 
result
-for (int i = 0; i < 2; i++) {
-  replica.load(replicatedDbName, 
tupleIncWithExternalBootstrap.dumpLocation, loadWithClause)
-  .status(replicatedDbName)
-  .verifyResult(tupleIncWithExternalBootstrap.lastReplicationId)
-  .run("use " + replicatedDbName)
-  .run("show tables like 't1'")
-  .verifyFailure(new String[]{"t1"})
-  .run("select place from t2 where country = 'us'")
-  .verifyResult("austin")
-  .run("select id from t4")
-  .verifyResult("10")
-  .run("select id from t5")
-  .verifyResult("10");
+// Fail setting ckpt property for table t4 but success for t2.
+BehaviourInjection callerVerifier
+= new BehaviourInjection() {
+  @Nullable
+  @Override
+  public Boolean apply(@Nullable CallerArguments args) {
+if (args.tblName.equalsIgnoreCase("t4") && 
args.dbName.equalsIgnoreCase(replicatedDbName)) {
+  injectionPathCalled = true;
+  LOG.warn("Verifier - DB : " + args.dbName + " TABLE : " + 
args.tblName);
+  return false;
+}
+return true;
+  }
+};
+
+// Fail repl load before the ckpt property is set for t4 and after it is 
set for t2.
+// In the retry, these half baked tables should be dropped and bootstrap 
should ve successful.
 
 Review comment:
   "should ve" - typo, should be "should be"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208449)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785268#comment-16785268
 ] 

Hive QA commented on HIVE-21286:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961194/HIVE-21286.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15818 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16354/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16354/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16354/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961194 - PreCommit-HIVE-Build

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21385) Allow disabling pushdown of non-splittable computation to JDBC sources

2019-03-05 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785264#comment-16785264
 ] 

Daniel Dai commented on HIVE-21385:
---

I think we do want to disable join as it will explode data exponentially. 
However, aggregation/sort/union might be ok. How do you think?

> Allow disabling pushdown of non-splittable computation to JDBC sources
> --
>
> Key: HIVE-21385
> URL: https://issues.apache.org/jira/browse/HIVE-21385
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, StorageHandler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21385.01.patch, HIVE-21385.patch
>
>
> Until pushdown is cost-based decision, we will be able to enable / disable 
> pushdown of operators that prevent reading results from the JDBC connection 
> in parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20616) Dynamic Partition Insert failed if PART_VALUE exceeds 4000 chars

2019-03-05 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785262#comment-16785262
 ] 

Daniel Dai commented on HIVE-20616:
---

PARAM_VALUE is already expanded in HIVE-20221. Do you want to add this change 
to a particular upgrade path?

> Dynamic Partition Insert failed if PART_VALUE exceeds 4000 chars
> 
>
> Key: HIVE-20616
> URL: https://issues.apache.org/jira/browse/HIVE-20616
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-20616.patch
>
>
> with mysql as metastore db the PARTITION_PARAMS.PARAM_VALUE defined as 
> varchar(4000)
> {code}
> describe PARTITION_PARAMS; 
> +-+---+--+-+-+---+ 
> | Field | Type | Null | Key | Default | Extra | 
> +-+---+--+-+-+---+ 
> | PART_ID | bigint(20) | NO | PRI | NULL | | 
> | PARAM_KEY | varchar(256) | NO | PRI | NULL | | 
> | PARAM_VALUE | varchar(4000) | YES | | NULL | | 
> +-+---+--+-+-+---+ 
> {code}
> which lead to the MoveTask failure if PART_VALUE excceeds 4000 chars.
> {code}
> org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO 
> `PARTITION_PARAMS` (`PARAM_VALUE`,`PART_ID`,`PARAM_KEY`) VALUES (?,?,?)
>  at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1074)
>  at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.putAll(JoinMapStore.java:224)
>  at 
> org.datanucleus.store.rdbms.mapping.java.MapMapping.postInsert(MapMapping.java:158)
>  at 
> org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:522)
>  at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertObjectInTable(RDBMSPersistenceHandler.java:162)
>  at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertObject(RDBMSPersistenceHandler.java:138)
>  at 
> org.datanucleus.state.StateManagerImpl.internalMakePersistent(StateManagerImpl.java:3363)
>  at 
> org.datanucleus.state.StateManagerImpl.makePersistent(StateManagerImpl.java:3339)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2080)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
>  at 
> org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.addPartition(ObjectStore.java:2442)
>  at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>  at com.sun.proxy.$Proxy32.addPartition(Unknown Source)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partition_core(HiveMetaStore.java:3976)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partition_with_environment_context(HiveMetaStore.java:4032)
>  at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  at com.sun.proxy.$Proxy34.add_partition_with_environment_context(Unknown 
> Source)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partition_with_environment_context.getResult(ThriftHiveMetastore.java:15528)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partition_with_environment_context.getResult(ThriftHiveMetastore.java:15512)
>  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>  at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
>  at 
> 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208450
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:13
Start Date: 06/Mar/19 06:13
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262801475
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -298,25 +298,26 @@ a database ( directory )
* @throws IOException File operations failure.
* @throws InvalidInputException Invalid input dump directory.
*/
-  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
-Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+  private void cleanTablesFromBootstrap() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToCleanTables)
 .addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
 FileSystem fs = bootstrapDirectory.getFileSystem(conf);
 
 if (!fs.exists(bootstrapDirectory)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is invalid: "
   + bootstrapDirectory);
 }
 
 FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
 if ((fileStatuses == null) || (fileStatuses.length == 0)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is empty: "
   + bootstrapDirectory);
 }
 
 if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
-  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
-  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables has multiple"
+  + " DB dirs in the dump: " + bootstrapDirectory
+  + " which is not allowed on single target DB: " + 
work.dbNameToLoadIn);
 
 Review comment:
   Similar grammar suggestion.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208450)
Time Spent: 5h  (was: 4h 50m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208451
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:13
Start Date: 06/Mar/19 06:13
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262801366
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -298,25 +298,26 @@ a database ( directory )
* @throws IOException File operations failure.
* @throws InvalidInputException Invalid input dump directory.
*/
-  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
-Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+  private void cleanTablesFromBootstrap() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToCleanTables)
 .addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
 FileSystem fs = bootstrapDirectory.getFileSystem(conf);
 
 if (!fs.exists(bootstrapDirectory)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is invalid: "
   + bootstrapDirectory);
 }
 
 FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
 if ((fileStatuses == null) || (fileStatuses.length == 0)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is empty: "
 
 Review comment:
   Grammar suggestion similar to above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208451)
Time Spent: 5h 10m  (was: 5h)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208452
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:13
Start Date: 06/Mar/19 06:13
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262803005
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -279,6 +292,72 @@ a database ( directory )
 return 0;
   }
 
+  /**
+   * Cleanup/drop tables from the given database which are bootstrapped by 
input dump dir.
+   * @throws HiveException Failed to drop the tables.
+   * @throws IOException File operations failure.
+   * @throws InvalidInputException Invalid input dump directory.
+   */
+  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+.addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
+FileSystem fs = bootstrapDirectory.getFileSystem(conf);
+
+if (!fs.exists(bootstrapDirectory)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  + bootstrapDirectory);
+}
+
+FileStatus[] fileStatuses = fs.listStatus(bootstrapDirectory, 
EximUtil.getDirectoryFilter(fs));
+if ((fileStatuses == null) || (fileStatuses.length == 0)) {
+  throw new InvalidInputException("Input bootstrap dump directory to 
rollback is empty: "
+  + bootstrapDirectory);
+}
+
+if (StringUtils.isNotBlank(work.dbNameToLoadIn) && (fileStatuses.length > 
1)) {
+  throw new InvalidInputException("Multiple DB dirs in the dump: " + 
bootstrapDirectory
+  + " is not allowed to load to single target DB: " + 
work.dbNameToLoadIn);
+}
+
+for (FileStatus dbDir : fileStatuses) {
+  Path dbLevelPath = dbDir.getPath();
+  String dbNameInDump = dbLevelPath.getName();
+
+  List tableNames = new ArrayList<>();
+  RemoteIterator filesIterator = 
fs.listFiles(dbLevelPath, true);
+  while (filesIterator.hasNext()) {
+Path nextFile = filesIterator.next().getPath();
+String filePath = nextFile.toString();
+if (filePath.endsWith(EximUtil.METADATA_NAME)) {
+  // Remove dbLevelPath from the current path to check if this 
_metadata file is under DB or
+  // table level directory.
+  String replacedString = filePath.replace(dbLevelPath.toString(), "");
+  if (!replacedString.equalsIgnoreCase(EximUtil.METADATA_NAME)) {
+tableNames.add(nextFile.getParent().getName());
 
 Review comment:
   Well, if this code is the only duplicate copy, it makes sense to do this 
refactoring while we are adding the duplicate code. That avoids duplicate code 
in the first place.
   
   We might not change the format of the dump directory but we might add 
_metadata file to other object specific directories. This will mean that the 
code above will mistake objects other than table to be tables and try dropping 
those.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208452)
Time Spent: 5h 20m  (was: 5h 10m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 

[jira] [Work logged] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?focusedWorklogId=208448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208448
 ]

ASF GitHub Bot logged work on HIVE-21286:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:13
Start Date: 06/Mar/19 06:13
Worklog Time Spent: 10m 
  Work Description: ashutosh-bapat commented on pull request #551: 
HIVE-21286: Hive should support clean-up of previously bootstrapped tables when 
retry from different dump.
URL: https://github.com/apache/hive/pull/551#discussion_r262801299
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java
 ##
 @@ -298,25 +298,26 @@ a database ( directory )
* @throws IOException File operations failure.
* @throws InvalidInputException Invalid input dump directory.
*/
-  private void bootstrapRollbackTask() throws HiveException, IOException, 
InvalidInputException {
-Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToRollback)
+  private void cleanTablesFromBootstrap() throws HiveException, IOException, 
InvalidInputException {
+Path bootstrapDirectory = new PathBuilder(work.bootstrapDumpToCleanTables)
 .addDescendant(ReplUtils.INC_BOOTSTRAP_ROOT_DIR_NAME).build();
 FileSystem fs = bootstrapDirectory.getFileSystem(conf);
 
 if (!fs.exists(bootstrapDirectory)) {
-  throw new InvalidInputException("Input bootstrap dump directory to 
rollback doesn't exist: "
+  throw new InvalidInputException("Input bootstrap dump directory to clean 
tables is invalid: "
 
 Review comment:
   Grammar suggestion "Bootstrap dump directory specified to clean tables from 
is invalid"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208448)
Time Spent: 4h 50m  (was: 4h 40m)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208437
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:04
Start Date: 06/Mar/19 06:04
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262800104
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
 ##
 @@ -1925,6 +1926,11 @@ public RelNode apply(RelOptCluster cluster, 
RelOptSchema relOptSchema, SchemaPlu
 perfLogger.PerfLogEnd(this.getClass().getName(), PerfLogger.OPTIMIZER, 
"Calcite: Window fixing rule");
   }
 
+  perfLogger.PerfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
 
 Review comment:
   Yes, that is what I meant. Maybe someone will pick it up :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208437)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch, HIVE-21338.4.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>  

[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208435
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 06:00
Start Date: 06/Mar/19 06:00
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262800893
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortLimitRemoveRule.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit;
+
+/**
+ * Planner rule that removes
+ * a {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit}.
+ * Note that this is different from HiveSortRemoveRule because this is not 
based on statistics
+ */
+public class HiveSortLimitRemoveRule extends RelOptRule {
+
+  //~ Constructors ---
+
+  public HiveSortLimitRemoveRule() {
+this(operand(HiveSortLimit.class, any()));
+  }
+
+  private HiveSortLimitRemoveRule(RelOptRuleOperand operand) {
+super(operand);
+  }
+
+  //~ Methods 
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+final HiveSortLimit sortLimit = call.rel(0);
+
+return HiveRelOptUtil.produceAtmostOneRow(sortLimit.getInput());
 
 Review comment:
   The rules should be portable between planners, i.e., a rule should not be 
unwrapping a HepRelVertex. If you want to remain generic, a workaround is 
matching RelNode. However, I would try to make the rule matchers as restrictive 
as possible, since the more information we expose to the planner, the smarter 
decisions it can make, e.g., avoiding unnecessary triggering of rules.
   You do not need recursion. When the rule is executed, we should not have 
multiple consecutive Project operators in the plan. If there were, we would get 
rid of them using ProjectMerge and ProjectRemove rules.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208435)
Time Spent: 1h 10m  (was: 1h)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch, HIVE-21338.4.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> 

[jira] [Commented] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785253#comment-16785253
 ] 

Hive QA commented on HIVE-21286:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 1s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
31s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
51s{color} | {color:blue} itests/hive-unit in master has 2 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
20s{color} | {color:red} itests/hive-unit: The patch generated 10 new + 18 
unchanged - 0 fixed = 28 total (was 18) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16354/dev-support/hive-personality.sh
 |
| git revision | master / 9dc28db |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16354/yetus/diff-checkstyle-itests_hive-unit.txt
 |
| modules | C: ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16354/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, 

[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208434
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 05:55
Start Date: 06/Mar/19 05:55
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262800104
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
 ##
 @@ -1925,6 +1926,11 @@ public RelNode apply(RelOptCluster cluster, 
RelOptSchema relOptSchema, SchemaPlu
 perfLogger.PerfLogEnd(this.getClass().getName(), PerfLogger.OPTIMIZER, 
"Calcite: Window fixing rule");
   }
 
+  perfLogger.PerfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
 
 Review comment:
   Yes, that is what I meant.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208434)
Time Spent: 1h  (was: 50m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch, HIVE-21338.4.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>  

[jira] [Commented] (HIVE-21272) information_schema.tables should also contain views

2019-03-05 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785247#comment-16785247
 ] 

Daniel Dai commented on HIVE-21272:
---

Checked the code, actually INFORMATION_SCHEMA.TABLES does contain views if the 
authorizer is Ranger. The problem here is with StorageBasedAuthorizer, which 
the permission is decided by the permission of hdfs files of the tables/views. 
However, view is a special table which does not have corresponding hdfs 
location, so Hive cannot decide the permission of the table. To be safe, we 
just make view not available. Since the major use case for information schema 
is for Ranger authorizer, in that case, INFORMATION_SCHEMA.TABLES works as 
expected.

> information_schema.tables should also contain views
> ---
>
> Key: HIVE-21272
> URL: https://issues.apache.org/jira/browse/HIVE-21272
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 3.1.0
>Reporter: Greg Rahn
>Assignee: Daniel Dai
>Priority: Critical
>
> Currently it appears that INFORMATION_SCHEMA.TABLES does not contain views.
> Per the ISO SQL standard, 
> {quote}
> The INFORMATION_SCHEMA.TABLES table contains one row for each table 
> {color:red}including views.{color}
> {quote}
> Example from Postgres:
> {noformat}
> create table t (i int);
> create view v as select i from t;
> select
>   table_catalog,
>   table_schema,
>   table_name,
>   table_type
> from information_schema.tables
> where table_name in ('t','v');
>  table_catalog | table_schema | table_name | table_type
> ---+--++
>  grahn | public   | t  | BASE TABLE
>  grahn | public   | v  | VIEW
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21283) Create Synonym mid for substr, position for locate

2019-03-05 Thread Mani M (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mani M updated HIVE-21283:
--
Status: In Progress  (was: Patch Available)

> Create Synonym mid for  substr, position for  locate
> 
>
> Key: HIVE-21283
> URL: https://issues.apache.org/jira/browse/HIVE-21283
> Project: Hive
>  Issue Type: New Feature
>Reporter: Mani M
>Assignee: Mani M
>Priority: Minor
>  Labels: UDF, pull-request-available, todoc4.0
> Fix For: 4.0.0
>
> Attachments: HIVE.21283.03.PATCH, HIVE.21283.2.PATCH, HIVE.21283.PATCH
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Create new synonym for the existing function
>  
> Mid for substr
> postiion for locate 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21338:
---
Status: Patch Available  (was: Open)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch, HIVE-21338.4.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21338:
---
Status: Open  (was: Patch Available)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch, HIVE-21338.4.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21338:
---
Attachment: HIVE-21338.4.patch

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch, HIVE-21338.4.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208312=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208312
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 05:01
Start Date: 06/Mar/19 05:01
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262792425
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
 ##
 @@ -1925,6 +1926,11 @@ public RelNode apply(RelOptCluster cluster, 
RelOptSchema relOptSchema, SchemaPlu
 perfLogger.PerfLogEnd(this.getClass().getName(), PerfLogger.OPTIMIZER, 
"Calcite: Window fixing rule");
   }
 
+  perfLogger.PerfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
 
 Review comment:
   @jcamachor I am not sure if I understand the point fully.  In your example 
ORDER BY is already being removed. Are you suggesting to open JIRA to remove 
extra aggregate? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208312)
Time Spent: 50m  (was: 40m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 

[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208266
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 06/Mar/19 04:42
Start Date: 06/Mar/19 04:42
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262789953
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortLimitRemoveRule.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit;
+
+/**
+ * Planner rule that removes
+ * a {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit}.
+ * Note that this is different from HiveSortRemoveRule because this is not 
based on statistics
+ */
+public class HiveSortLimitRemoveRule extends RelOptRule {
+
+  //~ Constructors ---
+
+  public HiveSortLimitRemoveRule() {
+this(operand(HiveSortLimit.class, any()));
+  }
+
+  private HiveSortLimitRemoveRule(RelOptRuleOperand operand) {
+super(operand);
+  }
+
+  //~ Methods 
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+final HiveSortLimit sortLimit = call.rel(0);
+
+return HiveRelOptUtil.produceAtmostOneRow(sortLimit.getInput());
 
 Review comment:
   @jcamachor The rule uses recursive method which could traverse the whole 
tree so restricting this rule to a specific pattern does not really make sense 
(beside matching HiveSortLimit). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208266)
Time Spent: 40m  (was: 0.5h)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> 

[jira] [Resolved] (HIVE-14568) Hive Decimal Returns NULL

2019-03-05 Thread gurmukh singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gurmukh singh resolved HIVE-14568.
--
  Resolution: Won't Fix
Release Note: Marking it resolved as per comment from @Xuefu Zhang

> Hive Decimal Returns NULL
> -
>
> Key: HIVE-14568
> URL: https://issues.apache.org/jira/browse/HIVE-14568
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0
> Environment: Centos 6.7, Hadoop 2.7.2,hive 1.0.0,2.0
>Reporter: gurmukh singh
>Assignee: Xuefu Zhang
>Priority: Major
>
> Hi
> I was under the impression that the bug: 
> https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the 
> same issue in Hive 1.0 and hive 1.2 as well.
> hive> desc mul_table;
> OK
> prc   decimal(38,28)
> vol   decimal(38,10)
> Time taken: 0.068 seconds, Fetched: 2 row(s)
> hive> select prc, vol, prc*vol as cost from mul_table;
> OK
> 1.2   200 NULL
> 1.44  200 NULL
> 2.14  100 NULL
> 3.004 50  NULL
> 1.2   200 NULL
> Time taken: 0.048 seconds, Fetched: 5 row(s)
> Rather then returning NULL, it should give error or round off.
> I understand that, I can use Double instead of decimal or can cast it, but 
> still returning "Null" will make many things go unnoticed.
> hive> desc mul_table2;
> OK
> prc   double
> vol   decimal(14,10)
> Time taken: 0.049 seconds, Fetched: 2 row(s)
> hive> select * from mul_table2;
> OK
> 1.4   200
> 1.34  200
> 7.34  100
> 7454533.354544100
> Time taken: 0.028 seconds, Fetched: 4 row(s)
> hive> select prc, vol, prc*vol  as cost from mul_table3;
> OK
> 7.34  100 734.0
> 7.34  10007340.0
> 1.000410001000.4
> 7454533.354544100 7.454533354544E8   <- Wrong result
> 7454533.35454410007.454533354544E9   <- Wrong result
> Time taken: 0.025 seconds, Fetched: 5 row(s)
> Casting:
> hive> select prc, vol, cast(prc*vol as decimal(38,38)) as cost from 
> mul_table3;
> OK
> 7.34  100 NULL
> 7.34  1000NULL
> 1.00041000NULL
> 7454533.354544100 NULL
> 7454533.3545441000NULL
> Time taken: 0.033 seconds, Fetched: 5 row(s)
> hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from 
> mul_table3;
> OK
> 7.34  100 734
> 7.34  10007340
> 1.000410001000.4
> 7454533.354544100 745453335.4544
> 7454533.35454410007454533354.544
> Time taken: 0.026 seconds, Fetched: 5 row(s) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21314) Hive Replication not retaining the owner in the replicated table

2019-03-05 Thread Sankar Hariappan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785225#comment-16785225
 ] 

Sankar Hariappan commented on HIVE-21314:
-

+1
03.patch looks good to me.


> Hive Replication not retaining the owner in the replicated table
> 
>
> Key: HIVE-21314
> URL: https://issues.apache.org/jira/browse/HIVE-21314
> Project: Hive
>  Issue Type: Bug
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21314.01.patch, HIVE-21314.02.patch, 
> HIVE-21314.03.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Hive Replication not retaining the owner in the replicated table. The owner 
> for the target table is set same as the user executing the load command. The 
> user information should be read from the dump metadata and should be used 
> while creating the table at target cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18572) The record readers; InputFormat needs to be fixed for Tez as it generates 1 split

2019-03-05 Thread gurmukh singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-18572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785221#comment-16785221
 ] 

gurmukh singh commented on HIVE-18572:
--

Opened an internal ticket to fix it.

> The record readers; InputFormat needs to be fixed for Tez as it generates 1 
> split
> -
>
> Key: HIVE-18572
> URL: https://issues.apache.org/jira/browse/HIVE-18572
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 2.1.0
>Reporter: gurmukh singh
>Priority: Major
>
> The record reader needs to be fixed for tez, as it generates only 1 split due 
> to the {color:#33}MRv2 CombineInputFormat broke that rule{color}.
> This has been fixed in MR but not Tez.
> == Issue 
> Closing it as invalid, will create a new ticket with tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21385) Allow disabling pushdown of non-splittable computation to JDBC sources

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785199#comment-16785199
 ] 

Hive QA commented on HIVE-21385:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961190/HIVE-21385.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15818 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.exec.tez.TestDynamicPartitionPruner.testSingleSourceMultipleFiltersOrdering1
 (batchId=319)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16353/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16353/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16353/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961190 - PreCommit-HIVE-Build

> Allow disabling pushdown of non-splittable computation to JDBC sources
> --
>
> Key: HIVE-21385
> URL: https://issues.apache.org/jira/browse/HIVE-21385
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, StorageHandler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21385.01.patch, HIVE-21385.patch
>
>
> Until pushdown is cost-based decision, we will be able to enable / disable 
> pushdown of operators that prevent reading results from the JDBC connection 
> in parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21385) Allow disabling pushdown of non-splittable computation to JDBC sources

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785184#comment-16785184
 ] 

Hive QA commented on HIVE-21385:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
57s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
44s{color} | {color:blue} common in master has 63 extant Findbugs warnings. 
{color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
35s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} The patch common passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} ql: The patch generated 0 new + 168 unchanged - 3 
fixed = 168 total (was 171) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16353/dev-support/hive-personality.sh
 |
| git revision | master / 9dc28db |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: common ql itests U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16353/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Allow disabling pushdown of non-splittable computation to JDBC sources
> --
>
> Key: HIVE-21385
> URL: https://issues.apache.org/jira/browse/HIVE-21385
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, StorageHandler
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21385.01.patch, HIVE-21385.patch
>
>
> Until pushdown is cost-based decision, we will be able to enable / disable 
> pushdown of operators that prevent reading results from the JDBC connection 
> in parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21325) Hive external table replication failed with Permission denied issue.

2019-03-05 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21325:
---
Status: Patch Available  (was: Open)

> Hive external table replication failed with Permission denied issue.
> 
>
> Key: HIVE-21325
> URL: https://issues.apache.org/jira/browse/HIVE-21325
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21325.01.patch
>
>
> During external table replication the file copy is done in parallel to the 
> meta data replication. If the file copy task creates the directory with do as 
> set to true, it will create the directory with permission set to the user 
> running the repl command. In that case the meta data task while creating the 
> table may fail as hive user might not have access to the created directory.
> The fix should be
>  # While creating directory, if sql based authentication is enabled, then 
> disable storage based authentication for hive user.
>  # Currently the created directory has the login user access, it should 
> retain the source clusters owner, group and permission.
>  # For external table replication don't create the directory during create 
> table and add partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21325) Hive external table replication failed with Permission denied issue.

2019-03-05 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21325:
---
Attachment: HIVE-21325.01.patch

> Hive external table replication failed with Permission denied issue.
> 
>
> Key: HIVE-21325
> URL: https://issues.apache.org/jira/browse/HIVE-21325
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21325.01.patch
>
>
> During external table replication the file copy is done in parallel to the 
> meta data replication. If the file copy task creates the directory with do as 
> set to true, it will create the directory with permission set to the user 
> running the repl command. In that case the meta data task while creating the 
> table may fail as hive user might not have access to the created directory.
> The fix should be
>  # While creating directory, if sql based authentication is enabled, then 
> disable storage based authentication for hive user.
>  # Currently the created directory has the login user access, it should 
> retain the source clusters owner, group and permission.
>  # For external table replication don't create the directory during create 
> table and add partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21388) Constant UDF is not pushed to JDBCStorage Handler

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785174#comment-16785174
 ] 

Hive QA commented on HIVE-21388:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961187/HIVE-21388.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15817 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=61)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16352/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16352/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16352/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961187 - PreCommit-HIVE-Build

> Constant UDF is not pushed to JDBCStorage Handler
> -
>
> Key: HIVE-21388
> URL: https://issues.apache.org/jira/browse/HIVE-21388
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, StorageHandler
>Affects Versions: 4.0.0
>Reporter: Daniel Dai
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21388.01.patch, HIVE-21388.patch
>
>
> A query involve a Hive UDF which produce a constant value does not push to 
> JDBC table. Replacing the UDF with a constant push down works. Ideally, Hive 
> shall first do constant folding and then push the computation.
> Here is the example:
> {code}
> explain select PRINCIPAL_NAME from sys.TBL_PRIVS where 
> PRINCIPAL_NAME=current_user();
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Select Operator [SEL_3]|
> |   Output:["_col0"] |
> |   Filter Operator [FIL_2]  |
> | predicate:(_col5 = 'hrt_qa')   |
> | Select Operator [SEL_1]|
> |   Output:["_col5"] |
> |   TableScan [TS_0] |
> | Output:["principal_name"],properties:{"hive.sql.query":"SELECT 
> `tbl_grant_id`, `create_time`, `grant_option`, `grantor`, `grantor_type`, 
> `principal_name`, `principal_type`, `tbl_priv`, `tbl_id`, `authorizer`\nFROM 
> `TBL_PRIVS`","hive.sql.query.fieldNames":"tbl_grant_id,create_time,grant_option,grantor,grantor_type,principal_name,principal_type,tbl_priv,tbl_id,authorizer","hive.sql.query.fieldTypes":"bigint,int,int,string,string,string,string,string,bigint,string","hive.sql.query.split":"true"}
>  |
> ||
> ++
> {code}
> If I replace current_user() with a constant, the predicate is pushed to table 
> scan.
> Also, setting annotation deterministic=true and make initialize() return a 
> ConstantObjectInspector of GenericUDFCurrentUser does not make a difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20848) After setting UpdateInputAccessTimeHook query fail with Table Not Found.

2019-03-05 Thread Rajkumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh updated HIVE-20848:
--
Attachment: HIVE-20848.01.patch
Status: Patch Available  (was: Open)

> After setting UpdateInputAccessTimeHook query fail with Table Not Found.
> 
>
> Key: HIVE-20848
> URL: https://issues.apache.org/jira/browse/HIVE-20848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-20848.01.patch, HIVE-20848.patch
>
>
> {code}
>  select from_unixtime(1540495168); 
>  set 
> hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook,org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec;
>  select from_unixtime(1540495168); 
> {code}
> the second select fail with following exception
> {code}
> ERROR ql.Driver: FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException(Table not found 
> _dummy_table)
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found 
> _dummy_table
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1217)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1168)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1155)
> at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:67)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1444)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1294)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20656) Sensible defaults: Map aggregation memory configs are too aggressive

2019-03-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-20656:
-
Attachment: HIVE-20656.2.patch

> Sensible defaults: Map aggregation memory configs are too aggressive
> 
>
> Key: HIVE-20656
> URL: https://issues.apache.org/jira/browse/HIVE-20656
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-20656.1.patch, HIVE-20656.2.patch
>
>
> The defaults for the following configs seems to be too aggressive. In java 
> this can easily lead to several full GC pauses whose memory cannot be 
> reclaimed.
> {code:java}
> HIVEMAPAGGRHASHMEMORY("hive.map.aggr.hash.percentmemory", (float) 0.99,
> "Portion of total memory to be used by map-side group aggregation hash 
> table"),
> HIVEMAPAGGRMEMORYTHRESHOLD("hive.map.aggr.hash.force.flush.memory.threshold", 
> (float) 0.9,
> "The max memory to be used by map-side group aggregation hash table.\n" +
> "If the memory usage is higher than this number, force to flush 
> data"),{code}
>  
> We can be little bit conservative for these configs to avoid getting into GC 
> pause. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20848) After setting UpdateInputAccessTimeHook query fail with Table Not Found.

2019-03-05 Thread Rajkumar Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkumar Singh updated HIVE-20848:
--
Status: Open  (was: Patch Available)

> After setting UpdateInputAccessTimeHook query fail with Table Not Found.
> 
>
> Key: HIVE-20848
> URL: https://issues.apache.org/jira/browse/HIVE-20848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-20848.01.patch, HIVE-20848.patch
>
>
> {code}
>  select from_unixtime(1540495168); 
>  set 
> hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook,org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec;
>  select from_unixtime(1540495168); 
> {code}
> the second select fail with following exception
> {code}
> ERROR ql.Driver: FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException(Table not found 
> _dummy_table)
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found 
> _dummy_table
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1217)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1168)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1155)
> at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:67)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1444)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1294)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20656) Sensible defaults: Map aggregation memory configs are too aggressive

2019-03-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785157#comment-16785157
 ] 

Prasanth Jayachandran commented on HIVE-20656:
--

Golden files updated. 

> Sensible defaults: Map aggregation memory configs are too aggressive
> 
>
> Key: HIVE-20656
> URL: https://issues.apache.org/jira/browse/HIVE-20656
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-20656.1.patch, HIVE-20656.2.patch
>
>
> The defaults for the following configs seems to be too aggressive. In java 
> this can easily lead to several full GC pauses whose memory cannot be 
> reclaimed.
> {code:java}
> HIVEMAPAGGRHASHMEMORY("hive.map.aggr.hash.percentmemory", (float) 0.99,
> "Portion of total memory to be used by map-side group aggregation hash 
> table"),
> HIVEMAPAGGRMEMORYTHRESHOLD("hive.map.aggr.hash.force.flush.memory.threshold", 
> (float) 0.9,
> "The max memory to be used by map-side group aggregation hash table.\n" +
> "If the memory usage is higher than this number, force to flush 
> data"),{code}
>  
> We can be little bit conservative for these configs to avoid getting into GC 
> pause. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785156#comment-16785156
 ] 

Vineet Garg commented on HIVE-21382:


This patch also fixes HIVE-21387

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785155#comment-16785155
 ] 

Vineet Garg commented on HIVE-21382:


Challenge with query23 is that group by has expressions (e.g. {{substr}}) which 
couldn't be tracked back to source table making it difficult for optimizer to 
determine if it is safe or not to remove such keys. I am attaching a patch 
which should otherwise work. It extends the existing logic to consider group by 
keys reduction even if columns originate from different source table.

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21382:
---
Status: Patch Available  (was: Open)

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21382) Group by keys reduction optimization - keys are not reduced in query23

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21382:
---
Attachment: HIVE-21382.1.patch

> Group by keys reduction optimization - keys are not reduced in query23
> --
>
> Key: HIVE-21382
> URL: https://issues.apache.org/jira/browse/HIVE-21382
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21382.1.patch
>
>
> {code:sql}
> explain cbo with frequent_ss_items as 
>  (select substr(i_item_desc,1,30) itemdesc,i_item_sk item_sk,d_date 
> solddate,count(*) cnt
>   from store_sales
>   ,date_dim 
>   ,item
>   where ss_sold_date_sk = d_date_sk
> and ss_item_sk = i_item_sk 
> and d_year in (1999,1999+1,1999+2,1999+3)
>   group by substr(i_item_desc,1,30),i_item_sk,d_date
>   having count(*) >4)
> select  sum(sales)
>  from ((select cs_quantity*cs_list_price sales
>from catalog_sales
>,date_dim 
>where d_year = 1999 
>  and d_moy = 1 
>  and cs_sold_date_sk = d_date_sk 
>  and cs_item_sk in (select item_sk from frequent_ss_items))) subq 
> limit 100;
> {code}
> {code:sql}
> HiveSortLimit(fetch=[100])
>   HiveProject($f0=[$0])
> HiveAggregate(group=[{}], agg#0=[sum($0)])
>   HiveProject(sales=[*(CAST($2):DECIMAL(10, 0), $3)])
> HiveSemiJoin(condition=[=($1, $5)], joinType=[inner])
>   HiveJoin(condition=[=($0, $4)], joinType=[inner], algorithm=[none], 
> cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(cs_sold_date_sk=[$0], cs_item_sk=[$15], 
> cs_quantity=[$18], cs_list_price=[$20])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[perf_constraints, catalog_sales]], 
> table:alias=[catalog_sales])
> HiveProject(d_date_sk=[$0])
>   HiveFilter(condition=[AND(=($6, 1999), =($8, 1))])
> HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
>   HiveProject(i_item_sk=[$1])
> HiveFilter(condition=[>($3, 4)])
>   HiveProject(substr=[$2], i_item_sk=[$1], d_date=[$0], $f3=[$3])
> HiveAggregate(group=[{3, 4, 5}], agg#0=[count()])
>   HiveJoin(condition=[=($1, $4)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
> HiveJoin(condition=[=($0, $2)], joinType=[inner], 
> algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}])
>   HiveProject(ss_sold_date_sk=[$0], ss_item_sk=[$2])
> HiveFilter(condition=[IS NOT NULL($0)])
>   HiveTableScan(table=[[perf_constraints, 
> store_sales]], table:alias=[store_sales])
>   HiveProject(d_date_sk=[$0], d_date=[$2])
> HiveFilter(condition=[IN($6, 1999, 2000, 2001, 2002)])
>   HiveTableScan(table=[[perf_constraints, date_dim]], 
> table:alias=[date_dim])
> HiveProject(i_item_sk=[$0], substr=[substr($4, 1, 30)])
>   HiveTableScan(table=[[perf_constraints, item]], 
> table:alias=[item])
> {code}
> Right side of HiveSemiJoin has an aggregate which could be reduce to have 
> only {{i_item_sk}} as group by key since {{i_item_sk}} is primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21339) LLAP: Cache hit also initializes an FS object

2019-03-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21339:
-
Attachment: HIVE-21339.3.patch

> LLAP: Cache hit also initializes an FS object 
> --
>
> Key: HIVE-21339
> URL: https://issues.apache.org/jira/browse/HIVE-21339
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21339.1.patch, HIVE-21339.2.patch, 
> HIVE-21339.3.patch, llap-cache-fs-get.png, llap-query7-cached.svg
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L214
> {code}
> // 1. Get file metadata from cache, or create the reader and read it.
> // Don't cache the filesystem object for now; Tez closes it and FS cache 
> will fix all that
> fs = split.getPath().getFileSystem(jobConf);
> fileKey = determineFileId(fs, split,
> HiveConf.getBoolVar(daemonConf, 
> ConfVars.LLAP_CACHE_ALLOW_SYNTHETIC_FILEID),
> HiveConf.getBoolVar(daemonConf, 
> ConfVars.LLAP_CACHE_DEFAULT_FS_FILE_ID),
> !HiveConf.getBoolVar(daemonConf, ConfVars.LLAP_IO_USE_FILEID_PATH)
> );
> {code}
>  !llap-cache-fs-get.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21339) LLAP: Cache hit also initializes an FS object

2019-03-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785147#comment-16785147
 ] 

Prasanth Jayachandran commented on HIVE-21339:
--

Looks like precommit tried to apply the svg file. Reuploading the same patch.

> LLAP: Cache hit also initializes an FS object 
> --
>
> Key: HIVE-21339
> URL: https://issues.apache.org/jira/browse/HIVE-21339
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21339.1.patch, HIVE-21339.2.patch, 
> HIVE-21339.3.patch, llap-cache-fs-get.png, llap-query7-cached.svg
>
>
> https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java#L214
> {code}
> // 1. Get file metadata from cache, or create the reader and read it.
> // Don't cache the filesystem object for now; Tez closes it and FS cache 
> will fix all that
> fs = split.getPath().getFileSystem(jobConf);
> fileKey = determineFileId(fs, split,
> HiveConf.getBoolVar(daemonConf, 
> ConfVars.LLAP_CACHE_ALLOW_SYNTHETIC_FILEID),
> HiveConf.getBoolVar(daemonConf, 
> ConfVars.LLAP_CACHE_DEFAULT_FS_FILE_ID),
> !HiveConf.getBoolVar(daemonConf, ConfVars.LLAP_IO_USE_FILEID_PATH)
> );
> {code}
>  !llap-cache-fs-get.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19968) UDF exception is not throw out

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-19968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785140#comment-16785140
 ] 

Hive QA commented on HIVE-19968:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961172/HIVE-19968.06.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15817 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[test_teradatabinaryfile] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=61)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16351/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16351/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16351/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961172 - PreCommit-HIVE-Build

> UDF exception is not throw out
> --
>
> Key: HIVE-19968
> URL: https://issues.apache.org/jira/browse/HIVE-19968
> Project: Hive
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-19968.01.patch, HIVE-19968.02.patch, 
> HIVE-19968.03.patch, HIVE-19968.04.patch, HIVE-19968.05.patch, 
> HIVE-19968.06.patch, hive-udf.png
>
>
> udf init failed, and throw a exception, but hive catch it and do nothing, 
> leading to app succ, but no data is generated.
> {code}
> GenericUDFReflect.java#evaluate()
> try {  
>    o = null;  
>    o = ReflectionUtils.newInstance(c, null);
> }   catch (Exception e) {  
> // ignored
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20848) After setting UpdateInputAccessTimeHook query fail with Table Not Found.

2019-03-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785128#comment-16785128
 ] 

Ashutosh Chauhan commented on HIVE-20848:
-

+1
Can you reattach the patch for QA run?

> After setting UpdateInputAccessTimeHook query fail with Table Not Found.
> 
>
> Key: HIVE-20848
> URL: https://issues.apache.org/jira/browse/HIVE-20848
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-20848.patch
>
>
> {code}
>  select from_unixtime(1540495168); 
>  set 
> hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook,org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec;
>  select from_unixtime(1540495168); 
> {code}
> the second select fail with following exception
> {code}
> ERROR ql.Driver: FAILED: Hive Internal Error: 
> org.apache.hadoop.hive.ql.metadata.InvalidTableException(Table not found 
> _dummy_table)
> org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found 
> _dummy_table
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1217)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1168)
> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1155)
> at 
> org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec.run(UpdateInputAccessTimeHook.java:67)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1444)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1294)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20616) Dynamic Partition Insert failed if PART_VALUE exceeds 4000 chars

2019-03-05 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785129#comment-16785129
 ] 

Ashutosh Chauhan commented on HIVE-20616:
-

[~daijy] Can you please review this?

> Dynamic Partition Insert failed if PART_VALUE exceeds 4000 chars
> 
>
> Key: HIVE-20616
> URL: https://issues.apache.org/jira/browse/HIVE-20616
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-20616.patch
>
>
> with mysql as metastore db the PARTITION_PARAMS.PARAM_VALUE defined as 
> varchar(4000)
> {code}
> describe PARTITION_PARAMS; 
> +-+---+--+-+-+---+ 
> | Field | Type | Null | Key | Default | Extra | 
> +-+---+--+-+-+---+ 
> | PART_ID | bigint(20) | NO | PRI | NULL | | 
> | PARAM_KEY | varchar(256) | NO | PRI | NULL | | 
> | PARAM_VALUE | varchar(4000) | YES | | NULL | | 
> +-+---+--+-+-+---+ 
> {code}
> which lead to the MoveTask failure if PART_VALUE excceeds 4000 chars.
> {code}
> org.datanucleus.store.rdbms.exceptions.MappedDatastoreException: INSERT INTO 
> `PARTITION_PARAMS` (`PARAM_VALUE`,`PART_ID`,`PARAM_KEY`) VALUES (?,?,?)
>  at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.internalPut(JoinMapStore.java:1074)
>  at 
> org.datanucleus.store.rdbms.scostore.JoinMapStore.putAll(JoinMapStore.java:224)
>  at 
> org.datanucleus.store.rdbms.mapping.java.MapMapping.postInsert(MapMapping.java:158)
>  at 
> org.datanucleus.store.rdbms.request.InsertRequest.execute(InsertRequest.java:522)
>  at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertObjectInTable(RDBMSPersistenceHandler.java:162)
>  at 
> org.datanucleus.store.rdbms.RDBMSPersistenceHandler.insertObject(RDBMSPersistenceHandler.java:138)
>  at 
> org.datanucleus.state.StateManagerImpl.internalMakePersistent(StateManagerImpl.java:3363)
>  at 
> org.datanucleus.state.StateManagerImpl.makePersistent(StateManagerImpl.java:3339)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2080)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1923)
>  at 
> org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1778)
>  at 
> org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:724)
>  at 
> org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:749)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.addPartition(ObjectStore.java:2442)
>  at sun.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
>  at com.sun.proxy.$Proxy32.addPartition(Unknown Source)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partition_core(HiveMetaStore.java:3976)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partition_with_environment_context(HiveMetaStore.java:4032)
>  at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>  at com.sun.proxy.$Proxy34.add_partition_with_environment_context(Unknown 
> Source)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partition_with_environment_context.getResult(ThriftHiveMetastore.java:15528)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$add_partition_with_environment_context.getResult(ThriftHiveMetastore.java:15512)
>  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>  at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
>  at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
>  at 

[jira] [Commented] (HIVE-21048) Remove needless org.mortbay.jetty from hadoop exclusions

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785087#comment-16785087
 ] 

Hive QA commented on HIVE-21048:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961171/HIVE-21048.09.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 15802 tests 
executed
*Failed tests:*
{noformat}
TestMiniLlapCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=155)

[intersect_all.q,unionDistinct_1.q,table_nonprintable.q,orc_llap_counters1.q,mm_cttas.q,whroot_external1.q,global_limit.q,cte_2.q,rcfile_createas1.q,dynamic_partition_pruning_2.q,intersect_merge.q,results_cache_diff_fs.q,cttl.q,parallel_colstats.q,load_hdfs_file_with_space_in_the_name.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[test_teradatabinaryfile] 
(batchId=2)
org.apache.hadoop.hive.cli.TestMiniDruidKafkaCliDriver.testCliDriver[druidkafkamini_avro]
 (batchId=275)
org.apache.hadoop.hive.cli.TestMiniDruidKafkaCliDriver.testCliDriver[druidkafkamini_basic]
 (batchId=275)
org.apache.hadoop.hive.cli.TestMiniDruidKafkaCliDriver.testCliDriver[druidkafkamini_csv]
 (batchId=275)
org.apache.hadoop.hive.cli.TestMiniDruidKafkaCliDriver.testCliDriver[druidkafkamini_delimited]
 (batchId=275)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16350/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16350/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16350/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961171 - PreCommit-HIVE-Build

> Remove needless org.mortbay.jetty from hadoop exclusions
> 
>
> Key: HIVE-21048
> URL: https://issues.apache.org/jira/browse/HIVE-21048
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-21048.01.patch, HIVE-21048.02.patch, 
> HIVE-21048.03.patch, HIVE-21048.04.patch, HIVE-21048.05.patch, 
> HIVE-21048.06.patch, HIVE-21048.07.patch, HIVE-21048.08.patch, 
> HIVE-21048.08.patch, HIVE-21048.09.patch, dep.out
>
>
> During HIVE-20638 i found that org.mortbay.jetty exclusions from e.g. hadoop 
> don't take effect, as the actual groupId of jetty is org.eclipse.jetty for 
> most of the current projects, please find attachment (example for hive 
> commons project).
> https://en.wikipedia.org/wiki/Jetty_(web_server)#History



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21390) BI split strategy does not work for blob stores

2019-03-05 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785085#comment-16785085
 ] 

Gopal V commented on HIVE-21390:


LGTM - +1 tests pending

> BI split strategy does not work for blob stores
> ---
>
> Key: HIVE-21390
> URL: https://issues.apache.org/jira/browse/HIVE-21390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21390.1.patch, HIVE-21390.2.patch
>
>
> BI split strategy cuts the split at block boundaries however there are no 
> block boundaries in blob storage so we end up with 1 split for BI split 
> strategy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21293) Fix ambiguity in grammar warnings at compilation time (II)

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785015#comment-16785015
 ] 

Hive QA commented on HIVE-21293:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961167/HIVE-21293.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15817 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby_reduce] 
(batchId=61)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16349/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16349/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16349/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961167 - PreCommit-HIVE-Build

> Fix ambiguity in grammar warnings at compilation time (II)
> --
>
> Key: HIVE-21293
> URL: https://issues.apache.org/jira/browse/HIVE-21293
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 4.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-21293.01.patch, HIVE-21293.02.patch
>
>
> These are the warnings at compilation time:
> {code}
> warning(200): IdentifiersParser.g:424:5:
> Decision can match input such as "KW_UNKNOWN" using multiple alternatives: 1, 
> 10
> As a result, alternative(s) 10 were disabled for that input
> {code}
> This means that multiple parser rules can match certain query text, possibly 
> leading to unexpected errors at parsing time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208130
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 05/Mar/19 22:34
Start Date: 05/Mar/19 22:34
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262699879
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortLimitRemoveRule.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit;
+
+/**
+ * Planner rule that removes
+ * a {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit}.
+ * Note that this is different from HiveSortRemoveRule because this is not 
based on statistics
+ */
+public class HiveSortLimitRemoveRule extends RelOptRule {
+
 
 Review comment:
   Create static final INSTANCE(s) for the rule to avoid multiple 
instantiations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208130)
Time Spent: 0.5h  (was: 20m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num 

[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208127
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 05/Mar/19 22:34
Start Date: 05/Mar/19 22:34
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262705631
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
 ##
 @@ -1047,4 +1052,37 @@ public static String toJsonString(final RelNode rel) {
 return planWriter.asString();
   }
 
+
+  /**
+   * Utility method to answer if given a rel plan it will produce at most
+   *  one row.
+   */
+  public static boolean produceAtmostOneRow(RelNode rel) {
+if(rel instanceof HepRelVertex) {
+  rel = ((HepRelVertex)rel).getCurrentRel();
+}
+if(rel instanceof HiveProject) {
+  if(((HiveProject)rel).hasWindowingExpr()) {
 
 Review comment:
   This check does not seem to be needed since the window expression will not 
alter the number of rows output by the Project.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208127)
Time Spent: 20m  (was: 10m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> 

[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208129
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 05/Mar/19 22:34
Start Date: 05/Mar/19 22:34
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262700363
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSortLimitRemoveRule.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelOptUtil;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit;
+
+/**
+ * Planner rule that removes
+ * a {@link 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveSortLimit}.
+ * Note that this is different from HiveSortRemoveRule because this is not 
based on statistics
+ */
+public class HiveSortLimitRemoveRule extends RelOptRule {
+
+  //~ Constructors ---
+
+  public HiveSortLimitRemoveRule() {
+this(operand(HiveSortLimit.class, any()));
+  }
+
+  private HiveSortLimitRemoveRule(RelOptRuleOperand operand) {
+super(operand);
+  }
+
+  //~ Methods 
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+final HiveSortLimit sortLimit = call.rel(0);
+
+return HiveRelOptUtil.produceAtmostOneRow(sortLimit.getInput());
 
 Review comment:
   Instead of unwrapping HepRelVertex nodes using this method, the rule should 
have two variants, similar to other rules: one that matches 
SortLimit-Project-Aggregate and one that matches SortLimit-Aggregate.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208129)
Time Spent: 0.5h  (was: 20m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> 

[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=208128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-208128
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 05/Mar/19 22:34
Start Date: 05/Mar/19 22:34
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557#discussion_r262715669
 
 

 ##
 File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
 ##
 @@ -1925,6 +1926,11 @@ public RelNode apply(RelOptCluster cluster, 
RelOptSchema relOptSchema, SchemaPlu
 perfLogger.PerfLogEnd(this.getClass().getName(), PerfLogger.OPTIMIZER, 
"Calcite: Window fixing rule");
   }
 
+  perfLogger.PerfLogBegin(this.getClass().getName(), PerfLogger.OPTIMIZER);
 
 Review comment:
   I believe we may still compute aggregations that are not used in the plan 
output. Could include the following test to confirm:
   ```
   explain cbo
   SELECT COUNT(*) FROM t1 ORDER BY SUM(col), COUNT(*);
   ```
   I believe this is not critical, but we can add the test and a note to the q 
file, and create a placeholder JIRA.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 208128)
Time Spent: 20m  (was: 10m)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select 

[jira] [Commented] (HIVE-21377) Using Oracle as HMS DB with DirectSQL

2019-03-05 Thread Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784977#comment-16784977
 ] 

Bo  commented on HIVE-21377:


[~pvary] I changed some code for handling the Number from Decimal to BigDecimal.

> Using Oracle as HMS DB with DirectSQL
> -
>
> Key: HIVE-21377
> URL: https://issues.apache.org/jira/browse/HIVE-21377
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bo 
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-21377.patch
>
>
> When we use the Oracle as HMS DB, we saw this kind of contents in the HMS log 
> accordingly:
> {code:java}
> 2019-02-02 T08:23:57,102 WARN [Thread-12]: metastore.ObjectStore 
> (ObjectStore.java:handleDirectSqlError(3741)) - Falling back to ORM path due 
> to direct SQL failure (this is not an error): Cannot extract boolean from 
> column value 0 at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.extractSqlBoolean(MetaStoreDirectSql.java:1031)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsFromPartitionIds(MetaStoreDirectSql.java:728)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$300(MetaStoreDirectSql.java:109)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql$1.run(MetaStoreDirectSql.java:471)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:73) 
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:462)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$8.getSqlResult(ObjectStore.java:3392)
> {code}
> In Hive, we handle the Postgres, MySQL and Derby for the extractSqlBoolean.
> But Oracle return the 0 or 1 for Boolean. So we need to modify the 
> MetastoreDirectSqlUtils.java - [1]
> So, could add this snip in this code?
> {code:java}
>   static Boolean extractSqlBoolean(Object value) throws MetaException {
> if (value == null) {
>   return null;
> }
> if (value instanceof Boolean) {
>   return (Boolean)value;
> }
> if (value instanceof Number) { // add
>   try {
> return BooleanUtils.toBooleanObject((BigDecimal) value, 1, 0, null);
>   } catch(IllegalArugmentExeception iae){
>   // NOOP
>   }
> if (value instanceof String) {
>   try {
> return BooleanUtils.toBooleanObject((String) value, "Y", "N", null);
>   } catch (IllegalArgumentException iae) {
> // NOOP
>   }
> }
> throw new MetaException("Cannot extract boolean from column value " + 
> value);
>   }
> {code}
>  [1] -
> https://github.com/apache/hive/blob/f51f108b761f0c88647f48f30447dae12b308f31/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDirectSqlUtils.java#L501-L527
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21337) HMS Metadata migration from Postgres/Derby to other DBs fail

2019-03-05 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21337:
-
Status: Open  (was: Patch Available)

> HMS Metadata migration from Postgres/Derby to other DBs fail
> 
>
> Key: HIVE-21337
> URL: https://issues.apache.org/jira/browse/HIVE-21337
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-21337.patch
>
>
> Customer recently was migrating from Postgres to Oracle for HMS metastore. 
> During import of the [exported] data from HMS metastore from postgres, 
> failures are seen as the COLUMNS_V2.COMMENT is 4000 bytes long whereas oracle 
> and other schemas define it to be 256 bytes.
> This inconsistency in the schema makes the migration cumbersome and manual. 
> This jira makes this column consistent in length across all databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21337) HMS Metadata migration from Postgres/Derby to other DBs fail

2019-03-05 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21337:
-
Attachment: HIVE-21337.2.patch
Status: Patch Available  (was: Open)

Instead of downsizing the COMMENT in Postgres and derby, I am upsizing it in 
MSSQL, MySQL and Oracle DBs.

> HMS Metadata migration from Postgres/Derby to other DBs fail
> 
>
> Key: HIVE-21337
> URL: https://issues.apache.org/jira/browse/HIVE-21337
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-21337.2.patch, HIVE-21337.patch
>
>
> Customer recently was migrating from Postgres to Oracle for HMS metastore. 
> During import of the [exported] data from HMS metastore from postgres, 
> failures are seen as the COLUMNS_V2.COMMENT is 4000 bytes long whereas oracle 
> and other schemas define it to be 256 bytes.
> This inconsistency in the schema makes the migration cumbersome and manual. 
> This jira makes this column consistent in length across all databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21390) BI split strategy does not work for blob stores

2019-03-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21390:
-
Attachment: HIVE-21390.2.patch

> BI split strategy does not work for blob stores
> ---
>
> Key: HIVE-21390
> URL: https://issues.apache.org/jira/browse/HIVE-21390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21390.1.patch, HIVE-21390.2.patch
>
>
> BI split strategy cuts the split at block boundaries however there are no 
> block boundaries in blob storage so we end up with 1 split for BI split 
> strategy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21390) BI split strategy does not work for blob stores

2019-03-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784929#comment-16784929
 ] 

Prasanth Jayachandran commented on HIVE-21390:
--

[~gopalv] can you please take a look?

> BI split strategy does not work for blob stores
> ---
>
> Key: HIVE-21390
> URL: https://issues.apache.org/jira/browse/HIVE-21390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21390.1.patch
>
>
> BI split strategy cuts the split at block boundaries however there are no 
> block boundaries in blob storage so we end up with 1 split for BI split 
> strategy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21390) BI split strategy does not work for blob stores

2019-03-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21390:
-
Status: Patch Available  (was: Open)

> BI split strategy does not work for blob stores
> ---
>
> Key: HIVE-21390
> URL: https://issues.apache.org/jira/browse/HIVE-21390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21390.1.patch
>
>
> BI split strategy cuts the split at block boundaries however there are no 
> block boundaries in blob storage so we end up with 1 split for BI split 
> strategy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21390) BI split strategy does not work for blob stores

2019-03-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-21390:
-
Attachment: HIVE-21390.1.patch

> BI split strategy does not work for blob stores
> ---
>
> Key: HIVE-21390
> URL: https://issues.apache.org/jira/browse/HIVE-21390
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0, 3.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Major
> Attachments: HIVE-21390.1.patch
>
>
> BI split strategy cuts the split at block boundaries however there are no 
> block boundaries in blob storage so we end up with 1 split for BI split 
> strategy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21371) Make NonSyncByteArrayOutputStream Overflow Conscious

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784897#comment-16784897
 ] 

Hive QA commented on HIVE-21371:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961165/HIVE-21371.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15817 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16348/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16348/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16348/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961165 - PreCommit-HIVE-Build

> Make NonSyncByteArrayOutputStream Overflow Conscious 
> -
>
> Key: HIVE-21371
> URL: https://issues.apache.org/jira/browse/HIVE-21371
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0, 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HIVE-21371.1.patch, HIVE-21371.2.patch, 
> HIVE-21371.2.patch, HIVE-21371.2.patch
>
>
> {code:java|title=NonSyncByteArrayOutputStream}
>   private int enLargeBuffer(int increment) {
> int temp = count + increment;
> int newLen = temp;
> if (temp > buf.length) {
>   if ((buf.length << 1) > temp) {
> newLen = buf.length << 1;
>   }
>   byte newbuf[] = new byte[newLen];
>   System.arraycopy(buf, 0, newbuf, 0, count);
>   buf = newbuf;
> }
> return newLen;
>   }
> {code}
> This will fail if the array is 2GB or larger because it will double the size 
> every time without consideration for the 4GB limit on arrays.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21375) Closing TransactionBatch closes FileSystem for other batches

2019-03-05 Thread Shawn Weeks (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784893#comment-16784893
 ] 

Shawn Weeks commented on HIVE-21375:


Same issue for ConnectionImpl as well. Closing the file system here will cause 
a problem if you have multiple connections or transaction batches open. This 
type of cleanup should probably be handled by the caller.

> Closing TransactionBatch closes FileSystem for other batches
> 
>
> Key: HIVE-21375
> URL: https://issues.apache.org/jira/browse/HIVE-21375
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Streaming
>Reporter: Shawn Weeks
>Priority: Minor
>
> The patch in HIVE-13151 added FileSystem.closeAllForUGI(ugi); to the close 
> method of HiveEndPoint for the legacy Streaming API. This seems to have a 
> side effect of closing the FileSystem for all open TransactionBatches as used 
> by NiFi and Storm when writing to multiple partitions. Setting 
> fs.hdfs.impl.disable.cache=true negates the issue but at a performance cost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21336) HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char

2019-03-05 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21336:
-
Attachment: HIVE-21336.3.patch
Status: Patch Available  (was: Open)

#1 Addresses a fresh installation of the hive schema
#2 Addresses any issues when upgrading from 1.3 to 4.0 when the index would 
have failed when the COLUMN_NAME size is increased to 767 from 128.
#3 Addresses the scenario where the customer may have installed 3.x schema but 
then changed the DB settings prior to the upgrade to 4.x schema.



> HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char
> --
>
> Key: HIVE-21336
> URL: https://issues.apache.org/jira/browse/HIVE-21336
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Attachments: HIVE-21336.2.patch, HIVE-21336.3.patch, HIVE-21336.patch
>
>
> CREATE INDEX PCS_STATS_IDX ON PAR T_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> Error: ORA-01450: maximum key length (6398) exceeded (state=72000,code=1450) 
> Customer tried the same DDL in SQLDevloper, and got the same error. This 
> could be a result of combination of DB level settings like the db_block_size, 
> limiting the maximum key length, as per below doc: 
> http://www.dba-oracle.com/t_ora_01450_maximum_key_length_exceeded.htm 
> Also {{NLS_LENGTH_SEMANTICS}} is by default BYTE, but users can set this at 
> the session level to CHAR, thus reducing the max size of the index length. We 
> have increased the size of the COLUMN_NAME from 128 to 767 (used to be at 
> 1000) and TABLE_NAME from 128 to 256. This by setting 
> {code} 
> CREATE TABLE PART_COL_STATS ( 
> CS_ID NUMBER NOT NULL, 
> DB_NAME VARCHAR2(128) NOT NULL, 
> TABLE_NAME VARCHAR2(256) NOT NULL, 
> PARTITION_NAME VARCHAR2(767) NOT NULL, 
> COLUMN_NAME VARCHAR2(767) NOT NULL,  
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> {code} 
> Reproducer: 
> {code} 
> SQL*Plus: Release 11.2.0.2.0 Production on Wed Feb 27 11:02:16 2019 Copyright 
> (c) 1982, 2011, Oracle. All rights reserved. 
> Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit 
> Production 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> BYTE 
> SQL> alter session set NLS_LENGTH_SEMANTICS=CHAR; Session altered. 
> SQL> commit; Commit complete. 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> CHAR 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> * ERROR at line 1: ORA-01450: maximum key length (6398) exceeded 
> SQL> alter session set NLS_LENGTH_SEMANTICS=BYTE; 
> Session altered. 
> SQL> commit; 
> Commit complete. 
> SQL> drop table PART_COL_STATS; 
> Table dropped. 
> SQL> commit; 
> Commit complete. 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> Index created. 
> SQL> commit; 
> Commit complete. 
> SQL> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21336) HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char

2019-03-05 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21336:
-
Status: Open  (was: Patch Available)

Will upload a new patch with some changes from the review.

> HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char
> --
>
> Key: HIVE-21336
> URL: https://issues.apache.org/jira/browse/HIVE-21336
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Attachments: HIVE-21336.2.patch, HIVE-21336.patch
>
>
> CREATE INDEX PCS_STATS_IDX ON PAR T_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> Error: ORA-01450: maximum key length (6398) exceeded (state=72000,code=1450) 
> Customer tried the same DDL in SQLDevloper, and got the same error. This 
> could be a result of combination of DB level settings like the db_block_size, 
> limiting the maximum key length, as per below doc: 
> http://www.dba-oracle.com/t_ora_01450_maximum_key_length_exceeded.htm 
> Also {{NLS_LENGTH_SEMANTICS}} is by default BYTE, but users can set this at 
> the session level to CHAR, thus reducing the max size of the index length. We 
> have increased the size of the COLUMN_NAME from 128 to 767 (used to be at 
> 1000) and TABLE_NAME from 128 to 256. This by setting 
> {code} 
> CREATE TABLE PART_COL_STATS ( 
> CS_ID NUMBER NOT NULL, 
> DB_NAME VARCHAR2(128) NOT NULL, 
> TABLE_NAME VARCHAR2(256) NOT NULL, 
> PARTITION_NAME VARCHAR2(767) NOT NULL, 
> COLUMN_NAME VARCHAR2(767) NOT NULL,  
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> {code} 
> Reproducer: 
> {code} 
> SQL*Plus: Release 11.2.0.2.0 Production on Wed Feb 27 11:02:16 2019 Copyright 
> (c) 1982, 2011, Oracle. All rights reserved. 
> Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit 
> Production 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> BYTE 
> SQL> alter session set NLS_LENGTH_SEMANTICS=CHAR; Session altered. 
> SQL> commit; Commit complete. 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> CHAR 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> * ERROR at line 1: ORA-01450: maximum key length (6398) exceeded 
> SQL> alter session set NLS_LENGTH_SEMANTICS=BYTE; 
> Session altered. 
> SQL> commit; 
> Commit complete. 
> SQL> drop table PART_COL_STATS; 
> Table dropped. 
> SQL> commit; 
> Commit complete. 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> Index created. 
> SQL> commit; 
> Commit complete. 
> SQL> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-16924:
--
Status: Patch Available  (was: Open)

> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, HIVE-16924.13.patch, HIVE-16924.14.patch, 
> HIVE-16924.15.patch, HIVE-16924.16.patch, HIVE-16924.17.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> {code}
> These queries should work:
> {code:sql}
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> {code}
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-16924:
--
Attachment: HIVE-16924.17.patch

> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, HIVE-16924.13.patch, HIVE-16924.14.patch, 
> HIVE-16924.15.patch, HIVE-16924.16.patch, HIVE-16924.17.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> {code}
> These queries should work:
> {code:sql}
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> {code}
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-16924:
--
Status: Open  (was: Patch Available)

> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, HIVE-16924.13.patch, HIVE-16924.14.patch, 
> HIVE-16924.15.patch, HIVE-16924.16.patch, HIVE-16924.17.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> {code}
> These queries should work:
> {code:sql}
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> {code}
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784837#comment-16784837
 ] 

Hive QA commented on HIVE-16924:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
36s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
14s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
35s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  9m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
19s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
50s{color} | {color:red} ql: The patch generated 8 new + 639 unchanged - 13 
fixed = 647 total (was 652) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  2m 
18s{color} | {color:red} root: The patch generated 8 new + 647 unchanged - 13 
fixed = 655 total (was 660) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
50s{color} | {color:green} ql generated 0 new + 2249 unchanged - 2 fixed = 2249 
total (was 2251) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 16m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 78m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16346/dev-support/hive-personality.sh
 |
| git revision | master / 3113f89 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16346/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16346/yetus/diff-checkstyle-root.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16346/yetus/whitespace-eol.txt
 |
| modules | C: ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16346/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, 

[jira] [Commented] (HIVE-21389) Hive distribution miss javax.ws.rs-api.jar after HIVE-21247

2019-03-05 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784833#comment-16784833
 ] 

Thejas M Nair commented on HIVE-21389:
--

+1 pending tests

> Hive distribution miss javax.ws.rs-api.jar after HIVE-21247
> ---
>
> Key: HIVE-21389
> URL: https://issues.apache.org/jira/browse/HIVE-21389
> Project: Hive
>  Issue Type: Bug
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-21389.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784819#comment-16784819
 ] 

Hive QA commented on HIVE-16924:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961160/HIVE-16924.16.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 15817 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_1] (batchId=92)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16346/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16346/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16346/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961160 - PreCommit-HIVE-Build

> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, HIVE-16924.13.patch, HIVE-16924.14.patch, 
> HIVE-16924.15.patch, HIVE-16924.16.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> {code}
> These queries should work:
> {code:sql}
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> {code}
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784820#comment-16784820
 ] 

Hive QA commented on HIVE-16924:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961160/HIVE-16924.16.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16347/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16347/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16347/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Tests exited with: Exception: Patch URL 
https://issues.apache.org/jira/secure/attachment/12961160/HIVE-16924.16.patch 
was found in seen patch url's cache and a test was probably run already on it. 
Aborting...
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961160 - PreCommit-HIVE-Build

> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, HIVE-16924.13.patch, HIVE-16924.14.patch, 
> HIVE-16924.15.patch, HIVE-16924.16.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> {code}
> These queries should work:
> {code:sql}
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> {code}
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784813#comment-16784813
 ] 

Vineet Garg commented on HIVE-21338:


[~jcamachorodriguez] Would you mind taking a look at 
https://github.com/apache/hive/pull/557/

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21338:
---
Attachment: HIVE-21338.3.patch

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21338:
---
Status: Patch Available  (was: Open)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?focusedWorklogId=207998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-207998
 ]

ASF GitHub Bot logged work on HIVE-21338:
-

Author: ASF GitHub Bot
Created on: 05/Mar/19 19:33
Start Date: 05/Mar/19 19:33
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #557: HIVE-21338 
Remove order by and limit for aggregates
URL: https://github.com/apache/hive/pull/557
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 207998)
Time Spent: 10m
Remaining Estimate: 0h

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> 

[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21338:
--
Labels: pull-request-available  (was: )

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21338) Remove order by and limit for aggregates

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21338:
---
Status: Open  (was: Patch Available)

> Remove order by and limit for aggregates
> 
>
> Key: HIVE-21338
> URL: https://issues.apache.org/jira/browse/HIVE-21338
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21338.1.patch, HIVE-21338.2.patch, 
> HIVE-21338.3.patch
>
>
> If a query is guaranteed to produce at most one row LIMIT and ORDER BY could 
> be removed. This saves unnecessary vertex for LIMIT/ORDER BY.
> {code:sql}
> explain select count(*) cs from store_sales where ss_ext_sales_price > 100.00 
> order by cs limit 100
> {code}
> {code}
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   DagId: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Edges:
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
>   DagName: vgarg_20190227131959_2914830f-eab6-425d-b9f0-b8cb56f8a1e9:4
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: store_sales
>   filterExpr: (ss_ext_sales_price > 100) (type: boolean)
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (ss_ext_sales_price > 100) (type: boolean)
> Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   Statistics: Num rows: 1 Data size: 112 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   sort order:
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Reducer 2
> Execution mode: vectorized
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: bigint)
>   sort order: +
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   TopN Hash Memory Usage: 0.1
> Reducer 3
> Execution mode: vectorized
> Reduce Operator Tree:
>   Select Operator
> expressions: KEY.reducesinkkey0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 120 Basic stats: COMPLETE 
> Column stats: NONE
> Limit
>   Number of rows: 100
>   Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 120 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
> Fetch Operator
>   limit: 100
>   Processor Tree:
> ListSink
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-21340:
---
   Resolution: Fixed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-05 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784796#comment-16784796
 ] 

Vineet Garg commented on HIVE-21340:


Thanks [~jcamachorodriguez]. Follow-up JIRA HIVE-21395

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21395) Refactor HiveSemiJoinRule

2019-03-05 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg reassigned HIVE-21395:
--


> Refactor HiveSemiJoinRule
> -
>
> Key: HIVE-21395
> URL: https://issues.apache.org/jira/browse/HIVE-21395
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>
> Following refactoring needs to be done:
> * Update the rule matching pattern to avoid using HepVertex
> * HIVE-21338 adds logic to determine if rel plan will produce at most one 
> row. Use this in HiveSemiJoinRule



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

2019-03-05 Thread David Mollitor (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784774#comment-16784774
 ] 

David Mollitor commented on HIVE-21240:
---

Hello Team,

Do you have any additional questions regarding this JIRA?

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16716) Clean up javadoc from errors in module ql

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784751#comment-16784751
 ] 

Hive QA commented on HIVE-16716:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961155/HIVE-16716.7.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15811 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.TestMarkPartitionRemote.testMarkingPartitionSet
 (batchId=230)
org.apache.hive.hcatalog.api.repl.commands.TestCommands.org.apache.hive.hcatalog.api.repl.commands.TestCommands
 (batchId=204)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16345/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16345/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16345/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12961155 - PreCommit-HIVE-Build

> Clean up javadoc from errors in module ql
> -
>
> Key: HIVE-16716
> URL: https://issues.apache.org/jira/browse/HIVE-16716
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janos Gub
>Assignee: Robert Kucsora
>Priority: Major
> Attachments: HIVE-16716-v2.patch, HIVE-16716.2.patch, 
> HIVE-16716.3.patch, HIVE-16716.4.patch, HIVE-16716.5.patch, 
> HIVE-16716.6.patch, HIVE-16716.7.patch, HIVE-16716.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21388) Constant UDF is not pushed to JDBCStorage Handler

2019-03-05 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784750#comment-16784750
 ] 

Jason Dere commented on HIVE-21388:
---

Just remove the following comment form SqlFunctionConverter:
{code:java}
// isDynamicFunction used to indicate the function is not deterministic between 
queries.
{code}

Otherwise +1

> Constant UDF is not pushed to JDBCStorage Handler
> -
>
> Key: HIVE-21388
> URL: https://issues.apache.org/jira/browse/HIVE-21388
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, StorageHandler
>Affects Versions: 4.0.0
>Reporter: Daniel Dai
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21388.01.patch, HIVE-21388.patch
>
>
> A query involve a Hive UDF which produce a constant value does not push to 
> JDBC table. Replacing the UDF with a constant push down works. Ideally, Hive 
> shall first do constant folding and then push the computation.
> Here is the example:
> {code}
> explain select PRINCIPAL_NAME from sys.TBL_PRIVS where 
> PRINCIPAL_NAME=current_user();
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Select Operator [SEL_3]|
> |   Output:["_col0"] |
> |   Filter Operator [FIL_2]  |
> | predicate:(_col5 = 'hrt_qa')   |
> | Select Operator [SEL_1]|
> |   Output:["_col5"] |
> |   TableScan [TS_0] |
> | Output:["principal_name"],properties:{"hive.sql.query":"SELECT 
> `tbl_grant_id`, `create_time`, `grant_option`, `grantor`, `grantor_type`, 
> `principal_name`, `principal_type`, `tbl_priv`, `tbl_id`, `authorizer`\nFROM 
> `TBL_PRIVS`","hive.sql.query.fieldNames":"tbl_grant_id,create_time,grant_option,grantor,grantor_type,principal_name,principal_type,tbl_priv,tbl_id,authorizer","hive.sql.query.fieldTypes":"bigint,int,int,string,string,string,string,string,bigint,string","hive.sql.query.split":"true"}
>  |
> ||
> ++
> {code}
> If I replace current_user() with a constant, the predicate is pushed to table 
> scan.
> Also, setting annotation deterministic=true and make initialize() return a 
> ConstantObjectInspector of GenericUDFCurrentUser does not make a difference.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21336) HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char

2019-03-05 Thread Yongzhi Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784725#comment-16784725
 ] 

Yongzhi Chen commented on HIVE-21336:
-

For patch 2, I think you should make changes in the upgrade scripts related to 
upgrade to 4.0

> HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char
> --
>
> Key: HIVE-21336
> URL: https://issues.apache.org/jira/browse/HIVE-21336
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Attachments: HIVE-21336.2.patch, HIVE-21336.patch
>
>
> CREATE INDEX PCS_STATS_IDX ON PAR T_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> Error: ORA-01450: maximum key length (6398) exceeded (state=72000,code=1450) 
> Customer tried the same DDL in SQLDevloper, and got the same error. This 
> could be a result of combination of DB level settings like the db_block_size, 
> limiting the maximum key length, as per below doc: 
> http://www.dba-oracle.com/t_ora_01450_maximum_key_length_exceeded.htm 
> Also {{NLS_LENGTH_SEMANTICS}} is by default BYTE, but users can set this at 
> the session level to CHAR, thus reducing the max size of the index length. We 
> have increased the size of the COLUMN_NAME from 128 to 767 (used to be at 
> 1000) and TABLE_NAME from 128 to 256. This by setting 
> {code} 
> CREATE TABLE PART_COL_STATS ( 
> CS_ID NUMBER NOT NULL, 
> DB_NAME VARCHAR2(128) NOT NULL, 
> TABLE_NAME VARCHAR2(256) NOT NULL, 
> PARTITION_NAME VARCHAR2(767) NOT NULL, 
> COLUMN_NAME VARCHAR2(767) NOT NULL,  
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> {code} 
> Reproducer: 
> {code} 
> SQL*Plus: Release 11.2.0.2.0 Production on Wed Feb 27 11:02:16 2019 Copyright 
> (c) 1982, 2011, Oracle. All rights reserved. 
> Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit 
> Production 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> BYTE 
> SQL> alter session set NLS_LENGTH_SEMANTICS=CHAR; Session altered. 
> SQL> commit; Commit complete. 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> CHAR 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> * ERROR at line 1: ORA-01450: maximum key length (6398) exceeded 
> SQL> alter session set NLS_LENGTH_SEMANTICS=BYTE; 
> Session altered. 
> SQL> commit; 
> Commit complete. 
> SQL> drop table PART_COL_STATS; 
> Table dropped. 
> SQL> commit; 
> Commit complete. 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> Index created. 
> SQL> commit; 
> Commit complete. 
> SQL> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16716) Clean up javadoc from errors in module ql

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784712#comment-16784712
 ] 

Hive QA commented on HIVE-16716:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
20s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 
26s{color} | {color:blue} ql in master has 2251 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
10s{color} | {color:red} ql: The patch generated 5 new + 2509 unchanged - 6 
fixed = 2514 total (was 2515) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  1m  
5s{color} | {color:red} ql generated 41 new + 59 unchanged - 41 fixed = 100 
total (was 100) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 18s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16345/dev-support/hive-personality.sh
 |
| git revision | master / 3113f89 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16345/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16345/yetus/whitespace-eol.txt
 |
| javadoc | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16345/yetus/diff-javadoc-javadoc-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16345/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Clean up javadoc from errors in module ql
> -
>
> Key: HIVE-16716
> URL: https://issues.apache.org/jira/browse/HIVE-16716
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janos Gub
>Assignee: Robert Kucsora
>Priority: Major
> Attachments: HIVE-16716-v2.patch, HIVE-16716.2.patch, 
> HIVE-16716.3.patch, HIVE-16716.4.patch, HIVE-16716.5.patch, 
> HIVE-16716.6.patch, HIVE-16716.7.patch, HIVE-16716.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17668) Push filter clauses through PTF(Windowing) does not work in some cases

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-17668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17668:
---
Attachment: HIVE-17668.01.patch

> Push filter clauses through PTF(Windowing) does not work in some cases
> --
>
> Key: HIVE-17668
> URL: https://issues.apache.org/jira/browse/HIVE-17668
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 3.0.0, 2.4.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-17668.01.patch, HIVE-17668.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-12192) Hive should carry out timestamp computations in UTC

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-12192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12192:
---
Docs Text: 
https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types

> Hive should carry out timestamp computations in UTC
> ---
>
> Key: HIVE-12192
> URL: https://issues.apache.org/jira/browse/HIVE-12192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Ryan Blue
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>  Labels: timestamp
> Fix For: 3.1.0
>
> Attachments: HIVE-12192.01.patch, HIVE-12192.02.patch, 
> HIVE-12192.03.patch, HIVE-12192.04.patch, HIVE-12192.05.patch, 
> HIVE-12192.06.patch, HIVE-12192.07.patch, HIVE-12192.08.patch, 
> HIVE-12192.09.patch, HIVE-12192.10.patch, HIVE-12192.11.patch, 
> HIVE-12192.12.patch, HIVE-12192.13.patch, HIVE-12192.14.patch, 
> HIVE-12192.15.patch, HIVE-12192.16.patch, HIVE-12192.17.patch, 
> HIVE-12192.18.patch, HIVE-12192.19.patch, HIVE-12192.20.patch, 
> HIVE-12192.21.patch, HIVE-12192.22.patch, HIVE-12192.23.patch, 
> HIVE-12192.24.patch, HIVE-12192.25.patch, HIVE-12192.26.patch, 
> HIVE-12192.27.patch, HIVE-12192.28.patch, HIVE-12192.patch
>
>
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the 
> SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use 
> {{Timestamp#getYear()}} and similar methods to implement SQL functions like 
> {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles 
> that alternates between PST and PDT, there are times that cannot be 
> represented because the effective zone skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a 
> java.sql.Timestamp avoids this bug, while still returning correct values for 
> {{getYear}} etc. Using UTC as the convenience representation (timestamp 
> without time zone has no real zone) would make timestamp calculations more 
> consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the 
> result is with respect to ["the default timezone and default 
> locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
>  That function would need to be updated to use the 
> {{System.getProperty("user.timezone")}} zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21336) HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char

2019-03-05 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21336:
-
Attachment: HIVE-21336.2.patch
Status: Patch Available  (was: Open)

Also including a fix for the upgrade script when the COLUMN_NAME went from 128 
to 767/1000. 

> HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char
> --
>
> Key: HIVE-21336
> URL: https://issues.apache.org/jira/browse/HIVE-21336
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Attachments: HIVE-21336.2.patch, HIVE-21336.patch
>
>
> CREATE INDEX PCS_STATS_IDX ON PAR T_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> Error: ORA-01450: maximum key length (6398) exceeded (state=72000,code=1450) 
> Customer tried the same DDL in SQLDevloper, and got the same error. This 
> could be a result of combination of DB level settings like the db_block_size, 
> limiting the maximum key length, as per below doc: 
> http://www.dba-oracle.com/t_ora_01450_maximum_key_length_exceeded.htm 
> Also {{NLS_LENGTH_SEMANTICS}} is by default BYTE, but users can set this at 
> the session level to CHAR, thus reducing the max size of the index length. We 
> have increased the size of the COLUMN_NAME from 128 to 767 (used to be at 
> 1000) and TABLE_NAME from 128 to 256. This by setting 
> {code} 
> CREATE TABLE PART_COL_STATS ( 
> CS_ID NUMBER NOT NULL, 
> DB_NAME VARCHAR2(128) NOT NULL, 
> TABLE_NAME VARCHAR2(256) NOT NULL, 
> PARTITION_NAME VARCHAR2(767) NOT NULL, 
> COLUMN_NAME VARCHAR2(767) NOT NULL,  
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> {code} 
> Reproducer: 
> {code} 
> SQL*Plus: Release 11.2.0.2.0 Production on Wed Feb 27 11:02:16 2019 Copyright 
> (c) 1982, 2011, Oracle. All rights reserved. 
> Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit 
> Production 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> BYTE 
> SQL> alter session set NLS_LENGTH_SEMANTICS=CHAR; Session altered. 
> SQL> commit; Commit complete. 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> CHAR 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> * ERROR at line 1: ORA-01450: maximum key length (6398) exceeded 
> SQL> alter session set NLS_LENGTH_SEMANTICS=BYTE; 
> Session altered. 
> SQL> commit; 
> Commit complete. 
> SQL> drop table PART_COL_STATS; 
> Table dropped. 
> SQL> commit; 
> Commit complete. 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> Index created. 
> SQL> commit; 
> Commit complete. 
> SQL> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21336) HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char

2019-03-05 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-21336:
-
Status: Open  (was: Patch Available)

> HMS Index PCS_STATS_IDX too long for Oracle when NLS_LENGTH_SEMANTICS=char
> --
>
> Key: HIVE-21336
> URL: https://issues.apache.org/jira/browse/HIVE-21336
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
> Attachments: HIVE-21336.patch
>
>
> CREATE INDEX PCS_STATS_IDX ON PAR T_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> Error: ORA-01450: maximum key length (6398) exceeded (state=72000,code=1450) 
> Customer tried the same DDL in SQLDevloper, and got the same error. This 
> could be a result of combination of DB level settings like the db_block_size, 
> limiting the maximum key length, as per below doc: 
> http://www.dba-oracle.com/t_ora_01450_maximum_key_length_exceeded.htm 
> Also {{NLS_LENGTH_SEMANTICS}} is by default BYTE, but users can set this at 
> the session level to CHAR, thus reducing the max size of the index length. We 
> have increased the size of the COLUMN_NAME from 128 to 767 (used to be at 
> 1000) and TABLE_NAME from 128 to 256. This by setting 
> {code} 
> CREATE TABLE PART_COL_STATS ( 
> CS_ID NUMBER NOT NULL, 
> DB_NAME VARCHAR2(128) NOT NULL, 
> TABLE_NAME VARCHAR2(256) NOT NULL, 
> PARTITION_NAME VARCHAR2(767) NOT NULL, 
> COLUMN_NAME VARCHAR2(767) NOT NULL,  
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> {code} 
> Reproducer: 
> {code} 
> SQL*Plus: Release 11.2.0.2.0 Production on Wed Feb 27 11:02:16 2019 Copyright 
> (c) 1982, 2011, Oracle. All rights reserved. 
> Connected to: Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit 
> Production 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> BYTE 
> SQL> alter session set NLS_LENGTH_SEMANTICS=CHAR; Session altered. 
> SQL> commit; Commit complete. 
> SQL> select * from v$nls_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; 
> PARAMETER 
>  
> VALUE 
>  
> NLS_LENGTH_SEMANTICS 
> CHAR 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) 
> * ERROR at line 1: ORA-01450: maximum key length (6398) exceeded 
> SQL> alter session set NLS_LENGTH_SEMANTICS=BYTE; 
> Session altered. 
> SQL> commit; 
> Commit complete. 
> SQL> drop table PART_COL_STATS; 
> Table dropped. 
> SQL> commit; 
> Commit complete. 
> SQL> CREATE TABLE PART_COL_STATS (CS_ID NUMBER NOT NULL, DB_NAME 
> VARCHAR2(128) NOT NULL, TABLE_NAME VARCHAR2(256) NOT NULL, PARTITION_NAME 
> VARCHAR2(767) NOT NULL, COLUMN_NAME VARCHAR2(767) NOT NULL); 
> Table created. 
> SQL> CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
> (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME); 
> Index created. 
> SQL> commit; 
> Commit complete. 
> SQL> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20007) Hive should carry out timestamp computations in UTC

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-20007:
---
Docs Text: 
https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types

> Hive should carry out timestamp computations in UTC
> ---
>
> Key: HIVE-20007
> URL: https://issues.apache.org/jira/browse/HIVE-20007
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Ryan Blue
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
>  Labels: timestamp
> Fix For: 4.0.0
>
> Attachments: HIVE-20007.patch
>
>
> Hive currently uses the "local" time of a java.sql.Timestamp to represent the 
> SQL data type TIMESTAMP WITHOUT TIME ZONE. The purpose is to be able to use 
> {{Timestamp#getYear()}} and similar methods to implement SQL functions like 
> {{year}}.
> When the SQL session's time zone is a DST zone, such as America/Los_Angeles 
> that alternates between PST and PDT, there are times that cannot be 
> represented because the effective zone skips them.
> {code}
> hive> select TIMESTAMP '2015-03-08 02:10:00.101';
> 2015-03-08 03:10:00.101
> {code}
> Using UTC instead of the SQL session time zone as the underlying zone for a 
> java.sql.Timestamp avoids this bug, while still returning correct values for 
> {{getYear}} etc. Using UTC as the convenience representation (timestamp 
> without time zone has no real zone) would make timestamp calculations more 
> consistent and avoid similar problems in the future.
> Notably, this would break the {{unix_timestamp}} UDF that specifies the 
> result is with respect to ["the default timezone and default 
> locale"|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions].
>  That function would need to be updated to use the 
> {{System.getProperty("user.timezone")}} zone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21376:
---
Target Version/s: 4.0.0, 3.2.0, 3.1.2  (was: 3.0.1, 4.0.0, 3.2.0, 3.1.2)

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.01.patch, HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21376) Incompatible change in Hive bucket computation

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-21376:
---
Attachment: HIVE-21376.01.patch

> Incompatible change in Hive bucket computation
> --
>
> Key: HIVE-21376
> URL: https://issues.apache.org/jira/browse/HIVE-21376
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: David Phillips
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-21376.01.patch, HIVE-21376.patch
>
>
> HIVE-20007 seems to have inadvertently changed the bucket hash code 
> computation via {{ObjectInspectorUtils.getBucketHashCodeOld()}} for the 
> {{DATE}} and {{TIMESTAMP}} data type2.
> {{DATE}} was previously computed using {{DateWritable}}, which uses 
> {{daysSinceEpoch}} as the hash code. It is now computed using 
> {{DateWritableV2}}, which uses the hash code of {{java.time.LocalDate}} 
> (which is not days since epoch).
> {{TIMESTAMP}} was previous computed using {{TimestampWritable}} and now uses 
> {{TimestampWritableV2}}. They ostensibly use the same hash code computation, 
> but there are two important differences:
>  # {{TimestampWritable}} rounds the number of milliseconds into the seconds 
> portion of the computation, but {{TimestampWritableV2}} does not.
>  # {{TimestampWritable}} gets the epoch time from {{java.sql.Timestamp}}, 
> which returns it relative to the JVM time zone, not UTC. 
> {{TimestampWritableV2}} uses a {{LocalDateTime}} relative to UTC.
> I was unable to get Hive 3.1 running in order to verify if this actually 
> causes data to be read or written incorrectly (there may be code above this 
> library method which makes things work correctly). However, if my 
> understanding is correct, this means Hive 3.1 is both forwards and backwards 
> incompatible with bucketed tables using either of these data types. It also 
> indicates that Hive needs tests to verify that the hash code does not change 
> between releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16924) Support distinct in presence of Group By

2019-03-05 Thread Miklos Gergely (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784647#comment-16784647
 ] 

Miklos Gergely commented on HIVE-16924:
---

Created HIVE-21394 for the issue that [~kgyrtkirk] mentioned above.

> Support distinct in presence of Group By 
> -
>
> Key: HIVE-16924
> URL: https://issues.apache.org/jira/browse/HIVE-16924
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Planning
>Reporter: Carter Shanklin
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-16924.01.patch, HIVE-16924.02.patch, 
> HIVE-16924.03.patch, HIVE-16924.04.patch, HIVE-16924.05.patch, 
> HIVE-16924.06.patch, HIVE-16924.07.patch, HIVE-16924.08.patch, 
> HIVE-16924.09.patch, HIVE-16924.10.patch, HIVE-16924.11.patch, 
> HIVE-16924.12.patch, HIVE-16924.13.patch, HIVE-16924.14.patch, 
> HIVE-16924.15.patch, HIVE-16924.16.patch
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> {code:sql}
> create table e011_01 (c1 int, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> {code}
> These queries should work:
> {code:sql}
> select distinct c1, count(*) from e011_01 group by c1;
> select distinct c1, avg(c2) from e011_01 group by c1;
> {code}
> Currently, you get : 
> FAILED: SemanticException 1:52 SELECT DISTINCT and GROUP BY can not be in the 
> same query. Error encountered near token 'c1'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21340) CBO: Prune non-key columns feeding into a SemiJoin

2019-03-05 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784633#comment-16784633
 ] 

Jesus Camacho Rodriguez commented on HIVE-21340:


Rb link did not work (permission denied), but I went through the patch and LGTM.

+1

[~vgarg], please create the follow-up to avoid using HepVertex, that will also 
help to move this change to Calcite eventually.

> CBO: Prune non-key columns feeding into a SemiJoin
> --
>
> Key: HIVE-21340
> URL: https://issues.apache.org/jira/browse/HIVE-21340
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Affects Versions: 4.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>Priority: Major
> Attachments: HIVE-21340.1.patch, HIVE-21340.2.patch, 
> HIVE-21340.3.patch
>
>
> {code}
> explain cbo 
> with ss as 
> (select count(1), ss_item_sk, ss_ticket_number from 
> store_sales group by ss_item_sk, ss_ticket_number 
> having count(1) > 1) 
> select count(1) from item where i_item_sk IN (select ss_item_sk from ss);
> {code}
> Notice the {{HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])}} 
> Only ss_item_sk is relevant for the HiveSemiJoin
> {code}
> CBO PLAN:
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveSemiJoin(condition=[=($0, $1)], joinType=[inner])
> HiveProject(i_item_sk=[$0])
>   HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, item]], 
> table:alias=[item])
> HiveProject(ss_item_sk=[$0], ss_ticket_number=[$1], $f2=[$2])
>   HiveFilter(condition=[>($2, 1)])
> HiveAggregate(group=[{1, 8}], agg#0=[count()])
>   HiveFilter(condition=[IS NOT NULL($1)])
> HiveTableScan(table=[[tpcds_copy_orc_partitioned_1, 
> store_sales]], table:alias=[store_sales])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20568) There is no need to convert the dbname to pattern while pulling tablemeta

2019-03-05 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784649#comment-16784649
 ] 

Clément Stenac commented on HIVE-20568:
---

Hi,

If I am not mistaken, this fix is not only a minor enhancement but a real bug 
fix that impacts ability to enumerate tables across schemas.

We have discovered that in Hortonworks HDP 3.1 (which ships with Hive 3.1.0), 
when Ranger authorization is in effect, the "getTables" call of the CLIService 
(accessed through the JDBC driver) will fail to return any table matching the 
search pattern if the database name contains underscores.

For example, getTables(null, null, "%", null) which should return tables from 
the whole metastore, will not return the ones from databases with underscores.

The reason is that when the Ranger authorizer is called, the dbName in the 
HivePrivObject has been mangled from my_db_with_underscores to 
my.db.with.underscores, and the authorization rule will fail to match.

My understanding is that this was introduced as part of HIVE-19432, when we 
started converting to pattern the dbName *after* retrieving it from the 
metastore.

The next call to the metastoreClient will contain the mangled name. The call to 
the metastore itself will succeed, because the pattern matches, but the 
HiveMetastoreClient will also call FilterUtils.filterTableNamesIfEnabled with 
the wrongful dbName, hence the authorizer will filter everything out.

Is our interpretation correct ?

Thanks,

> There is no need to convert the dbname to pattern while pulling tablemeta
> -
>
> Key: HIVE-20568
> URL: https://issues.apache.org/jira/browse/HIVE-20568
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 0.4.0
> Environment: Hive-4,Java-8
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-20568.patch
>
>
> there is no need to convert the dbname to pattern, dbNamePattern is just a 
> dbName which we are passing to getTableMeta
> https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java#L117



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21283) Create Synonym mid for substr, position for locate

2019-03-05 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784646#comment-16784646
 ] 

Hive QA commented on HIVE-21283:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12961153/HIVE.21283.03.PATCH

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16344/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16344/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16344/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-03-05 16:52:24.327
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-16344/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-03-05 16:52:24.331
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   fc3eefa..3113f89  master -> origin/master
+ git reset --hard HEAD
HEAD is now at fc3eefa HIVE-21312: FSStatsAggregator::connect is slow (Rajesh 
Balamohan, reviewed by Zoltan Haindrich)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 3113f89 HIVE-21384: Upgrade to dbcp2 in JDBC storage handler 
(Jesus Camacho Rodriguez, reviewed by Daniel Dai)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-03-05 16:52:25.922
+ rm -rf ../yetus_PreCommit-HIVE-Build-16344
+ mkdir ../yetus_PreCommit-HIVE-Build-16344
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-16344
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-16344/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: git apply -p0
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
protoc-jar: executing: [/tmp/protoc5967532920056350080.exe, --version]
libprotoc 2.5.0
protoc-jar: executing: [/tmp/protoc5967532920056350080.exe, 
-I/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore,
 
--java_out=/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/target/generated-sources,
 
/data/hiveptest/working/apache-github-source-source/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/metastore.proto]
ANTLR Parser Generator  Version 3.5.2
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) on project hive-pre-upgrade: Execution 
process-resource-bundles of goal 
org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process failed. 
ConcurrentModificationException -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-pre-upgrade
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-16344
+ exit 1
'
{noformat}

This message is automatically 

[jira] [Updated] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Attachment: (was: HIVE-21286.02.patch)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Attachment: HIVE-21286.02.patch

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21286) Hive should support clean-up of previously bootstrapped tables when retry from different dump.

2019-03-05 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-21286:

Status: Patch Available  (was: Open)

> Hive should support clean-up of previously bootstrapped tables when retry 
> from different dump.
> --
>
> Key: HIVE-21286
> URL: https://issues.apache.org/jira/browse/HIVE-21286
> Project: Hive
>  Issue Type: Bug
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: DR, Replication, pull-request-available
> Attachments: HIVE-21286.01.patch, HIVE-21286.02.patch
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> If external tables are enabled for replication on an existing repl policy, 
> then bootstrapping of external tables are combined with incremental dump.
> If incremental bootstrap load fails with non-retryable error for which user 
> will have to manually drop all the external tables before trying with another 
> bootstrap dump. For full bootstrap, to retry with different dump, we 
> suggested user to drop the DB but in this case they need to manually drop all 
> the external tables which is not so user friendly. So, need to handle it in 
> Hive side as follows.
> REPL LOAD takes additional config (passed by user in WITH clause) that says, 
> drop all the tables which are bootstrapped from previous dump. 
> hive.repl.clean.tables.from.bootstrap=
> Hive will use this config only if the current dump is bootstrap dump or 
> combined bootstrap in incremental dump.
> Caution to be taken by user that this config should not be passed if previous 
> REPL LOAD (with bootstrap) was successful or any successful incremental 
> dump+load happened after "previous_bootstrap_dump_dir".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >