[jira] [Commented] (HIVE-16511) CBO looses inner casts on constants of complex type

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190843#comment-16190843
 ] 

Hive QA commented on HIVE-16511:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890248/HIVE-16511.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11199 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7107/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7107/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12890248 - PreCommit-HIVE-Build

> CBO looses inner casts on constants of complex type
> ---
>
> Key: HIVE-16511
> URL: https://issues.apache.org/jira/browse/HIVE-16511
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Vineet Garg
> Attachments: HIVE-16511.1.patch, HIVE-16511.2.patch
>
>
> type for map <10, cast(null as int)> becomes map 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16674) Hive metastore JVM dumps core

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190791#comment-16190791
 ] 

Hive QA commented on HIVE-16674:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12869451/HIVE-16674.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7106/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7106/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7106/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-10-04 04:30:22.219
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-7106/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-10-04 04:30:22.222
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   073e847..d376157  master -> origin/master
   6401597..14c9482  hive-14535 -> origin/hive-14535
+ git reset --hard HEAD
HEAD is now at 073e847 HIVE-17432: Enable join and aggregate materialized view 
rewriting (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git clean -f -d
Removing standalone-metastore/src/gen/org/
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at d376157 HIVE-17682: Vectorization: IF stmt produces wrong 
results (Matt McCline, reviewed by Gopal Vijayaraghavan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-10-04 04:30:23.794
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java: 
No such file or directory
error: a/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java: 
No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12869451 - PreCommit-HIVE-Build

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 3.0.0
>
> Attachments: HIVE-16674.1.patch, HIVE-16674.patch
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> 

[jira] [Commented] (HIVE-17098) Race condition in Hbase tables

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190790#comment-16190790
 ] 

Hive QA commented on HIVE-17098:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12877546/HIVE-17098.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11197 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query39] 
(batchId=242)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7105/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7105/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7105/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12877546 - PreCommit-HIVE-Build

> Race condition in Hbase tables
> --
>
> Key: HIVE-17098
> URL: https://issues.apache.org/jira/browse/HIVE-17098
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.1
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17098.1.patch
>
>
> These steps simulate our customer production env.
> *STEP 1. Create test tables*
> {code}
> CREATE TABLE for_loading(
>   key int, 
>   value string,
>   age int,
>   salary decimal (10,2)
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> {code}
> {code}
> CREATE TABLE test_1(
>   key int, 
>   value string,
>   age int,
>   salary decimal (10,2)
> )
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ( 
>   'hbase.columns.mapping'=':key, cf1:value, cf1:age, cf1:salary', 
>   'serialization.format'='1')
> TBLPROPERTIES (
>   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
>   'hbase.table.name'='test_1', 
>   'numFiles'='0', 
>   'numRows'='0', 
>   'rawDataSize'='0', 
>   'totalSize'='0', 
>   'transient_lastDdlTime'='1495769316');
> {code}
> {code}
> CREATE TABLE test_2(
>   key int, 
>   value string,
>   age int,
>   salary decimal (10,2)
> )
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ( 
>   'hbase.columns.mapping'=':key, cf1:value, cf1:age, cf1:salary', 
>   'serialization.format'='1')
> TBLPROPERTIES (
>   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
>   'hbase.table.name'='test_2', 
>   'numFiles'='0', 
>   'numRows'='0', 
>   'rawDataSize'='0', 
>   'totalSize'='0', 
>   'transient_lastDdlTime'='1495769316');
> {code}
> *STEP 2. Create test data*
> {code}
> import java.io.IOException;
> import java.math.BigDecimal;
> import java.nio.charset.Charset;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import java.nio.file.StandardOpenOption;
> import java.util.ArrayList;
> import java.util.Arrays;
> import java.util.List;
> import java.util.Random;
> import static java.lang.String.format;
> public class Generator {
> private static List lines = new ArrayList<>();
> private static List name = Arrays.asList("Brian", "John", 
> "Rodger", "Max", "Freddie", "Albert", "Fedor", "Lev", "Niccolo");
> private static List salary = new ArrayList<>();
> public static void main(String[] args) {
> generateData(Integer.parseInt(args[0]), args[1]);
> }
> public static void generateData(int rowNumber, String file) {
> double maxValue = 2.55;
> double minValue = 1000.03;
> Random random = new Random();
> for (int i = 1; i <= rowNumber; i++) {
> 

[jira] [Commented] (HIVE-17657) Does ExIm for MM tables work?

2017-10-03 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190780#comment-16190780
 ] 

Eugene Koifman commented on HIVE-17657:
---


does this test have any aborted transactions?
Is the data that belongs to aborted txns added to archive?

does it try to import into a different cluster where transaction id sequence is 
different, where in the original cluster txn 5 was committed a the time of 
export but in new cluster txn 5 is aborted?



> Does ExIm for MM tables work?
> -
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17628) always use fully qualified path for tables/partitions/etc.

2017-10-03 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190772#comment-16190772
 ] 

Lefty Leverenz commented on HIVE-17628:
---

How about both CREATE and ALTER?

* [Create Table | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable]
 -- add to notes after the syntax
* [Alter Table/Partition/Column | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column]
** [Add Partitions | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AddPartitions]
** [Alter Table/Partition Location | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionLocation]

What about CREATE and ALTER DATABASE?

* [Create/Drop/Alter/Use Database | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter/UseDatabase]

There may be more DDL sections where this should be mentioned.

Does this also apply to LOAD and INSERT?  IMPORT/EXPORT?  SELECT?

* [DML (Load and Insert) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-HiveDataManipulationLanguage]
* [Import/Export | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport]
* [Select Syntax | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax]

> always use fully qualified path for tables/partitions/etc.
> --
>
> Key: HIVE-17628
> URL: https://issues.apache.org/jira/browse/HIVE-17628
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-17628.01.patch, HIVE-17628.patch
>
>
> # Different services, or the same one at different times, may have different 
> default FS, so it doesn't make sense to persist a non-qualified path.
> # The logic to detect whether we are using default FS or not is anyway rather 
> questionable e.g. it will run if the setting is set to the same value as the 
> default fs, as long as it's set.
> # In fact might be more expensive that just making the path qualified as it 
> iterates thru all the properties, including the ones added from 
> getConfVarInputStream.
> # It also hits HADOOP-13500.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17608) REPL LOAD should overwrite the data files if exists instead of duplicating it

2017-10-03 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17608:

Status: Patch Available  (was: Open)

> REPL LOAD should overwrite the data files if exists instead of duplicating it
> -
>
> Key: HIVE-17608
> URL: https://issues.apache.org/jira/browse/HIVE-17608
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17608.01.patch, HIVE-17608.02.patch
>
>
> This is to make insert event idempotent.
> Currently, MoveTask would create a new file if the destination folder 
> contains a file of the same name. This is wrong if we have the same file in 
> both bootstrap dump and incremental dump (by design, duplicate file in 
> incremental dump will be ignored for idempotent reason), we will get 
> duplicate files eventually. Also it is wrong to just retain the filename in 
> the staging folder. Suppose we get the same insert event twice, the first 
> time we get the file from source table folder, the second time we get the 
> file from cm, we still end up with duplicate copy. The right solution is to 
> keep the same file name as the source table folder.
> To do that, we can put the original filename in MoveWork, and in MoveTask, if 
> original filename is set, don't generate a new name, simply overwrite. We 
> need to do it in both bootstrap and incremental load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17608) REPL LOAD should overwrite the data files if exists instead of duplicating it

2017-10-03 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17608:

Attachment: HIVE-17608.02.patch

Added 02.patch to fix test failures.

> REPL LOAD should overwrite the data files if exists instead of duplicating it
> -
>
> Key: HIVE-17608
> URL: https://issues.apache.org/jira/browse/HIVE-17608
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17608.01.patch, HIVE-17608.02.patch
>
>
> This is to make insert event idempotent.
> Currently, MoveTask would create a new file if the destination folder 
> contains a file of the same name. This is wrong if we have the same file in 
> both bootstrap dump and incremental dump (by design, duplicate file in 
> incremental dump will be ignored for idempotent reason), we will get 
> duplicate files eventually. Also it is wrong to just retain the filename in 
> the staging folder. Suppose we get the same insert event twice, the first 
> time we get the file from source table folder, the second time we get the 
> file from cm, we still end up with duplicate copy. The right solution is to 
> keep the same file name as the source table folder.
> To do that, we can put the original filename in MoveWork, and in MoveTask, if 
> original filename is set, don't generate a new name, simply overwrite. We 
> need to do it in both bootstrap and incremental load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17608) REPL LOAD should overwrite the data files if exists instead of duplicating it

2017-10-03 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17608:

Status: Open  (was: Patch Available)

> REPL LOAD should overwrite the data files if exists instead of duplicating it
> -
>
> Key: HIVE-17608
> URL: https://issues.apache.org/jira/browse/HIVE-17608
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, pull-request-available, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17608.01.patch
>
>
> This is to make insert event idempotent.
> Currently, MoveTask would create a new file if the destination folder 
> contains a file of the same name. This is wrong if we have the same file in 
> both bootstrap dump and incremental dump (by design, duplicate file in 
> incremental dump will be ignored for idempotent reason), we will get 
> duplicate files eventually. Also it is wrong to just retain the filename in 
> the staging folder. Suppose we get the same insert event twice, the first 
> time we get the file from source table folder, the second time we get the 
> file from cm, we still end up with duplicate copy. The right solution is to 
> keep the same file name as the source table folder.
> To do that, we can put the original filename in MoveWork, and in MoveTask, if 
> original filename is set, don't generate a new name, simply overwrite. We 
> need to do it in both bootstrap and incremental load.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17657) Does ExIm for MM tables work?

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190760#comment-16190760
 ] 

Sergey Shelukhin edited comment on HIVE-17657 at 10/4/17 3:55 AM:
--

I don't understand what the JIRA is about :) If we have an exim test and it 
produces correct results and the table is MM, what would we want to verify in 
addition? Is the JIRA to extend the test?


was (Author: sershe):
I don't understand what the JIRA is about :) If we have an exim test and it 
produces correct results and the table is MM, what would we want to verify in 
addition?

> Does ExIm for MM tables work?
> -
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17657) Does ExIm for MM tables work?

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190760#comment-16190760
 ] 

Sergey Shelukhin commented on HIVE-17657:
-

I don't understand what the JIRA is about :) If we have an exim test and it 
produces correct results and the table is MM, what would we want to verify in 
addition?

> Does ExIm for MM tables work?
> -
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17657) Does ExIm for MM tables work?

2017-10-03 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190759#comment-16190759
 ] 

Eugene Koifman commented on HIVE-17657:
---

I don't get your comment

> Does ExIm for MM tables work?
> -
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17613) remove object pools for short, same-thread allocations

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17613:

Attachment: HIVE-17613.01.patch

rebase

> remove object pools for short, same-thread allocations
> --
>
> Key: HIVE-17613
> URL: https://issues.apache.org/jira/browse/HIVE-17613
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17613.01.patch, HIVE-17613.patch, HIVE-17613.patch
>
>
> Objects pools probably don't have much effect in this case.
> They are only useful when allocations are shared between threads.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15212) merge branch into master

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15212:

Attachment: HIVE-15212.19.patch

> merge branch into master
> 
>
> Key: HIVE-15212
> URL: https://issues.apache.org/jira/browse/HIVE-15212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15212.01.patch, HIVE-15212.02.patch, 
> HIVE-15212.03.patch, HIVE-15212.04.patch, HIVE-15212.05.patch, 
> HIVE-15212.06.patch, HIVE-15212.07.patch, HIVE-15212.08.patch, 
> HIVE-15212.09.patch, HIVE-15212.10.patch, HIVE-15212.11.patch, 
> HIVE-15212.12.patch, HIVE-15212.12.patch, HIVE-15212.13.patch, 
> HIVE-15212.13.patch, HIVE-15212.14.patch, HIVE-15212.15.patch, 
> HIVE-15212.16.patch, HIVE-15212.17.patch, HIVE-15212.18.patch, 
> HIVE-15212.19.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17674) grep TODO HIVE-15212.17.patch |wc - l = 49

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17674:

Fix Version/s: hive-14535

> grep TODO HIVE-15212.17.patch |wc - l = 49
> --
>
> Key: HIVE-17674
> URL: https://issues.apache.org/jira/browse/HIVE-17674
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
>
> grep TODO HIVE-15212.17.patch 
> +  // TODO The following two utility methods can be moved to AcidUtils once 
> no class in metastore is relying on them,
> +  // TODO: CTAS for MM may currently be broken. It used to work. See the old 
> code and why isCtas isn't used?
> +// TODO: Setting these separately is a very hairy issue in certain 
> combinations, since we
> +// TODO: this doesn't handle list bucketing properly. Does the original 
> exim do that?
> +  // TODO: support this? we only bail because it's a PITA and hardly 
> anyone seems to care.
> +// TODO: support this?
> +// TODO: support this?
> +  // TODO: why is it stored in both?
> +  // TODO: this doesn't work... the path for writer and reader mismatch
> -// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +  // TODO: this particular bit will not work for MM tables, as there can 
> be multiple
> +// TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: do we also need to do this when creating an empty partition 
> from select?
> +// TODO: see HIVE-14886 - removeTempOrDuplicateFiles is broken for list 
> bucketing,
> +  // TODO: for IOW, we also need to count in base dir, if any
> +  // TODO: HiveFileFormatUtils.getPartitionDescFromPathRecursively for 
> MM tables?
> +// TODO: not having aliases for path usually means some bug. Should 
> it give up?
>  updateMrWork(jobConf);  // TODO: refactor this in HIVE-6366
> +  // TODO: this assumes both paths are qualified; which they are, 
> currently.
> +// TODO: just like the move path, we only do one level of recursion.
> +  // TODO: may be broken? no LB bugs for now but if any are found.
> +  String lbDirSuffix = lbDirName.replace(partDirName, ""); // TODO: wtf?
> +// TODO: we should really probably throw. Keep the existing logic 
> for now.
> +// TODO: this assumes both paths are qualified; which they are, 
> currently.
> +// TODO: we assume lbLevels is 0 here. Same as old code for non-MM.
> +  // TODO: we could just find directories with any MM directories 
> inside?
>// TODO: add test case and clean it up
> +  // Duplicates logic in TextMetaDataFormatter TODO: wtf?!!
> +  // TODO: why is this in text formatter? grrr
> +  // TODO# noop MoveWork to avoid q file changes in HIVE-14990. Remove 
> (w/the flag) after merge.
> +// TODO: this should never happen for mm tables.
> +// TODO: if we are not dealing with concatenate DDL, we should not 
> create a merge+move path
> +// TODO: wtf? wtf?!! why is this in this method?
> +// TODO: this relies a lot on having one file per bucket. No support for 
> MM tables for now.
> +// TODO: should this use getPartitionDescFromPathRecursively?
> +  // TODO: we should probably block all ACID tables here.
> +  @SuppressWarnings("unused") // TODO: wtf?
> +// TODO: due to the master merge, tblDesc is no longer CreateTableDesc, 
> but ImportTableDesc
> +// TODO: ReplCopyTask is completely screwed. Need to support when 
> it's not as screwed.
> +  // TODO: ReplCopyTask is completely screwed. Need to support when 
> it's not as screwed.
>// TODO: the below seems like they should just be combined into 
> partitionDesc
> +// TODO: could we instead get FS from path here and add normal files for 
> every UGI?
> +-- TODO: doesn't work truncate table part_mm partition(key_mm=455);
> +-- TODO: need to include merge+union+DP, but it's broken for now
> +  // TODO: move these test parameters to more specific places... there's no 
> need to have them here



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190734#comment-16190734
 ] 

Hive QA commented on HIVE-17669:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890238/HIVE-17699.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11197 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=240)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
org.apache.hadoop.hive.ql.parse.TestExportImport.dataImportAfterMetadataOnlyImport
 (batchId=220)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7104/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7104/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7104/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12890238 - PreCommit-HIVE-Build

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190715#comment-16190715
 ] 

Matt McCline commented on HIVE-17682:
-

Committed to master.

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17682:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190711#comment-16190711
 ] 

Matt McCline commented on HIVE-17682:
-

No test related failures.

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17566) Create schema required for workload management.

2017-10-03 Thread Harish Jaiprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Jaiprakash updated HIVE-17566:
-
Attachment: HIVE-17566.03.patch

Fixed syntax error in hive-schema.

> Create schema required for workload management.
> ---
>
> Key: HIVE-17566
> URL: https://issues.apache.org/jira/browse/HIVE-17566
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Harish Jaiprakash
>Assignee: Harish Jaiprakash
> Attachments: HIVE-17566.01.patch, HIVE-17566.02.patch, 
> HIVE-17566.03.patch
>
>
> Schema + model changes required for workload management.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17658) Bucketed/Sorted tables - SMB join

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190702#comment-16190702
 ] 

Sergey Shelukhin commented on HIVE-17658:
-

currently it will at the very least fail as bad as HIVE-17675 :)

> Bucketed/Sorted tables - SMB join
> -
>
> Key: HIVE-17658
> URL: https://issues.apache.org/jira/browse/HIVE-17658
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> How does this handle tables that are bucketed + sorted?
> insert into T values(1,2),(5,6); creates something like delta_2_2/bucket_1
> insert into T values(3,4),(7,8) creates delta_3_3/bucket_1
> the expectation for any reader would be to see some contiguous subset of 
> (1,2),(3,4),(5,6),(7,8)
> but this would require a special reader which I don't see
> In particular it's not clear how SMB join can work
> This looks like a general problem:
> For plain Hive table, if you do 2 inserts, and the 1st one creates 0_0, 
> then 2nd one will create 0_0_copy_1.
> There is nothing merge these files at query time to produce a single sort 
> order (like Acid reader in full acid tables)
> It should at least throw in this case.
> Current "CONCATENATE" doesn't support bucketed or sorted tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17657) Does ExIm for MM tables work?

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190700#comment-16190700
 ] 

Sergey Shelukhin commented on HIVE-17657:
-

selecting from the tables verifies that they contain the same data... wouldn't 
the structure be an implementation detail?

> Does ExIm for MM tables work?
> -
>
> Key: HIVE-17657
> URL: https://issues.apache.org/jira/browse/HIVE-17657
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17685) AcidUtils.parseString(String propertiesStr) - missing "break"

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17685:

Fix Version/s: hive-14535

> AcidUtils.parseString(String propertiesStr) - missing "break"
> -
>
> Key: HIVE-17685
> URL: https://issues.apache.org/jira/browse/HIVE-17685
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
> Fix For: hive-14535
>
>
> {noformat}
>   case INSERT_ONLY_STRING:
> obj.setInsertOnly(true);
>   default:
> throw new IllegalArgumentException(
> "Unexpected value " + option + " for ACID operational 
> properties!");
> {noformat}
> in fact _case INSERT_ONLY_STRING_ is not needed at all



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17674) grep TODO HIVE-15212.17.patch |wc - l = 49

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-17674.
-
Resolution: Fixed

> grep TODO HIVE-15212.17.patch |wc - l = 49
> --
>
> Key: HIVE-17674
> URL: https://issues.apache.org/jira/browse/HIVE-17674
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> grep TODO HIVE-15212.17.patch 
> +  // TODO The following two utility methods can be moved to AcidUtils once 
> no class in metastore is relying on them,
> +  // TODO: CTAS for MM may currently be broken. It used to work. See the old 
> code and why isCtas isn't used?
> +// TODO: Setting these separately is a very hairy issue in certain 
> combinations, since we
> +// TODO: this doesn't handle list bucketing properly. Does the original 
> exim do that?
> +  // TODO: support this? we only bail because it's a PITA and hardly 
> anyone seems to care.
> +// TODO: support this?
> +// TODO: support this?
> +  // TODO: why is it stored in both?
> +  // TODO: this doesn't work... the path for writer and reader mismatch
> -// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +  // TODO: this particular bit will not work for MM tables, as there can 
> be multiple
> +// TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: do we also need to do this when creating an empty partition 
> from select?
> +// TODO: see HIVE-14886 - removeTempOrDuplicateFiles is broken for list 
> bucketing,
> +  // TODO: for IOW, we also need to count in base dir, if any
> +  // TODO: HiveFileFormatUtils.getPartitionDescFromPathRecursively for 
> MM tables?
> +// TODO: not having aliases for path usually means some bug. Should 
> it give up?
>  updateMrWork(jobConf);  // TODO: refactor this in HIVE-6366
> +  // TODO: this assumes both paths are qualified; which they are, 
> currently.
> +// TODO: just like the move path, we only do one level of recursion.
> +  // TODO: may be broken? no LB bugs for now but if any are found.
> +  String lbDirSuffix = lbDirName.replace(partDirName, ""); // TODO: wtf?
> +// TODO: we should really probably throw. Keep the existing logic 
> for now.
> +// TODO: this assumes both paths are qualified; which they are, 
> currently.
> +// TODO: we assume lbLevels is 0 here. Same as old code for non-MM.
> +  // TODO: we could just find directories with any MM directories 
> inside?
>// TODO: add test case and clean it up
> +  // Duplicates logic in TextMetaDataFormatter TODO: wtf?!!
> +  // TODO: why is this in text formatter? grrr
> +  // TODO# noop MoveWork to avoid q file changes in HIVE-14990. Remove 
> (w/the flag) after merge.
> +// TODO: this should never happen for mm tables.
> +// TODO: if we are not dealing with concatenate DDL, we should not 
> create a merge+move path
> +// TODO: wtf? wtf?!! why is this in this method?
> +// TODO: this relies a lot on having one file per bucket. No support for 
> MM tables for now.
> +// TODO: should this use getPartitionDescFromPathRecursively?
> +  // TODO: we should probably block all ACID tables here.
> +  @SuppressWarnings("unused") // TODO: wtf?
> +// TODO: due to the master merge, tblDesc is no longer CreateTableDesc, 
> but ImportTableDesc
> +// TODO: ReplCopyTask is completely screwed. Need to support when 
> it's not as screwed.
> +  // TODO: ReplCopyTask is completely screwed. Need to support when 
> it's not as screwed.
>// TODO: the below seems like they should just be combined into 
> partitionDesc
> +// TODO: could we instead get FS from path here and add normal files for 
> every UGI?
> +-- TODO: doesn't work truncate table part_mm partition(key_mm=455);
> +-- TODO: need to include merge+union+DP, but it's broken for now
> +  // TODO: move these test parameters to more specific places... there's no 
> need to have them here



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190693#comment-16190693
 ] 

Hive QA commented on HIVE-17682:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890231/HIVE-17682.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11198 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[0]
 (batchId=184)
org.apache.hive.hcatalog.pig.TestHCatLoaderComplexSchema.testTupleInBagInTupleInBag[1]
 (batchId=184)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDate (batchId=184)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7103/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7103/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7103/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12890231 - PreCommit-HIVE-Build

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17674) grep TODO HIVE-15212.17.patch |wc - l = 49

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190690#comment-16190690
 ] 

Sergey Shelukhin commented on HIVE-17674:
-

Removed some TODOs, fixed a couple, and marked others that are MM gaps as such.
Remaining TODOs relevant to after the merge (although those with a "?" are very 
low pri, or just need to be verified):
{noformat}
$ git diff master | grep TODO | grep "^+" | grep -E "(--|MM gap)"
+// TODO [MM gap?]: by design; no-one seems to use LB tables. They will 
work, but not convert.
+  // TODO [MM gap?]: this doesn't work, however this is MR only.
+  // TODO [MM gap]: CTAS may currently be broken. It used to work. See the old 
code, and why isCtas isn't used?
+  // TODO [MM gap]: for IOW, we also need to count in base dir, if any
+// TODO [MM gap?]: bad merge; tblDesc is no longer CreateTableDesc, but 
ImportTableDesc.
+-- TODO: doesn't work truncate table part_mm partition(key_mm=455);
+-- TODO: need to include merge+union+DP, but it's broken for now
{noformat}
In the following TODOs, a couple are from moved lines (i.e. they exist in 
master), and the rest are for existing code. In the process of working on MM 
tables, I saw all of the legacy code in Hive and I was horrified, but it 
doesn't mean I'm going to fix it all...
{noformat}
$ git diff master | grep TODO | grep "^+" | grep -v -E "(--|MM gap)"
+  // TODO: why is it stored in both table and dpCtx?
+// TODO: In a follow-up to HIVE-1361, we should refactor 
loadDynamicPartitions
+  // TODO: not clear why two if conditions are different. Preserve the 
existing logic for now.
+  // TODO: not clear why two if conditions are different. Preserve the 
existing logic for now.
+// TODO: see HIVE-14886 - removeTempOrDuplicateFiles is broken for list 
bucketing,
+// TODO: not having aliases for path usually means some bug. Should it 
give up?
+  String lbDirSuffix = lbDirName.replace(partDirName, ""); // TODO: should it 
rather do a prefix?
+// TODO: we should really probably throw. Keep the existing logic for 
now.
+  // TODO: why is this in text formatter?!!
+// TODO: if we are not dealing with concatenate DDL, we should not create 
a merge+move path
+// TODO: wtf?!! why is this in this method? This has nothing to do with 
anything.
+// TODO: this code is buggy - it relies on having one file per bucket; no 
MM support (by design).
+// TODO: should this use getPartitionDescFromPathRecursively? That's 
what other code uses.
+  // TODO: we should probably block all ACID tables here.
+  // TODO: the below seem like they should just be combined into partitionDesc
+// TODO: could we instead get FS from path here and add normal files for 
every UGI?
+  // TODO: move these test parameters to more specific places... there's no 
need to have them here
{noformat}

> grep TODO HIVE-15212.17.patch |wc - l = 49
> --
>
> Key: HIVE-17674
> URL: https://issues.apache.org/jira/browse/HIVE-17674
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> grep TODO HIVE-15212.17.patch 
> +  // TODO The following two utility methods can be moved to AcidUtils once 
> no class in metastore is relying on them,
> +  // TODO: CTAS for MM may currently be broken. It used to work. See the old 
> code and why isCtas isn't used?
> +// TODO: Setting these separately is a very hairy issue in certain 
> combinations, since we
> +// TODO: this doesn't handle list bucketing properly. Does the original 
> exim do that?
> +  // TODO: support this? we only bail because it's a PITA and hardly 
> anyone seems to care.
> +// TODO: support this?
> +// TODO: support this?
> +  // TODO: why is it stored in both?
> +  // TODO: this doesn't work... the path for writer and reader mismatch
> -// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +  // TODO: this particular bit will not work for MM tables, as there can 
> be multiple
> +// TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: do we also need to do this when creating an empty partition 
> from select?
> +// TODO: see HIVE-14886 - removeTempOrDuplicateFiles is broken for list 
> bucketing,
> +  // TODO: for 

[jira] [Commented] (HIVE-17629) CachedStore - wait for prewarm at use time, not init time

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190675#comment-16190675
 ] 

Sergey Shelukhin commented on HIVE-17629:
-

[~thejas] [~daijy] can you please take a look?

> CachedStore - wait for prewarm at use time, not init time
> -
>
> Key: HIVE-17629
> URL: https://issues.apache.org/jira/browse/HIVE-17629
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17629.patch
>
>
> Most of the changes are trivial, due to static changing to non-static. The 
> patch basically adds lifecycle management for shared cache, and uses it to 
> move prewarm to background thread, waiting for prewarm to happen at use time 
> not setConf, and to make init optional (if not called, CachedStore would 
> proxy the methods to rawStore, like it currently does in the ones not 
> implemented).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17685) AcidUtils.parseString(String propertiesStr) - missing "break"

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-17685.
-
Resolution: Fixed

Will commit to the branch shortly

> AcidUtils.parseString(String propertiesStr) - missing "break"
> -
>
> Key: HIVE-17685
> URL: https://issues.apache.org/jira/browse/HIVE-17685
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> {noformat}
>   case INSERT_ONLY_STRING:
> obj.setInsertOnly(true);
>   default:
> throw new IllegalArgumentException(
> "Unexpected value " + option + " for ACID operational 
> properties!");
> {noformat}
> in fact _case INSERT_ONLY_STRING_ is not needed at all



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17685) AcidUtils.parseString(String propertiesStr) - missing "break"

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-17685:
---

Assignee: Sergey Shelukhin  (was: Eugene Koifman)

> AcidUtils.parseString(String propertiesStr) - missing "break"
> -
>
> Key: HIVE-17685
> URL: https://issues.apache.org/jira/browse/HIVE-17685
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> {noformat}
>   case INSERT_ONLY_STRING:
> obj.setInsertOnly(true);
>   default:
> throw new IllegalArgumentException(
> "Unexpected value " + option + " for ACID operational 
> properties!");
> {noformat}
> in fact _case INSERT_ONLY_STRING_ is not needed at all



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17686) BucketingSortingOpProcFactory.FileSinkInferrer - throw/return?

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190664#comment-16190664
 ] 

Sergey Shelukhin edited comment on HIVE-17686 at 10/4/17 1:47 AM:
--

The comment referred to actually covers this; the existing code tries to infer 
buckets and discards the information if it cannot be done; so that is what this 
does also. This message is only intended as a warning in the logs.


was (Author: sershe):
The commented referred to actually covers this; the existing code tries to 
infer buckets and discards the information if it cannot be done; so that is 
what this does also. This message is only intended as a warning in the logs.

> BucketingSortingOpProcFactory.FileSinkInferrer - throw/return?
> --
>
> Key: HIVE-17686
> URL: https://issues.apache.org/jira/browse/HIVE-17686
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> {noformat}
>   if (fop.getConf().isMmTable()) {
> // See the comment inside updatePartitionBucketSortColumns.
> LOG.warn("Currently, inferring buckets is not going to work for MM 
> tables (by design).");
>   }
> {noformat}
> should this throw/return rather than just log?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17686) BucketingSortingOpProcFactory.FileSinkInferrer - throw/return?

2017-10-03 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-17686.
-
Resolution: Not A Problem

The commented referred to actually covers this; the existing code tries to 
infer buckets and discards the information if it cannot be done; so that is 
what this does also. This message is only intended as a warning in the logs.

> BucketingSortingOpProcFactory.FileSinkInferrer - throw/return?
> --
>
> Key: HIVE-17686
> URL: https://issues.apache.org/jira/browse/HIVE-17686
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> {noformat}
>   if (fop.getConf().isMmTable()) {
> // See the comment inside updatePartitionBucketSortColumns.
> LOG.warn("Currently, inferring buckets is not going to work for MM 
> tables (by design).");
>   }
> {noformat}
> should this throw/return rather than just log?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17674) grep TODO HIVE-15212.17.patch |wc - l = 49

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190646#comment-16190646
 ] 

Sergey Shelukhin commented on HIVE-17674:
-

Some of these actually are existing TODOs also

> grep TODO HIVE-15212.17.patch |wc - l = 49
> --
>
> Key: HIVE-17674
> URL: https://issues.apache.org/jira/browse/HIVE-17674
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> grep TODO HIVE-15212.17.patch 
> +  // TODO The following two utility methods can be moved to AcidUtils once 
> no class in metastore is relying on them,
> +  // TODO: CTAS for MM may currently be broken. It used to work. See the old 
> code and why isCtas isn't used?
> +// TODO: Setting these separately is a very hairy issue in certain 
> combinations, since we
> +// TODO: this doesn't handle list bucketing properly. Does the original 
> exim do that?
> +  // TODO: support this? we only bail because it's a PITA and hardly 
> anyone seems to care.
> +// TODO: support this?
> +// TODO: support this?
> +  // TODO: why is it stored in both?
> +  // TODO: this doesn't work... the path for writer and reader mismatch
> -// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +// TODO: In a follow-up to HIVE-1361, we should refactor 
> loadDynamicPartitions
> +  // TODO: this particular bit will not work for MM tables, as there can 
> be multiple
> +// TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: not clear why two if conditions are different. Preserve the 
> existing logic for now.
> +  // TODO: do we also need to do this when creating an empty partition 
> from select?
> +// TODO: see HIVE-14886 - removeTempOrDuplicateFiles is broken for list 
> bucketing,
> +  // TODO: for IOW, we also need to count in base dir, if any
> +  // TODO: HiveFileFormatUtils.getPartitionDescFromPathRecursively for 
> MM tables?
> +// TODO: not having aliases for path usually means some bug. Should 
> it give up?
>  updateMrWork(jobConf);  // TODO: refactor this in HIVE-6366
> +  // TODO: this assumes both paths are qualified; which they are, 
> currently.
> +// TODO: just like the move path, we only do one level of recursion.
> +  // TODO: may be broken? no LB bugs for now but if any are found.
> +  String lbDirSuffix = lbDirName.replace(partDirName, ""); // TODO: wtf?
> +// TODO: we should really probably throw. Keep the existing logic 
> for now.
> +// TODO: this assumes both paths are qualified; which they are, 
> currently.
> +// TODO: we assume lbLevels is 0 here. Same as old code for non-MM.
> +  // TODO: we could just find directories with any MM directories 
> inside?
>// TODO: add test case and clean it up
> +  // Duplicates logic in TextMetaDataFormatter TODO: wtf?!!
> +  // TODO: why is this in text formatter? grrr
> +  // TODO# noop MoveWork to avoid q file changes in HIVE-14990. Remove 
> (w/the flag) after merge.
> +// TODO: this should never happen for mm tables.
> +// TODO: if we are not dealing with concatenate DDL, we should not 
> create a merge+move path
> +// TODO: wtf? wtf?!! why is this in this method?
> +// TODO: this relies a lot on having one file per bucket. No support for 
> MM tables for now.
> +// TODO: should this use getPartitionDescFromPathRecursively?
> +  // TODO: we should probably block all ACID tables here.
> +  @SuppressWarnings("unused") // TODO: wtf?
> +// TODO: due to the master merge, tblDesc is no longer CreateTableDesc, 
> but ImportTableDesc
> +// TODO: ReplCopyTask is completely screwed. Need to support when 
> it's not as screwed.
> +  // TODO: ReplCopyTask is completely screwed. Need to support when 
> it's not as screwed.
>// TODO: the below seems like they should just be combined into 
> partitionDesc
> +// TODO: could we instead get FS from path here and add normal files for 
> every UGI?
> +-- TODO: doesn't work truncate table part_mm partition(key_mm=455);
> +-- TODO: need to include merge+union+DP, but it's broken for now
> +  // TODO: move these test parameters to more specific places... there's no 
> need to have them here



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17687) CompactorMR.run() should update compaction_queue table for MM

2017-10-03 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17687:
-


> CompactorMR.run() should update compaction_queue table for MM
> -
>
> Key: HIVE-17687
> URL: https://issues.apache.org/jira/browse/HIVE-17687
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> for MM it deletes Aborted dirs and bails.  Should probably update 
> compaction_queue so that it's clear why it doesn't have HadoopJobId etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190622#comment-16190622
 ] 

Mithun Radhakrishnan commented on HIVE-17669:
-

bq. weight based eviction could be a better approach (weight can be length of 
string).
Ah, that's an interesting suggestion. Shouldn't we also consider the cost of 
deserializing the sarg-string? On the one hand, perhaps the longer sarg-strings 
take longer to deserialize, and might benefit from caching. But on the other, 
they might dominate the cache. :/ I'll have to think this through.

Any recommendation on the value for {{CacheBuilder.maximumWeight()}}? :]

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17686) BucketingSortingOpProcFactory.FileSinkInferrer - throw/return?

2017-10-03 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17686:
-


> BucketingSortingOpProcFactory.FileSinkInferrer - throw/return?
> --
>
> Key: HIVE-17686
> URL: https://issues.apache.org/jira/browse/HIVE-17686
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Sergey Shelukhin
>
> {noformat}
>   if (fop.getConf().isMmTable()) {
> // See the comment inside updatePartitionBucketSortColumns.
> LOG.warn("Currently, inferring buckets is not going to work for MM 
> tables (by design).");
>   }
> {noformat}
> should this throw/return rather than just log?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17659) get_token thrift call fails for DBTokenStore in remote HMS mode

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190608#comment-16190608
 ] 

Hive QA commented on HIVE-17659:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890218/HIVE-17659.01.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7102/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7102/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7102/

Messages:
{noformat}
 This message was trimmed, see log for full details 
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-auth/2.8.1/hadoop-auth-2.8.1.jar(org/apache/hadoop/security/authentication/server/PseudoAuthenticationHandler.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/hadoop/hadoop-common/2.8.1/hadoop-common-2.8.1.jar(org/apache/hadoop/util/GenericOptionsParser.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-rewrite/9.3.8.v20160314/jetty-rewrite-9.3.8.v20160314.jar(org/eclipse/jetty/rewrite/handler/RedirectPatternRule.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-rewrite/9.3.8.v20160314/jetty-rewrite-9.3.8.v20160314.jar(org/eclipse/jetty/rewrite/handler/RewriteHandler.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-server/9.3.8.v20160314/jetty-server-9.3.8.v20160314.jar(org/eclipse/jetty/server/Handler.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-server/9.3.8.v20160314/jetty-server-9.3.8.v20160314.jar(org/eclipse/jetty/server/Server.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-server/9.3.8.v20160314/jetty-server-9.3.8.v20160314.jar(org/eclipse/jetty/server/ServerConnector.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-server/9.3.8.v20160314/jetty-server-9.3.8.v20160314.jar(org/eclipse/jetty/server/handler/HandlerList.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-servlet/9.3.8.v20160314/jetty-servlet-9.3.8.v20160314.jar(org/eclipse/jetty/servlet/FilterHolder.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-servlet/9.3.8.v20160314/jetty-servlet-9.3.8.v20160314.jar(org/eclipse/jetty/servlet/ServletContextHandler.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-servlet/9.3.8.v20160314/jetty-servlet-9.3.8.v20160314.jar(org/eclipse/jetty/servlet/ServletHolder.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/eclipse/jetty/jetty-xml/9.3.8.v20160314/jetty-xml-9.3.8.v20160314.jar(org/eclipse/jetty/xml/XmlConfiguration.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/slf4j/jul-to-slf4j/1.7.10/jul-to-slf4j-1.7.10.jar(org/slf4j/bridge/SLF4JBridgeHandler.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/javax/servlet/javax.servlet-api/3.1.0/javax.servlet-api-3.1.0.jar(javax/servlet/DispatcherType.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar(javax/servlet/http/HttpServletRequest.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/common/target/hive-common-3.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceAudience$LimitedPrivate.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/apache-github-source-source/common/target/hive-common-3.0.0-SNAPSHOT.jar(org/apache/hadoop/hive/common/classification/InterfaceStability$Unstable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/ByteArrayOutputStream.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/OutputStream.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Closeable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/lang/AutoCloseable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/io/Flushable.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(javax/xml/bind/annotation/XmlRootElement.class)]]
[loading 
ZipFileIndexFileObject[/data/hiveptest/working/maven/org/apache/commons/commons-exec/1.1/commons-exec-1.1.jar(org/apache/commons/exec/ExecuteException.class)]]
[loading 
ZipFileIndexFileObject[/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar(java/security/PrivilegedExceptionAction.class)]]
[loading 

[jira] [Assigned] (HIVE-17685) AcidUtils.parseString(String propertiesStr) - missing "break"

2017-10-03 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17685:
-


> AcidUtils.parseString(String propertiesStr) - missing "break"
> -
>
> Key: HIVE-17685
> URL: https://issues.apache.org/jira/browse/HIVE-17685
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> {noformat}
>   case INSERT_ONLY_STRING:
> obj.setInsertOnly(true);
>   default:
> throw new IllegalArgumentException(
> "Unexpected value " + option + " for ACID operational 
> properties!");
> {noformat}
> in fact _case INSERT_ONLY_STRING_ is not needed at all



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15016) Run tests with Hadoop 3.0.0-alpha1

2017-10-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15016:

Attachment: HIVE-15016.patch

With hadoop3-beta1

> Run tests with Hadoop 3.0.0-alpha1
> --
>
> Key: HIVE-15016
> URL: https://issues.apache.org/jira/browse/HIVE-15016
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Aihua Xu
> Attachments: Hadoop3Upstream.patch, HIVE-15016.patch
>
>
> Hadoop 3.0.0-alpha1 was released back on Sep/16 to allow other components run 
> tests against this new version before GA.
> We should start running tests with Hive to validate compatibility against 
> Hadoop 3.0.
> NOTE: The patch used to test must not be committed to Hive until Hadoop 3.0 
> GA is released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13690) Shade guava in hive-exec fat jar

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190603#comment-16190603
 ] 

Hive QA commented on HIVE-13690:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12802303/HIVE-13690.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 911 failed/errored test(s), 6394 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestBeeLineDriver.org.apache.hadoop.hive.cli.TestBeeLineDriver
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_blobstore_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_blobstore_to_hdfs]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_hdfs_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[having] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_local]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_warehouse]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_local_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore_nonpart]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_local]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse_nonpart]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_local_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_blobstore_to_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_empty_into_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_directory]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[join2] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[join] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[load_data] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[map_join] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[map_join_on_filter]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[nested_outer_join]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_buckets] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_format_nonpart]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_format_part]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[orc_nonstd_partitions_loc]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ptf_general_queries]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ptf_matchpath] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ptf_orcfile] 
(batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ptf_persistence]
 (batchId=243)

[jira] [Updated] (HIVE-15016) Run tests with Hadoop 3.0.0-alpha1

2017-10-03 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15016:

Status: Patch Available  (was: Open)

> Run tests with Hadoop 3.0.0-alpha1
> --
>
> Key: HIVE-15016
> URL: https://issues.apache.org/jira/browse/HIVE-15016
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sergio Peña
>Assignee: Aihua Xu
> Attachments: Hadoop3Upstream.patch, HIVE-15016.patch
>
>
> Hadoop 3.0.0-alpha1 was released back on Sep/16 to allow other components run 
> tests against this new version before GA.
> We should start running tests with Hive to validate compatibility against 
> Hadoop 3.0.
> NOTE: The patch used to test must not be committed to Hive until Hadoop 3.0 
> GA is released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars

2017-10-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190600#comment-16190600
 ] 

Sergey Shelukhin commented on HIVE-17574:
-

Yeah those are all known failures. Sorry for late response, it's customary to 
ignore those :(

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -
>
> Key: HIVE-17574
> URL: https://issues.apache.org/jira/browse/HIVE-17574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17574.1-branch-2.2.patch, 
> HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch, HIVE-17574.2.patch
>
>
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and 
> affects scripts that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) 
> tend to be stored in HDFS paths, as are any custom user-libraries used in 
> workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the 
> following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped 
> right back to HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should 
> skip shipping HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17608) REPL LOAD should overwrite the data files if exists instead of duplicating it

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190584#comment-16190584
 ] 

Hive QA commented on HIVE-17608:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890203/HIVE-17608.01.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 703 failed/errored test(s), 11195 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_11] 
(batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join14] (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join17] (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join19] (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join19_inclause] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join1] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join26] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join2] (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join3] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join4] (batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join5] (batchId=70)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join6] (batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join7] (batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join8] (batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join9] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_output_format] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket1] (batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket2] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket3] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_1]
 (batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_3]
 (batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_4]
 (batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_5]
 (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8]
 (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[case_sensitivity] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast1] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join17] 
(batchId=24)

[jira] [Updated] (HIVE-17672) Upgrade Calcite version to 1.14

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17672:
---
Attachment: HIVE-17672.01.patch

> Upgrade Calcite version to 1.14
> ---
>
> Key: HIVE-17672
> URL: https://issues.apache.org/jira/browse/HIVE-17672
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17672.01.patch
>
>
> Calcite 1.14.0 has been recently released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17672) Upgrade Calcite version to 1.14

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17672:
---
Attachment: (was: HIVE-17672.patch)

> Upgrade Calcite version to 1.14
> ---
>
> Key: HIVE-17672
> URL: https://issues.apache.org/jira/browse/HIVE-17672
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17672.01.patch
>
>
> Calcite 1.14.0 has been recently released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17508) Implement pool rules and triggers based on counters

2017-10-03 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190552#comment-16190552
 ] 

Prasanth Jayachandran commented on HIVE-17508:
--

I think I should be able to integrate back the tests with a static rules 
fetcher for tests. Will post updates in the next patch. 

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.2.patch, 
> HIVE-17508.3.patch, HIVE-17508.3.patch, HIVE-17508.4.patch, 
> HIVE-17508.5.patch, HIVE-17508.WIP.2.patch, HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-03 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190551#comment-16190551
 ] 

Prasanth Jayachandran commented on HIVE-17669:
--

I think replacing cache expiration with weight based eviction could be a better 
approach (weight can be length of string). Also cutting down on the configs, 
would be better to disable the cache for any values <= 0. 
With weight based eviction, you can drop MD5 as long strings will get evicted 
eventually.

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17574) Avoid multiple copies of HDFS-based jars when localizing job-jars

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190550#comment-16190550
 ] 

Mithun Radhakrishnan commented on HIVE-17574:
-

Ok, verified that these tests fail on {{master}}. [~sershe], if this still 
looks good to you, would you mind awfully if I committed?

> Avoid multiple copies of HDFS-based jars when localizing job-jars
> -
>
> Key: HIVE-17574
> URL: https://issues.apache.org/jira/browse/HIVE-17574
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0, 3.0.0, 2.4.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17574.1-branch-2.2.patch, 
> HIVE-17574.1-branch-2.patch, HIVE-17574.1.patch, HIVE-17574.2.patch
>
>
> Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)
> This has to do with the classpaths of Hive actions run from Oozie, and 
> affects scripts that adds jars/resources from HDFS locations.
> As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) 
> tend to be stored in HDFS paths, as are any custom user-libraries used in 
> workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the 
> following steps to occur:
> # Files are downloaded from HDFS to local temp dir.
> # UDFs are resolved/validated.
> # All jars/files, including those just downloaded from HDFS, are shipped 
> right back to HDFS-based scratch-directories, for job submission.
> For HDFS-based files, this is wasteful and time-consuming. #3 above should 
> skip shipping HDFS-based resources, and add those directly to the Tez session.
> We have a patch that's being used internally at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17432:
---
Labels:   (was: pull-request-available)

> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17432.01.patch, HIVE-17432.02.patch, 
> HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-10-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190546#comment-16190546
 ] 

ASF GitHub Bot commented on HIVE-17432:
---

Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/245


> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-17432.01.patch, HIVE-17432.02.patch, 
> HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17432:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Regenerated remaining q files and pushed to master. Thanks for reviewing 
[~ashutoshc]!

> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-17432.01.patch, HIVE-17432.02.patch, 
> HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-10-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-17432:
--
Labels: pull-request-available  (was: )

> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-17432.01.patch, HIVE-17432.02.patch, 
> HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17669:
---

Assignee: Mithun Radhakrishnan  (was: Chris Drome)

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17432) Enable join and aggregate materialized view rewriting

2017-10-03 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190533#comment-16190533
 ] 

Ashutosh Chauhan commented on HIVE-17432:
-

+1

> Enable join and aggregate materialized view rewriting
> -
>
> Key: HIVE-17432
> URL: https://issues.apache.org/jira/browse/HIVE-17432
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17432.01.patch, HIVE-17432.02.patch, 
> HIVE-17432.patch
>
>
> Enable Calcite materialized view based rewriting for queries containing joins 
> and aggregates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190342#comment-16190342
 ] 

Mithun Radhakrishnan edited comment on HIVE-17669 at 10/3/17 11:25 PM:
---

Is 
{{[HIVE-17669.2.patch|https://issues.apache.org/jira/secure/attachment/12890238/HIVE-17699.2.patch]}}
 more palatable? I've switched over to using a configurably bounded Guava 
{{Cache}}. I'm not sure what the default max-size should be. I've left it at 
100. Is that acceptable?

[~prasanth_j], do you suppose we should leave out doing the MD5/SHA-1 hashing? 
Would that be overkill, given the small cache-size?

I'm working on a {{branch-2}} port. (I'll have to rewrite the lambda bits.)


was (Author: mithun):
Is this more palatable? I've switched over to using a configurably bounded 
Guava {{Cache}}. I'm not sure what the default max-size should be. I've left it 
at 100. Is that acceptable?

[~prasanth_j], do you suppose we should leave out doing the MD5/SHA-1 hashing? 
Would that be overkill, given the small cache-size?

I'm working on a {{branch-2}} port. (I'll have to rewrite the lambda bits.)

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17609) Tool to manipulate delegation tokens

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190530#comment-16190530
 ] 

Mithun Radhakrishnan commented on HIVE-17609:
-

bq. also the confLocation argument .. does it expect HCat's conf location?
The conf directory is required so that the tool can read the value of 
{{DELEGATION_TOKEN_STORE_CLS}}, and any other settings the 
{{DelegationTokenStore}} might require to start up. The updated version should 
be able to read delegation tokens from either the metastore or HS2.

> Tool to manipulate delegation tokens
> 
>
> Key: HIVE-17609
> URL: https://issues.apache.org/jira/browse/HIVE-17609
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17609.1-branch-2.2.patch, 
> HIVE-17609.1-branch-2.patch, HIVE-17609.1.patch, HIVE-17609.2.patch
>
>
> This was precipitated by OOZIE-2797. We had a case in production where the 
> number of active metastore delegation tokens outstripped the ZooKeeper 
> {{jute.maxBuffer}} size. Delegation tokens could neither be fetched, nor be 
> cancelled. 
> The root-cause turned out to be a miscommunication, causing delegation tokens 
> fetched by Oozie *not* to be cancelled automatically from HCat. This was 
> sorted out as part of OOZIE-2797.
> The issue exposed how poor the log-messages were, in the code pertaining to 
> token fetch/cancellation. We also found need for a tool to query/list/purge 
> delegation tokens that might have expired already. This patch introduces such 
> a tool, and improves the log-messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17609) Tool to manipulate delegation tokens

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17609:

Attachment: HIVE-17609.2.patch

Updated to support HS2.

> Tool to manipulate delegation tokens
> 
>
> Key: HIVE-17609
> URL: https://issues.apache.org/jira/browse/HIVE-17609
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Security
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17609.1-branch-2.2.patch, 
> HIVE-17609.1-branch-2.patch, HIVE-17609.1.patch, HIVE-17609.2.patch
>
>
> This was precipitated by OOZIE-2797. We had a case in production where the 
> number of active metastore delegation tokens outstripped the ZooKeeper 
> {{jute.maxBuffer}} size. Delegation tokens could neither be fetched, nor be 
> cancelled. 
> The root-cause turned out to be a miscommunication, causing delegation tokens 
> fetched by Oozie *not* to be cancelled automatically from HCat. This was 
> sorted out as part of OOZIE-2797.
> The issue exposed how poor the log-messages were, in the code pertaining to 
> token fetch/cancellation. We also found need for a tool to query/list/purge 
> delegation tokens that might have expired already. This patch introduces such 
> a tool, and improves the log-messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17679) http-generic-click-jacking for WebHcat server

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190526#comment-16190526
 ] 

Hive QA commented on HIVE-17679:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890202/HIVE-17679.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11194 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7099/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7099/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7099/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12890202 - PreCommit-HIVE-Build

> http-generic-click-jacking for WebHcat server
> -
>
> Key: HIVE-17679
> URL: https://issues.apache.org/jira/browse/HIVE-17679
> Project: Hive
>  Issue Type: Bug
>  Components: Security, WebHCat
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17679.1.patch, HIVE-17679.2.patch
>
>
> The web UIs do not include the "X-Frame-Options" header to prevent the pages 
> from being framed from another site.
> Reference:
> https://www.owasp.org/index.php/Clickjacking
> https://www.owasp.org/index.php/Clickjacking_Defense_Cheat_Sheet
> https://developer.mozilla.org/en-US/docs/Web/HTTP/X-Frame-Options



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16602) Implement shared scans with Tez

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190524#comment-16190524
 ] 

Jesus Camacho Rodriguez commented on HIVE-16602:


[~kellyzly], it will depend on the scale and your hw indeed. For instance, in 
our case we could confirm that it made a huge difference for a few TCP-DS 
queries when we were benchmarking Hive at 10TB scale (more info in 
https://hortonworks.com/blog/3x-faster-interactive-query-hive-llap/).

> Implement shared scans with Tez
> ---
>
> Key: HIVE-16602
> URL: https://issues.apache.org/jira/browse/HIVE-16602
> Project: Hive
>  Issue Type: New Feature
>  Components: Physical Optimizer
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-16602.01.patch, HIVE-16602.02.patch, 
> HIVE-16602.03.patch, HIVE-16602.04.patch, HIVE-16602.patch
>
>
> Given a query plan, the goal is to identify scans on input tables that can be 
> merged so the data is read only once. Optimization will be carried out at the 
> physical level.
> In the longer term, identification of equivalent expressions and 
> reutilization of intermediary results should be done at the logical layer via 
> Spool operator.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2017-10-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190521#comment-16190521
 ] 

Sahil Takiar commented on HIVE-17684:
-

Another solution would be to use {{BytesBytesMultiHashMap}} instead of 
{{HashMapWrapper}}, we have to serialize the objects into {{byte[]}} anyway, so 
doing it earlier shouldn't make a huge difference. It also makes estimating 
Java object size much easier.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17672) Upgrade Calcite version to 1.14

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17672:
---
Attachment: HIVE-17672.patch

Uploading initial patch to trigger ptests.

> Upgrade Calcite version to 1.14
> ---
>
> Key: HIVE-17672
> URL: https://issues.apache.org/jira/browse/HIVE-17672
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17672.patch
>
>
> Calcite 1.14.0 has been recently released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17672) Upgrade Calcite version to 1.14

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17672 started by Jesus Camacho Rodriguez.
--
> Upgrade Calcite version to 1.14
> ---
>
> Key: HIVE-17672
> URL: https://issues.apache.org/jira/browse/HIVE-17672
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Calcite 1.14.0 has been recently released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17672) Upgrade Calcite version to 1.14

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17672:
---
Status: Patch Available  (was: In Progress)

> Upgrade Calcite version to 1.14
> ---
>
> Key: HIVE-17672
> URL: https://issues.apache.org/jira/browse/HIVE-17672
> Project: Hive
>  Issue Type: Task
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Calcite 1.14.0 has been recently released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2017-10-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190516#comment-16190516
 ] 

Sahil Takiar commented on HIVE-17684:
-

Long term it would be good to have some type of global memory manager per Spark 
executor that can determine the heap size is growing too big (it can do this by 
measuring the amount of time spent doing GC and comparing it to amount of data 
being reclaimed by GC - if lots of time is spent doing GC and not much memory 
is being released, then the heap is probably full). It could then abort 
specific Hive Tasks it thinks are using too much memory (the tasks can report 
their memory usage to this global manager). This pattern should be better than 
relying on threshold values for determining when to abort a task.

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2017-10-03 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190515#comment-16190515
 ] 

Sahil Takiar commented on HIVE-17684:
-

A solution to this issue would be to attempt to estimate the size of the 
{{HashMap}} that folds the small table. That seems to be what most other parts 
of Hive are doing: {{GroupByOperator}}, 
{{o.a.h.hive.ql.exec.tez.HashTableLoader}}.

The {{MapJoinTableContainer}} already implements a {{MemoryEstimate}} 
interface, but the implementation for {{HashMapWrapper}} doesn't seem complete 
(see {{HashMapWrapper#getEstimatedMemorySize}}. Estimating the size of Java 
objects is hard, but we could do our best ({{GroupByOperator}} attempts to do 
this, and the {{JavaDataModel}} has some helper method too).

> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17684) HoS memory issues with MapJoinMemoryExhaustionHandler

2017-10-03 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17684:
---


> HoS memory issues with MapJoinMemoryExhaustionHandler
> -
>
> Key: HIVE-17684
> URL: https://issues.apache.org/jira/browse/HIVE-17684
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> We have seen a number of memory issues due the {{HashSinkOperator}} use of 
> the {{MapJoinMemoryExhaustionHandler}}. This handler is meant to detect 
> scenarios where the small table is taking too much space in memory, in which 
> case a {{MapJoinMemoryExhaustionError}} is thrown.
> The configs to control this logic are:
> {{hive.mapjoin.localtask.max.memory.usage}} (default 0.90)
> {{hive.mapjoin.followby.gby.localtask.max.memory.usage}} (default 0.55)
> The handler works by using the {{MemoryMXBean}} and uses the following logic 
> to estimate how much memory the {{HashMap}} is consuming: 
> {{MemoryMXBean#getHeapMemoryUsage().getUsed() / 
> MemoryMXBean#getHeapMemoryUsage().getMax()}}
> The issue is that {{MemoryMXBean#getHeapMemoryUsage().getUsed()}} can be 
> inaccurate. The value returned by this method returns all reachable and 
> unreachable memory on the heap, so there may be a bunch of garbage data, and 
> the JVM just hasn't taken the time to reclaim it all. This can lead to 
> intermittent failures of this check even though a simple GC would have 
> reclaimed enough space for the process to continue working.
> We should re-think the usage of {{MapJoinMemoryExhaustionHandler}} for HoS. 
> In Hive-on-MR this probably made sense to use because every Hive task was run 
> in a dedicated container, so a Hive Task could assume it created most of the 
> data on the heap. However, in Hive-on-Spark there can be multiple Hive Tasks 
> running in a single executor, each doing different things.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16143) Improve msck repair batching

2017-10-03 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16143:
---
Attachment: HIVE-16143.10-branch-2.patch

Attaching patch for branch-2

> Improve msck repair batching
> 
>
> Key: HIVE-16143
> URL: https://issues.apache.org/jira/browse/HIVE-16143
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 3.0.0
>
> Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, 
> HIVE-16143.03.patch, HIVE-16143.04.patch, HIVE-16143.05.patch, 
> HIVE-16143.06.patch, HIVE-16143.07.patch, HIVE-16143.08.patch, 
> HIVE-16143.09.patch, HIVE-16143.10-branch-2.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>   if (batch_size > 0 && partsNotInMs.size() > batch_size) {
> int counter = 0;
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   counter++;
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
>   if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
> db.createPartitions(apd);
> apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>   }
> }
>   } else {
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
> }
> db.createPartitions(apd);
>   }
> } catch (Exception e) {
>   LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>   repairOutput.clear();
>   msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
> }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17561) Move TxnStore and implementations to standalone metastore

2017-10-03 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190476#comment-16190476
 ] 

Eugene Koifman commented on HIVE-17561:
---

Look OK in general.  I left a few comments in 
https://github.com/apache/hive/pull/253/files

> Move TxnStore and implementations to standalone metastore
> -
>
> Key: HIVE-17561
> URL: https://issues.apache.org/jira/browse/HIVE-17561
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore, Transactions
>Reporter: Alan Gates
>Assignee: Alan Gates
>  Labels: pull-request-available
> Attachments: HIVE-17561.4.patch, HIVE-17561.patch
>
>
> We need to move the metastore handling of transactions into the standalone 
> metastore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17617) Rollup of an empty resultset should contain the grouping of the empty grouping set

2017-10-03 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17617:

Status: Patch Available  (was: Open)

> Rollup of an empty resultset should contain the grouping of the empty 
> grouping set
> --
>
> Key: HIVE-17617
> URL: https://issues.apache.org/jira/browse/HIVE-17617
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-17617.01.patch
>
>
> running
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer,c integer);
> select  sum(c),
> grouping(b)
> fromtx1
> group by rollup (b);
> {code}
> returns 0 rows; however 
> according to the standard:
> The  is regarded as the shortest such initial sublist. 
> For example, “ROLLUP ( (A, B), (C, D) )”
> is equivalent to “GROUPING SETS ( (A, B, C, D), (A, B), () )”.
> so I think the totals row (the grouping for {{()}} should be present)  - psql 
> returns it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17617) Rollup of an empty resultset should contain the grouping of the empty grouping set

2017-10-03 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-17617:

Attachment: HIVE-17617.01.patch

#1) possibly works patch :)

> Rollup of an empty resultset should contain the grouping of the empty 
> grouping set
> --
>
> Key: HIVE-17617
> URL: https://issues.apache.org/jira/browse/HIVE-17617
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
> Attachments: HIVE-17617.01.patch
>
>
> running
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer,c integer);
> select  sum(c),
> grouping(b)
> fromtx1
> group by rollup (b);
> {code}
> returns 0 rows; however 
> according to the standard:
> The  is regarded as the shortest such initial sublist. 
> For example, “ROLLUP ( (A, B), (C, D) )”
> is equivalent to “GROUPING SETS ( (A, B, C, D), (A, B), () )”.
> so I think the totals row (the grouping for {{()}} should be present)  - psql 
> returns it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17617) Rollup of an empty resultset should contain the grouping of the empty grouping set

2017-10-03 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-17617:
---

Assignee: Zoltan Haindrich

> Rollup of an empty resultset should contain the grouping of the empty 
> grouping set
> --
>
> Key: HIVE-17617
> URL: https://issues.apache.org/jira/browse/HIVE-17617
> Project: Hive
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-17617.01.patch
>
>
> running
> {code}
> drop table if exists tx1;
> create table tx1 (a integer,b integer,c integer);
> select  sum(c),
> grouping(b)
> fromtx1
> group by rollup (b);
> {code}
> returns 0 rows; however 
> according to the standard:
> The  is regarded as the shortest such initial sublist. 
> For example, “ROLLUP ( (A, B), (C, D) )”
> is equivalent to “GROUPING SETS ( (A, B, C, D), (A, B), () )”.
> so I think the totals row (the grouping for {{()}} should be present)  - psql 
> returns it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17508) Implement pool rules and triggers based on counters

2017-10-03 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-17508:
-
Attachment: HIVE-17508.5.patch

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.2.patch, 
> HIVE-17508.3.patch, HIVE-17508.3.patch, HIVE-17508.4.patch, 
> HIVE-17508.5.patch, HIVE-17508.WIP.2.patch, HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-13004) Remove encryption shims

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190444#comment-16190444
 ] 

Hive QA commented on HIVE-13004:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890192/HIVE-13004.2.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 10930 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_auto_purge_tables]
 (batchId=166)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_ctas]
 (batchId=166)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_drop_partition]
 (batchId=166)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_drop_table]
 (batchId=167)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_drop_table_in_encrypted_db]
 (batchId=166)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_drop_view]
 (batchId=167)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_insert_partition_dynamic]
 (batchId=168)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_insert_partition_static]
 (batchId=165)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_insert_values]
 (batchId=167)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_unencrypted_tbl]
 (batchId=169)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=169)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_load_data_to_encrypted_tables]
 (batchId=166)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_move_tbl]
 (batchId=166)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_select_read_only_encrypted_tbl]
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=172)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=173)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
 (batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
 (batchId=101)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
 (batchId=88)
org.apache.hadoop.hive.cli.TestMinimrCliDriver.org.apache.hadoop.hive.cli.TestMinimrCliDriver
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver
 (batchId=93)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.org.apache.hadoop.hive.cli.TestTezPerfCliDriver
 (batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager2.checkExpectedLocks 
(batchId=284)
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByHCatMR[0]
 (batchId=184)
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByHCatMR[1]
 (batchId=184)
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByHCatMR[2]
 (batchId=184)
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.testReadDataFromEncryptedHiveTableByHCatMR[3]
 (batchId=184)

[jira] [Updated] (HIVE-16511) CBO looses inner casts on constants of complex type

2017-10-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16511:
---
Attachment: HIVE-16511.2.patch

> CBO looses inner casts on constants of complex type
> ---
>
> Key: HIVE-16511
> URL: https://issues.apache.org/jira/browse/HIVE-16511
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Vineet Garg
> Attachments: HIVE-16511.1.patch, HIVE-16511.2.patch
>
>
> type for map <10, cast(null as int)> becomes map 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16511) CBO looses inner casts on constants of complex type

2017-10-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16511:
---
Status: Patch Available  (was: Open)

> CBO looses inner casts on constants of complex type
> ---
>
> Key: HIVE-16511
> URL: https://issues.apache.org/jira/browse/HIVE-16511
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Vineet Garg
> Attachments: HIVE-16511.1.patch, HIVE-16511.2.patch
>
>
> type for map <10, cast(null as int)> becomes map 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16511) CBO looses inner casts on constants of complex type

2017-10-03 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16511:
---
Status: Open  (was: Patch Available)

> CBO looses inner casts on constants of complex type
> ---
>
> Key: HIVE-16511
> URL: https://issues.apache.org/jira/browse/HIVE-16511
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Query Planning
>Reporter: Ashutosh Chauhan
>Assignee: Vineet Garg
> Attachments: HIVE-16511.1.patch
>
>
> type for map <10, cast(null as int)> becomes map 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-1128) Let max/min handle complex types like struct

2017-10-03 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1128:
-
Description: 
A lot of users are interested in doing "arg_min" and "arg_max". Basically, 
return the value of some other columns when one column's value is the max value.

The following is an example usage when this is done:

{code}
SELECT department, max(struct(salary, employee_name))
FROM compensations
GROUP BY department;

SELECT department, max(struct(salary, employee_name)).col2 AS employee_name
FROM compensations
GROUP BY department;

SELECT department, sen.col1 as salary, sen.col2 as employee_name
FROM (
SELECT department, max(struct(salary, employee_name)).col2 AS sen
FROM compensations
GROUP BY department
) tmp

{code}


  was:
A lot of users are interested in doing "arg_min" and "arg_max". Basically, 
return the value of some other columns when one column's value is the max value.

The following is an example usage when this is done:

{code}
SELECT department, max(struct(salary, employee_name))
FROM compensations;
{code}



> Let max/min handle complex types like struct
> 
>
> Key: HIVE-1128
> URL: https://issues.apache.org/jira/browse/HIVE-1128
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
> Attachments: HIVE-1128.1.sh, HIVE-1128.2.patch
>
>
> A lot of users are interested in doing "arg_min" and "arg_max". Basically, 
> return the value of some other columns when one column's value is the max 
> value.
> The following is an example usage when this is done:
> {code}
> SELECT department, max(struct(salary, employee_name))
> FROM compensations
> GROUP BY department;
> SELECT department, max(struct(salary, employee_name)).col2 AS employee_name
> FROM compensations
> GROUP BY department;
> SELECT department, sen.col1 as salary, sen.col2 as employee_name
> FROM (
> SELECT department, max(struct(salary, employee_name)).col2 AS sen
> FROM compensations
> GROUP BY department
> ) tmp
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-1128) Let max/min handle complex types like struct

2017-10-03 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1128:
-
Description: 
A lot of users are interested in doing "arg_min" and "arg_max". Basically, 
return the value of some other columns when one column's value is the max value.

The following is an example usage when this is done:

{code}
SELECT department, max(struct(salary, employee_name))
FROM compensations
GROUP BY department;

SELECT department, max(struct(salary, employee_name)).col2 AS employee_name
FROM compensations
GROUP BY department;

SELECT department, sen.col1 as salary, sen.col2 as employee_name
FROM (
SELECT department, max(struct(salary, employee_name)) AS sen
FROM compensations
GROUP BY department
) tmp

{code}


  was:
A lot of users are interested in doing "arg_min" and "arg_max". Basically, 
return the value of some other columns when one column's value is the max value.

The following is an example usage when this is done:

{code}
SELECT department, max(struct(salary, employee_name))
FROM compensations
GROUP BY department;

SELECT department, max(struct(salary, employee_name)).col2 AS employee_name
FROM compensations
GROUP BY department;

SELECT department, sen.col1 as salary, sen.col2 as employee_name
FROM (
SELECT department, max(struct(salary, employee_name)).col2 AS sen
FROM compensations
GROUP BY department
) tmp

{code}



> Let max/min handle complex types like struct
> 
>
> Key: HIVE-1128
> URL: https://issues.apache.org/jira/browse/HIVE-1128
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.6.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.6.0
>
> Attachments: HIVE-1128.1.sh, HIVE-1128.2.patch
>
>
> A lot of users are interested in doing "arg_min" and "arg_max". Basically, 
> return the value of some other columns when one column's value is the max 
> value.
> The following is an example usage when this is done:
> {code}
> SELECT department, max(struct(salary, employee_name))
> FROM compensations
> GROUP BY department;
> SELECT department, max(struct(salary, employee_name)).col2 AS employee_name
> FROM compensations
> GROUP BY department;
> SELECT department, sen.col1 as salary, sen.col2 as employee_name
> FROM (
> SELECT department, max(struct(salary, employee_name)) AS sen
> FROM compensations
> GROUP BY department
> ) tmp
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.1.2, 3.0.0, 2.4.0, 2.2.1, 2.3.1
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Fix Version/s: 2.1.2

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.1.2, 3.0.0, 2.4.0, 2.2.1, 2.3.1
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Target Version/s:   (was: 3.0.0, 2.2.1, 2.3.1)

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0, 2.4.0, 2.2.1, 2.3.1
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Fix Version/s: 2.2.1

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0, 2.4.0, 2.2.1, 2.3.1
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190252#comment-16190252
 ] 

Matt McCline edited comment on HIVE-17682 at 10/3/17 9:35 PM:
--

Thanks to [~gopalv] for coming up with a very small test case and code fix:

{noformat}
drop table if exists foo;
create temporary table foo (x int, y int) stored as orc;
insert into foo values(1,1),(2,NULL),(3,1);
select x, IF(x > 0,y,0) from foo order by x;
{noformat}



was (Author: mmccline):
Thanks to [~gopalv] for coming up with a very small test case:

{noformat}
drop table if exists foo;
create temporary table foo (x int, y int) stored as orc;
insert into foo values(1,1),(2,NULL),(3,1);
select x, IF(x > 0,y,0) from foo order by x;
{noformat}


> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15121:
---
Fix Version/s: (was: 2.3.0)
   3.0.0

> Last MR job in Hive should be able to write to a different scratch directory
> 
>
> Key: HIVE-15121
> URL: https://issues.apache.org/jira/browse/HIVE-15121
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.patch, HIVE-15121.WIP.1.patch, 
> HIVE-15121.WIP.2.patch, HIVE-15121.WIP.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16674) Hive metastore JVM dumps core

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16674:
---
Fix Version/s: (was: 2.3.0)
   3.0.0

> Hive metastore JVM dumps core
> -
>
> Key: HIVE-16674
> URL: https://issues.apache.org/jira/browse/HIVE-16674
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: Hive-1.2.1
> Kerberos enabled cluster
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
>Priority: Blocker
> Fix For: 1.2.1, 3.0.0
>
> Attachments: HIVE-16674.1.patch, HIVE-16674.patch
>
>
> While trying to run a Hive query on 24 partitions executed on an external 
> table with large amount of partitions (4K+). I get an error
> {code}
>  - org.apache.thrift.transport.TSaslTransport$SaslParticipant.wrap(byte[], 
> int, int) @bci=27, line=568 (Compiled frame)
>  - org.apache.thrift.transport.TSaslTransport.flush() @bci=52, line=492 
> (Compiled frame)
>  - org.apache.thrift.transport.TSaslServerTransport.flush() @bci=1, line=41 
> (Compiled frame)
>  - org.apache.thrift.ProcessFunction.process(int, 
> org.apache.thrift.protocol.TProtocol, org.apache.thrift.protocol.TProtocol, 
> java.lang.Object) @bci=236, line=55 (Compiled frame)
>  - 
> org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=126, line=39 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=15, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run()
>  @bci=1, line=685 (Compiled frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1595 (Compiled frame)
>  - 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(org.apache.thrift.protocol.TProtocol,
>  org.apache.thrift.protocol.TProtocol) @bci=273, line=685 (Compiled frame)
>  - org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() @bci=151, 
> line=285 (Interpreted frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17098) Race condition in Hbase tables

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17098:
---
Fix Version/s: (was: 2.3.0)
   3.0.0

> Race condition in Hbase tables
> --
>
> Key: HIVE-17098
> URL: https://issues.apache.org/jira/browse/HIVE-17098
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 2.1.1
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17098.1.patch
>
>
> These steps simulate our customer production env.
> *STEP 1. Create test tables*
> {code}
> CREATE TABLE for_loading(
>   key int, 
>   value string,
>   age int,
>   salary decimal (10,2)
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> {code}
> {code}
> CREATE TABLE test_1(
>   key int, 
>   value string,
>   age int,
>   salary decimal (10,2)
> )
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ( 
>   'hbase.columns.mapping'=':key, cf1:value, cf1:age, cf1:salary', 
>   'serialization.format'='1')
> TBLPROPERTIES (
>   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
>   'hbase.table.name'='test_1', 
>   'numFiles'='0', 
>   'numRows'='0', 
>   'rawDataSize'='0', 
>   'totalSize'='0', 
>   'transient_lastDdlTime'='1495769316');
> {code}
> {code}
> CREATE TABLE test_2(
>   key int, 
>   value string,
>   age int,
>   salary decimal (10,2)
> )
> ROW FORMAT SERDE 
>   'org.apache.hadoop.hive.hbase.HBaseSerDe' 
> STORED BY 
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
> WITH SERDEPROPERTIES ( 
>   'hbase.columns.mapping'=':key, cf1:value, cf1:age, cf1:salary', 
>   'serialization.format'='1')
> TBLPROPERTIES (
>   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
>   'hbase.table.name'='test_2', 
>   'numFiles'='0', 
>   'numRows'='0', 
>   'rawDataSize'='0', 
>   'totalSize'='0', 
>   'transient_lastDdlTime'='1495769316');
> {code}
> *STEP 2. Create test data*
> {code}
> import java.io.IOException;
> import java.math.BigDecimal;
> import java.nio.charset.Charset;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import java.nio.file.StandardOpenOption;
> import java.util.ArrayList;
> import java.util.Arrays;
> import java.util.List;
> import java.util.Random;
> import static java.lang.String.format;
> public class Generator {
> private static List lines = new ArrayList<>();
> private static List name = Arrays.asList("Brian", "John", 
> "Rodger", "Max", "Freddie", "Albert", "Fedor", "Lev", "Niccolo");
> private static List salary = new ArrayList<>();
> public static void main(String[] args) {
> generateData(Integer.parseInt(args[0]), args[1]);
> }
> public static void generateData(int rowNumber, String file) {
> double maxValue = 2.55;
> double minValue = 1000.03;
> Random random = new Random();
> for (int i = 1; i <= rowNumber; i++) {
> lines.add(
> i + "," +
> name.get(random.nextInt(name.size())) + "," +
> (random.nextInt(62) + 18) + "," +
> format("%.2f", (minValue + (maxValue - minValue) * 
> random.nextDouble(;
> }
> Path path = Paths.get(file);
> try {
> Files.write(path, lines, Charset.forName("UTF-8"), 
> StandardOpenOption.APPEND);
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
> }
> {code}
> {code}
> javac Generator.java
> java Generator 300 dataset.csv
> hadoop fs -put dataset.csv /
> {code}
> *STEP 3. Upload test data*
> {code}
> load data local inpath '/home/myuser/dataset.csv' into table for_loading;
> {code}
> {code}
> from for_loading
> insert into table test_1
> select key,value,age,salary;
> {code}
> {code}
> from for_loading
> insert into table test_2
> select key,value,age,salary;
> {code}
> *STEP 4. Run test queries*
> Run in 5 parallel terminals for table {{test_1}}
> {code}
> for i in {1..500}; do beeline -u "jdbc:hive2://localhost:1/default 
> testuser1" -e "select * from test_1 limit 10;" 1>/dev/null; done
> {code}
> Run in 5 parallel terminals for table {{test_2}}
> {code}
> for i in {1..500}; do beeline -u "jdbc:hive2://localhost:1/default 
> testuser2" -e "select * from test_2 limit 10;" 1>/dev/null; done
> {code}
> *EXPECTED RESULT:*
> All queris are OK.
> *ACTUAL RESULT*
> {code}
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.IllegalStateException: The input format instance has not been 
> properly ini
> tialized. Ensure you call initializeTable either in your constructor 

[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Fix Version/s: 2.3.1

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0, 2.4.0, 2.3.1
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Fix Version/s: 2.4.0

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17683) Annotate Query Plan with locking information

2017-10-03 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17683:
--
Description: 
Explore if it's possible to add info about what locks will be asked for to the 
query plan.
Lock acquisition (for Acid Lock Manager) is done in DbTxnManager.acquireLocks() 
which is called once the query starts running.  Would need to refactor that.

  was:Explore if it's possible to add info about what locks will be asked for 
to the query plan.


> Annotate Query Plan with locking information
> 
>
> Key: HIVE-17683
> URL: https://issues.apache.org/jira/browse/HIVE-17683
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>
> Explore if it's possible to add info about what locks will be asked for to 
> the query plan.
> Lock acquisition (for Acid Lock Manager) is done in 
> DbTxnManager.acquireLocks() which is called once the query starts running.  
> Would need to refactor that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17683) Annotate Query Plan with locking information

2017-10-03 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17683:
-

Assignee: Eugene Koifman

> Annotate Query Plan with locking information
> 
>
> Key: HIVE-17683
> URL: https://issues.apache.org/jira/browse/HIVE-17683
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> Explore if it's possible to add info about what locks will be asked for to 
> the query plan.
> Lock acquisition (for Acid Lock Manager) is done in 
> DbTxnManager.acquireLocks() which is called once the query starts running.  
> Would need to refactor that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17664) Refactor and add new tests

2017-10-03 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17664:
---
Fix Version/s: 3.0.0

> Refactor and add new tests
> --
>
> Key: HIVE-17664
> URL: https://issues.apache.org/jira/browse/HIVE-17664
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 3.0.0
>
> Attachments: HIVE-17664.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17508) Implement pool rules and triggers based on counters

2017-10-03 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190347#comment-16190347
 ] 

Prasanth Jayachandran commented on HIVE-17508:
--

Some changes from previous refactor patch is causing NPEs. Will fix it in next 
patch.

> Implement pool rules and triggers based on counters
> ---
>
> Key: HIVE-17508
> URL: https://issues.apache.org/jira/browse/HIVE-17508
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-17508.1.patch, HIVE-17508.2.patch, 
> HIVE-17508.3.patch, HIVE-17508.3.patch, HIVE-17508.4.patch, 
> HIVE-17508.WIP.2.patch, HIVE-17508.WIP.patch
>
>
> Workload management can defined Rules that are bound to a resource plan. Each 
> rule can have a trigger expression and an action associated with it. Trigger 
> expressions are evaluated at runtime after configurable check interval, based 
> on which actions like killing a query, moving a query to different pool etc. 
> will get invoked. Simple rule could be something like
> {code}
> CREATE RULE slow_query IN resource_plan_name
> WHEN execution_time_ms > 1
> MOVE TO slow_queue
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17679) http-generic-click-jacking for WebHcat server

2017-10-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190345#comment-16190345
 ] 

Hive QA commented on HIVE-17679:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890202/HIVE-17679.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11193 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=232)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=232)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query14] 
(batchId=240)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=240)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=203)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=227)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7097/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7097/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7097/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12890202 - PreCommit-HIVE-Build

> http-generic-click-jacking for WebHcat server
> -
>
> Key: HIVE-17679
> URL: https://issues.apache.org/jira/browse/HIVE-17679
> Project: Hive
>  Issue Type: Bug
>  Components: Security, WebHCat
>Affects Versions: 2.1.1
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-17679.1.patch, HIVE-17679.2.patch
>
>
> The web UIs do not include the "X-Frame-Options" header to prevent the pages 
> from being framed from another site.
> Reference:
> https://www.owasp.org/index.php/Clickjacking
> https://www.owasp.org/index.php/Clickjacking_Defense_Cheat_Sheet
> https://developer.mozilla.org/en-US/docs/Web/HTTP/X-Frame-Options



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17669) Cache to optimize SearchArgument deserialization

2017-10-03 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17669:

Attachment: HIVE-17699.2.patch

Is this more palatable? I've switched over to using a configurably bounded 
Guava {{Cache}}. I'm not sure what the default max-size should be. I've left it 
at 100. Is that acceptable?

[~prasanth_j], do you suppose we should leave out doing the MD5/SHA-1 hashing? 
Would that be overkill, given the small cache-size?

I'm working on a {{branch-2}} port. (I'll have to rewrite the lambda bits.)

> Cache to optimize SearchArgument deserialization
> 
>
> Key: HIVE-17669
> URL: https://issues.apache.org/jira/browse/HIVE-17669
> Project: Hive
>  Issue Type: Improvement
>  Components: ORC, Query Processor
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Chris Drome
> Attachments: HIVE-17699.1.patch, HIVE-17699.2.patch
>
>
> And another, from [~selinazh] and [~cdrome]. (YHIVE-927)
> When a mapper needs to process multiple ORC files, it might land up having 
> use essentially the same {{SearchArgument}} over several files. It would be 
> good not to have to deserialize from string, over and over again. Caching the 
> object against the string-form should speed things up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190316#comment-16190316
 ] 

Gopal V commented on HIVE-17682:


LGTM - +1 tests pending.

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17682:

Attachment: HIVE-17682.01.patch

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17682:

Status: Patch Available  (was: Open)

> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17682.01.patch
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17544) Provide classname info for function authorization

2017-10-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190296#comment-16190296
 ] 

Sergio Peña commented on HIVE-17544:


The patch looks good.
+1

> Provide classname info for function authorization
> -
>
> Key: HIVE-17544
> URL: https://issues.apache.org/jira/browse/HIVE-17544
> Project: Hive
>  Issue Type: Task
>  Components: Authorization
>Affects Versions: 2.1.1
>Reporter: Na Li
>Assignee: Aihua Xu
>Priority: Critical
> Attachments: HIVE-17544.1.patch, HIVE-17544.2.patch, 
> HIVE-17544.3.patch
>
>
> Right now, for authorization 2, the 
> HiveAuthorizationValidator.checkPrivileges(HiveOperationType var1, 
> List var2, List var3, 
> HiveAuthzContext var4) does not contain the parsed sql command string as 
> input. Therefore, Sentry has to parse the command again.
> The API should be changed to include all required information as input, so 
> Sentry does not need to parse the sql command string again.
> known situations:
> 1) when dropping a database which does not exist, hive should not call sentry 
> or it calls sentry with database name as input
> 2) when creating function, hive should provide UDF class name as input.
> 3) When dropping function, hive should provide UDF class name as input.
> 4) When dropping a table which does not exist, hive should not call sentry or 
> it calls sentry with database name and table name as input.
> 5) In any situation that the command should succeeds and hive does not 
> provide required info to sentry, hive should not call sentry at all because 
> sentry will throw exception when required info is not available from input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190252#comment-16190252
 ] 

Matt McCline commented on HIVE-17682:
-

Thanks to [~gopalv] for coming up with a very small test case:

{noformat}
drop table if exists foo;
create temporary table foo (x int, y int) stored as orc;
insert into foo values(1,1),(2,NULL),(3,1);
select x, IF(x > 0,y,0) from foo order by x;
{noformat}


> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17682) Vectorization: IF stmt produces wrong results

2017-10-03 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-17682:
---


> Vectorization: IF stmt produces wrong results
> -
>
> Key: HIVE-17682
> URL: https://issues.apache.org/jira/browse/HIVE-17682
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.0.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 3.0.0
>
>
> A query using with a vectorized IF(condition, thenExpr, elseExpr) function 
> can produce wrong results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17659) get_token thrift call fails for DBTokenStore in remote HMS mode

2017-10-03 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190234#comment-16190234
 ] 

Vihang Karajgaonkar commented on HIVE-17659:


Patch merged to branch-2. Attaching the patch for master branch as well to run 
tests on master before merging it on master.

> get_token thrift call fails for DBTokenStore in remote HMS mode
> ---
>
> Key: HIVE-17659
> URL: https://issues.apache.org/jira/browse/HIVE-17659
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17659.01-branch-2.patch, HIVE-17659.01.patch, 
> HIVE-17659.02-branch-2.patch
>
>
> The {{get_token(String tokenIdentifier)}} HMS thrift API fails when HMS is 
> deployed in remote mode and when there is no token found with that 
> tokenIndentifier. This could happen when an application calls a 
> renewDelegationToken on an expired/cancelled delegation token. The issue is 
> that get_token tries to return a null result values which cannot be done in 
> Thrift. The API call errors out with 
> {{org.apache.thrift.TApplicationException unknown result}} exception which is 
> uncaught and HS2 thrift server closes the client transport. So no further 
> calls from that connection can be accepted unless client reconnects to HS2 
> again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17659) get_token thrift call fails for DBTokenStore in remote HMS mode

2017-10-03 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17659:
---
Attachment: HIVE-17659.01.patch

> get_token thrift call fails for DBTokenStore in remote HMS mode
> ---
>
> Key: HIVE-17659
> URL: https://issues.apache.org/jira/browse/HIVE-17659
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.0, 2.1.1, 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17659.01-branch-2.patch, HIVE-17659.01.patch, 
> HIVE-17659.02-branch-2.patch
>
>
> The {{get_token(String tokenIdentifier)}} HMS thrift API fails when HMS is 
> deployed in remote mode and when there is no token found with that 
> tokenIndentifier. This could happen when an application calls a 
> renewDelegationToken on an expired/cancelled delegation token. The issue is 
> that get_token tries to return a null result values which cannot be done in 
> Thrift. The API call errors out with 
> {{org.apache.thrift.TApplicationException unknown result}} exception which is 
> uncaught and HS2 thrift server closes the client transport. So no further 
> calls from that connection can be accepted unless client reconnects to HS2 
> again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >