[jira] [Commented] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot

2017-04-24 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982369#comment-15982369
 ] 

Rajesh Balamohan commented on HIVE-16353:
-

Thanks [~gopalv]. Patch lgtm. +1. 

> Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
> --
>
> Key: HIVE-16353
> URL: https://issues.apache.org/jira/browse/HIVE-16353
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16353.patch
>
>
> HIVE-16049 upgraded to jetty 9. It is committed to apache master which is 
> still 2.3.0-snapshot. This breaks couple of other components like LLAP and 
> ends up throwing the following error during runtime.
> {noformat}
> 2017-04-02T20:17:45,435 WARN  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP 
> Daemon with exception
> java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule
> at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) 
> ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>  ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) 
> [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> Caused by: java.lang.ClassNotFoundException: 
> org.eclipse.jetty.rewrite.handler.Rule
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77]
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77]
> ... 7 more
> 2017-04-02T20:17:45,441 INFO  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown 
> invoked
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982368#comment-15982368
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] created an RB: https://reviews.apache.org/r/58691/

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL 

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-24 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982333#comment-15982333
 ] 

Pengcheng Xiong commented on HIVE-15396:


[~stakiar], could u create a review request? Thanks.

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1  
>| NULL|
> | Bucket Columns:   | []  
>| NULL|
> | Sort Columns: | []  
>| NULL|
> | Storage Desc Params:  | NULL
>| NULL   

[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-24 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982331#comment-15982331
 ] 

Pengcheng Xiong commented on HIVE-16147:


LGTM. +1 pending tests.

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16445) enable Acid by default in the parent patch and run build bot

2017-04-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982323#comment-15982323
 ] 

Hive QA commented on HIVE-16445:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864866/HIVE-16445.01.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 599 failed/errored test(s), 9092 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock1] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock2] (batchId=29)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock3] (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock4] (batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=73)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=137)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=139)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver
 (batchId=141)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=165)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=166)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=167)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
 (batchId=96)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver
 (batchId=97)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into1] 
(batchId=87)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into2] 
(batchId=88)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into3] 
(batchId=87)

[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369

2017-04-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16510:

Attachment: HIVE-16510.02.patch

> Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
> -
>
> Key: HIVE-16510
> URL: https://issues.apache.org/jira/browse/HIVE-16510
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch
>
>
> Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so 
> separating out adding vectorized versions of current windowing_*.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369

2017-04-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16510:

Attachment: (was: HIVE-16510.02.patch)

> Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
> -
>
> Key: HIVE-16510
> URL: https://issues.apache.org/jira/browse/HIVE-16510
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch
>
>
> Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so 
> separating out adding vectorized versions of current windowing_*.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369

2017-04-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16510:

Status: Patch Available  (was: In Progress)

> Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
> -
>
> Key: HIVE-16510
> URL: https://issues.apache.org/jira/browse/HIVE-16510
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch
>
>
> Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so 
> separating out adding vectorized versions of current windowing_*.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369

2017-04-24 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16510:

Attachment: HIVE-16510.02.patch

> Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
> -
>
> Key: HIVE-16510
> URL: https://issues.apache.org/jira/browse/HIVE-16510
> Project: Hive
>  Issue Type: Bug
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch
>
>
> Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so 
> separating out adding vectorized versions of current windowing_*.q tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-24 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982262#comment-15982262
 ] 

Prasanth Jayachandran commented on HIVE-16503:
--

bq. Is SESSIONS_PER_DEFAULT_QUEUE guaranteed to be >= 1?

The default value is 1 but it doesn't guard against values set by user. In .3 
patch added guard against <=0 values and a unit test for the same.

 bq. Does it make sense to add range validators for the new settings (0, 1.0)?

.1 patch had RangeValidator but then I removed it as >1.0 values are also 
valid. Something like 120% oversubscription is a valid scenario. However under 
subscription is not allowed and is guarded by the return value
{code}
return Math.max(maxSize, llapMaxSize);
{code}
where maxSize is the initial no conditional task size. 

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, 
> HIVE-16503.3.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-24 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-16503:
-
Attachment: HIVE-16503.3.patch

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, 
> HIVE-16503.3.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-24 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16524:
-
Description: 
The Id attribute is defined in w3c as follows:
1.The id attribute specifies the unique id of the HTML element.
2.Id must be unique in the HTML document.
3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or by 
CSS to change or add a style to an element with the specified id.
But,the "id='attributes_table'"  in hiveserver2.jsp and QueryProfileTmpl.jamon:
1.Not quoted by any css and js
2.It has the same id attribute name on the same page
So I suggest removing this id attribute definition,Please Check It.

> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16524.1.patch
>
>
> The Id attribute is defined in w3c as follows:
> 1.The id attribute specifies the unique id of the HTML element.
> 2.Id must be unique in the HTML document.
> 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or 
> by CSS to change or add a style to an element with the specified id.
> But,the "id='attributes_table'"  in hiveserver2.jsp and 
> QueryProfileTmpl.jamon:
> 1.Not quoted by any css and js
> 2.It has the same id attribute name on the same page
> So I suggest removing this id attribute definition,Please Check It.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good

2017-04-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982241#comment-15982241
 ] 

Sergey Shelukhin edited comment on HIVE-16523 at 4/25/17 1:25 AM:
--

[~gopalv] [~mmccline] do you mind taking a look?
With this patch, I see the collisions in Q58 reduced to 1-2 elements per hash 
code (from 400-1200 for the worst codes before).


was (Author: sershe):
[~gopalv] [~mmccline] do you mind taking a look?

> VectorHashKeyWrapper hash code for strings is not so good
> -
>
> Key: HIVE-16523
> URL: https://issues.apache.org/jira/browse/HIVE-16523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16523.patch
>
>
> Perf issues in vectorized gby on some string keys



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-24 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16524:
-
Status: Patch Available  (was: Open)

> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16524.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-24 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982242#comment-15982242
 ] 

Gunther Hagleitner commented on HIVE-16503:
---

LGTM +1. Some smaller questions:

- Is SESSIONS_PER_DEFAULT_QUEUE guaranteed to be >= 1?

- Does it make sense to add range validators for the new settings (0, 1.0)?

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-24 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin updated HIVE-16524:
-
Attachment: HIVE-16524.1.patch

> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
> Attachments: HIVE-16524.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good

2017-04-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16523:

Attachment: HIVE-16523.patch

> VectorHashKeyWrapper hash code for strings is not so good
> -
>
> Key: HIVE-16523
> URL: https://issues.apache.org/jira/browse/HIVE-16523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16523.patch
>
>
> Perf issues in vectorized gby on some string keys



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good

2017-04-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16523:

Status: Patch Available  (was: Open)

[~gopalv] [~mmccline] do you mind taking a look?

> VectorHashKeyWrapper hash code for strings is not so good
> -
>
> Key: HIVE-16523
> URL: https://issues.apache.org/jira/browse/HIVE-16523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16523.patch
>
>
> Perf issues in vectorized gby on some string keys



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon

2017-04-24 Thread ZhangBing Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangBing Lin reassigned HIVE-16524:



> Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
> 
>
> Key: HIVE-16524
> URL: https://issues.apache.org/jira/browse/HIVE-16524
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: ZhangBing Lin
>Assignee: ZhangBing Lin
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good

2017-04-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16523:
---


> VectorHashKeyWrapper hash code for strings is not so good
> -
>
> Key: HIVE-16523
> URL: https://issues.apache.org/jira/browse/HIVE-16523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Perf issues in vectorized gby on some string keys



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11420) add support for "set autocommit"

2017-04-24 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982201#comment-15982201
 ] 

Eugene Koifman commented on HIVE-11420:
---

Currently (as of HIVE-12636), the system will recognize "set autocommit " 
command but it has no effect.  This should be supported at SessionState level 
probably to match JDBC semantics.

> add support for "set autocommit"
> 
>
> Key: HIVE-11420
> URL: https://issues.apache.org/jira/browse/HIVE-11420
> Project: Hive
>  Issue Type: Sub-task
>  Components: CLI, Transactions
>Affects Versions: 1.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> HIVE-11077 add support for "set autocommit true/false".
> should add support for "set autocommit" to return the current value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size

2017-04-24 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982193#comment-15982193
 ] 

Prasanth Jayachandran commented on HIVE-16503:
--

test failures are unrelated to the patch. accumulo_index.q and accumulo_index.q 
are already failing in master. skewjoinopt1.q passes locally for me which is 
probably flaky. I will trigger another test run to make sure anyways.

> LLAP: Oversubscribe memory for noconditional task size
> --
>
> Key: HIVE-16503
> URL: https://issues.apache.org/jira/browse/HIVE-16503
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch
>
>
> When running map joins in llap, it can potentially use more memory for hash 
> table loading (assuming other executors in the daemons have some memory to 
> spare). This map join conversion decision has to be made during compilation 
> that can provide some more room for LLAP. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16445) enable Acid by default in the parent patch and run build bot

2017-04-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16445:
--
Attachment: HIVE-16445.01.patch

set
hive.support.concurrency=true
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

> enable Acid by default in the parent patch and run build bot
> 
>
> Key: HIVE-16445
> URL: https://issues.apache.org/jira/browse/HIVE-16445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16445.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16445) enable Acid by default in the parent patch and run build bot

2017-04-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16445:
--
Status: Patch Available  (was: Open)

> enable Acid by default in the parent patch and run build bot
> 
>
> Key: HIVE-16445
> URL: https://issues.apache.org/jira/browse/HIVE-16445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-16445.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-24 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982186#comment-15982186
 ] 

Eugene Koifman commented on HIVE-12636:
---

no related failures for HIVE-12636.17.patch

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.17.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16399) create an index for tc_txnid in TXN_COMPONENTS

2017-04-24 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982182#comment-15982182
 ] 

Eugene Koifman commented on HIVE-16399:
---

The master patch contains upgrade-2.1.0-to-2.2.0.sql files but there is no 
branch-2.2 patch.
Shouldn't it match?

otherwise LGTM

[~wzheng] yes, I think if it's still possible to add it to 2.3 it would be good



> create an index for tc_txnid in TXN_COMPONENTS
> --
>
> Key: HIVE-16399
> URL: https://issues.apache.org/jira/browse/HIVE-16399
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Wei Zheng
> Attachments: HIVE-16399.branch-2.3.patch, HIVE-16399.branch-2.patch, 
> HIVE-16399.master.patch
>
>
> w/o this TxnStore.cleanEmptyAbortedTxns() can be very slow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-24 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982170#comment-15982170
 ] 

Hive QA commented on HIVE-12636:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864848/HIVE-12636.17.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10620 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4859/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4859/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4859/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864848 - PreCommit-HIVE-Build

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.17.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore

2017-04-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16520:
--
Status: Patch Available  (was: Open)

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-proto-2.patch, HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore

2017-04-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16520:
--
Attachment: HIVE-16520-proto-2.patch

Triggering a UT test.

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-proto-2.patch, HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception

2017-04-24 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16213:
---
Attachment: HIVE-16213.04.patch

Attaching the patch again to trigger QA

> ObjectStore can leak Queries when rollbackTransaction throws an exception
> -
>
> Key: HIVE-16213
> URL: https://issues.apache.org/jira/browse/HIVE-16213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, 
> HIVE-16213.03.patch, HIVE-16213.04.patch
>
>
> In ObjectStore.java there are a few places with the code similar to:
> {code}
> Query query = null;
> try {
>   openTransaction();
>   query = pm.newQuery(Something.class);
>   ...
>   commited = commitTransaction();
> } finally {
>   if (!commited) {
> rollbackTransaction();
>   }
>   if (query != null) {
> query.closeAll();
>   }
> }
> {code}
> The problem is that rollbackTransaction() may throw an exception in which 
> case query.closeAll() wouldn't be executed. 
> The fix would be to wrap rollbackTransaction in its own try-catch block.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12636:
--
Attachment: (was: HIVE-12636.16.patch)

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.17.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12636:
--
Attachment: HIVE-12636.17.patch

[~wzheng], could you review please

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.16.patch, HIVE-12636.17.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16354) Modularization efforts - change some dependencies to smaller client/api modules

2017-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16354:

Status: Patch Available  (was: Open)

seems like there were some ptest problems...instead of splitting all of my 
current changes; and triggering those...I send the whole package for test 
purposes

> Modularization efforts - change some dependencies to smaller client/api 
> modules
> ---
>
> Key: HIVE-16354
> URL: https://issues.apache.org/jira/browse/HIVE-16354
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Server Infrastructure
>Reporter: Zoltan Haindrich
> Attachments: allinwonder.1.patch
>
>
> in HIVE-16214 I've identified some pieces which might be good to move to new 
> modules...since that I've looked into it a bit more what could be done in 
> this aspect...and to prevent going backward in this path; or get stuck at 
> some point - I would like to be able to propose smaller changes prior to 
> creating any modules...
> The goal here is to remove the unneeded dependencies from the modules which 
> doesn't necessarily need them: the biggest fish in this tank is the {{jdbc}} 
> module, which currently ships with full hiveserver server side + all of the 
> ql codes + the whole metastore (including the jpa persistence libs) - this 
> makes the jdbc driver a really fat jar...
> These changes will also reduce the hive binary distribution size; introducing 
> service-client have reduce it by 20% percent alone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16354) Modularization efforts - change some dependencies to smaller client/api modules

2017-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16354:

Attachment: allinwonder.1.patch

> Modularization efforts - change some dependencies to smaller client/api 
> modules
> ---
>
> Key: HIVE-16354
> URL: https://issues.apache.org/jira/browse/HIVE-16354
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Server Infrastructure
>Reporter: Zoltan Haindrich
> Attachments: allinwonder.1.patch
>
>
> in HIVE-16214 I've identified some pieces which might be good to move to new 
> modules...since that I've looked into it a bit more what could be done in 
> this aspect...and to prevent going backward in this path; or get stuck at 
> some point - I would like to be able to propose smaller changes prior to 
> creating any modules...
> The goal here is to remove the unneeded dependencies from the modules which 
> doesn't necessarily need them: the biggest fish in this tank is the {{jdbc}} 
> module, which currently ships with full hiveserver server side + all of the 
> ql codes + the whole metastore (including the jpa persistence libs) - this 
> makes the jdbc driver a really fat jar...
> These changes will also reduce the hive binary distribution size; introducing 
> service-client have reduce it by 20% percent alone.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-24 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982021#comment-15982021
 ] 

Chaoyu Tang commented on HIVE-16147:


Patch has been uploaded to RB. [~pxiong], could you help to review it. Thanks.

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12614) RESET command does not close spark session

2017-04-24 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982020#comment-15982020
 ] 

Xuefu Zhang commented on HIVE-12614:


+1

> RESET command does not close spark session
> --
>
> Key: HIVE-12614
> URL: https://issues.apache.org/jira/browse/HIVE-12614
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.3.0, 2.1.0
>Reporter: Nemon Lou
>Assignee: Sahil Takiar
>Priority: Minor
> Attachments: HIVE-12614.1.patch, HIVE-12614.2.patch, 
> HIVE-12614.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-24 Thread Mike Fagan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981915#comment-15981915
 ] 

Mike Fagan edited comment on HIVE-15795 at 4/24/17 10:23 PM:
-

Patch to fix Accumulo integration test failures attached to the ticket


was (Author: faganm):
Patch to fix Accumulo integration test failures.

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, 
> HIVE-15795.3.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16080) Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981941#comment-15981941
 ] 

Sahil Takiar commented on HIVE-16080:
-

This patch is going into the 2.3 release, not 2.2

I've updated the wiki for both of these config keys and added in the version 
info.

> Add parquet to possible values for hive.default.fileformat and 
> hive.default.fileformat.managed
> --
>
> Key: HIVE-16080
> URL: https://issues.apache.org/jira/browse/HIVE-16080
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-16080.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-24 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16484:

Attachment: HIVE-16484.4.patch

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, 
> HIVE-16484.3.patch, HIVE-16484.4.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-24 Thread Mike Fagan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Fagan updated HIVE-15795:
--
Attachment: HIVE-15795.3.patch

Patch to fix Accumulo integration test failures.

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, 
> HIVE-15795.3.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-24 Thread Mike Fagan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981911#comment-15981911
 ] 

Mike Fagan edited comment on HIVE-15795 at 4/24/17 8:59 PM:


To address failures in integration tests


was (Author: faganm):
To address failures in inegration tests

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-24 Thread Mike Fagan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Fagan reopened HIVE-15795:
---

To address failures in inegration tests

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-24 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16346:

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

pushed to branch-2. Thanks Sahil.

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16497) FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file system operations should be impersonated

2017-04-24 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981848#comment-15981848
 ] 

Sushanth Sowmyan commented on HIVE-16497:
-

+1, LGTM.

> FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file 
> system operations should be impersonated
> --
>
> Key: HIVE-16497
> URL: https://issues.apache.org/jira/browse/HIVE-16497
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 3.0.0
>
> Attachments: HIVE-16497.1.patch, HIVE-16497.2.patch
>
>
> FileUtils.isActionPermittedForFileHierarchy checks if user has permissions 
> for given action. The checks are made by impersonating the user.
> However, the listing of child dirs are done as the hiveserver2 user. If the 
> hive user doesn't have permissions on the filesystem, it gives incorrect 
> error that the user doesn't have permissions to perform the action.
> Impersonating the end user for all file operations in that function is also 
> logically correct thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-10865) Beeline needs to support DELIMITER command

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981843#comment-15981843
 ] 

Sahil Takiar commented on HIVE-10865:
-

[~ctang.ma], [~ngangam], [~ychena] could someone take a look?

> Beeline needs to support DELIMITER command
> --
>
> Key: HIVE-10865
> URL: https://issues.apache.org/jira/browse/HIVE-10865
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Sahil Takiar
> Attachments: HIVE-10865.1.patch, HIVE-10865.2.patch, 
> HIVE-10865.3.patch, HIVE-10865.4.patch, HIVE-10865.5.patch
>
>
> MySQL Client provides a DELIMITER command to set statement delimiter.
> Beeline needs to support a similar command to allow commands having 
> semi-colon as non-statement delimiter (as with MySQL stored procedures). This 
> is a follow-up jira for HIVE-10659



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16277) Exchange Partition between filesystems throws "IllegalArgumentException Wrong FS"

2017-04-24 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16277:

Status: Open  (was: Patch Available)

> Exchange Partition between filesystems throws "IllegalArgumentException Wrong 
> FS"
> -
>
> Key: HIVE-16277
> URL: https://issues.apache.org/jira/browse/HIVE-16277
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16277.1.patch, HIVE-16277.2.patch, 
> HIVE-16277.3.patch, HIVE-16277.4.patch
>
>
> The following query: {{alter table s3_tbl exchange partition (country='USA') 
> with table hdfs_tbl}} fails with the following exception:
> {code}
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: 
> java.lang.IllegalArgumentException Wrong FS: 
> s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:379)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:347)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:361)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:Got exception: java.lang.IllegalArgumentException Wrong 
> FS: s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.exchangeTablePartitions(Hive.java:3553)
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.exchangeTablePartition(DDLTask.java:4691)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:570)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2182)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1838)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1525)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1231)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254)
>   ... 11 more
> Caused by: MetaException(message:Got exception: 
> java.lang.IllegalArgumentException Wrong FS: 
> s3a://[bucket]/table/country=USA, expected: file:///)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1387)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:208)
>   at 
> org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:200)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.exchange_partitions(HiveMetaStore.java:2967)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.exchange_partitions(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.exchange_partitions(HiveMetaStoreClient.java:690)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  

[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981840#comment-15981840
 ] 

Sahil Takiar commented on HIVE-15396:
-

[~pxiong] wanted to see if we can still get this patch in. Let me know what you 
think of the most recent patch. To summarize:

* The patch added basic stats collection for table with a {{LOCATION}} 
specified, but only if the specified location is empty and the table is not an 
external table
* This should be useful when running on blobstores such as S3, where users 
commonly specify an explicit {{LOCATION}} clause

Thanks for spending the time to look at this!

> Basic Stats are not collected when for managed tables with LOCATION specified
> -
>
> Key: HIVE-15396
> URL: https://issues.apache.org/jira/browse/HIVE-15396
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, 
> HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, 
> HIVE-15396.6.patch, HIVE-15396.7.patch
>
>
> Basic stats are not collected when a managed table is created with a 
> specified {{LOCATION}} clause.
> {code}
> 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int);
> 0: jdbc:hive2://localhost:1> describe formatted hdfs_1;
> +---++-+
> |   col_name| data_type   
>|   comment   |
> +---++-+
> | # col_name| data_type   
>| comment |
> |   | NULL
>| NULL|
> | col   | int 
>| |
> |   | NULL
>| NULL|
> | # Detailed Table Information  | NULL
>| NULL|
> | Database: | default 
>| NULL|
> | Owner:| anonymous   
>| NULL|
> | CreateTime:   | Wed Mar 22 18:09:19 PDT 2017
>| NULL|
> | LastAccessTime:   | UNKNOWN 
>| NULL|
> | Retention:| 0   
>| NULL|
> | Location: | file:/warehouse/hdfs_1 | NULL   
>  |
> | Table Type:   | MANAGED_TABLE   
>| NULL|
> | Table Parameters: | NULL
>| NULL|
> |   | COLUMN_STATS_ACCURATE   
>| {\"BASIC_STATS\":\"true\"}  |
> |   | numFiles
>| 0   |
> |   | numRows 
>| 0   |
> |   | rawDataSize 
>| 0   |
> |   | totalSize   
>| 0   |
> |   | transient_lastDdlTime   
>| 1490231359  |
> |   | NULL
>| NULL|
> | # Storage Information | NULL
>| NULL|
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>|
> | InputFormat:  | org.apache.hadoop.mapred.TextInputFormat
>| NULL|
> | OutputFormat: | 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL 
>|
> | Compressed:   | No  
>| NULL|
> | Num Buckets:  | -1   

[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981835#comment-15981835
 ] 

Sahil Takiar commented on HIVE-16346:
-

[~aihuaxu] can this be merged?

> inheritPerms should be conditional based on the target filesystem
> -
>
> Key: HIVE-16346
> URL: https://issues.apache.org/jira/browse/HIVE-16346
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16346.1-branch-2.patch, 
> HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch
>
>
> Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of 
> different files that have been moved / copied. This is only triggered if 
> {{hive.warehouse.subdir.inherit.perms}} is set to true.
> However, on blobstores such as S3, there is no concept of file permissions so 
> these calls are unnecessary, which can hurt performance.
> One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to 
> false, but this would be a global change that affects an entire HS2 instance. 
> So HDFS tables will no longer have permissions inheritance.
> A better solution would be to make the inheritance of permissions conditional 
> on the target filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14864) Distcp is not called from MoveTask when src is a directory

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981833#comment-15981833
 ] 

Sahil Takiar commented on HIVE-14864:
-

I updated the documentation for this here: 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution

> Distcp is not called from MoveTask when src is a directory
> --
>
> Key: HIVE-14864
> URL: https://issues.apache.org/jira/browse/HIVE-14864
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14864.1.patch, HIVE-14864.2.patch, 
> HIVE-14864.3.patch, HIVE-14864.4.patch, HIVE-14864.patch
>
>
> In FileUtils.java the following code does not get executed even when src 
> directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because 
> srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We 
> should use srcFS.getContentSummary(src).getLength() instead.
> {noformat}
> /* Run distcp if source file/dir is too big */
> if (srcFS.getUri().getScheme().equals("hdfs") &&
> srcFS.getFileStatus(src).getLen() > 
> conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) {
>   LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. 
> (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + 
> ")");
>   LOG.info("Launch distributed copy (distcp) job.");
>   HiveConfUtil.updateJobCredentialProviders(conf);
>   copied = shims.runDistCp(src, dst, conf);
>   if (copied && deleteSource) {
> srcFS.delete(src, true);
>   }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-8750) Commit initial encryption work

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981832#comment-15981832
 ] 

Sahil Takiar commented on HIVE-8750:


I updated the documentation for this here: 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution

> Commit initial encryption work
> --
>
> Key: HIVE-8750
> URL: https://issues.apache.org/jira/browse/HIVE-8750
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Sergio Peña
>  Labels: TODOC15
> Fix For: encryption-branch, 1.1.0
>
> Attachments: HIVE-8750.1.patch
>
>
> I believe Sergio has some work done for encryption. In this item we'll commit 
> it to branch.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-24 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12636:
--
Attachment: HIVE-12636.16.patch

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.16.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981784#comment-15981784
 ] 

Sahil Takiar commented on HIVE-14170:
-

This didn't go into the 2.2 release, seems its going into the 2.3 release. I've 
updated the wiki: 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981770#comment-15981770
 ] 

Sahil Takiar commented on HIVE-7224:


This didn't go into the 2.2 release, seems its going into the 2.3 release. I've 
updated the wiki to reflect this: 
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions

> Set incremental printing to true by default in Beeline
> --
>
> Key: HIVE-7224
> URL: https://issues.apache.org/jira/browse/HIVE-7224
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, Clients, JDBC
>Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Sahil Takiar
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-7224.1.patch, HIVE-7224.2.patch, HIVE-7224.2.patch, 
> HIVE-7224.3.patch, HIVE-7224.4.patch, HIVE-7224.5.patch
>
>
> See HIVE-7221.
> By default beeline tries to buffer the entire output relation before printing 
> it on stdout. This can cause OOM when the output relation is large. However, 
> beeline has the option of incremental prints. We should keep that as the 
> default.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-24 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16147:
---
Attachment: HIVE-16147.patch

The patch is to:
1. preserve the column stats in a partitioned table rename
2. since the column stats are no more invalidated during a table rename, I 
renamed the alter_table_invalidate_column_stats.q to alter_table_column_stats.q

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats

2017-04-24 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-16147:
---
Status: Patch Available  (was: Open)

> Rename a partitioned table should not drop its partition columns stats
> --
>
> Key: HIVE-16147
> URL: https://issues.apache.org/jira/browse/HIVE-16147
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Attachments: HIVE-16147.patch
>
>
> When a partitioned table (e.g. sample_pt) is renamed (e.g to 
> sample_pt_rename), describing its partition shows that the partition column 
> stats are still accurate, but actually they all have been dropped.
> It could be reproduce as following:
> 1. analyze table sample_pt compute statistics for columns;
> 2. describe formatted default.sample_pt partition (dummy = 3):  COLUMN_STATS 
> for all columns are true
> {code}
> ...
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358
> ... 
> {code}
> 3: describe formatted default.sample_pt partition (dummy = 3) salary: column 
> stats exists
> {code}
> # col_namedata_type   min 
> max num_nulls   distinct_count  
> avg_col_len max_col_len num_trues   
> num_falses  comment 
>   
>  
> salaryint 1   151370  
> 0   94
>   
> from deserializer 
> {code}
> 4. alter table sample_pt rename to sample_pt_rename;
> 5. describe formatted default.sample_pt_rename partition (dummy = 3): 
> describe the rename table partition (dummy =3) shows that COLUMN_STATS for 
> columns are still true.
> {code}
> # Detailed Partition Information   
> Partition Value:  [3]  
> Database: default  
> Table:sample_pt_rename 
> CreateTime:   Fri Jan 20 15:42:30 EST 2017 
> LastAccessTime:   UNKNOWN  
> Location: 
> file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3
> Partition Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}}
>   last_modified_byctang   
>   last_modified_time  1485217063  
>   numFiles1   
>   numRows 100 
>   rawDataSize 5143
>   totalSize   5243
>   transient_lastDdlTime   1488842358  
> {code}
> describe formatted default.sample_pt_rename partition (dummy = 3) salary: the 
> column stats have been dropped.
> {code}
> # col_namedata_type   comment 
>  
>   
>  
> salaryint from deserializer   
>  
> Time taken: 0.131 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution

2017-04-24 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981720#comment-15981720
 ] 

slim bouguerra commented on HIVE-16522:
---

[~ashutoshc] can you please check this ?

> Hive is query timer is not keeping track of the fetch task execution
> 
>
> Key: HIVE-16522
> URL: https://issues.apache.org/jira/browse/HIVE-16522
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-16522.patch
>
>
> Currently Hive CLI query execution time does not include fetch time execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution

2017-04-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16522:
--
Attachment: HIVE-16522.patch

> Hive is query timer is not keeping track of the fetch task execution
> 
>
> Key: HIVE-16522
> URL: https://issues.apache.org/jira/browse/HIVE-16522
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-16522.patch
>
>
> Currently Hive CLI query execution time does not include fetch time execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring

2017-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981707#comment-15981707
 ] 

Siddharth Seth commented on HIVE-16343:
---

This look up can be quite expensive. e.g. the SMAPS based lookup can take 
multiple seconds. I don't think refreshing it every 10s is a good idea. Need to 
have some kind of guard around when it gets refreshed (independent of the 
metrics config)

> LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
> 
>
> Key: HIVE-16343
> URL: https://issues.apache.org/jira/browse/HIVE-16343
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16343.1.patch
>
>
> Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful 
> for monitoring and also setting up triggers via JMC. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution

2017-04-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16522:
--
Status: Patch Available  (was: Open)

> Hive is query timer is not keeping track of the fetch task execution
> 
>
> Key: HIVE-16522
> URL: https://issues.apache.org/jira/browse/HIVE-16522
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> Currently Hive CLI query execution time does not include fetch time execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution

2017-04-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-16522:
-


> Hive is query timer is not keeping track of the fetch task execution
> 
>
> Key: HIVE-16522
> URL: https://issues.apache.org/jira/browse/HIVE-16522
> Project: Hive
>  Issue Type: Bug
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> Currently Hive CLI query execution time does not include fetch time execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16521) HoS user level explain plan possibly incorrect for UNION clause

2017-04-24 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-16521:
---


> HoS user level explain plan possibly incorrect for UNION clause
> ---
>
> Key: HIVE-16521
> URL: https://issues.apache.org/jira/browse/HIVE-16521
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The user-level explain plan for queries with a UNION operator look very 
> different for HoS vs. Hive-on-Tez. Furthermore, the HoS plan looks incomplete:
> Query: {{EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart)}}
> Hive-on-Tez:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)
> Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Union 6 (CONTAINS)
> Reducer 7 <- Union 6 (SIMPLE_EDGE)
> Reducer 9 <- Map 8 (CUSTOM_SIMPLE_EDGE), Union 6 (CONTAINS)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [CUSTOM_SIMPLE_EDGE]
>   PARTITION_ONLY_SHUFFLE [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Merge Join Operator [MERGEJOIN_44] (rows=1000 width=8)
> Conds:RS_26._col0=RS_27._col0(Inner)
>   <-Map 1 [SIMPLE_EDGE]
> SHUFFLE [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=184)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=194)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:COMPLETE
>   <-Reducer 7 [SIMPLE_EDGE]
> SHUFFLE [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Union 6 [SIMPLE_EDGE]
> <-Reducer 5 [CONTAINS]
>   Reduce Output Operator [RS_23]
> PartitionCols:_col0
> Group By Operator [GBY_22] (rows=1 width=184)
>   Output:["_col0"],keys:_col0
>   Filter Operator [FIL_9] (rows=1 width=184)
> predicate:_col0 is not null
> Group By Operator [GBY_7] (rows=1 width=184)
>   
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
> <-Map 4 [CUSTOM_SIMPLE_EDGE]
>   PARTITION_ONLY_SHUFFLE [RS_6]
> Group By Operator [GBY_5] (rows=1 width=184)
>   Output:["_col0"],aggregations:["max(ds)"]
>   Select Operator [SEL_4] (rows=2000 
> width=194)
> Output:["ds"]
> TableScan [TS_3] (rows=2000 width=194)
>   
> default@srcpart,srcpart,Tbl:COMPLETE,Col:COMPLETE
> <-Reducer 9 [CONTAINS]
>   Reduce Output Operator [RS_23]
> PartitionCols:_col0
> Group By Operator [GBY_22] (rows=1 width=184)
>   Output:["_col0"],keys:_col0
>   Filter Operator [FIL_17] (rows=1 width=184)
> predicate:_col0 is not null
> Group By Operator [GBY_15] (rows=1 width=184)
>   
> Output:["_col0"],aggregations:["min(VALUE._col0)"]
> <-Map 8 [CUSTOM_SIMPLE_EDGE]
>   PARTITION_ONLY_SHUFFLE [RS_14]
> Group By Operator [GBY_13] (rows=1 width=184)
>   Output:["_col0"],aggregations:["min(ds)"]
>   Select Operator [SEL_12] (rows=2000 
> width=194)
> Output:["ds"]
> TableScan [TS_11] (rows=2000 width=194)
>   
> default@srcpart,srcpart,Tbl:COMPLETE,Col:COMPLETE
> Dynamic Partitioning Event Operator [EVENT_43] (rows=1 
> width=184)
>   Group By Operator [GBY_42] (rows=1 width=184)
> Output:["_col0"],keys:_col0
>   

[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981683#comment-15981683
 ] 

Sahil Takiar commented on HIVE-11133:
-

[~lirui] no it isn't truncated. This could be another bug. I filed HIVE-16521 
as another follow up item. I suspect it has something to do with the UNION 
operator, the user level plans for HoS vs. Hive-on-Tez look very different for 
queries with a UNION.

> Support hive.explain.user for Spark
> ---
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Mohit Sabharwal
>Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, 
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, 
> HIVE-11133.6.patch, HIVE-11133.7.patch, HIVE-11133.8.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support 
> Spark as well. 
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14881) integrate MM tables into ACID: merge cleaner into ACID threads

2017-04-24 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981650#comment-15981650
 ] 

Eugene Koifman commented on HIVE-14881:
---

As we discussed, I think adding a new compaction type here is going to cause 
confusion longer term.

Also, if you make Worker delete the Directory. getAbortedDirectories(), then 
the Cleaner doesn't have to change at all and probably has better scalability.

getAbortedDirectories() shoudl have have a JavaDoc that it's a list of deltas 
that have nothing but aborted txns.

I'd change
{noformat}
if (MetaStoreUtils.isInsertOnlyTable(tblproperties) &&
1013  txnList.isTxnAborted(delta.minTransaction)) { // for MM 
table, minTxnId & maxTxnId is same
1014aborted.add(child);
1015  }
{noformat}
to check that all txns between min/max for current delta and if so, add the 
delta to the list - this way the code is not specific to MM tables and the 
collection being build may be useful somewhere else.




> integrate MM tables into ACID: merge cleaner into ACID threads 
> ---
>
> Key: HIVE-14881
> URL: https://issues.apache.org/jira/browse/HIVE-14881
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Attachments: HIVE-14881.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector

2017-04-24 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981634#comment-15981634
 ] 

Josh Elser commented on HIVE-15795:
---

bq. I am curious why I didn't run into this before Sergey committed. Sorry 
about that, folks.

Ahh, I had applied this onto a 2.2 branch and not master. That's probably why I 
didn't catch the issue. Sorry again.

[~faganm], can you please attach your patch from reviewboard here for HadoopQA 
to run?

> Support Accumulo Index Tables in Hive Accumulo Connector
> 
>
> Key: HIVE-15795
> URL: https://issues.apache.org/jira/browse/HIVE-15795
> Project: Hive
>  Issue Type: Improvement
>  Components: Accumulo Storage Handler
>Reporter: Mike Fagan
>Assignee: Mike Fagan
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch
>
>
> Ability to specify an accumulo index table for an accumulo-hive table.
> This would greatly improve performance for non-rowid query predicates



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15571) Support Insert into for druid storage handler

2017-04-24 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-15571:

Status: Patch Available  (was: Open)

> Support Insert into for druid storage handler
> -
>
> Key: HIVE-15571
> URL: https://issues.apache.org/jira/browse/HIVE-15571
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
> Attachments: HIVE-15571.01.patch
>
>
> Add support of inset into operator for druid storage handler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring

2017-04-24 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981631#comment-15981631
 ] 

Prasanth Jayachandran commented on HIVE-16343:
--

bq. While launching the process. environment.put("JVM_PID", "$$") / export. 
Within the process - System.getenv().get("JVM_PID").
does it also happen for process launched by slider?

bq. Is there an easier and more reliable way to do this, instead of relying on 
a pid file
I thought this is the reliable when compared to others :) The current default 
location for the pid file is not reliable as it defaults to /tmp/user location.
If JVM_PID is guaranteed to be set I can add that option as well. 

bq. May want to introduce a config for which process monitor to use, instead of 
relying on a YARN configuration.
hmm.. why do we need this? Unless LLAP adds its own class it is not that 
useful. Isn't it?

bq. How often will the metrics be collected?
Configurable in hadoop-metrics2.properties file. The template and Ambari 
default is to collect every 10s and publish every 5 mins. 



> LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
> 
>
> Key: HIVE-16343
> URL: https://issues.apache.org/jira/browse/HIVE-16343
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16343.1.patch
>
>
> Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful 
> for monitoring and also setting up triggers via JMC. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981614#comment-15981614
 ] 

Sergey Shelukhin commented on HIVE-16519:
-

+1

> Fix exception thrown by checkOutputSpecs
> 
>
> Key: HIVE-16519
> URL: https://issues.apache.org/jira/browse/HIVE-16519
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: druid
> Attachments: HIVE-16519.patch
>
>
> do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring

2017-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981608#comment-15981608
 ] 

Siddharth Seth commented on HIVE-16343:
---

Getting access to the PID. Is there an easier and more reliable way to do this, 
instead of relying on a pid file. Tez/YARN use the following - While launching 
the process. environment.put("JVM_PID", "$$") / export. Within the process - 
System.getenv().get("JVM_PID").

If retaining the current method of accessing the pid file, please move to a 
helper class. The daemon class is getting a little noisy.

May want to introduce a config for which process monitor to use, instead of 
relying on a YARN configuration.

How often will the metrics be collected?


> LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
> 
>
> Key: HIVE-16343
> URL: https://issues.apache.org/jira/browse/HIVE-16343
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-16343.1.patch
>
>
> Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful 
> for monitoring and also setting up triggers via JMC. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore

2017-04-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-16520:
--
Attachment: HIVE-16520-proto.patch

> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-16520-proto.patch
>
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16520) Cache hive metadata in metastore

2017-04-24 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-16520:
-


> Cache hive metadata in metastore
> 
>
> Key: HIVE-16520
> URL: https://issues.apache.org/jira/browse/HIVE-16520
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>
> During Hive 2 benchmark, we find Hive metastore operation take a lot of time 
> and thus slow down Hive compilation. In some extreme case, it takes much 
> longer than the actual query run time. Especially, we find the latency of 
> cloud db is very high and 90% of total query runtime is waiting for metastore 
> SQL database operations. Based on this observation, the metastore operation 
> performance will be greatly enhanced if we have a memory structure which 
> cache the database query result.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects

2017-04-24 Thread Misha Dmitriev (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981588#comment-15981588
 ] 

Misha Dmitriev commented on HIVE-16079:
---

vector_if_expr test fails in pretty much every Hive build. accumulo_index fails 
in every 2nd..3rd build, so it looks flaky as well.

> HS2: high memory pressure due to duplicate Properties objects
> -
>
> Key: HIVE-16079
> URL: https://issues.apache.org/jira/browse/HIVE-16079
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, 
> HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code. One (duplicate strings) has been addressed in 
> https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going 
> to address the fact that almost 20% of memory is used by instances of 
> java.util.Properties. These objects are highly duplicate, since for each 
> partition each concurrently running query creates its own copy of Partion, 
> PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 
> partitions) Properties in memory. By interning/deduplicating these objects we 
> may be able to save perhaps 15% of memory.
> Note, however, that if there are queries that mutate partitions, the 
> corresponding Properties would be mutated as well. Thus we cannot simply use 
> a single "canonicalized" Properties object at all times for all Partition 
> objects representing the same DB partition. Instead, I am going to introduce 
> a special CopyOnFirstWriteProperties class. Such an object initially 
> internally references a canonicalized Properties object, and keeps doing so 
> while only read methods are called. However, once any mutating method is 
> called, the given CopyOnFirstWriteProperties copies the data into its own 
> table from the canonicalized table, and uses it ever after.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16441) De-duplicate semijoin branches in n-way joins

2017-04-24 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16441:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> De-duplicate semijoin branches in n-way joins
> -
>
> Key: HIVE-16441
> URL: https://issues.apache.org/jira/browse/HIVE-16441
> Project: Hive
>  Issue Type: Improvement
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16441.1.patch, HIVE-16441.2.patch, 
> HIVE-16441.3.patch, HIVE-16441.4.patch
>
>
> Currently in n-way joins, semi join optimization creates n branches on same 
> key. Instead it should reuse one branch for all the joins.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981553#comment-15981553
 ] 

slim bouguerra commented on HIVE-16519:
---

[~sershe] can you review this please

> Fix exception thrown by checkOutputSpecs
> 
>
> Key: HIVE-16519
> URL: https://issues.apache.org/jira/browse/HIVE-16519
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: druid
> Attachments: HIVE-16519.patch
>
>
> do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16519:
--
Attachment: HIVE-16519.patch

> Fix exception thrown by checkOutputSpecs
> 
>
> Key: HIVE-16519
> URL: https://issues.apache.org/jira/browse/HIVE-16519
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: druid
> Attachments: HIVE-16519.patch
>
>
> do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16519:
--
Status: Patch Available  (was: Open)

> Fix exception thrown by checkOutputSpecs
> 
>
> Key: HIVE-16519
> URL: https://issues.apache.org/jira/browse/HIVE-16519
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: druid
> Attachments: HIVE-16519.patch
>
>
> do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16519) Fix exception thrown by checkOutputSpecs

2017-04-24 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-16519:
-


> Fix exception thrown by checkOutputSpecs
> 
>
> Key: HIVE-16519
> URL: https://issues.apache.org/jira/browse/HIVE-16519
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: druid
>
> do not throw exception by checkOutputSpecs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16513) width_bucket issues

2017-04-24 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981546#comment-15981546
 ] 

Sahil Takiar commented on HIVE-16513:
-

Yup, I'll try to take a look later today, if not tomorrow.

> width_bucket issues
> ---
>
> Key: HIVE-16513
> URL: https://issues.apache.org/jira/browse/HIVE-16513
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>
> width_bucket was recently added with HIVE-15982. This ticket notes a few 
> issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles 
> are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the 
> reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another 
> example (that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping 
> sets. These queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) 
> from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) 
> from e011_01;
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as 
> integer)) from e011_02 group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem 
> straightforward, valid, and it's strange that they don't work. Similar 
> queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe 
> [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing 
> functions like this don't need to think about value expression support but we 
> don't seem to be there yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16513) width_bucket issues

2017-04-24 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981540#comment-15981540
 ] 

Ashutosh Chauhan commented on HIVE-16513:
-

[~stakiar] Would you like to take a look?

> width_bucket issues
> ---
>
> Key: HIVE-16513
> URL: https://issues.apache.org/jira/browse/HIVE-16513
> Project: Hive
>  Issue Type: Bug
>Reporter: Carter Shanklin
>
> width_bucket was recently added with HIVE-15982. This ticket notes a few 
> issues.
> Usability issue:
> Currently only accepts integral numeric types. Decimals, floats and doubles 
> are not supported.
> Runtime failures: This query will cause a runtime divide-by-zero in the 
> reduce stage.
> select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1;
> The divide-by-zero seems to trigger any time I use a group-by. Here's another 
> example (that actually requires the group-by):
> select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1;
> Advanced Usage Issues:
> Suppose you have a table e011_01 as follows:
> create table e011_01 (c1 integer, c2 smallint);
> insert into e011_01 values (1, 1), (2, 2);
> Compile-time problems:
> You cannot use simple case expressions, searched case expressions or grouping 
> sets. These queries fail:
> select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) 
> from e011_01;
> select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) 
> from e011_01;
> select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as 
> integer)) from e011_02 group by cube(c1, c2);
> I'll admit the grouping one is pretty contrived but the case ones seem 
> straightforward, valid, and it's strange that they don't work. Similar 
> queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe 
> [~ashutoshc] can lend some perspective on that?
> Interestingly, you can use window functions in width_bucket, example:
> select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01;
> works just fine. Hopefully we can get to a place where people implementing 
> functions like this don't need to think about value expression support but we 
> don't seem to be there yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT

2017-04-24 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981539#comment-15981539
 ] 

Sergey Shelukhin commented on HIVE-16516:
-

+1. I think the idea is that the default should point to a snapshot, otherwise 
it would be impossible to change anything without a release. Then, the next 
Hive release would require a s-a release, and would need to point to that.

> Set storage-api.version to 3.0.0-SNAPSHOT
> -
>
> Key: HIVE-16516
> URL: https://issues.apache.org/jira/browse/HIVE-16516
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16516.1.patch
>
>
> I think the update of this property was missed during preparation to 3.0.0;
> I've bumped into this after cleaning the local .m2 repo caches.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16483) HoS should populate split related configurations to HiveConf

2017-04-24 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-16483:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Test failure not related. Committed to master. Thanks Xuefu for the review.

> HoS should populate split related configurations to HiveConf
> 
>
> Key: HIVE-16483
> URL: https://issues.apache.org/jira/browse/HIVE-16483
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Chao Sun
>Assignee: Chao Sun
> Fix For: 3.0.0
>
> Attachments: HIVE-16483.1.patch
>
>
> There are several split related configurations, such as 
> {{MAPREDMINSPLITSIZE}}, {{MAPREDMINSPLITSIZEPERNODE}}, 
> {{MAPREDMINSPLITSIZEPERRACK}}, etc., that should be populated to HiveConf. 
> Currently we only do this for {{MAPREDMINSPLITSIZE}}.
> All the others, if not set, will be using the default value, which is 1.
> Without these, Spark sometimes will not merge small files for file formats 
> such as text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-24 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Attachment: HIVE-16488.01.patch

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.2.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-24 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Status: Patch Available  (was: Open)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.2.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-24 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16488:

Attachment: (was: HIVE-16488.01.patch)

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.2.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot

2017-04-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16353:
---
Status: Patch Available  (was: Open)

> Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
> --
>
> Key: HIVE-16353
> URL: https://issues.apache.org/jira/browse/HIVE-16353
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16353.patch
>
>
> HIVE-16049 upgraded to jetty 9. It is committed to apache master which is 
> still 2.3.0-snapshot. This breaks couple of other components like LLAP and 
> ends up throwing the following error during runtime.
> {noformat}
> 2017-04-02T20:17:45,435 WARN  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP 
> Daemon with exception
> java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule
> at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) 
> ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>  ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) 
> [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> Caused by: java.lang.ClassNotFoundException: 
> org.eclipse.jetty.rewrite.handler.Rule
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77]
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77]
> ... 7 more
> 2017-04-02T20:17:45,441 INFO  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown 
> invoked
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot

2017-04-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16353:
---
Attachment: HIVE-16353.patch

This is possibly due to the replacement of jetty-all.jar with multiple sub jars 
in the pom.xml

> Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
> --
>
> Key: HIVE-16353
> URL: https://issues.apache.org/jira/browse/HIVE-16353
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16353.patch
>
>
> HIVE-16049 upgraded to jetty 9. It is committed to apache master which is 
> still 2.3.0-snapshot. This breaks couple of other components like LLAP and 
> ends up throwing the following error during runtime.
> {noformat}
> 2017-04-02T20:17:45,435 WARN  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP 
> Daemon with exception
> java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule
> at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) 
> ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>  ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) 
> [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> Caused by: java.lang.ClassNotFoundException: 
> org.eclipse.jetty.rewrite.handler.Rule
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77]
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77]
> ... 7 more
> 2017-04-02T20:17:45,441 INFO  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown 
> invoked
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats

2017-04-24 Thread Rentao Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981412#comment-15981412
 ] 

Rentao Wu commented on HIVE-16288:
--

I think these tests were added to the wrong directory (should be under 
itest/hive-blobstore instead of hive-blobstore).
https://github.com/apache/hive/commit/ea41d0a685fb346ce20075d0dcc3c736e375bb20

> Add blobstore tests for ORC and RCFILE file formats
> ---
>
> Key: HIVE-16288
> URL: https://issues.apache.org/jira/browse/HIVE-16288
> Project: Hive
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.1.1
>Reporter: Thomas Poepping
>Assignee: Thomas Poepping
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16288.patch
>
>
> This patch adds four tests each for ORC and RCFILE when running against 
> blobstore filesystems:
>   * Test for bucketed tables
>   * Test for nonpartitioned tables
>   * Test for partitioned tables
>   * Test for partitioned tables with nonstandard partition locations



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot

2017-04-24 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-16353:
--

Assignee: Gopal V

> Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
> --
>
> Key: HIVE-16353
> URL: https://issues.apache.org/jira/browse/HIVE-16353
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Gopal V
>Priority: Minor
>
> HIVE-16049 upgraded to jetty 9. It is committed to apache master which is 
> still 2.3.0-snapshot. This breaks couple of other components like LLAP and 
> ends up throwing the following error during runtime.
> {noformat}
> 2017-04-02T20:17:45,435 WARN  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP 
> Daemon with exception
> java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule
> at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) 
> ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>  ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385)
>  ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) 
> ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?]
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) 
> [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
> Caused by: java.lang.ClassNotFoundException: 
> org.eclipse.jetty.rewrite.handler.Rule
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77]
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77]
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77]
> ... 7 more
> 2017-04-02T20:17:45,441 INFO  [main ()] 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown 
> invoked
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16518) Insert override for druid does not replace all existing segments

2017-04-24 Thread Nishant Bangarwa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa updated HIVE-16518:

Fix Version/s: 3.0.0

> Insert override for druid does not replace all existing segments
> 
>
> Key: HIVE-16518
> URL: https://issues.apache.org/jira/browse/HIVE-16518
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
> Fix For: 3.0.0
>
>
> Insert override for Druid does not replace segments for all intervals. 
> It just replaces segments for the intervals which are newly ingested. 
> INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all 
> existing segments for the table. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16518) Insert override for druid does not replace all existing segments

2017-04-24 Thread Nishant Bangarwa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa updated HIVE-16518:

Component/s: Druid integration

> Insert override for druid does not replace all existing segments
> 
>
> Key: HIVE-16518
> URL: https://issues.apache.org/jira/browse/HIVE-16518
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
> Fix For: 3.0.0
>
>
> Insert override for Druid does not replace segments for all intervals. 
> It just replaces segments for the intervals which are newly ingested. 
> INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all 
> existing segments for the table. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16451) Race condition between HiveStatement.getQueryLog and HiveStatement.runAsyncOnServer

2017-04-24 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981334#comment-15981334
 ] 

Peter Vary commented on HIVE-16451:
---

On the weekend I did some thinking, and I realized that HiveStatement could not 
be made to thread safe (see execute, and getResultSet could be called from 
different threads and cause problems).

We can not make the HiveStatement thread safe, but we should at least make sure 
that calling getQueryLog will not cause problems if it is called parallel with 
any of the followings: cancel, close, execute, executeAsync, executeQuery, 
executeUpdate, getUpdateCount and more interestingly for the 
HiveQueryResultSet.next too. It is a quiet complex problem for the first 
glance, so I created a new jira for it: HIVE-16517 - HiveStatement thread 
safety issues.

In the meantime this patch could solve the problems which can arise during the 
happy path.



> Race condition between HiveStatement.getQueryLog and 
> HiveStatement.runAsyncOnServer
> ---
>
> Key: HIVE-16451
> URL: https://issues.apache.org/jira/browse/HIVE-16451
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, JDBC
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16451.02.patch, HIVE-16451.03.patch, 
> HIVE-16451.patch
>
>
> During the BeeLineDriver testing I have met the following race condition:
> - Run the query asynchronously through BeeLine
> - Querying the logs in the BeeLine
> In the following code:
> {code:title=HiveStatement.runAsyncOnServer}
>   private void runAsyncOnServer(String sql) throws SQLException {
> checkConnection("execute");
> closeClientOperation();
> initFlags();
> [..]
>   }
> {code}
> {code:title=HiveStatement.getQueryLog}
>   public List getQueryLog(boolean incremental, int fetchSize)
>   throws SQLException, ClosedOrCancelledStatementException {
> [..]
> try {
>   if (stmtHandle != null) {
> [..]
>   } else {
> if (isQueryClosed) {
>   throw new ClosedOrCancelledStatementException("Method getQueryLog() 
> failed. The " +
>   "statement has been closed or cancelled.");
> } else {
>   return logs;
> }
>   }
> } catch (SQLException e) {
> [..]
> }
> [..]
>   }
> {code}
> The runAsyncOnServer {{closeClientOperation}} sets {{isQueryClosed}} flag to 
> true:
> {code:title=HiveStatement.closeClientOperation}
>   void closeClientOperation() throws SQLException {
> [..]
> isQueryClosed = true;
> isExecuteStatementFailed = false;
> stmtHandle = null;
>   }
> {code}
> The {{initFlags}} sets it to false:
> {code}
>   private void initFlags() {
> isCancelled = false;
> isQueryClosed = false;
> isLogBeingGenerated = true;
> isExecuteStatementFailed = false;
> isOperationComplete = false;
>   }
> {code}
> If the {{getQueryLog}} is called after the {{closeClientOperation}}, but 
> before the {{initFlags}}, then we will have a following warning if verbose 
> mode is set to true in BeeLine:
> {code}
> Warning: org.apache.hive.jdbc.ClosedOrCancelledStatementException: Method 
> getQueryLog() failed. The statement has been closed or cancelled. 
> (state=,code=0)
> {code}
> This caused this fail:
> https://builds.apache.org/job/PreCommit-HIVE-Build/4691/testReport/org.apache.hadoop.hive.cli/TestBeeLineDriver/testCliDriver_smb_mapjoin_11_/
> {code}
> Error Message
> Client result comparison failed with error code = 1 while executing 
> fname=smb_mapjoin_11
> 16a17
> > Warning: org.apache.hive.jdbc.ClosedOrCancelledStatementException: Method 
> > getQueryLog() failed. The statement has been closed or cancelled. 
> > (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16518) Insert override for druid does not replace all existing segments

2017-04-24 Thread Nishant Bangarwa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa reassigned HIVE-16518:
---


> Insert override for druid does not replace all existing segments
> 
>
> Key: HIVE-16518
> URL: https://issues.apache.org/jira/browse/HIVE-16518
> Project: Hive
>  Issue Type: Bug
>Reporter: Nishant Bangarwa
>Assignee: Nishant Bangarwa
>
> Insert override for Druid does not replace segments for all intervals. 
> It just replaces segments for the intervals which are newly ingested. 
> INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all 
> existing segments for the table. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16426) Query cancel: improve the way to handle files

2017-04-24 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-16426:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Push into master.  Thanks [~aihuaxu] and [~ctang.ma] for reviewing the patch.

> Query cancel: improve the way to handle files
> -
>
> Key: HIVE-16426
> URL: https://issues.apache.org/jira/browse/HIVE-16426
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Fix For: 3.0.0
>
> Attachments: HIVE-16426.1.patch
>
>
> 1. Add data structure support to make it is easy to check query cancel status.
> 2. Handle query cancel more gracefully. Remove possible file leaks caused by 
> query cancel as shown in following stack:
> {noformat}
> 2017-04-11 09:57:30,727 WARN  org.apache.hadoop.hive.ql.exec.Utilities: 
> [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories.
> java.io.InterruptedIOException: Call interrupted
> at org.apache.hadoop.ipc.Client.call(Client.java:1496)
> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy20.delete(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy21.delete(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463)
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 3. Add checkpoints to related file operations to improve response time for 
> query cancelling. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15571) Support Insert into for druid storage handler

2017-04-24 Thread Nishant Bangarwa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Bangarwa updated HIVE-15571:

Attachment: HIVE-15571.01.patch

> Support Insert into for druid storage handler
> -
>
> Key: HIVE-15571
> URL: https://issues.apache.org/jira/browse/HIVE-15571
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: Nishant Bangarwa
> Attachments: HIVE-15571.01.patch
>
>
> Add support of inset into operator for druid storage handler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT

2017-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16516:

Status: Patch Available  (was: Open)

[~owen.omalley] could you take a look?
...am I right regarding changing this to 3.0.0-SNAPSHOT ...or this property 
should point to a 'released' version?

> Set storage-api.version to 3.0.0-SNAPSHOT
> -
>
> Key: HIVE-16516
> URL: https://issues.apache.org/jira/browse/HIVE-16516
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16516.1.patch
>
>
> I think the update of this property was missed during preparation to 3.0.0;
> I've bumped into this after cleaning the local .m2 repo caches.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT

2017-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16516:

Attachment: HIVE-16516.1.patch

> Set storage-api.version to 3.0.0-SNAPSHOT
> -
>
> Key: HIVE-16516
> URL: https://issues.apache.org/jira/browse/HIVE-16516
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16516.1.patch
>
>
> I think the update of this property was missed during preparation to 3.0.0;
> I've bumped into this after cleaning the local .m2 repo caches.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT

2017-04-24 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-16516:
---


> Set storage-api.version to 3.0.0-SNAPSHOT
> -
>
> Key: HIVE-16516
> URL: https://issues.apache.org/jira/browse/HIVE-16516
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-16516.1.patch
>
>
> I think the update of this property was missed during preparation to 3.0.0;
> I've bumped into this after cleaning the local .m2 repo caches.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16449) BeeLineDriver should handle query result sorting

2017-04-24 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-16449:
--
Attachment: HIVE-16449.06.patch

Retrigger the tests with the same patch file

> BeeLineDriver should handle query result sorting
> 
>
> Key: HIVE-16449
> URL: https://issues.apache.org/jira/browse/HIVE-16449
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16449.02.patch, HIVE-16449.03.patch, 
> HIVE-16449.04.patch, HIVE-16449.05.patch, HIVE-16449.06.patch, 
> HIVE-16449.patch
>
>
> The CLI driver supports the following features:
> -- SORT_QUERY_RESULTS
> -- HASH_QUERY_RESULTS
> -- SORT_AND_HASH_QUERY_RESULTS
> BeeLineDriver should find a way to support these



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16450) Some metastore operations are not retried even with desired underlining exceptions

2017-04-24 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16450:

Attachment: HIVE-16450.2.patch

> Some metastore operations are not retried even with desired underlining 
> exceptions
> --
>
> Key: HIVE-16450
> URL: https://issues.apache.org/jira/browse/HIVE-16450
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16450.1.patch, HIVE-16450.2.patch
>
>
> In RetryingHMSHandler class, we are expecting the operations should retry 
> when the cause of MetaException is JDOException or NucleusException.
> {noformat}
> if (e.getCause() instanceof MetaException && e.getCause().getCause() 
> != null) {
>   if (e.getCause().getCause() instanceof javax.jdo.JDOException ||
>   e.getCause().getCause() instanceof NucleusException) {
> // The JDOException or the Nucleus Exception may be wrapped 
> further in a MetaException
> caughtException = e.getCause().getCause();
>}
> {noformat}
> While in ObjectStore, many places we are only throwing new MetaException(msg) 
> without the cause, so we are missing retrying for some cases. e.g., with the 
> following JDOException, we should retry but it's ignored.
> {noformat}
> 2017-04-04 17:28:21,602 ERROR metastore.ObjectStore 
> (ObjectStore.java:getMTableColumnStatistics(6555)) - Error retrieving 
> statistics via jdo
> javax.jdo.JDOException: Exception thrown when executing query
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6546)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6606)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6595)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2633)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6594)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6588)
> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
> at com.sun.proxy.$Proxy0.getTableColumnStatistics(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:787)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:247)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3809)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3779)
> at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy3.alter_table_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9617)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9601)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
>

[jira] [Updated] (HIVE-16450) Some metastore operations are not retried even with desired underlining exceptions

2017-04-24 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16450:

Attachment: (was: HIVE-16450.2.patch)

> Some metastore operations are not retried even with desired underlining 
> exceptions
> --
>
> Key: HIVE-16450
> URL: https://issues.apache.org/jira/browse/HIVE-16450
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16450.1.patch, HIVE-16450.2.patch
>
>
> In RetryingHMSHandler class, we are expecting the operations should retry 
> when the cause of MetaException is JDOException or NucleusException.
> {noformat}
> if (e.getCause() instanceof MetaException && e.getCause().getCause() 
> != null) {
>   if (e.getCause().getCause() instanceof javax.jdo.JDOException ||
>   e.getCause().getCause() instanceof NucleusException) {
> // The JDOException or the Nucleus Exception may be wrapped 
> further in a MetaException
> caughtException = e.getCause().getCause();
>}
> {noformat}
> While in ObjectStore, many places we are only throwing new MetaException(msg) 
> without the cause, so we are missing retrying for some cases. e.g., with the 
> following JDOException, we should retry but it's ignored.
> {noformat}
> 2017-04-04 17:28:21,602 ERROR metastore.ObjectStore 
> (ObjectStore.java:getMTableColumnStatistics(6555)) - Error retrieving 
> statistics via jdo
> javax.jdo.JDOException: Exception thrown when executing query
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6546)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6606)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6595)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2633)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6594)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6588)
> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
> at com.sun.proxy.$Proxy0.getTableColumnStatistics(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:787)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:247)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3809)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3779)
> at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy3.alter_table_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9617)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9601)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> 

[jira] [Updated] (HIVE-16450) Some metastore operations are not retried even with desired underlining exceptions

2017-04-24 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-16450:

Status: In Progress  (was: Patch Available)

> Some metastore operations are not retried even with desired underlining 
> exceptions
> --
>
> Key: HIVE-16450
> URL: https://issues.apache.org/jira/browse/HIVE-16450
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-16450.1.patch, HIVE-16450.2.patch
>
>
> In RetryingHMSHandler class, we are expecting the operations should retry 
> when the cause of MetaException is JDOException or NucleusException.
> {noformat}
> if (e.getCause() instanceof MetaException && e.getCause().getCause() 
> != null) {
>   if (e.getCause().getCause() instanceof javax.jdo.JDOException ||
>   e.getCause().getCause() instanceof NucleusException) {
> // The JDOException or the Nucleus Exception may be wrapped 
> further in a MetaException
> caughtException = e.getCause().getCause();
>}
> {noformat}
> While in ObjectStore, many places we are only throwing new MetaException(msg) 
> without the cause, so we are missing retrying for some cases. e.g., with the 
> following JDOException, we should retry but it's ignored.
> {noformat}
> 2017-04-04 17:28:21,602 ERROR metastore.ObjectStore 
> (ObjectStore.java:getMTableColumnStatistics(6555)) - Error retrieving 
> statistics via jdo
> javax.jdo.JDOException: Exception thrown when executing query
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
> at 
> org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6546)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6606)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6595)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2633)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6594)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6588)
> at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103)
> at com.sun.proxy.$Proxy0.getTableColumnStatistics(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:787)
> at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:247)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3809)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3779)
> at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy3.alter_table_with_environment_context(Unknown 
> Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9617)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9601)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
> at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> 

  1   2   >