[VOTE] Shall we release Hive 2.2.0 rc1?

2017-07-21 Thread Owen O'Malley
Ok, I rolled a new RC with the Apache header problem fixed.

Artifacts: https://home.apache.org/~omalley/hive-2.2.0/
Tag: https://github.com/apache/hive/releases/tag/release-2.2.0rc1 da840b0

.. Owen


Re: Review Request 61010: HIVE-17128 Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-21 Thread Andrew Sherman


> On July 20, 2017, 11:59 p.m., Aihua Xu wrote:
> > service/src/java/org/apache/hive/service/cli/operation/Operation.java
> > Lines 269 (patched)
> > 
> >
> > We only register the Operation log appender once for all the operation 
> > logs at the beginning. Now if you stop the appender here, then the 
> > following queries will not be able to output to the appender any more, 
> > right?
> > 
> > Can you test your patch by: connect to HS2 from one beeline session, 
> > disconnect and reconnect. Then see if you still see output from the beeline 
> > console?
> > 
> > Will we be able to close OutputStream instead?  stopQueryAppender() 
> > should be called when HS2 service gets shutdown.
> 
> Peter Vary wrote:
> Nice catch Andrew!
> 
> One more thing. Could you please do this for the LogDivertAppenderForTest 
> too? It is only used for tests, but it would be good to clean up it too.
> 
> Thanks,
> Peter
> 
> Andrew Sherman wrote:
> @Aihua thanks for the stimulating question. I ran hand tests to prove 
> that logging works for multiple queries in the same session, and also in a 
> new session. The reason the code is OK is that it is not the RoutineAppender 
> that is closed, but the specific Appender for the query. In 
> https://logging.apache.org/log4j/2.x/manual/appenders.html#RoutingAppender 
> this Appender is referred to as a subordinate Appender. I'm updating the code 
> to make this clearer.
> 
> @Peter I will look at LogDivertAppenderForTest to see if I can do the 
> same thing there.
> 
> Aihua Xu wrote:
> I see. That makes sense. I will take a second look. Thanks for the 
> explanation.

Thanks Aihua for the review.
I have updated the review with changes requested by Peter to deal with 
LogDivertAppenderForTest


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61010/#review181083
---


On July 22, 2017, 12:16 a.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61010/
> ---
> 
> (Updated July 22, 2017, 12:16 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously HIVE-16061 and HIVE-16400 changed Operation Logging to use the 
> Log4j2
> RoutingAppender to automatically output the log for each query into an
> individual operation log file.  As log4j does not know when a query is 
> finished
> it keeps the OutputStream in the subordinate Appender open even when the query
> completes.  The stream holds a file descriptor and so we leak file 
> descriptors.
> Note that we are already careful to close any streams reading from the 
> operation
> log file.  To fix this we use a technique described in the comments of
> LOG4J2-510 which uses reflection to close the subordinate Appender.  We use 
> this
> to close the per-query subordinate Appenders from both LogDivertAppender and
> LogDivertAppenderForTest.  The test in TestOperationLoggingLayout is extended 
> to
> check that the Appenders are closed when a query completes.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/LogUtils.java 
> 83f3af7440253bfbcedbc8b21d745fb71c0d7fb9 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingLayout.java
>  1a8337f574bb753e8c3c48a6b477b17700b05256 
>   ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java 
> e697b545984555414e27bafe92d7f22829a22687 
>   ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppenderForTest.java 
> 465844d66b92b371f457fda0d885d75fbfce6805 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 8d453d5d9153c2ec86c4adc7a68bd3b5dd249743 
> 
> 
> Diff: https://reviews.apache.org/r/61010/diff/2/
> 
> 
> Testing
> ---
> 
> Hand testing to show leak has gone.
> The test in TestOperationLoggingLayout is extended to check that the Appender 
> is closed.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



Re: Review Request 61010: HIVE-17128 Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-21 Thread Andrew Sherman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61010/
---

(Updated July 22, 2017, 12:16 a.m.)


Review request for hive.


Repository: hive-git


Description (updated)
---

Previously HIVE-16061 and HIVE-16400 changed Operation Logging to use the Log4j2
RoutingAppender to automatically output the log for each query into an
individual operation log file.  As log4j does not know when a query is finished
it keeps the OutputStream in the subordinate Appender open even when the query
completes.  The stream holds a file descriptor and so we leak file descriptors.
Note that we are already careful to close any streams reading from the operation
log file.  To fix this we use a technique described in the comments of
LOG4J2-510 which uses reflection to close the subordinate Appender.  We use this
to close the per-query subordinate Appenders from both LogDivertAppender and
LogDivertAppenderForTest.  The test in TestOperationLoggingLayout is extended to
check that the Appenders are closed when a query completes.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/LogUtils.java 
83f3af7440253bfbcedbc8b21d745fb71c0d7fb9 
  
itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingLayout.java
 1a8337f574bb753e8c3c48a6b477b17700b05256 
  ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java 
e697b545984555414e27bafe92d7f22829a22687 
  ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppenderForTest.java 
465844d66b92b371f457fda0d885d75fbfce6805 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 
8d453d5d9153c2ec86c4adc7a68bd3b5dd249743 


Diff: https://reviews.apache.org/r/61010/diff/2/

Changes: https://reviews.apache.org/r/61010/diff/1-2/


Testing
---

Hand testing to show leak has gone.
The test in TestOperationLoggingLayout is extended to check that the Appender 
is closed.


Thanks,

Andrew Sherman



Re: Review Request 61010: HIVE-17128 Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-21 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61010/#review181158
---


Ship it!




Ship It!

- Aihua Xu


On July 20, 2017, 10:45 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61010/
> ---
> 
> (Updated July 20, 2017, 10:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously HIVE-16061 and HIVE-16400 changed Operation Logging to use the 
> Log4j2
> RoutingAppender to automatically output the log for each query into an
> individual operation log file.  As log4j does not know when a query is 
> finished
> it keeps the OutputStream in the Appender open even when the query completes.
> The stream holds a file descriptor and so we leak file descriptors. Note that 
> we
> are already careful to close any streams reading from the operation log file.
> To fix this we use a technique described in the comments of LOG4J2-510 which
> uses reflection to close the appender. The test in TestOperationLoggingLayout 
> is
> extended to check that the Appender is closed.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingLayout.java
>  1a8337f574bb753e8c3c48a6b477b17700b05256 
>   ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java 
> e697b545984555414e27bafe92d7f22829a22687 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 8d453d5d9153c2ec86c4adc7a68bd3b5dd249743 
> 
> 
> Diff: https://reviews.apache.org/r/61010/diff/1/
> 
> 
> Testing
> ---
> 
> Hand testing to show leak has gone.
> The test in TestOperationLoggingLayout is extended to check that the Appender 
> is closed.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



[GitHub] hive pull request #209: HIVE-17154. Fix RAT problem on branch-2.2.

2017-07-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/hive/pull/209


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 61009: Extend object store to store bit vectors

2017-07-21 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61009/#review181146
---




common/src/java/org/apache/hadoop/hive/common/ndv/FMSketch.java
Lines 177 (patched)


Should close bos stream to avoid leak.



common/src/java/org/apache/hadoop/hive/common/ndv/FMSketch.java
Lines 184 (patched)


need to close is stream.



common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
Lines 27 (patched)


You may use java.util.Base64.RFC2045 encoder/decoder which is present in 
jdk8.



common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
Lines 38 (patched)


Whats the reason for Base64 encoding. All databases we use support binary 
type where we can store bit vector as is. No need to use varchar.



common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
Lines 41 (patched)


Need to close the stream, otherwise file-descriptor leak.



common/src/java/org/apache/hadoop/hive/common/ndv/fm/FMSketchUtils.java
Lines 55 (patched)


Doesnt this mean that it will always be in ASCII range. IF so, any 
advantage of doing Base64 encoding on it subsequently?



common/src/java/org/apache/hadoop/hive/common/ndv/fm/FMSketchUtils.java
Lines 89 (patched)


Caller made this check. No advantage of redoing it.



common/src/java/org/apache/hadoop/hive/common/ndv/fm/FMSketchUtils.java
Lines 90 (patched)


ws



metastore/scripts/upgrade/derby/044-HIVE-16997.derby.sql
Lines 1 (patched)


May use blob data type instead.



metastore/scripts/upgrade/mssql/029-HIVE-16997.mssql.sql
Lines 1 (patched)


May use varbinary instead.



metastore/scripts/upgrade/mysql/044-HIVE-16997.mysql.sql
Lines 1 (patched)


May use blob type instead.



metastore/scripts/upgrade/oracle/044-HIVE-16997.oracle.sql
Lines 1 (patched)


May use varbinary type instead.



metastore/scripts/upgrade/postgres/043-HIVE-16997.postgres.sql
Lines 1 (patched)


May use bytea type.



metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 1507-1508 (original)


Seems like we are going to loose this extrapolation logic.



metastore/src/model/package.jdo
Lines 883 (patched)


may use blob type.


- Ashutosh Chauhan


On July 20, 2017, 10:41 p.m., pengcheng xiong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61009/
> ---
> 
> (Updated July 20, 2017, 10:41 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16997
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/ndv/FMSketch.java e20d29954a 
>   
> common/src/java/org/apache/hadoop/hive/common/ndv/NumDistinctValueEstimatorFactory.java
>  e810ac5487 
>   common/src/java/org/apache/hadoop/hive/common/ndv/fm/FMSketchUtils.java 
> PRE-CREATION 
>   common/src/java/org/apache/hadoop/hive/common/ndv/hll/HyperLogLog.java 
> d1955468a6 
>   
> common/src/test/org/apache/hadoop/hive/common/ndv/fm/TestFMSketchSerialization.java
>  PRE-CREATION 
>   metastore/scripts/upgrade/derby/044-HIVE-16997.derby.sql PRE-CREATION 
>   metastore/scripts/upgrade/derby/hive-schema-3.0.0.derby.sql a9a532906f 
>   metastore/scripts/upgrade/derby/upgrade-2.3.0-to-3.0.0.derby.sql 30513dc882 
>   metastore/scripts/upgrade/mssql/029-HIVE-16997.mssql.sql PRE-CREATION 
>   metastore/scripts/upgrade/mssql/hive-schema-3.0.0.mssql.sql 1cfe2d1b2d 
>   metastore/scripts/upgrade/mssql/upgrade-2.3.0-to-3.0.0.mssql.sql 5683254b04 
>   metastore/scripts/upgrade/mysql/044-HIVE-16997.mysql.sql PRE-CREATION 
>   metastore/scripts/upgrade/mysql/hive-schema-3.0.0.mysql.sql 97d881f263 
>   metastore/scripts/upgrade/mysql/upgrade-2.3.0-to-3.0.0.mysql.sql ba62939809 
>   metastore/scripts/upgrade/oracle/044-HIVE-16997.oracle.sql PRE-CREATION 
>   metastore/scripts/upgrade/oracle/hive-schema-3.0.0.oracle.sql 8fdb552367 

Re: [VOTE] Shall we release Hive 2.2.0 rc0?

2017-07-21 Thread Owen O'Malley
Ok, Alan discovered that there are RAT failures, so I'll roll a new RC.

Thanks,
   Owen

On Thu, Jul 20, 2017 at 1:40 PM, Owen O'Malley 
wrote:

> All,
>The Hive branch-2.2 is passing its tests and I'd like to release it.
>
> Artifacts: https://home.apache.org/~omalley/hive-2.2.0/
> Tag: https://github.com/apache/hive/releases/tag/release-2.2.0rc0 0068fcc
>
> I'm going through the list of issues on Jira today to make sure they match.
>
> Thanks,
>Owen
>


[GitHub] hive pull request #209: HIVE-17154. Fix RAT problem on branch-2.2.

2017-07-21 Thread omalley
GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/209

HIVE-17154. Fix RAT problem on branch-2.2.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-17154

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #209


commit 27b22fe1c99dc0fe1fe30bc0cd1f1188b0b7d299
Author: Owen O'Malley 
Date:   2017-07-21T21:48:00Z

HIVE-17154. Fix RAT problem on branch-2.2.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (HIVE-17155) findConfFile() in HiveConf.java has some issues with the conf path

2017-07-21 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-17155:
---

 Summary: findConfFile() in HiveConf.java has some issues with the 
conf path
 Key: HIVE-17155
 URL: https://issues.apache.org/jira/browse/HIVE-17155
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 3.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor


In findConfFile() function of HiveConf.java, here are some issues. 
File.pathSeparator which is ":" is used as the separator rather than "/". new 
File(jarUri).getParentFile() will get the "$hive_home/lib" folder, but actually 
we want "$hive_home".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17154) fix rat problems in branch-2.2

2017-07-21 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-17154:


 Summary: fix rat problems in branch-2.2
 Key: HIVE-17154
 URL: https://issues.apache.org/jira/browse/HIVE-17154
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Fix rat problems in the branch-2.2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-21 Thread Sahil Takiar (JIRA)
Sahil Takiar created HIVE-17153:
---

 Summary: Flaky test: 
TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
 Key: HIVE-17153
 URL: https://issues.apache.org/jira/browse/HIVE-17153
 Project: Hive
  Issue Type: Sub-task
  Components: Spark, Test
Reporter: Sahil Takiar
Assignee: Sahil Takiar


{code}
Client Execution succeeded but contained differences (error code = 1) after 
executing spark_dynamic_partition_pruning.q 
3703c3703
<   target work: Map 4
---
>   target work: Map 1
3717c3717
<   target work: Map 1
---
>   target work: Map 4
3746c3746
<   target work: Map 4
---
>   target work: Map 1
3760c3760
<   target work: Map 1
---
>   target work: Map 4
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17152) Improve security of random generator for HS2 cookies

2017-07-21 Thread Tao Li (JIRA)
Tao Li created HIVE-17152:
-

 Summary: Improve security of random generator for HS2 cookies
 Key: HIVE-17152
 URL: https://issues.apache.org/jira/browse/HIVE-17152
 Project: Hive
  Issue Type: Bug
Reporter: Tao Li
Assignee: Tao Li


The random number generated is used as a secret to append to a sequence and SHA 
to implement a CookieSigner. If this is attackable, then it's possible for an 
attacker to sign a cookie as if we had. We should fix this and use SecureRandom 
as a stronger random function .

HTTPAuthUtils has a similar issue. If that is attackable, an attacker might be 
able to create a similar cookie. Paired with the above issue with the 
CookieSigner, it could reasonably spoof a HS2 cookie.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61010: HIVE-17128 Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-21 Thread Aihua Xu


> On July 20, 2017, 11:59 p.m., Aihua Xu wrote:
> > service/src/java/org/apache/hive/service/cli/operation/Operation.java
> > Lines 269 (patched)
> > 
> >
> > We only register the Operation log appender once for all the operation 
> > logs at the beginning. Now if you stop the appender here, then the 
> > following queries will not be able to output to the appender any more, 
> > right?
> > 
> > Can you test your patch by: connect to HS2 from one beeline session, 
> > disconnect and reconnect. Then see if you still see output from the beeline 
> > console?
> > 
> > Will we be able to close OutputStream instead?  stopQueryAppender() 
> > should be called when HS2 service gets shutdown.
> 
> Peter Vary wrote:
> Nice catch Andrew!
> 
> One more thing. Could you please do this for the LogDivertAppenderForTest 
> too? It is only used for tests, but it would be good to clean up it too.
> 
> Thanks,
> Peter
> 
> Andrew Sherman wrote:
> @Aihua thanks for the stimulating question. I ran hand tests to prove 
> that logging works for multiple queries in the same session, and also in a 
> new session. The reason the code is OK is that it is not the RoutineAppender 
> that is closed, but the specific Appender for the query. In 
> https://logging.apache.org/log4j/2.x/manual/appenders.html#RoutingAppender 
> this Appender is referred to as a subordinate Appender. I'm updating the code 
> to make this clearer.
> 
> @Peter I will look at LogDivertAppenderForTest to see if I can do the 
> same thing there.

I see. That makes sense. I will take a second look. Thanks for the explanation.


- Aihua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61010/#review181083
---


On July 20, 2017, 10:45 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61010/
> ---
> 
> (Updated July 20, 2017, 10:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously HIVE-16061 and HIVE-16400 changed Operation Logging to use the 
> Log4j2
> RoutingAppender to automatically output the log for each query into an
> individual operation log file.  As log4j does not know when a query is 
> finished
> it keeps the OutputStream in the Appender open even when the query completes.
> The stream holds a file descriptor and so we leak file descriptors. Note that 
> we
> are already careful to close any streams reading from the operation log file.
> To fix this we use a technique described in the comments of LOG4J2-510 which
> uses reflection to close the appender. The test in TestOperationLoggingLayout 
> is
> extended to check that the Appender is closed.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingLayout.java
>  1a8337f574bb753e8c3c48a6b477b17700b05256 
>   ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java 
> e697b545984555414e27bafe92d7f22829a22687 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 8d453d5d9153c2ec86c4adc7a68bd3b5dd249743 
> 
> 
> Diff: https://reviews.apache.org/r/61010/diff/1/
> 
> 
> Testing
> ---
> 
> Hand testing to show leak has gone.
> The test in TestOperationLoggingLayout is extended to check that the Appender 
> is closed.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



[jira] [Created] (HIVE-17151) Uploaded image for project: 'Bug DB' Bug DBBUG-84676 Hive tables are being created in public schema with (Postgres DB)

2017-07-21 Thread amarnath reddy pappu (JIRA)
amarnath reddy pappu created HIVE-17151:
---

 Summary: Uploaded image for project: 'Bug DB'   Bug 
DBBUG-84676 Hive tables are being created in public schema with (Postgres DB)
 Key: HIVE-17151
 URL: https://issues.apache.org/jira/browse/HIVE-17151
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: amarnath reddy pappu
Priority: Critical


1. Installing Hive with "Existing PostgreSQL Database " option creates tables 
in public schema even schema name is specified during the installation 
configurations.
2. lets say if customer creates Postgres user with search_path only pointing to 
that specific schema and don't have any visibility to public schema then 
installation would fail during the insert queries stage as created tables are 
not part of selected schema. It fails with insert query errors.
3. In 
/var/lib/ambari-server/common-services/HIVE/0.12.0.2.0/package/etc/hive-schema-0.12.0.postgres.sql
 file we are hard coding the search_path as below which is causing the issue.
SET search_path = public, pg_catalog;
Options



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61010: HIVE-17128 Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-21 Thread Andrew Sherman


> On July 20, 2017, 11:59 p.m., Aihua Xu wrote:
> > service/src/java/org/apache/hive/service/cli/operation/Operation.java
> > Lines 269 (patched)
> > 
> >
> > We only register the Operation log appender once for all the operation 
> > logs at the beginning. Now if you stop the appender here, then the 
> > following queries will not be able to output to the appender any more, 
> > right?
> > 
> > Can you test your patch by: connect to HS2 from one beeline session, 
> > disconnect and reconnect. Then see if you still see output from the beeline 
> > console?
> > 
> > Will we be able to close OutputStream instead?  stopQueryAppender() 
> > should be called when HS2 service gets shutdown.
> 
> Peter Vary wrote:
> Nice catch Andrew!
> 
> One more thing. Could you please do this for the LogDivertAppenderForTest 
> too? It is only used for tests, but it would be good to clean up it too.
> 
> Thanks,
> Peter

@Aihua thanks for the stimulating question. I ran hand tests to prove that 
logging works for multiple queries in the same session, and also in a new 
session. The reason the code is OK is that it is not the RoutineAppender that 
is closed, but the specific Appender for the query. In 
https://logging.apache.org/log4j/2.x/manual/appenders.html#RoutingAppender this 
Appender is referred to as a subordinate Appender. I'm updating the code to 
make this clearer.

@Peter I will look at LogDivertAppenderForTest to see if I can do the same 
thing there.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61010/#review181083
---


On July 20, 2017, 10:45 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61010/
> ---
> 
> (Updated July 20, 2017, 10:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously HIVE-16061 and HIVE-16400 changed Operation Logging to use the 
> Log4j2
> RoutingAppender to automatically output the log for each query into an
> individual operation log file.  As log4j does not know when a query is 
> finished
> it keeps the OutputStream in the Appender open even when the query completes.
> The stream holds a file descriptor and so we leak file descriptors. Note that 
> we
> are already careful to close any streams reading from the operation log file.
> To fix this we use a technique described in the comments of LOG4J2-510 which
> uses reflection to close the appender. The test in TestOperationLoggingLayout 
> is
> extended to check that the Appender is closed.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingLayout.java
>  1a8337f574bb753e8c3c48a6b477b17700b05256 
>   ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java 
> e697b545984555414e27bafe92d7f22829a22687 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 8d453d5d9153c2ec86c4adc7a68bd3b5dd249743 
> 
> 
> Diff: https://reviews.apache.org/r/61010/diff/1/
> 
> 
> Testing
> ---
> 
> Hand testing to show leak has gone.
> The test in TestOperationLoggingLayout is extended to check that the Appender 
> is closed.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



Re: [DISCUSS] Separating out the metastore as its own TLP

2017-07-21 Thread Alan Gates
It seems we have settled into a consensus that this will be good for the
ecosystem, but there are concerns that this will be a burden on Hive.  The
original proposal included a time to work on the separation inside the Hive
project to help address any such issues.  This would culminate in a source
only release of the metastore inside Hive.  I propose we start working on
that internal separation now.  I’ll file an umbrella JIRA for it soon.

Alan.

On Tue, Jul 11, 2017 at 3:41 PM, Lefty Leverenz 
wrote:

> >> I'd like to suggest Riven.  (Owen O'Malley)
>
> > How about "Flora"?  (Andrew Sherman)
>
> Nice idea and thanks for introducing me to that book, Andrew.
>
> Along the same lines, how about "Honeycomb"?
>
> But since the idea is to make the metastore useful for many projects, a
> generic name that starts with "Meta" would be less confusing ... even
> though it breaks the tradition of Apache projects having quirky names.
> Unfortunately "Metalog" is already in use.  "Metamorph" has other
> connotations, but it's cool.
>
> Naming enthusiasm notwithstanding, I'm +/-0 on the idea of splitting off
> the metastore into a new project:  -0.5 for the sake of Hive and +0.5 for
> the greater good.  Wishy-washy, that's me.
>
> -- Lefty
>
>
> On Tue, Jul 11, 2017 at 1:04 PM, Andrew Sherman 
> wrote:
>
> > On Fri, Jun 30, 2017 at 5:05 PM, Owen O'Malley 
> > wrote:
> >
> > > On Fri, Jun 30, 2017 at 3:26 PM, Chao Sun  wrote:
> > >
> > > > and maybe a different project name?
> > > >
> > >
> > > Yes, it certainly needs a new name. I'd like to suggest Riven.
> > >
> > > .. Owen
> > >
> >
> > How about "Flora"?
> >
> > (Flora is the protagonist of The Bees by Laline Paull)
> >
> > -Andrew
> >
>


Re: Review Request 61041: HIVE-17150: CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-21 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61041/
---

(Updated July 21, 2017, 5:51 p.m.)


Review request for hive, Alexander Kolbasov, Mohit Sabharwal, and Vihang 
Karajgaonkar.


Changes
---

Fixing HiveQA tests.


Bugs: HIVE-17150
https://issues.apache.org/jira/browse/HIVE-17150


Repository: hive-git


Description
---

The patch adds a new parameter, HIVE_METASTORE_TRANSACTION_ACTIVE, to the 
parameters passed to the notification listeners. This parameter has a 
true/false value dependinf if the HMS is running in a transaction or not.


Diffs (updated)
-

  
hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
 6d7ee4cb824ba40537876bd0629831a19ac91d76 
  
hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/MetaStoreEventListenerConstants.java
 a4f2d592ced6427a05177e67be27286c408744a6 
  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 b016920fa57501b9f07f1810b2f8010e40575efd 
  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/TestDbNotificationListener.java
 808c9c7c36fd0ba36adac7be942c4841ab0a08a8 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
58b9044930046758a83ee499692e5593cd82f9e0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreListenerNotifier.java
 20011ccec83e87a55de7668b86773bb817135cbd 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
8f6af9f346102359c3cd1a6c27000f46e1ddbae6 
  metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 
3ac4fe1604c7b0b455894b8e6293484e9226836e 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
5a45051f4417352f20334319c796dc0ca2d8ad9e 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 bd33c7101f5bc01a482153f810b65b620a061636 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 94cbd5235dce0b49b92203ff92d0d2e29e8b2ca9 


Diff: https://reviews.apache.org/r/61041/diff/2/

Changes: https://reviews.apache.org/r/61041/diff/1-2/


Testing
---


Thanks,

Sergio Pena



Review Request 61041: HIVE-17150: CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-21 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61041/
---

Review request for hive, Alexander Kolbasov, Mohit Sabharwal, and Vihang 
Karajgaonkar.


Bugs: HIVE-17150
https://issues.apache.org/jira/browse/HIVE-17150


Repository: hive-git


Description
---

The patch adds a new parameter, HIVE_METASTORE_TRANSACTION_ACTIVE, to the 
parameters passed to the notification listeners. This parameter has a 
true/false value dependinf if the HMS is running in a transaction or not.


Diffs
-

  
hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
 6d7ee4cb824ba40537876bd0629831a19ac91d76 
  
hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/MetaStoreEventListenerConstants.java
 a4f2d592ced6427a05177e67be27286c408744a6 
  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 b016920fa57501b9f07f1810b2f8010e40575efd 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
58b9044930046758a83ee499692e5593cd82f9e0 
  
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreListenerNotifier.java
 20011ccec83e87a55de7668b86773bb817135cbd 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 
8f6af9f346102359c3cd1a6c27000f46e1ddbae6 
  metastore/src/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java 
3ac4fe1604c7b0b455894b8e6293484e9226836e 
  metastore/src/java/org/apache/hadoop/hive/metastore/hbase/HBaseStore.java 
5a45051f4417352f20334319c796dc0ca2d8ad9e 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 bd33c7101f5bc01a482153f810b65b620a061636 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 94cbd5235dce0b49b92203ff92d0d2e29e8b2ca9 


Diff: https://reviews.apache.org/r/61041/diff/1/


Testing
---


Thanks,

Sergio Pena



[jira] [Created] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction

2017-07-21 Thread JIRA
Sergio Peña created HIVE-17150:
--

 Summary: CREATE INDEX execute HMS out-of-transaction listener 
calls inside a transaction
 Key: HIVE-17150
 URL: https://issues.apache.org/jira/browse/HIVE-17150
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 2.3.0
Reporter: Sergio Peña


The problem with CREATE INDEX is that it calls a CREATE TABLE operation inside 
the same CREATE INDEX transaction. During listener calls, there are some 
listeners that should run in an out-of-transaction context, for instance, 
Sentry blocks the HMS operation until the DB log notification is processed, but 
if the transaction has not finished, then the out-of-transaction listener will 
block forever (or until a read-time out happens).

A fix would be to add a parameter to the out-of-transaction listener that 
alerts the listener if HMS is in an active transaction. If so, then is up to 
the listener plugin to return immediately and avoid blocking the HMS operation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17149) Hdfs directory is not cleared if partition creation failed on HMS

2017-07-21 Thread Barna Zsombor Klara (JIRA)
Barna Zsombor Klara created HIVE-17149:
--

 Summary: Hdfs directory is not cleared if partition creation 
failed on HMS
 Key: HIVE-17149
 URL: https://issues.apache.org/jira/browse/HIVE-17149
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.0.0
Reporter: Barna Zsombor Klara
Assignee: Barna Zsombor Klara


Hive#loadPartition will load a directory into a Hive Table Partition. It will 
alter the existing content of
the partition with the new contents and create a new partition if one does not 
exist.
The file move is performed before the partition creation and if the creation 
failes, the moved files are not cleared.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-21 Thread Vlad Gudikov (JIRA)
Vlad Gudikov created HIVE-17148:
---

 Summary: Incorrect result for Hive join query with COALESCE in 
WHERE condition
 Key: HIVE-17148
 URL: https://issues.apache.org/jira/browse/HIVE-17148
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 2.1.1
 Environment: {color:red}colored text{color}
Reporter: Vlad Gudikov


The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo enabled:

STEPS TO REPRODUCE:

{code}
Step 1: Create a table ct1
create table ct1 (a1 string,b1 string);

Step 2: Create a table ct2
create table ct2 (a2 string);

Step 3 : Insert following data into table ct1
insert into table ct1 (a1) values ('1');

Step 4 : Insert following data into table ct2
insert into table ct2 (a2) values ('1');

Step 5 : Execute the following query 
select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
{code}

ACTUAL RESULT:
{code}
The query returns nothing;
{code}

EXPECTED RESULT:
{code}
1   NULL1
{code}

The issue seems to be because of the incorrect query plan. In the plan we can 
see:
predicate:(a1 is not null and b1 is not null)
which does not look correct. As a result, it is filtering out all the rows is 
any column mentioned in the COALESCE has null value.
Please find the query plan below:

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Map 2 (BROADCAST_EDGE)

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1
  File Output Operator [FS_10]
Map Join Operator [MAPJOIN_15] (rows=1 width=4)
  
Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
<-Map 2 [BROADCAST_EDGE]
  BROADCAST [RS_7]
PartitionCols:_col0
Select Operator [SEL_5] (rows=1 width=1)
  Output:["_col0"]
  Filter Operator [FIL_14] (rows=1 width=1)
predicate:a2 is not null
TableScan [TS_3] (rows=1 width=1)
  default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
<-Select Operator [SEL_2] (rows=1 width=4)
Output:["_col0","_col1"]
Filter Operator [FIL_13] (rows=1 width=4)
  predicate:{color:red}(a1 is not null and b1 is not null){color}
  TableScan [TS_0] (rows=1 width=4)
default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
{code}

This happens only if join is inner type? otherwise HiveJoinAddNotRule which 
creates whis problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] hive pull request #208: Cdh5 1.1.0 5.12.0

2017-07-21 Thread jason000zhang
GitHub user jason000zhang opened a pull request:

https://github.com/apache/hive/pull/208

Cdh5 1.1.0 5.12.0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloudera/hive cdh5-1.1.0_5.12.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #208


commit c448cd7d0f1112b2823efd81bff99f1d04d8bff1
Author: Siddharth Seth 
Date:   2016-08-26T22:25:24Z

CDH-48084 : HIVE-14561. Minor ptest2 improvements. (Siddharth Seth, 
reviewed by Prasanth Jayachandran)

Change-Id: I42de32b07f63e0281bb635ffd4e91f1004b49b55

commit 6fad3444ef6fd827414ee8bd577376cf820ce53a
Author: Prasanth Jayachandran 
Date:   2016-09-13T07:50:07Z

CDH-48085 : HIVE-14663: Change ptest java language version to 1.7, other 
version changes and fixes (Siddharth Seth reviewed by Prasanth Jayachandran)

Change-Id: Ida5ec1d107adfd8591a66e199ff3d83b06b03f4a

commit 79b587f1e39b310e5af9856983152c063760b85d
Author: Szehon Ho 
Date:   2015-05-11T06:39:40Z

CDH-48073 : HIVE-10655 : [PTest2] Propagate additionalProfiles flag to the 
source-prep.vm (Szehon, reviewed by Brock)

Change-Id: Iab8d3223b3d821f10f0490c40cea993f043a2f26

commit 0ad5b38447f3264b6bb5844c63908ada73d97b03
Author: Szehon Ho 
Date:   2015-05-05T19:12:39Z

CDH-48072 : HIVE-7375 : Add option in test infra to compile in other 
profiles (like hadoop-1) (Szehon, reviewed by Xuefu and Brock)

Change-Id: Ic2da62b5cd8f94b27dc8c1403c5bb7c1cca47e46

commit 6674aaf1d90076fd2dd238b8d6e4ee1ac999d20a
Author: Siddharth Seth 
Date:   2016-09-19T18:19:05Z

CDH-48090 : HIVE-14781. ptest killall command does not work. (Siddharth 
Seth, reviewed by Prasanth Jayachandran)

Change-Id: I252181258f33a3e99a51b1d7d515e60700ef7ff4

commit 49ce26b13200515c560c17efc4460b27376f15be
Author: Prasanth Jayachandran 
Date:   2016-10-13T21:40:09Z

CDH-48092 : HIVE-14835: Improve ptest2 build time (Prasanth Jayachandran 
reviewed by Sergio Pena)

Change-Id: I9c4ea9524cf3e26f77ca0b5a35a1378df72ff9ba

commit 1b0fea5643d202fc43db45279841097a3e63fda6
Author: Siddharth Seth 
Date:   2016-10-19T20:51:02Z

CDH-48095 : HIVE-15009 : ptest - avoid unnecessary cleanup from previous 
test runs in batch-exec.vm. (Siddharth Seth, reviewed by Sergio Peña)

Change-Id: Iead424659d2ad67bc97d57817d24edb4b1b3a1bc

commit cffcb927a641685a3f82c6302e17238f90d565c5
Author: Siddharth Seth 
Date:   2016-10-27T20:22:33Z

CDH-48096 : HIVE-14887. Reduce the memory used by MiniMr, MiniTez, MiniLlap 
tests. (Siddharth Seth, reviewed by Sergio Peña)

Change-Id: I95664909b8a87c85396cae03cee1b6ffbaea11d9

commit 6a5c4d8fdc08670fbad67608590780d7409edb7a
Author: Vihang Karajgaonkar 
Date:   2016-12-13T20:44:54Z

CLOUDERA-BUILD : CDH-48338 : Increase memory of batch exec to fix OOM errors

Change-Id: Ifff73eb40ecb220c06ea58176a67f66aea94231a

commit e595626fa4fd67c58aae99cb56eb659035a838d8
Author: Vihang Karajgaonkar 
Date:   2017-01-18T17:56:09Z

CLOUDERA-BUILD : Fix build failures

Change-Id: If273f03cd3d73151a7b3920cdfeb710be73afaa2

commit dc4c398783b8cdad6266a6b8e5b6f97e0e074e1b
Author: Alan Gates 
Date:   2015-11-02T23:53:07Z

CDH-49194 : HIVE-11293 HiveConnection.setAutoCommit(true) throws exception 
(Michał Węgrzyn and Alan Gates, reviewed by Thejas Nair)

Change-Id: I0405de8d2d441ad21d17623f7373475d4e529a14

commit 326c7afcb15746e4614464a7f4031243a8a41036
Author: Vihang Karajgaonkar 
Date:   2017-01-18T23:40:56Z

Revert "CDH-48096 : HIVE-14887. Reduce the memory used by MiniMr, MiniTez, 
MiniLlap tests. (Siddharth Seth, reviewed by Sergio Peña)"

This reverts commit cffcb927a641685a3f82c6302e17238f90d565c5.

Change-Id: Ib1da9f0634fc0f05c63615d26d638b8c197f6c9f

commit 674dea7cce827e4a449d9a3bec89e05a5b822cbe
Author: Sergey Shelukhin 
Date:   2016-04-22T17:55:53Z

CDH-48508: HIVE-13240 : GroupByOperator: Drop the hash aggregates when 
closing operator (Gopal V, reviewed by Sergey Shelukhin)

(cherry picked from commit 145e253df9c05e4e725c6aeab172ac0885bf5384)

Change-Id: I8bf90bc6539edc9219097a626065dd1583ccf183

commit 45b0e47959f7603b414ae3d7c169a0118a96dfe2
Author: Aihua Xu 
Date:   2016-09-22T19:46:21Z

CDH-49149: HIVE-14820: RPC server for spark inside HS2 is not getting 
server address properly (Aihua Xu, reviewed by Yongzhi Chen)

(cherry picked from commit 421d97a8d75490ca8ec698ef67f7ed8739e394f8)
   

[jira] [Created] (HIVE-17147) Vectorization: Add code for testing MapJoin operator in isolation and measuring its performance with JMH

2017-07-21 Thread Matt McCline (JIRA)
Matt McCline created HIVE-17147:
---

 Summary: Vectorization: Add code for testing MapJoin operator in 
isolation and measuring its performance with JMH
 Key: HIVE-17147
 URL: https://issues.apache.org/jira/browse/HIVE-17147
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Priority: Critical



Current limitations:

Only a one long key test currently.  Need more tests.

The hive-jmh test doesn't handle multiple iterations.  And, the number of rows 
and keys being driven through is way too small to be meaningful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Review Request 61010: HIVE-17128 Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-21 Thread Peter Vary


> On July 20, 2017, 11:59 p.m., Aihua Xu wrote:
> > service/src/java/org/apache/hive/service/cli/operation/Operation.java
> > Lines 269 (patched)
> > 
> >
> > We only register the Operation log appender once for all the operation 
> > logs at the beginning. Now if you stop the appender here, then the 
> > following queries will not be able to output to the appender any more, 
> > right?
> > 
> > Can you test your patch by: connect to HS2 from one beeline session, 
> > disconnect and reconnect. Then see if you still see output from the beeline 
> > console?
> > 
> > Will we be able to close OutputStream instead?  stopQueryAppender() 
> > should be called when HS2 service gets shutdown.

Nice catch Andrew!

One more thing. Could you please do this for the LogDivertAppenderForTest too? 
It is only used for tests, but it would be good to clean up it too.

Thanks,
Peter


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61010/#review181083
---


On July 20, 2017, 10:45 p.m., Andrew Sherman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61010/
> ---
> 
> (Updated July 20, 2017, 10:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Previously HIVE-16061 and HIVE-16400 changed Operation Logging to use the 
> Log4j2
> RoutingAppender to automatically output the log for each query into an
> individual operation log file.  As log4j does not know when a query is 
> finished
> it keeps the OutputStream in the Appender open even when the query completes.
> The stream holds a file descriptor and so we leak file descriptors. Note that 
> we
> are already careful to close any streams reading from the operation log file.
> To fix this we use a technique described in the comments of LOG4J2-510 which
> uses reflection to close the appender. The test in TestOperationLoggingLayout 
> is
> extended to check that the Appender is closed.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hive/service/cli/operation/TestOperationLoggingLayout.java
>  1a8337f574bb753e8c3c48a6b477b17700b05256 
>   ql/src/java/org/apache/hadoop/hive/ql/log/LogDivertAppender.java 
> e697b545984555414e27bafe92d7f22829a22687 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 8d453d5d9153c2ec86c4adc7a68bd3b5dd249743 
> 
> 
> Diff: https://reviews.apache.org/r/61010/diff/1/
> 
> 
> Testing
> ---
> 
> Hand testing to show leak has gone.
> The test in TestOperationLoggingLayout is extended to check that the Appender 
> is closed.
> 
> 
> Thanks,
> 
> Andrew Sherman
> 
>



[jira] [Created] (HIVE-17146) Spark on Hive - Exception while joining tables - "Requested replication factor of 10 exceeds maximum of x"

2017-07-21 Thread George Smith (JIRA)
George Smith created HIVE-17146:
---

 Summary: Spark on Hive - Exception while joining tables - 
"Requested replication factor of 10 exceeds maximum of x" 
 Key: HIVE-17146
 URL: https://issues.apache.org/jira/browse/HIVE-17146
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.1.1, 3.0.0
Reporter: George Smith


We found a bug in the current implementation of 
[org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/SparkHashTableSinkOperator.java]

The *magic number 10* for minReplication factor can causes the exception when 
the configuration parameter _dfs.replication_ is lower than 10. 

Consider these [properties 
configuration|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
 on cluster:
{code}
dfs.namenode.replication.min=1
dfs.replication=2
dfs.replication.max=512 (that's the default value)
{code}
The current implementation counts target file replication as follows (relevant 
snippets of the code):
{code}
private int minReplication = 10;
...
int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
// minReplication value should not cross the value of dfs.replication.max
minReplication = Math.min(minReplication, dfsMaxReplication);
...
FileSystem fs = path.getFileSystem(htsOperator.getConfiguration());
short replication = fs.getDefaultReplication(path);
...
int numOfPartitions = replication;
replication = (short) Math.max(minReplication, numOfPartitions);
//use replication value in fs.create(path, replication);
{code}


With a current code the used replication value is 10 and the config value 
_dfs.replication_ is not used at all.

There are probably more (easy) ways to fix it:
# Set field  {code}private int minReplication = 1 ; {code} I don't see any 
obvious reason for the value 10.or
# Init minReplication from config value _dfs.namenode.replication.min_ with a 
default value 1. or
# Count replication this way: {code}replication = Math.min(numOfPartitions, 
dfsMaxReplication);{code} or
# Use replication = numOfPartitions; directly
Config value _dfs.replication_ has a default value 3 which is supposed to be 
always lower than "dfs.replication.max", no checking is probably needed.



As a *workaround* for this issue we had to set dfs.replication.max=2, but 
obviously _dfs.replication_ value should NOT be ignored and the problem should 
be resolved.







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17145) Planner assume (ROW__ID IS NOT NULL") is true and plans it as a constant true

2017-07-21 Thread Matt McCline (JIRA)
Matt McCline created HIVE-17145:
---

 Summary: Planner assume (ROW__ID IS NOT NULL") is true and plans 
it as a constant true
 Key: HIVE-17145
 URL: https://issues.apache.org/jira/browse/HIVE-17145
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Priority: Critical


select (ROW__ID IS NOT NULL), key, value from some_table;

produces wrong result when ROW__ID is a null column value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17144) export of temporary tables not working and it seems to be using distcp rather than filesystem copy

2017-07-21 Thread anishek (JIRA)
anishek created HIVE-17144:
--

 Summary: export of temporary tables not working and it seems to be 
using distcp rather than filesystem copy
 Key: HIVE-17144
 URL: https://issues.apache.org/jira/browse/HIVE-17144
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2
Affects Versions: 3.0.0
Reporter: anishek
Assignee: anishek
 Fix For: 3.0.0


create temporary table t1 (i int);
insert into t1 values (3);
export table t1 to 'hdfs://somelocation';

above fails. additionally it should use filesystem copy and not distcp to do 
the job.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


HIVE command to get column count

2017-07-21 Thread Jayanthi Reddy
Hi Vineet,

This is regarding "https://issues.apache.org/jira/browse/HIVE-17141; .

I have a table with lot of columns. I want the count of number of columns
my table is having.


Thanks
Jayanthi