[jira] [Updated] (HIVE-5126) Make vector expressions serializable.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5126:
---

Attachment: HIVE-5126.1.patch

 Make vector expressions serializable.
 -

 Key: HIVE-5126
 URL: https://issues.apache.org/jira/browse/HIVE-5126
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5126.1.patch


 We should make all vectorized expressions serializable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5146) FilterExprOrExpr changes the order of the rows

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5146:
---

Status: Patch Available  (was: Open)

 FilterExprOrExpr changes the order of the rows
 --

 Key: HIVE-5146
 URL: https://issues.apache.org/jira/browse/HIVE-5146
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch


 FilterExprOrExpr changes the order of the rows which might break some UDFs 
 that assume an order in data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5126) Make vector expressions serializable.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5126:
---

Status: Patch Available  (was: Open)

 Make vector expressions serializable.
 -

 Key: HIVE-5126
 URL: https://issues.apache.org/jira/browse/HIVE-5126
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5126.1.patch


 We should make all vectorized expressions serializable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4959:
---

Attachment: HIVE-4959.1.patch

Patch uploaded.

 Vectorized plan generation should be added as an optimization transform.
 

 Key: HIVE-4959
 URL: https://issues.apache.org/jira/browse/HIVE-4959
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4959.1.patch


 Currently the query plan is vectorized at the query run time in the map task. 
 It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-4959:
---

Status: Patch Available  (was: Open)

 Vectorized plan generation should be added as an optimization transform.
 

 Key: HIVE-4959
 URL: https://issues.apache.org/jira/browse/HIVE-4959
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4959.1.patch


 Currently the query plan is vectorized at the query run time in the map task. 
 It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749867#comment-13749867
 ] 

Jitendra Nath Pandey commented on HIVE-4959:


This jira requires all vector expressions to be serializable (HIVE-5126). 
This jira also requires HIVE-5146 for some of the tests to work.

 Vectorized plan generation should be added as an optimization transform.
 

 Key: HIVE-4959
 URL: https://issues.apache.org/jira/browse/HIVE-4959
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4959.1.patch


 Currently the query plan is vectorized at the query run time in the map task. 
 It will be much cleaner to add vectorization as an optimization step.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3562:
--

Attachment: HIVE-3562.D5967.7.patch

navis updated the revision HIVE-3562 [jira] Some limit can be pushed down to 
map stage.

  Addressed some comments

Reviewers: ashutoshc, JIRA, tarball

REVISION DETAIL
  https://reviews.facebook.net/D5967

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D5967?vs=38379id=39009#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/build.xml
  ql/ivy.xml
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/limit_pushdown.q
  ql/src/test/queries/clientpositive/limit_pushdown_negative.q
  ql/src/test/results/clientpositive/limit_pushdown.q.out
  ql/src/test/results/clientpositive/limit_pushdown_negative.q.out

To: JIRA, tarball, ashutoshc, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5132) Can't access to hwi due to No Java compiler available

2013-08-26 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-5132:
--

Assignee: Bing Li

 Can't access to hwi due to No Java compiler available
 ---

 Key: HIVE-5132
 URL: https://issues.apache.org/jira/browse/HIVE-5132
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
 Environment: JDK1.6, hadoop 2.0.4-alpha
Reporter: Bing Li
Assignee: Bing Li
Priority: Critical

 I want to use hwi to submit hive queries, but after start hwi successfully, I 
 can't open the web page of it.
 I noticed that someone also met the same issue in hive-0.10.
 Reproduce steps:
 --
 1. start hwi
 bin/hive --config $HIVE_CONF_DIR --service hwi
 2. access to http://hive_hwi_node:/hwi via browser
 got the following error message:
 HTTP ERROR 500
 Problem accessing /hwi/. Reason: 
 No Java compiler available
 Caused by:
 java.lang.IllegalStateException: No Java compiler available
   at 
 org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225)
   at 
 org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560)
   at 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299)
   at 
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
   at 
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at 
 org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5132) Can't access to hwi due to No Java compiler available

2013-08-26 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749963#comment-13749963
 ] 

Bing Li commented on HIVE-5132:
---

The root cause of this failure is that ANT_LIB is NOT setting in hwi server.

But I can resolve this failure when copy the following two ant jars into 
$HIVE_HOME/lib
- ant-launcher.jar
- ant.jar

I think we can add ant as the runtime dependency of hive.

 Can't access to hwi due to No Java compiler available
 ---

 Key: HIVE-5132
 URL: https://issues.apache.org/jira/browse/HIVE-5132
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
 Environment: JDK1.6, hadoop 2.0.4-alpha
Reporter: Bing Li
Assignee: Bing Li
Priority: Critical

 I want to use hwi to submit hive queries, but after start hwi successfully, I 
 can't open the web page of it.
 I noticed that someone also met the same issue in hive-0.10.
 Reproduce steps:
 --
 1. start hwi
 bin/hive --config $HIVE_CONF_DIR --service hwi
 2. access to http://hive_hwi_node:/hwi via browser
 got the following error message:
 HTTP ERROR 500
 Problem accessing /hwi/. Reason: 
 No Java compiler available
 Caused by:
 java.lang.IllegalStateException: No Java compiler available
   at 
 org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225)
   at 
 org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560)
   at 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299)
   at 
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
   at 
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at 
 org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3562:
--

Attachment: HIVE-3562.D5967.8.patch

navis updated the revision HIVE-3562 [jira] Some limit can be pushed down to 
map stage.

  Missed ASF header
  Removed unnecessary array creation in RS

Reviewers: ashutoshc, JIRA, tarball

REVISION DETAIL
  https://reviews.facebook.net/D5967

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D5967?vs=39009id=39015#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/build.xml
  ql/ivy.xml
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/limit_pushdown.q
  ql/src/test/queries/clientpositive/limit_pushdown_negative.q
  ql/src/test/results/clientpositive/limit_pushdown.q.out
  ql/src/test/results/clientpositive/limit_pushdown_negative.q.out

To: JIRA, tarball, ashutoshc, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk

2013-08-26 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-5147:
---

Assignee: Navis

 Newly added test TestSessionHooks is failing on trunk
 -

 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Navis

 This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk

2013-08-26 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750010#comment-13750010
 ] 

Navis commented on HIVE-5147:
-

Sorry, I've committed wrong version of patch, which I've modified. I'll 
rollback that.

 Newly added test TestSessionHooks is failing on trunk
 -

 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Navis

 This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5147:
--

Attachment: HIVE-5147.D12543.1.patch

navis requested code review of HIVE-5147 [jira] Newly added test 
TestSessionHooks is failing on trunk.

Reviewers: JIRA

HIVE-5147 Newly added test TestSessionHooks is failing on trunk

This was recently added via HIVE-4588

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12543

AFFECTED FILES
  
service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContext.java
  
service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContextImpl.java
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/30087/

To: JIRA, navis


 Newly added test TestSessionHooks is failing on trunk
 -

 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-5147.D12543.1.patch


 This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-26 Thread Yin Huai
forgot to add in my last reply To generate correct results, you can
set hive.optimize.reducededuplication to false to turn off
ReduceSinkDeDuplication


On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai huaiyin@gmail.com wrote:

 Created a jira https://issues.apache.org/jira/browse/HIVE-5149


 On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.com wrote:

 Seems ReduceSinkDeDuplication picked the wrong partitioning columns.


 On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote:

 I think the problem lies with in the group by operation. For this
 optimization to work the group bys partitioning should be on the column
 1 only.

 It wont effect the correctness of group by, can make it slow but int
 this case will fasten the overall query performance.


 On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 I have attached the hive 10 and 11 query plans, for the sample query
 below, for illustration.


 On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia 
 mchett...@rocketfuelinc.com wrote:

 Hi,

 We are using DISTRIBUTE BY with custom reducer scripts in our query
 workload.

 After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT
 BY and custom reducer scripts produced incorrect results. Particularly,
 rows with same value on DISTRIBUTE BY column ends up in multiple reducers
 and thus produce multiple rows in final result, when we expect only one.

 I investigated a little bit and discovered the following behavior for
 Hive 0.11:

 - Hive 0.11 produces a different plan for these queries with incorrect
 results. The extra stage for the DISTRIBUTE BY + Transform is missing and
 the Transform operator for the custom reducer script is pushed into the
 reduce operator tree containing GROUP BY itself.

 - However, *if the SORT BY in the query has a DESC order in it*, the
 right plan is produced, and the results look correct too.

 Hive 0.10 produces the expected plan with right results in all cases.


 To illustrate, here is a simplified repro setup:

 Table:

 *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
 TERMINATED BY '\n' STORED AS TEXTFILE;*

 Query:

 *ADD FILE reducer.py;*

 *FROM(*
 *  SELECT grp, val2 *
 *  FROM test_cluster *
 *  GROUP BY grp, val2 *
 *  DISTRIBUTE BY grp *
 *  SORT BY grp, val2  -- add DESC here to get correct results*
 *) **a*
 *
 *
 *REDUCE a.**
 *USING 'reducer.py'*
 *AS grp, reducedValue*


 If i understand correctly, this is a bug. Is this a known issue? Any
 other insights? We have reverted to Hive 0.10 to avoid the incorrect
 results while we investigate this.

 I have the repro sample, with test data and scripts, if anybody is
 interested.



 Thanks,
 pala








[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4375:
--

Attachment: HIVE-4375.D10329.3.patch

navis updated the revision HIVE-4375 [jira] Single sourced multi insert 
consists of native and non-native table mixed throws NPE.

  Missed to update this

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D10329

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D10329?vs=32775id=39027#toc

BRANCH
  HIVE-4375

ARCANIST PROJECT
  hive

AFFECTED FILES
  hbase-handler/src/test/queries/positive/hbase_single_sourced_multi_insert.q
  
hbase-handler/src/test/results/positive/hbase_single_sourced_multi_insert.q.out
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java

To: JIRA, ashutoshc, navis
Cc: njain


 Single sourced multi insert consists of native and non-native table mixed 
 throws NPE
 

 Key: HIVE-4375
 URL: https://issues.apache.org/jira/browse/HIVE-4375
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch, 
 HIVE-4375.D10329.3.patch


 CREATE TABLE src_x1(key string, value string);
 CREATE TABLE src_x2(key string, value string)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string);
 explain
 from src a
 insert overwrite table src_x1
 select key,value where a.key  0 AND a.key  50
 insert overwrite table src_x2
 select key,value where a.key  50 AND a.key  100;
 throws,
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236)
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


How to validate data type in Hive

2013-08-26 Thread Puneet Khatod
Hi,

I have a requirement to validate data type of the values present in my flat 
file (which is source for my hive table). I am unable to find any hive 
feature/function which would do that.
Is there any way to validate data type of the values present in the underlying 
file? Something like BCP (Bulk copy program), used in SQL.

Please reply, my whole project is struck due to this issue.

Thanks,
Puneet

From: Yin Huai [mailto:huaiyin@gmail.com]
Sent: Monday, August 26, 2013 5:10 PM
To: u...@hive.apache.org
Cc: dev; Eric Chu
Subject: Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

forgot to add in my last reply To generate correct results, you can set 
hive.optimize.reducededuplication to false to turn off ReduceSinkDeDuplication

On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai 
huaiyin@gmail.commailto:huaiyin@gmail.com wrote:
Created a jira https://issues.apache.org/jira/browse/HIVE-5149

On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai 
huaiyin@gmail.commailto:huaiyin@gmail.com wrote:
Seems ReduceSinkDeDuplication picked the wrong partitioning columns.

On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP 
s...@rocketfuel.commailto:s...@rocketfuel.com wrote:
I think the problem lies with in the group by operation. For this optimization 
to work the group bys partitioning should be on the column 1 only.

It wont effect the correctness of group by, can make it slow but int this case 
will fasten the overall query performance.

On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia 
mchett...@rocketfuelinc.commailto:mchett...@rocketfuelinc.com wrote:
I have attached the hive 10 and 11 query plans, for the sample query below, for 
illustration.

On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia 
mchett...@rocketfuelinc.commailto:mchett...@rocketfuelinc.com wrote:
Hi,

We are using DISTRIBUTE BY with custom reducer scripts in our query workload.

After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY and 
custom reducer scripts produced incorrect results. Particularly, rows with same 
value on DISTRIBUTE BY column ends up in multiple reducers and thus produce 
multiple rows in final result, when we expect only one.

I investigated a little bit and discovered the following behavior for Hive 0.11:

- Hive 0.11 produces a different plan for these queries with incorrect results. 
The extra stage for the DISTRIBUTE BY + Transform is missing and the Transform 
operator for the custom reducer script is pushed into the reduce operator tree 
containing GROUP BY itself.

- However, if the SORT BY in the query has a DESC order in it, the right plan 
is produced, and the results look correct too.

Hive 0.10 produces the expected plan with right results in all cases.


To illustrate, here is a simplified repro setup:

Table:

CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3 STRING, val4 
INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' 
STORED AS TEXTFILE;

Query:

ADD FILE reducer.py;

FROM(
  SELECT grp, val2
  FROM test_cluster
  GROUP BY grp, val2
  DISTRIBUTE BY grp
  SORT BY grp, val2  -- add DESC here to get correct results
) a

REDUCE a.*
USING 'reducer.py'
AS grp, reducedValue


If i understand correctly, this is a bug. Is this a known issue? Any other 
insights? We have reverted to Hive 0.10 to avoid the incorrect results while we 
investigate this.

I have the repro sample, with test data and scripts, if anybody is interested.



Thanks,
pala






Any comments or statements made in this email are not necessarily those of 
Tavant Technologies.
The information transmitted is intended only for the person or entity to which 
it is addressed and may 
contain confidential and/or privileged material. If you have received this in 
error, please contact the 
sender and delete the material from any computer. All e-mails sent from or to 
Tavant Technologies 
may be subject to our monitoring procedures.


[jira] [Updated] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()

2013-08-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5100:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gopal!

  RCFile::sync(long)  missing 1 byte in System.arraycopy()
 -

 Key: HIVE-5100
 URL: https://issues.apache.org/jira/browse/HIVE-5100
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: tagus wang
Assignee: Gopal V
  Labels: regression
 Fix For: 0.12.0

 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch


 this has a bug in this:
 System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
 it should be 
 System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
 it is missing 1 byte at the end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()

2013-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750120#comment-13750120
 ] 

Hudson commented on HIVE-5100:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #382 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/382/])
HIVE-5100 :  RCFile::sync(long)  missing 1 byte in System.arraycopy() (Gopal V 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java


  RCFile::sync(long)  missing 1 byte in System.arraycopy()
 -

 Key: HIVE-5100
 URL: https://issues.apache.org/jira/browse/HIVE-5100
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: tagus wang
Assignee: Gopal V
  Labels: regression
 Fix For: 0.12.0

 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch


 this has a bug in this:
 System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
 it should be 
 System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
 it is missing 1 byte at the end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750115#comment-13750115
 ] 

Phabricator commented on HIVE-5147:
---

ashutoshc has accepted the revision HIVE-5147 [jira] Newly added test 
TestSessionHooks is failing on trunk.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D12543

BRANCH
  HIVE-5147

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, navis


 Newly added test TestSessionHooks is failing on trunk
 -

 Key: HIVE-5147
 URL: https://issues.apache.org/jira/browse/HIVE-5147
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-5147.D12543.1.patch


 This was recently added via HIVE-4588

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-5144:
--

Attachment: HIVE-5144.02.patch

Bad merge in patch.

{code}
-if((hasFilter(alias)  joinFilters[alias].size()  0) || joinValues[alias]
+if((hasFilter(alias)  filterMaps[alias].length  0) || joinValues[alias].
{code}

The check is supposed to be on filterMaps not joinFilters.

This fixes test-failures found in the last run.

 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-26 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-5144:
--

Status: Patch Available  (was: Open)

 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead

2013-08-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750182#comment-13750182
 ] 

Ashutosh Chauhan commented on HIVE-5144:


+1

 HashTableSink allocates empty new Object[] arrays  OOMs - use a static 
 emptyRow instead
 

 Key: HIVE-5144
 URL: https://issues.apache.org/jira/browse/HIVE-5144
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
 Environment: Ubuntu LXC + -Xmx512m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
  Labels: perfomance
 Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch


 The map-join hashtable sink in the local-task creates an in-memory hashtable 
 with the following code.
 {code}
  Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
 ...
  MapJoinRowContainer rowContainer = tableContainer.get(key);
 if (rowContainer == null) {
   rowContainer = new MapJoinRowContainer();
   rowContainer.add(value);
 {code}
 But for a query where the joinValues[alias].size() == 0, this results in a 
 large number of unnecessary allocations which would be better served with a 
 copy-on-write default value container  a pre-allocated zero object array 
 which is immutable (the only immutable array there is in java).
 The query tested is roughly the following to scan all of 
 customer_demographics in the hash-sink
 {code}
 select c_salutation, count(1)
  from customer
   JOIN customer_demographics ON customer.c_current_cdemo_sk = 
 customer_demographics.cd_demo_sk
  group by c_salutation
  limit 10
 ;
 {code}
 When running with current trunk, the code results in an OOM with 512Mb ram.
 {code}
 2013-08-23 05:11:26   Processing rows:140 Hashtable size: 139 
 Memory usage:   292418944   percentage: 0.579
 Execution failed with exit status: 3
 Obtaining error information
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()

2013-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750194#comment-13750194
 ] 

Hudson commented on HIVE-5100:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #70 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/70/])
HIVE-5100 :  RCFile::sync(long)  missing 1 byte in System.arraycopy() (Gopal V 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java


  RCFile::sync(long)  missing 1 byte in System.arraycopy()
 -

 Key: HIVE-5100
 URL: https://issues.apache.org/jira/browse/HIVE-5100
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: tagus wang
Assignee: Gopal V
  Labels: regression
 Fix For: 0.12.0

 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch


 this has a bug in this:
 System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
 it should be 
 System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
 it is missing 1 byte at the end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5132) Can't access to hwi due to No Java compiler available

2013-08-26 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-5132:
--

Attachment: HIVE-5132-01.patch

add ant.jar and ant-launcher.jar as the runtime dependencies of hive

 Can't access to hwi due to No Java compiler available
 ---

 Key: HIVE-5132
 URL: https://issues.apache.org/jira/browse/HIVE-5132
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
 Environment: JDK1.6, hadoop 2.0.4-alpha
Reporter: Bing Li
Assignee: Bing Li
Priority: Critical
 Attachments: HIVE-5132-01.patch


 I want to use hwi to submit hive queries, but after start hwi successfully, I 
 can't open the web page of it.
 I noticed that someone also met the same issue in hive-0.10.
 Reproduce steps:
 --
 1. start hwi
 bin/hive --config $HIVE_CONF_DIR --service hwi
 2. access to http://hive_hwi_node:/hwi via browser
 got the following error message:
 HTTP ERROR 500
 Problem accessing /hwi/. Reason: 
 No Java compiler available
 Caused by:
 java.lang.IllegalStateException: No Java compiler available
   at 
 org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225)
   at 
 org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560)
   at 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299)
   at 
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
   at 
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at 
 org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5132) Can't access to hwi due to No Java compiler available

2013-08-26 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-5132:
--

Status: Patch Available  (was: Open)

The patch is generated against the latest trunk.

 Can't access to hwi due to No Java compiler available
 ---

 Key: HIVE-5132
 URL: https://issues.apache.org/jira/browse/HIVE-5132
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0, 0.10.0
 Environment: JDK1.6, hadoop 2.0.4-alpha
Reporter: Bing Li
Assignee: Bing Li
Priority: Critical
 Attachments: HIVE-5132-01.patch


 I want to use hwi to submit hive queries, but after start hwi successfully, I 
 can't open the web page of it.
 I noticed that someone also met the same issue in hive-0.10.
 Reproduce steps:
 --
 1. start hwi
 bin/hive --config $HIVE_CONF_DIR --service hwi
 2. access to http://hive_hwi_node:/hwi via browser
 got the following error message:
 HTTP ERROR 500
 Problem accessing /hwi/. Reason: 
 No Java compiler available
 Caused by:
 java.lang.IllegalStateException: No Java compiler available
   at 
 org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225)
   at 
 org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560)
   at 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299)
   at 
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
   at 
 org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at 
 org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()

2013-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750235#comment-13750235
 ] 

Hudson commented on HIVE-5100:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #138 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/138/])
HIVE-5100 :  RCFile::sync(long)  missing 1 byte in System.arraycopy() (Gopal V 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java


  RCFile::sync(long)  missing 1 byte in System.arraycopy()
 -

 Key: HIVE-5100
 URL: https://issues.apache.org/jira/browse/HIVE-5100
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: tagus wang
Assignee: Gopal V
  Labels: regression
 Fix For: 0.12.0

 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch


 this has a bug in this:
 System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
 it should be 
 System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
 it is missing 1 byte at the end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4824) make TestWebHCatE2e run w/o requiring installing external hadoop

2013-08-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750226#comment-13750226
 ] 

Eugene Koifman commented on HIVE-4824:
--

Another possibility is to just call HCatCli directly from WebHCat - that would 
simplify the architecure and improve perf of DDL ops dramatically.
One possible issue here is concurrency - hive code is not completely thread 
safe.  We could use a new ClassLoader for each call to HCatCli - this would 
work around concurrency issues and will still be a good step forward.

 make TestWebHCatE2e run w/o requiring installing external hadoop
 

 Key: HIVE-4824
 URL: https://issues.apache.org/jira/browse/HIVE-4824
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 Currently WebHCat will use hive/build/dist/hcatalog/bin/hcat to execute DDL 
 commands, which in turn uses Hadoop Jar command.
 This in turn requires that HADOOP_HOME env var be defined and point to an 
 existing Hadoop install.  
 Need to see we can apply hive/testutils/hadoop idea here to make WebHCat not 
 depend on external hadoop.
 This will make Unit tests better/easier to write and make dev/test cycle 
 simpler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5146) FilterExprOrExpr changes the order of the rows

2013-08-26 Thread Tony Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750248#comment-13750248
 ] 

Tony Murphy commented on HIVE-5146:
---

+1 these changes look good. Thanks Jitendra.

 FilterExprOrExpr changes the order of the rows
 --

 Key: HIVE-5146
 URL: https://issues.apache.org/jira/browse/HIVE-5146
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch


 FilterExprOrExpr changes the order of the rows which might break some UDFs 
 that assume an order in data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750264#comment-13750264
 ] 

Prasanth J commented on HIVE-5145:
--

Removed the order by from the previous patch and regenerated the golden file.

 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5145:
-

Attachment: HIVE-5145.2.patch

 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Proposing a 0.11.1

2013-08-26 Thread Mark Grover
Hi folks,
Any update on this? We are considering including Hive 0.11* in Bigtop 0.7
and it would be very useful and much appreciated to get a little more
context into what the Hive 0.11.1 release would look like.

Thanks in advance!
Mark


On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I am fealing more like we should release a 12.0 rather then backport things
 into 11.X.




 On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote:

  If this is only for addressing npath problem, we got three months for
 that.
 
  Would it be enough time for releasing 0.12.0?
 
  ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata.
 
  2013/8/14 Edward Capriolo edlinuxg...@gmail.com:
   Should we get the npath rename in? Do we have a jira for this? If not I
   will take it.
  
  
   On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wagner.mar...@gmail.com
  wrote:
  
   It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has
 been
   committed to trunk and it looks like 4789 is close.
  
   Thanks,
   Mark
  
   On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org
   wrote:
  
All,
   I'd like to create an 0.11.1 with some fixes in it. I plan to put
together a release candidate over the next week. I'm in the process
 of
putting together the list of bugs that I want to include, but I
  wanted to
solicit the jiras that others though would be important for an
 0.11.1.
   
Thanks,
   Owen
   
  
 



[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750302#comment-13750302
 ] 

Ashutosh Chauhan commented on HIVE-5145:


+1

 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750364#comment-13750364
 ] 

Francis Liu commented on HIVE-4331:
---

{quote}
Francis, you're welcome to go ahead and review it as well. I have been looking 
at it as well, along with testing, although I waited on commenting till I'd 
finished with the hive-side review. I'll comment on both of them before Monday.
{quote}
Cool. It'd be good to have a reviewer that looks at both pieces.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750367#comment-13750367
 ] 

Olga Natkovich commented on HIVE-4331:
--

Hi Sushanth, Were you able to review this patch? Thanks!

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3603) Enable client-side caching for scans on HBase

2013-08-26 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750370#comment-13750370
 ] 

Swarnim Kulkarni commented on HIVE-3603:


[~appodictic] Thanks! Also how is setting this property different than directly 
setting the hbase.client.scanner.caching property in hive-site.xml without 
this enhancement? Wouldn't they have the same effect?

 Enable client-side caching for scans on HBase
 -

 Key: HIVE-3603
 URL: https://issues.apache.org/jira/browse/HIVE-3603
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Karthik Ranganathan
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-3603.D7761.1.patch


 HBaseHandler sets up a TableInputFormat MR job against HBase to read data in. 
 The underlying implementation (in HBaseHandler.java) makes an RPC call per 
 row-key, which makes it very inefficient. Need to specify a client side cache 
 size on the scan.
 Note that HBase currently only supports num-rows based caching (no way to 
 specify a memory limit). Created HBASE-6770 to address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750371#comment-13750371
 ] 

Sushanth Sowmyan commented on HIVE-4331:


I left some more comments on the hive patch phabricator review. Apart from a 
couple of minor code-style comments, the majority of my feeling about this is 
that I'm okay with what HivePTOF is doing, and it's a good first step, and I 
will +1 it, but I think that it does not go far enough along to address being 
able to use any generic MR OutputFormat with hive. That said, I agree that that 
would not really be possible unless we plumb out how HiveOutputFormat is used 
itself, and I'm not completely certain people have a need for that.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5150) UnsatisfiedLinkError when running hive unit tests on Windows

2013-08-26 Thread shanyu zhao (JIRA)
shanyu zhao created HIVE-5150:
-

 Summary: UnsatisfiedLinkError when running hive unit tests on 
Windows
 Key: HIVE-5150
 URL: https://issues.apache.org/jira/browse/HIVE-5150
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: shanyu zhao


When running any hive unit tests against hadoop 2.0, it will fail with error 
like this:

[junit] Exception in thread main java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
[junit] at 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
[junit] at 
org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
[junit] at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:933)
[junit] at 
org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:177)
[junit] at 
org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:164)


This is due to the test process failed to find hadoop.dll. This is related to 
YARN-729.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750398#comment-13750398
 ] 

Phabricator commented on HIVE-3562:
---

ashutoshc has commented on the revision HIVE-3562 [jira] Some limit can be 
pushed down to map stage.

  Looks pretty good. Just requesting to add few more comments.

INLINE COMMENTS
  conf/hive-default.xml.template:1586-1590 We can remove this now.
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1186 Just to 
add more clarity, say something like: we can push the limit above GBY (running 
in Reducer), since that will generate single row for each group. This doesn't 
necessarily hold for GBY (running in Mappers), so we don't push limit above it.
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:182 It 
will be good to add comment about what this field is holding. Add a comment 
saying: This two dimensional array holds key data and a corresponding Union 
object which contains the tag identifying the aggregate expression for distinct 
columns.

  Ideally, instead of this 2-D array, we should have probably enhanced HiveKey 
class for this logic. We should do that in a follow-up jira.
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:267 I 
didnt follow this logic completely.
  Seems like this is an optimization not to evaluate union object repeatedly 
and do system copy for it. Can you add a comment explaining this?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:271 seems 
like it will be null only for all i = 0. If so, better do if (i==0) check ? 
Also add comment when this will be null and when it will be non-null?
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:260 You 
made changes to this section, because you found bug or are you purely 
refactoring this? If you hit upon the bug, can you explain what was it?
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:50-51 It will be 
good to add comment about what these 2D arrays are holding?
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:52 Also, add comment 
saying this array holds hashcode for keys.

  Also, add note that indices of all these arrays must line up.
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:82 
Nice Comments!
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:34 It will be good 
to add a javadoc for this class.
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:36 Also javadoc for 
this interface.

REVISION DETAIL
  https://reviews.facebook.net/D5967

To: JIRA, tarball, ashutoshc, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5150) UnsatisfiedLinkError when running hive unit tests on Windows

2013-08-26 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-5150:
--

Attachment: HIVE-5150.patch

Patch attached.

 UnsatisfiedLinkError when running hive unit tests on Windows
 

 Key: HIVE-5150
 URL: https://issues.apache.org/jira/browse/HIVE-5150
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Affects Versions: 0.11.0
 Environment: Windows
Reporter: shanyu zhao
 Attachments: HIVE-5150.patch


 When running any hive unit tests against hadoop 2.0, it will fail with error 
 like this:
 [junit] Exception in thread main java.lang.UnsatisfiedLinkError: 
 org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
 [junit]   at 
 org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
 [junit]   at 
 org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
 [junit]   at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:933)
 [junit]   at 
 org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:177)
 [junit]   at 
 org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:164)
 This is due to the test process failed to find hadoop.dll. This is related to 
 YARN-729.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750396#comment-13750396
 ] 

Sushanth Sowmyan commented on HIVE-4331:


I would love to see SH become a first class entry in hive, and HOF be a kind of 
SH, leading to its eventual removal. That's precisely what my long-term goal 
for this is.

 and I'm not completely certain people have a need for that

By this bit, I meant that I wasn't sure people had a need for doing away with 
HOF (other than for code-cleanliness, which is why I would like to see it gone) 
being able to use any generic OF with hive - most of the usecases of 
traditional M/R OFs are already covered by hive, or for newer formats being 
developed, the OF writer winds up making changes so that it is hive compatible, 
such as with orc, or with the HBase SH. So unless there were a major push to 
see a BlahOutputFormat that is widely used, but was not already usable from 
within Hive, I don't see there being a necessity case for a change in hive that 
I want. :)

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-26 Thread Pala M Muthaia
Thanks for following up Yin.

We realized later this was due to the reduce deduplication optimization,
and found turning off the flag avoids the issue.

-pala


On Mon, Aug 26, 2013 at 4:40 AM, Yin Huai huaiyin@gmail.com wrote:

 forgot to add in my last reply To generate correct results, you can
 set hive.optimize.reducededuplication to false to turn off
 ReduceSinkDeDuplication


 On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai huaiyin@gmail.com wrote:

  Created a jira https://issues.apache.org/jira/browse/HIVE-5149
 
 
  On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.com wrote:
 
  Seems ReduceSinkDeDuplication picked the wrong partitioning columns.
 
 
  On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com
 wrote:
 
  I think the problem lies with in the group by operation. For this
  optimization to work the group bys partitioning should be on the column
  1 only.
 
  It wont effect the correctness of group by, can make it slow but int
  this case will fasten the overall query performance.
 
 
  On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia 
  mchett...@rocketfuelinc.com wrote:
 
  I have attached the hive 10 and 11 query plans, for the sample query
  below, for illustration.
 
 
  On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia 
  mchett...@rocketfuelinc.com wrote:
 
  Hi,
 
  We are using DISTRIBUTE BY with custom reducer scripts in our query
  workload.
 
  After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT
  BY and custom reducer scripts produced incorrect results.
 Particularly,
  rows with same value on DISTRIBUTE BY column ends up in multiple
 reducers
  and thus produce multiple rows in final result, when we expect only
 one.
 
  I investigated a little bit and discovered the following behavior for
  Hive 0.11:
 
  - Hive 0.11 produces a different plan for these queries with
 incorrect
  results. The extra stage for the DISTRIBUTE BY + Transform is
 missing and
  the Transform operator for the custom reducer script is pushed into
 the
  reduce operator tree containing GROUP BY itself.
 
  - However, *if the SORT BY in the query has a DESC order in it*, the
  right plan is produced, and the results look correct too.
 
  Hive 0.10 produces the expected plan with right results in all cases.
 
 
  To illustrate, here is a simplified repro setup:
 
  Table:
 
  *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3
  STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES
  TERMINATED BY '\n' STORED AS TEXTFILE;*
 
  Query:
 
  *ADD FILE reducer.py;*
 
  *FROM(*
  *  SELECT grp, val2 *
  *  FROM test_cluster *
  *  GROUP BY grp, val2 *
  *  DISTRIBUTE BY grp *
  *  SORT BY grp, val2  -- add DESC here to get correct results*
  *) **a*
  *
  *
  *REDUCE a.**
  *USING 'reducer.py'*
  *AS grp, reducedValue*
 
 
  If i understand correctly, this is a bug. Is this a known issue? Any
  other insights? We have reverted to Hive 0.10 to avoid the incorrect
  results while we investigate this.
 
  I have the repro sample, with test data and scripts, if anybody is
  interested.
 
 
 
  Thanks,
  pala
 
 
 
 
 
 



Re: Proposing a 0.11.1

2013-08-26 Thread Owen O'Malley
Hi Mark,
   I haven't made any progress on it yet. I hope to make progress on it
this week. I will certainly include the npath changes. On a separate
thread, I'll start a discussion about starting to lock down 0.12.0.

-- Owen


On Mon, Aug 26, 2013 at 10:20 AM, Mark Grover m...@apache.org wrote:

 Hi folks,
 Any update on this? We are considering including Hive 0.11* in Bigtop 0.7
 and it would be very useful and much appreciated to get a little more
 context into what the Hive 0.11.1 release would look like.

 Thanks in advance!
 Mark


 On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  I am fealing more like we should release a 12.0 rather then backport
 things
  into 11.X.
 
 
 
 
  On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote:
 
   If this is only for addressing npath problem, we got three months for
  that.
  
   Would it be enough time for releasing 0.12.0?
  
   ps. IMHO, n-path seemed too generic name to be patented. I hate
 Teradata.
  
   2013/8/14 Edward Capriolo edlinuxg...@gmail.com:
Should we get the npath rename in? Do we have a jira for this? If
 not I
will take it.
   
   
On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner 
 wagner.mar...@gmail.com
   wrote:
   
It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has
  been
committed to trunk and it looks like 4789 is close.
   
Thanks,
Mark
   
On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org
 
wrote:
   
 All,
I'd like to create an 0.11.1 with some fixes in it. I plan to
 put
 together a release candidate over the next week. I'm in the
 process
  of
 putting together the list of bugs that I want to include, but I
   wanted to
 solicit the jiras that others though would be important for an
  0.11.1.

 Thanks,
Owen

   
  
 



[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()

2013-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750481#comment-13750481
 ] 

Hudson commented on HIVE-5100:
--

FAILURE: Integrated in Hive-trunk-h0.21 #2290 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2290/])
HIVE-5100 :  RCFile::sync(long)  missing 1 byte in System.arraycopy() (Gopal V 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java


  RCFile::sync(long)  missing 1 byte in System.arraycopy()
 -

 Key: HIVE-5100
 URL: https://issues.apache.org/jira/browse/HIVE-5100
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: tagus wang
Assignee: Gopal V
  Labels: regression
 Fix For: 0.12.0

 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch


 this has a bug in this:
 System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix);
 it should be 
 System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix);
 it is missing 1 byte at the end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5145:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Prasanth!

 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750501#comment-13750501
 ] 

Sushanth Sowmyan commented on HIVE-4331:


Oh, and one more thing:

{quote}
With this StorageHandlers can use generic OFs.
{quote}

This assertion is incorrect. A more precise assertion would be that with this 
patch, Hive can use generic OFs that do not do anything useful or necessary in 
their outputcommitters(i.e. do not need calls on them to be made). If an OF is 
designed with an outputcommitter in mind, chances are that it will need some 
retrofitting before it will work from within hive.


 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5151) Going green: Container re-cycling in Tez

2013-08-26 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-5151:


 Summary: Going green: Container re-cycling in Tez
 Key: HIVE-5151
 URL: https://issues.apache.org/jira/browse/HIVE-5151
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch


Tez reuses containers to schedule tasks from same and different vertices in the 
same JVM. It also offers an API to reuse objects across vertices, dags and 
sessions.

For hive we should reuse the operator plan as well as any hash tables (map 
join).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5151) Going green: Container re-cycling in Tez

2013-08-26 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5151:
-

Description: 
Tez reuses containers to schedule tasks from same and different vertices in the 
same JVM. It also offers an API to reuse objects across vertices, dags and 
sessions.

For hive we should reuse the operator plan as well as any hash tables (map 
join).

NO PRECOMMIT TESTS (this is wip for the tez branch)

  was:
Tez reuses containers to schedule tasks from same and different vertices in the 
same JVM. It also offers an API to reuse objects across vertices, dags and 
sessions.

For hive we should reuse the operator plan as well as any hash tables (map 
join).


 Going green: Container re-cycling in Tez
 

 Key: HIVE-5151
 URL: https://issues.apache.org/jira/browse/HIVE-5151
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch


 Tez reuses containers to schedule tasks from same and different vertices in 
 the same JVM. It also offers an API to reuse objects across vertices, dags 
 and sessions.
 For hive we should reuse the operator plan as well as any hash tables (map 
 join).
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750510#comment-13750510
 ] 

Sushanth Sowmyan commented on HIVE-4331:


As it currently stands, SHes *need* , as a prerequisite, to be able to work at 
a partition level, and have outputcommitter concepts before it can be used more 
widely.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez

2013-08-26 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5148:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to branch

 Jam sessions w/ Tez
 ---

 Key: HIVE-5148
 URL: https://issues.apache.org/jira/browse/HIVE-5148
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5148.1.patch


 Tez introduced a session api that let's you reuse certain resources during a 
 session (AM, localized files, etc).
 Hive needs to tie these into hive sessions (for both CLI and HS2)
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750508#comment-13750508
 ] 

Francis Liu commented on HIVE-4331:
---

{quote}
most of the usecases of traditional M/R OFs are already covered by hive, or for 
newer formats being developed, the OF writer winds up making changes so that it 
is hive compatible, such as with orc, or with the HBase SH
{quote}
Yes but ideally they don't really need HOF to do that. 

{quote}
So unless there were a major push to see a BlahOutputFormat that is widely 
used, but was not already usable from within Hive, I don't see there being a 
necessity case for a change in hive that I want.
{quote}
Yep, which is why we want to do it incrementally. Letting it leak into SH and 
hcat code would make the idea of cleaning things up less appealing. I think if 
we just started using SH for new OFs and not use HOF, these pieces would 
naturally go into this state. Having said that it'd be nice if Orc could be 
moved to using storage handlers. It would also help SH mature.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750523#comment-13750523
 ] 

Ashutosh Chauhan commented on HIVE-4331:


[~toffer] One more thing popped up in my discussion with Sushanth. Another 
issue which is in this general area of Hive usage of OF is w.r.t 
OutputCommitter. Hive currently explicitly disables OC and performs lot of 
logic (which folks usually write in OC) from client side. Architecturally, its 
cleaner for Hive to migrate these things from client to OC and make OC first 
class citizen. For folks who has OF which has OC, it will be easier to 
integrate that in Hive, instead of understanding Hive innards and handling of 
OC. Wondering if you have given a thought on this? I just want to make sure if 
and when we go that route these current changes won't get in the way.   

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750526#comment-13750526
 ] 

Francis Liu commented on HIVE-4331:
---

Sigh there's the rub. Are there Jira's for this? Would be great to keep track 
of this in case someone would like to do it.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5151) Going green: Container re-cycling in Tez

2013-08-26 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5151:
-

Attachment: HIVE-5151.1.patch

 Going green: Container re-cycling in Tez
 

 Key: HIVE-5151
 URL: https://issues.apache.org/jira/browse/HIVE-5151
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5151.1.patch


 Tez reuses containers to schedule tasks from same and different vertices in 
 the same JVM. It also offers an API to reuse objects across vertices, dags 
 and sessions.
 For hive we should reuse the operator plan as well as any hash tables (map 
 join).
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-5152:
--

 Summary: Vector operators should inherit from non-vector operators 
for code re-use.
 Key: HIVE-5152
 URL: https://issues.apache.org/jira/browse/HIVE-5152
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


In many cases vectorized operators could share code from non-vector operators 
by inheriting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5152:
---

Attachment: HIVE-5152.1.patch

 Vector operators should inherit from non-vector operators for code re-use.
 --

 Key: HIVE-5152
 URL: https://issues.apache.org/jira/browse/HIVE-5152
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5152.1.patch


 In many cases vectorized operators could share code from non-vector operators 
 by inheriting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750545#comment-13750545
 ] 

Francis Liu commented on HIVE-4331:
---

{quote}
For folks who has OF which has OC, it will be easier to integrate that in Hive, 
instead of understanding Hive innards and handling of OC. Wondering if you have 
given a thought on this? I just want to make sure if and when we go that route 
these current changes won't get in the way.
{quote}
For HCat we already do it this way. It's not really just the OC but the 
OF,OC,RR in general. HOF essentially is doing the Hive specific stuff that the 
plain OC, RR, etc can do as well. So I don't think we changed the complexity of 
the work needed to support new formats? Is that what you meant by get in the 
way? 

In the long run it'd be better since HCat and Hive treat OFs the same way. 
Though it'd be great to document what that contract (beyond the typical OF) is. 


 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.

2013-08-26 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5152:
---

Status: Patch Available  (was: Open)

 Vector operators should inherit from non-vector operators for code re-use.
 --

 Key: HIVE-5152
 URL: https://issues.apache.org/jira/browse/HIVE-5152
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-5152.1.patch


 In many cases vectorized operators could share code from non-vector operators 
 by inheriting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns

2013-08-26 Thread Pala M Muthaia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750551#comment-13750551
 ] 

Pala M Muthaia commented on HIVE-5149:
--

Currently, there is a workaround to avoid this bug, by turning off all reduce 
deduplication optimization: hive.optimize.reducededuplication = false. However, 
that will affect other valid deduplications also, so the user has to be 
educated enough to turn it off selectively, or we turn it off globally in 
hive-site.xml, but give up performance in other cases.

So, using this config is only a short term workaround.

 ReduceSinkDeDuplication can pick the wrong partitioning columns
 ---

 Key: HIVE-5149
 URL: https://issues.apache.org/jira/browse/HIVE-5149
 Project: Hive
  Issue Type: Bug
Reporter: Yin Huai
Assignee: Yin Huai

 https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: LIKE filter pushdown for tables and partitions

2013-08-26 Thread Sergey Shelukhin
Since there's no response I am assuming nobody cares about this code...
Jira is HIVE-5134, I will attach a patch with removal this week.

On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin ser...@hortonworks.comwrote:

 Hi.

 I think there are issues with the way hive can currently do LIKE
 operator JDO pushdown and it the code should be removed for partitions
 and tables.
 Are there objections to removing LIKE from Filter.g and related areas?
 If no I will file a JIRA and do it.

 Details:
 There's code in metastore that is capable of pushing down LIKE
 expression into JDO for string partition keys, as well as tables.
 The code for tables doesn't appear used, and partition code definitely
 doesn't run in Hive proper because metastore client doesn't send LIKE
 expressions to server. It may be used in e.g. HCat and other places,
 but after asking some people here, I found out it probably isn't.
 I was trying to make it run and noticed some problems:
 1) For partitions, Hive sends SQL patterns in a filter for like, e.g.
 %foo%, whereas metastore passes them into matches() JDOQL method
 which expects Java regex.
 2) Converting the pattern to Java regex via UDFLike method, I found
 out that not all regexes appear to work in DN. .*foo seems to work
 but anything complex (such as escaping the pattern using
 Pattern.quote, which UDFLike does) breaks and no longer matches
 properly.
 3) I tried to implement common cases using JDO methods
 startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests
 on Derby, they also appear to have problems with some strings (for
 example, partition with backslash in the name cannot be matched by
 LIKE %\% (single backslash in a string), after being converted to
 .indexOf(param) where param is \ (escaping the backslash once again
 doesn't work either, and anyway there's no documented reason why it
 shouldn't work properly), while other characters match correctly, even
 e.g. %.

 For tables, there's no SQL-like, it expects Java regex, but I am not
 convinced all Java regexes are going to work.

 So, I think that for future correctness sake it's better to remove this
 code.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-5107) Change hive's build to maven

2013-08-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750556#comment-13750556
 ] 

Sergey Shelukhin commented on HIVE-5107:


Would this be a good time to change module structure? I can do a followup patch 
after this is done. It would be nice to separate metastore client from server, 
both for potential external usage, and for internal features where metastore 
server wants to involve QL bits.

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23

2013-08-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750570#comment-13750570
 ] 

Ashutosh Chauhan commented on HIVE-4750:


Current patch passes on MacOS but fails on ubuntu. I think it is caused by 
order in which sub-dirs of a dir are returned by OS, which could vary. I guess 
we need to add order by on partition column for this to work consistently on 
all OS.

 Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
 ---

 Key: HIVE-4750
 URL: https://issues.apache.org/jira/browse/HIVE-4750
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-4750.patch


 Removing 6,7,8 from the scope of HIVE-4746.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750579#comment-13750579
 ] 

Ashutosh Chauhan commented on HIVE-4331:


No HCat and Hive dont treat OFs in same way. This difference of OF handling is 
a reason why HCatOF couldn't be used from Hive, another being HCat uses 
mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that 
will be a win, but thats yet another topic.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5029) direct SQL perf optimization cannot be tested well

2013-08-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5029:
---

Status: Open  (was: Patch Available)

 direct SQL perf optimization cannot be tested well
 --

 Key: HIVE-5029
 URL: https://issues.apache.org/jira/browse/HIVE-5029
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, 
 HIVE-5029.patch


 HIVE-4051 introduced perf optimization that involves getting partitions 
 directly via SQL in metastore. Given that SQL queries might not work on all 
 datastores (and will not work on non-SQL ones), JDO fallback is in place.
 Given that perf improvement is very large for short queries, it's on by 
 default.
 However, there's a problem with tests with regard to that. If SQL code is 
 broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might 
 allow tests to pass.
 We are going to disable SQL by default before the testing problem is resolved.
 There are several possible solultions:
 1) Separate build for this setting. Seems like an overkill...
 2) Enable by default; disable by default in tests, create a clone of 
 TestCliDriver with a subset of queries that will exercise the SQL path.
 3) Have some sort of test hook inside metastore that will run both ORM and 
 SQL and compare.
 3') Or make a subclass of ObjectStore that will do that. ObjectStore is 
 already pluggable.
 4) Write unit tests for one of the modes (JDO, as non-default?) and declare 
 that they are sufficient; disable fallback in tests.
 3' seems like the easiest. For now we will disable SQL by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23

2013-08-26 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-4750:
-

Attachment: HIVE-4750.2.patch

 Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
 ---

 Key: HIVE-4750
 URL: https://issues.apache.org/jira/browse/HIVE-4750
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-4750.2.patch, HIVE-4750.patch


 Removing 6,7,8 from the scope of HIVE-4746.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5153) current database in hive prompt in cli remote mode is incorrect

2013-08-26 Thread Thejas M Nair (JIRA)
Thejas M Nair created HIVE-5153:
---

 Summary: current database in hive prompt in cli remote mode is 
incorrect
 Key: HIVE-5153
 URL: https://issues.apache.org/jira/browse/HIVE-5153
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0
Reporter: Thejas M Nair


HIVE-2233 added a feature to show current database on hive cli prompt. 
The current implementation will work only with local mode. It will not work if 
you try connecting to hive server (v1) from hive cli. 

This is because it relies on the Hive object in current thread to have the 
right current database information. But when remote mode is used, the client 
side Hive object does not get updated when database is changed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5153) current database in hive prompt in cli remote mode is incorrect

2013-08-26 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750596#comment-13750596
 ] 

Thejas M Nair commented on HIVE-5153:
-

Note that this is not an issue with beeline+hive-server2 which in my opinion is 
the recommended client and server to be used.



 current database in hive prompt in cli remote mode is incorrect
 ---

 Key: HIVE-5153
 URL: https://issues.apache.org/jira/browse/HIVE-5153
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
Reporter: Thejas M Nair

 HIVE-2233 added a feature to show current database on hive cli prompt. 
 The current implementation will work only with local mode. It will not work 
 if you try connecting to hive server (v1) from hive cli. 
 This is because it relies on the Hive object in current thread to have the 
 right current database information. But when remote mode is used, the client 
 side Hive object does not get updated when database is changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23

2013-08-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750594#comment-13750594
 ] 

Prasanth J commented on HIVE-4750:
--

Added order by to make the test cases works consistently across OSes. I tested 
it on Mac and CentOS 6. Can you please test it on Ubuntu before committing it 
(I don't have an ubuntu installation)?

 Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
 ---

 Key: HIVE-4750
 URL: https://issues.apache.org/jira/browse/HIVE-4750
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 0.12.0
Reporter: Brock Noland
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-4750.2.patch, HIVE-4750.patch


 Removing 6,7,8 from the scope of HIVE-4746.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750598#comment-13750598
 ] 

Francis Liu commented on HIVE-4331:
---

{quote}
No HCat and Hive dont treat OFs in same way. This difference of OF handling is 
a reason why HCatOF couldn't be used from Hive, another being HCat uses 
mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that 
will be a win, but thats yet another topic.
{quote}
Currently they don't mainly because of HOF but they behave in almost the same 
way else this whole interoperability story is broken. With this patch they'll 
at least be closer when it comes to dealing with OFs that don't use HOF. 
Instead of having to mirror that behavior.

Actually AFAIK only the HCatOF wrapper classes uses mapreduce and the 
underlying stuff deals with mapred which we did as part of the 
StorageDriver-SerDe migration. So it'd be relatively easy to support a mapred 
version of HCatOF.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750599#comment-13750599
 ] 

Prasanth J commented on HIVE-5145:
--

I think this patch should also fail in Ubuntu (similar to HIVE-4750) as the 
results are dependent on the order of subdirectories under a partition. Can you 
please revert the 2nd patch and apply the 1st patch since first patch has order 
by clause which makes the result consistent across OSes.

 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)

2013-08-26 Thread Benjamin Jakobus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750601#comment-13750601
 ] 

Benjamin Jakobus commented on HIVE-5018:


Bump :)

 Avoiding object instantiation in loops (issue 6)
 

 Key: HIVE-5018
 URL: https://issues.apache.org/jira/browse/HIVE-5018
 Project: Hive
  Issue Type: Sub-task
Reporter: Benjamin Jakobus
Assignee: Benjamin Jakobus
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-5018.1.patch.txt


 Object instantiation inside loops is very expensive. Where possible, object 
 references should be created outside the loop so that they can be reused.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: LIKE filter pushdown for tables and partitions

2013-08-26 Thread Ashutosh Chauhan
Couple of questions:

1. What about LIKE operator for Hive itself? Will that continue to work
(presumably because there is an alternative path for that).
2. This will nonetheless break other direct consumers of metastore client
api (like HCatalog).

I see your point that we have a buggy implementation, so whats out there is
not safe to use. Question than really is shall we remove this code, thereby
breaking people for whom current buggy implementation is good enough (or
you can say salvage them from breaking in future). Or shall we try to fix
it now?
My take is if there are no users of this anyways, then there is no point
fixing it for non-existing users, but if there are we probably have to. I
will suggest you to send an email to users@hive to ask if there are users
for this.

Thanks,
Ashutosh



On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin ser...@hortonworks.comwrote:

 Since there's no response I am assuming nobody cares about this code...
 Jira is HIVE-5134, I will attach a patch with removal this week.

 On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin ser...@hortonworks.com
 wrote:

  Hi.
 
  I think there are issues with the way hive can currently do LIKE
  operator JDO pushdown and it the code should be removed for partitions
  and tables.
  Are there objections to removing LIKE from Filter.g and related areas?
  If no I will file a JIRA and do it.
 
  Details:
  There's code in metastore that is capable of pushing down LIKE
  expression into JDO for string partition keys, as well as tables.
  The code for tables doesn't appear used, and partition code definitely
  doesn't run in Hive proper because metastore client doesn't send LIKE
  expressions to server. It may be used in e.g. HCat and other places,
  but after asking some people here, I found out it probably isn't.
  I was trying to make it run and noticed some problems:
  1) For partitions, Hive sends SQL patterns in a filter for like, e.g.
  %foo%, whereas metastore passes them into matches() JDOQL method
  which expects Java regex.
  2) Converting the pattern to Java regex via UDFLike method, I found
  out that not all regexes appear to work in DN. .*foo seems to work
  but anything complex (such as escaping the pattern using
  Pattern.quote, which UDFLike does) breaks and no longer matches
  properly.
  3) I tried to implement common cases using JDO methods
  startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests
  on Derby, they also appear to have problems with some strings (for
  example, partition with backslash in the name cannot be matched by
  LIKE %\% (single backslash in a string), after being converted to
  .indexOf(param) where param is \ (escaping the backslash once again
  doesn't work either, and anyway there's no documented reason why it
  shouldn't work properly), while other characters match correctly, even
  e.g. %.
 
  For tables, there's no SQL-like, it expects Java regex, but I am not
  convinced all Java regexes are going to work.
 
  So, I think that for future correctness sake it's better to remove this
  code.
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Proposing a 0.11.1

2013-08-26 Thread Eric Chu
Hi,

We recently tried to upgrade to Hive 0.11 from 0.10 and noticed we needed
to add patches for the following JIRAs to make Hive 0.11 work:

   - HIVE-4619. None of our MR queries worked without it
   - HIVE-4003. We ran into this bug in 0.10 also
   - Data Nucleus-related: At first we were only trying to get rid of the
   Data Nucleus error messages, but one Jira led to another, and we ended up
   upgrading Data Nucleus with the following:
  - HIVE-4900, HIVE-3632, HIVE-4942 (and then do ant very-clean to
  clean up old version of Data Nucleus)
   - HIVE-4647

On a separate note:

   - HIVE-5149. This bug made us revert to Hive-0.10 b/c at the time we
   didn't know what's wrong. See email titled DISTRIBUTE BY works incorrectly
   in Hive 0.11 in some cases, which led to the creation of HIVE-5149. The
   proposed workaround works but as we noted there we still need the right
   fix. The property hive.optimize.reducededuplication already existed in
   Hive 0.10 so this is a regression.

Thanks,

Eric


On Mon, Aug 26, 2013 at 12:56 PM, Owen O'Malley omal...@apache.org wrote:

 Hi Mark,
I haven't made any progress on it yet. I hope to make progress on it
 this week. I will certainly include the npath changes. On a separate
 thread, I'll start a discussion about starting to lock down 0.12.0.

 -- Owen


 On Mon, Aug 26, 2013 at 10:20 AM, Mark Grover m...@apache.org wrote:

  Hi folks,
  Any update on this? We are considering including Hive 0.11* in Bigtop 0.7
  and it would be very useful and much appreciated to get a little more
  context into what the Hive 0.11.1 release would look like.
 
  Thanks in advance!
  Mark
 
 
  On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   I am fealing more like we should release a 12.0 rather then backport
  things
   into 11.X.
  
  
  
  
   On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote:
  
If this is only for addressing npath problem, we got three months for
   that.
   
Would it be enough time for releasing 0.12.0?
   
ps. IMHO, n-path seemed too generic name to be patented. I hate
  Teradata.
   
2013/8/14 Edward Capriolo edlinuxg...@gmail.com:
 Should we get the npath rename in? Do we have a jira for this? If
  not I
 will take it.


 On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner 
  wagner.mar...@gmail.com
wrote:

 It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953
 has
   been
 committed to trunk and it looks like 4789 is close.

 Thanks,
 Mark

 On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley 
 omal...@apache.org
  
 wrote:

  All,
 I'd like to create an 0.11.1 with some fixes in it. I plan to
  put
  together a release candidate over the next week. I'm in the
  process
   of
  putting together the list of bugs that I want to include, but I
wanted to
  solicit the jiras that others though would be important for an
   0.11.1.
 
  Thanks,
 Owen
 

   
  
 



[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-08-26 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750620#comment-13750620
 ] 

Sushanth Sowmyan commented on HIVE-4331:


[~ashutoshc] : The mapreduce-mapred transition is not a tough one by 
itself(apart from losing abort/commit semantics, which need to be emulated). 
The Outputcommitter concept is quite a bit more involved though.

For now, I'm still +1 on this patch(the hive side) because it moves us one step 
closer to that goal. I don't think we should require the OC concept being 
brought in as well for it to be committed, I brought that up because I wanted 
to highlight that we aren't done with this patch, and more work is required 
past it, and this patch alone does not make generic OFs workable from within 
Hive.


 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750638#comment-13750638
 ] 

Phabricator commented on HIVE-4002:
---

yhuai has commented on the revision HIVE-4002 [jira] Fetch task aggregation 
for simple group by query.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3631 Seems 
that this line is the same as the line 3633
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 Why do 
we need to change getInternalName to field? If we want to use field instead of 
getInternalName, can you also make this to other places of this class?
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 why do we need 
flushOp? I think it is not necessary to have flushOp. Also, can you change an 
blocking operator to a blocking operator? I am sorry about the typo I made...
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think we 
can just use operator.flush() to tell GBY to process its buffer.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: No java compiler available exception for HWI

2013-08-26 Thread Eric Chu
Hi Bing,

Sorry for the late reply. I was in another Hive hell myself last week.

We did get HWI to work but unfortunately it's so long ago that I don't
remember exactly what we did for this problem. I think we had that problem
only in manual set up and testing, and the problem went away when we
switched to production script that deploys Hive.

I do remember we needed to add core-3.1.1.jar ourself to make HWI work b/c
Hive 10 didn't include that for some reason. We also needed to set
HADOOP_HOME in bin/hive-wrapper.sh.

We have since deprecated HWI and only use HUE now.

Eric


On Wed, Aug 21, 2013 at 8:02 AM, Bing Li sarah.lib...@gmail.com wrote:

 Hi, Edward
 I filed it as HIVE-5132, did you mean this one?


 2013/8/21 Edward Capriolo edlinuxg...@gmail.com

  We rally should pre compile the jsp. There ia a jira on this somewhere.
 
  On Tuesday, August 20, 2013, Bing Li sarah.lib...@gmail.com wrote:
   Hi, Eric et al
   Did you resolve this failure?
   I'm using Hive-0.11.0, and get the same error when access to HWI via
  browser.
  
   I already set the following properties in hive-site.xml
   - hive.hwi.listen.host
   - hive.hwi.listen.port
   - hive.hwi.war.file
  
   And copied two jasper jars into hive/lib:
   - jasper-compiler-5.5.23.jar
   - jasper-runtime-5.5.23.jar
  
   Thanks,
   - Bing
  
   2013/8/20 Bing Li sarah.lib...@gmail.com
  
   Hi, Eric et al
   Did you resolve this failure?
   I'm using Hive-0.11.0, and get the same error when access to HWI via
  browser.
  
   I already set the following properties in hive-site.xml
   - hive.hwi.listen.host
   - hive.hwi.listen.port
   - hive.hwi.war.file
  
   And copied two jasper jars into hive/lib:
   - jasper-compiler-5.5.23.jar
   - jasper-runtime-5.5.23.jar
  
   Thanks,
   - Bing
  
  
   2013/3/30 Eric Chu e...@rocketfuel.com
  
   Hi,
   I'm running Hive 0.10 and I want to support HWI (besides CLI and HUE).
  When I started HWI I didn't get any error. However, when I went to Hive
  Server Address:/hwi on my browser I saw the error below complaining
  about No Java compiler available. My JAVA_HOME is set
  to /usr/lib/jvm/java-1.6.0-sun-1.6.0.16.
   Besides https://cwiki.apache.org/Hive/hivewebinterface.html, there's
 not
  much documentation on HWI. I'm wondering if anyone else has seen this or
  has any idea about what's wrong?
   Thanks.
   Eric
  
   Problem accessing /hwi/. Reason:
  
   No Java compiler available
  
   Caused by:
  
   java.lang.IllegalStateException: No Java compiler available
   at
 
 
 org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225)
   at
 
 
 org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560)
   at
 
 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299)
   at
  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
   at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at
  org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at
 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
  org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at
  org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327)
   at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
   at
  org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
   at
  org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
   at
 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
  org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at
  org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at
  org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at
 
 
 org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49)
   at org.mortbay.jetty.handler.
 



[jira] [Updated] (HIVE-5128) Direct SQL for view is failing

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5128:
--

Attachment: HIVE-5128.D12465.2.patch

sershe updated the revision HIVE-5128 [jira] Direct SQL for view is failing.

  Updated the patch. I tested the example, seems to work now on my setup.

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12465

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12465?vs=38727id=39039#toc

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java

To: JIRA, sershe


 Direct SQL for view is failing 
 ---

 Key: HIVE-5128
 URL: https://issues.apache.org/jira/browse/HIVE-5128
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch


 I cannot sure of this, but dropping views, (it rolls back to JPA and works 
 fine)
 {noformat}
 etastore.ObjectStore: Direct SQL failed, falling back to ORM
 MetaException(message:Unexpected null for one of the IDs, SD null, column 
 null, serde null)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758)
 ...
 {noformat}
 Should it be disabled for views or can be fixed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750734#comment-13750734
 ] 

Hudson commented on HIVE-5145:
--

FAILURE: Integrated in Hive-trunk-hadoop1-ptest #139 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/139/])
HIVE-5145 : Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 
(Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517682)
* 
/hive/trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out


 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-5151) Going green: Container re-cycling in Tez

2013-08-26 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-5151.
--

Resolution: Fixed

 Going green: Container re-cycling in Tez
 

 Key: HIVE-5151
 URL: https://issues.apache.org/jira/browse/HIVE-5151
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-5151.1.patch


 Tez reuses containers to schedule tasks from same and different vertices in 
 the same JVM. It also offers an API to reuse objects across vertices, dags 
 and sessions.
 For hive we should reuse the operator plan as well as any hash tables (map 
 join).
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog

2013-08-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-4895:
-

Attachment: HIVE-4895.update.patch
HIVE-4895.move.patch
HIVE-4895.patch

HIVE-4895.patch cummulative patch
HIVE-4895.move.patch only the 'git mv' part
HIVE-4895.update.patch - code changes



 Move all HCatalog classes to org.apache.hive.hcatalog
 -

 Key: HIVE-4895
 URL: https://issues.apache.org/jira/browse/HIVE-4895
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, 
 HIVE-4895.update.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 make sure to preserve history in SCM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog

2013-08-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750809#comment-13750809
 ] 

Eugene Koifman commented on HIVE-4895:
--

HCat unit tests and webhcat e2e tests passed

 Move all HCatalog classes to org.apache.hive.hcatalog
 -

 Key: HIVE-4895
 URL: https://issues.apache.org/jira/browse/HIVE-4895
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, 
 HIVE-4895.update.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 make sure to preserve history in SCM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog

2013-08-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-4895:
-

Status: Patch Available  (was: Open)

 Move all HCatalog classes to org.apache.hive.hcatalog
 -

 Key: HIVE-4895
 URL: https://issues.apache.org/jira/browse/HIVE-4895
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, 
 HIVE-4895.update.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 make sure to preserve history in SCM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-08-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: char_progress.patch.7.txt

Apparently the patches I've been attaching (which I downloaded from 
Phabricator) are not applying correctly.  Attaching char_progress.patch.7.txt 
which should be the same progress as char_progress.patch.6.txt, but with the 
patch generated from my git repository.

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: char_progress.patch.7.txt, HIVE-4844.1.patch.hack, 
 HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, 
 HIVE-4844.6.patch, screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog

2013-08-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750811#comment-13750811
 ] 

Eugene Koifman commented on HIVE-4895:
--

groupid in pom.xml was also changed from org.apache.hcatalog to 
org.apache.hive.hcatalog

 Move all HCatalog classes to org.apache.hive.hcatalog
 -

 Key: HIVE-4895
 URL: https://issues.apache.org/jira/browse/HIVE-4895
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, 
 HIVE-4895.update.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 make sure to preserve history in SCM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work logged] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog

2013-08-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4895?focusedWorklogId=14841page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-14841
 ]

Eugene Koifman logged work on HIVE-4895:


Author: Eugene Koifman
Created on: 27/Aug/13 01:30
Start Date: 27/Aug/13 01:29
Worklog Time Spent: 12h 

Issue Time Tracking
---

Worklog Id: (was: 14841)
Time Spent: 12h
Remaining Estimate: 12h  (was: 24h)

 Move all HCatalog classes to org.apache.hive.hcatalog
 -

 Key: HIVE-4895
 URL: https://issues.apache.org/jira/browse/HIVE-4895
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.12.0

 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, 
 HIVE-4895.update.patch

   Original Estimate: 24h
  Time Spent: 12h
  Remaining Estimate: 12h

 make sure to preserve history in SCM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-08-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: HIVE-4844.7.patch

attached wrong patch .. replacing char_progress.7.patch with HIVE-4844.7.patch.

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, 
 HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, 
 HIVE-4844.7.patch, screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-08-26 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Attachment: (was: char_progress.patch.7.txt)

 Add char/varchar data types
 ---

 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, 
 HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, 
 HIVE-4844.7.patch, screenshot.png


 Add new char/varchar data types which have support for more SQL-compliant 
 behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Proposing a 0.11.1

2013-08-26 Thread Mark Grover
Hi Owen,
Sounds good. Thanks for the update!

Mark


On Mon, Aug 26, 2013 at 12:56 PM, Owen O'Malley omal...@apache.org wrote:

 Hi Mark,
I haven't made any progress on it yet. I hope to make progress on it
 this week. I will certainly include the npath changes. On a separate
 thread, I'll start a discussion about starting to lock down 0.12.0.

 -- Owen


 On Mon, Aug 26, 2013 at 10:20 AM, Mark Grover m...@apache.org wrote:

  Hi folks,
  Any update on this? We are considering including Hive 0.11* in Bigtop 0.7
  and it would be very useful and much appreciated to get a little more
  context into what the Hive 0.11.1 release would look like.
 
  Thanks in advance!
  Mark
 
 
  On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.com
  wrote:
 
   I am fealing more like we should release a 12.0 rather then backport
  things
   into 11.X.
  
  
  
  
   On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote:
  
If this is only for addressing npath problem, we got three months for
   that.
   
Would it be enough time for releasing 0.12.0?
   
ps. IMHO, n-path seemed too generic name to be patented. I hate
  Teradata.
   
2013/8/14 Edward Capriolo edlinuxg...@gmail.com:
 Should we get the npath rename in? Do we have a jira for this? If
  not I
 will take it.


 On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner 
  wagner.mar...@gmail.com
wrote:

 It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953
 has
   been
 committed to trunk and it looks like 4789 is close.

 Thanks,
 Mark

 On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley 
 omal...@apache.org
  
 wrote:

  All,
 I'd like to create an 0.11.1 with some fixes in it. I plan to
  put
  together a release candidate over the next week. I'm in the
  process
   of
  putting together the list of bugs that I want to include, but I
wanted to
  solicit the jiras that others though would be important for an
   0.11.1.
 
  Thanks,
 Owen
 

   
  
 



[jira] [Commented] (HIVE-5107) Change hive's build to maven

2013-08-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750837#comment-13750837
 ] 

Edward Capriolo commented on HIVE-5107:
---

Right now we are not making any branches/patches yet. Our plan is to hack at 
github and then once we get everything working like we like open a hive branch 
and do it all again. Breaking up meta-store sounds ok.

 Change hive's build to maven
 

 Key: HIVE-5107
 URL: https://issues.apache.org/jira/browse/HIVE-5107
 Project: Hive
  Issue Type: Task
Reporter: Edward Capriolo
Assignee: Edward Capriolo

 I can not cope with hive's build infrastructure any more. I have started 
 working on porting the project to maven. When I have some solid progess i 
 will github the entire thing for review. Then we can talk about switching the 
 project somehow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3562:
--

Attachment: HIVE-3562.D5967.9.patch

navis updated the revision HIVE-3562 [jira] Some limit can be pushed down to 
map stage.

  Addressed comments

Reviewers: ashutoshc, JIRA, tarball

REVISION DETAIL
  https://reviews.facebook.net/D5967

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D5967?vs=39015id=39051#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/build.xml
  ql/ivy.xml
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java
  ql/src/test/queries/clientpositive/limit_pushdown.q
  ql/src/test/queries/clientpositive/limit_pushdown_negative.q
  ql/src/test/results/clientpositive/limit_pushdown.q.out
  ql/src/test/results/clientpositive/limit_pushdown_negative.q.out

To: JIRA, tarball, ashutoshc, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23

2013-08-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750890#comment-13750890
 ] 

Hudson commented on HIVE-5145:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #71 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/71/])
HIVE-5145 : Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 
(Prasanth J via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517682)
* 
/hive/trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out


 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
 

 Key: HIVE-5145
 URL: https://issues.apache.org/jira/browse/HIVE-5145
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.12.0

 Attachments: HIVE-5145.2.patch, HIVE-5145.patch


 there is some determinism related to the output of 
 list_bucket_query_multiskew_2.q test case. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: LIKE filter pushdown for tables and partitions

2013-08-26 Thread Sergey Shelukhin
Adding user list. Any objections to removing LIKE support from
getPartitionsByFilter?

On Mon, Aug 26, 2013 at 2:54 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Couple of questions:

 1. What about LIKE operator for Hive itself? Will that continue to work
 (presumably because there is an alternative path for that).
 2. This will nonetheless break other direct consumers of metastore client
 api (like HCatalog).

 I see your point that we have a buggy implementation, so whats out there is
 not safe to use. Question than really is shall we remove this code, thereby
 breaking people for whom current buggy implementation is good enough (or
 you can say salvage them from breaking in future). Or shall we try to fix
 it now?
 My take is if there are no users of this anyways, then there is no point
 fixing it for non-existing users, but if there are we probably have to. I
 will suggest you to send an email to users@hive to ask if there are users
 for this.

 Thanks,
 Ashutosh



 On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin ser...@hortonworks.com
 wrote:

  Since there's no response I am assuming nobody cares about this code...
  Jira is HIVE-5134, I will attach a patch with removal this week.
 
  On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin 
 ser...@hortonworks.com
  wrote:
 
   Hi.
  
   I think there are issues with the way hive can currently do LIKE
   operator JDO pushdown and it the code should be removed for partitions
   and tables.
   Are there objections to removing LIKE from Filter.g and related areas?
   If no I will file a JIRA and do it.
  
   Details:
   There's code in metastore that is capable of pushing down LIKE
   expression into JDO for string partition keys, as well as tables.
   The code for tables doesn't appear used, and partition code definitely
   doesn't run in Hive proper because metastore client doesn't send LIKE
   expressions to server. It may be used in e.g. HCat and other places,
   but after asking some people here, I found out it probably isn't.
   I was trying to make it run and noticed some problems:
   1) For partitions, Hive sends SQL patterns in a filter for like, e.g.
   %foo%, whereas metastore passes them into matches() JDOQL method
   which expects Java regex.
   2) Converting the pattern to Java regex via UDFLike method, I found
   out that not all regexes appear to work in DN. .*foo seems to work
   but anything complex (such as escaping the pattern using
   Pattern.quote, which UDFLike does) breaks and no longer matches
   properly.
   3) I tried to implement common cases using JDO methods
   startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests
   on Derby, they also appear to have problems with some strings (for
   example, partition with backslash in the name cannot be matched by
   LIKE %\% (single backslash in a string), after being converted to
   .indexOf(param) where param is \ (escaping the backslash once again
   doesn't work either, and anyway there's no documented reason why it
   shouldn't work properly), while other characters match correctly, even
   e.g. %.
  
   For tables, there's no SQL-like, it expects Java regex, but I am not
   convinced all Java regexes are going to work.
  
   So, I think that for future correctness sake it's better to remove this
   code.
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Updated] (HIVE-5128) Direct SQL for view is failing

2013-08-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5128:
---

Status: Patch Available  (was: Open)

 Direct SQL for view is failing 
 ---

 Key: HIVE-5128
 URL: https://issues.apache.org/jira/browse/HIVE-5128
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Sergey Shelukhin
Priority: Trivial
 Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch


 I cannot sure of this, but dropping views, (it rolls back to JPA and works 
 fine)
 {noformat}
 etastore.ObjectStore: Direct SQL failed, falling back to ORM
 MetaException(message:Unexpected null for one of the IDs, SD null, column 
 null, serde null)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195)
   at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98)
   at 
 org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758)
 ...
 {noformat}
 Should it be disabled for views or can be fixed?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

2013-08-26 Thread Navis (JIRA)
Navis created HIVE-5154:
---

 Summary: Remove unnecessary array creation in ReduceSinkOperator
 Key: HIVE-5154
 URL: https://issues.apache.org/jira/browse/HIVE-5154
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


Key array is created for each row, which seemed not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5154:
--

Attachment: HIVE-5154.D12549.1.patch

navis requested code review of HIVE-5154 [jira] Remove unnecessary array 
creation in ReduceSinkOperator.

Reviewers: JIRA

HIVE-5154 Remove unnecessary array creation in ReduceSinkOperator

Key array is created for each row, which seemed not necessary.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12549

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/30117/

To: JIRA, navis


 Remove unnecessary array creation in ReduceSinkOperator
 ---

 Key: HIVE-5154
 URL: https://issues.apache.org/jira/browse/HIVE-5154
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5154.D12549.1.patch


 Key array is created for each row, which seemed not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator

2013-08-26 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5154:


Status: Patch Available  (was: Open)

 Remove unnecessary array creation in ReduceSinkOperator
 ---

 Key: HIVE-5154
 URL: https://issues.apache.org/jira/browse/HIVE-5154
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-5154.D12549.1.patch


 Key array is created for each row, which seemed not necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750945#comment-13750945
 ] 

Phabricator commented on HIVE-4002:
---

navis has commented on the revision HIVE-4002 [jira] Fetch task aggregation 
for simple group by query.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3631 Right. 
I'll fix that.
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 It's 
the same thing. I just want to be more consistent.
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 I need recursive 
flush method for implementing this, like what init or close method does. I 
think I've broken something rebasing the patch. Can I ask what query was not 
working with this patch? Test framework seemed not working recently.
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 Flush 
should be called to all operators in execution tree, for this patch.

REVISION DETAIL
  https://reviews.facebook.net/D8739

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query

2013-08-26 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4002:
--

Attachment: HIVE-4002.D8739.4.patch

navis updated the revision HIVE-4002 [jira] Fetch task aggregation for simple 
group by query.

  Addressed comments

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D8739

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D8739?vs=38829id=39063#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/queries/clientpositive/fetch_aggregation.q
  ql/src/test/results/clientpositive/fetch_aggregation.q.out
  ql/src/test/results/compiler/plan/groupby1.q.xml
  ql/src/test/results/compiler/plan/groupby2.q.xml
  ql/src/test/results/compiler/plan/groupby3.q.xml
  ql/src/test/results/compiler/plan/groupby5.q.xml
  serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java

To: JIRA, navis
Cc: yhuai


 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, 
 HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE

2013-08-26 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4375:


Status: Patch Available  (was: Open)

 Single sourced multi insert consists of native and non-native table mixed 
 throws NPE
 

 Key: HIVE-4375
 URL: https://issues.apache.org/jira/browse/HIVE-4375
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch, 
 HIVE-4375.D10329.3.patch


 CREATE TABLE src_x1(key string, value string);
 CREATE TABLE src_x2(key string, value string)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string);
 explain
 from src a
 insert overwrite table src_x1
 select key,value where a.key  0 AND a.key  50
 insert overwrite table src_x2
 select key,value where a.key  50 AND a.key  100;
 throws,
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236)
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2013-08-26 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750955#comment-13750955
 ] 

Phabricator commented on HIVE-3562:
---

ashutoshc has accepted the revision HIVE-3562 [jira] Some limit can be pushed 
down to map stage.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D5967

BRANCH
  HIVE-3562

ARCANIST PROJECT
  hive

To: JIRA, tarball, ashutoshc, navis
Cc: njain


 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, 
 HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, 
 HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, 
 HIVE-3562.D5967.9.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE

2013-08-26 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4375:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Single sourced multi insert consists of native and non-native table mixed 
 throws NPE
 

 Key: HIVE-4375
 URL: https://issues.apache.org/jira/browse/HIVE-4375
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Navis
Assignee: Navis
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch, 
 HIVE-4375.D10329.3.patch


 CREATE TABLE src_x1(key string, value string);
 CREATE TABLE src_x2(key string, value string)
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string);
 explain
 from src a
 insert overwrite table src_x1
 select key,value where a.key  0 AND a.key  50
 insert overwrite table src_x2
 select key,value where a.key  50 AND a.key  100;
 throws,
 {noformat}
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236)
   at 
 org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67)
   at 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354)
   at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759)
   at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
   at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >