[jira] [Updated] (HIVE-5126) Make vector expressions serializable.
[ https://issues.apache.org/jira/browse/HIVE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5126: --- Attachment: HIVE-5126.1.patch Make vector expressions serializable. - Key: HIVE-5126 URL: https://issues.apache.org/jira/browse/HIVE-5126 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5126.1.patch We should make all vectorized expressions serializable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5146) FilterExprOrExpr changes the order of the rows
[ https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5146: --- Status: Patch Available (was: Open) FilterExprOrExpr changes the order of the rows -- Key: HIVE-5146 URL: https://issues.apache.org/jira/browse/HIVE-5146 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch FilterExprOrExpr changes the order of the rows which might break some UDFs that assume an order in data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5126) Make vector expressions serializable.
[ https://issues.apache.org/jira/browse/HIVE-5126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5126: --- Status: Patch Available (was: Open) Make vector expressions serializable. - Key: HIVE-5126 URL: https://issues.apache.org/jira/browse/HIVE-5126 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5126.1.patch We should make all vectorized expressions serializable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.
[ https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4959: --- Attachment: HIVE-4959.1.patch Patch uploaded. Vectorized plan generation should be added as an optimization transform. Key: HIVE-4959 URL: https://issues.apache.org/jira/browse/HIVE-4959 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4959.1.patch Currently the query plan is vectorized at the query run time in the map task. It will be much cleaner to add vectorization as an optimization step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.
[ https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-4959: --- Status: Patch Available (was: Open) Vectorized plan generation should be added as an optimization transform. Key: HIVE-4959 URL: https://issues.apache.org/jira/browse/HIVE-4959 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4959.1.patch Currently the query plan is vectorized at the query run time in the map task. It will be much cleaner to add vectorization as an optimization step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4959) Vectorized plan generation should be added as an optimization transform.
[ https://issues.apache.org/jira/browse/HIVE-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749867#comment-13749867 ] Jitendra Nath Pandey commented on HIVE-4959: This jira requires all vector expressions to be serializable (HIVE-5126). This jira also requires HIVE-5146 for some of the tests to work. Vectorized plan generation should be added as an optimization transform. Key: HIVE-4959 URL: https://issues.apache.org/jira/browse/HIVE-4959 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-4959.1.patch Currently the query plan is vectorized at the query run time in the map task. It will be much cleaner to add vectorization as an optimization step. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3562: -- Attachment: HIVE-3562.D5967.7.patch navis updated the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. Addressed some comments Reviewers: ashutoshc, JIRA, tarball REVISION DETAIL https://reviews.facebook.net/D5967 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D5967?vs=38379id=39009#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/build.xml ql/ivy.xml ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java ql/src/test/queries/clientpositive/limit_pushdown.q ql/src/test/queries/clientpositive/limit_pushdown_negative.q ql/src/test/results/clientpositive/limit_pushdown.q.out ql/src/test/results/clientpositive/limit_pushdown_negative.q.out To: JIRA, tarball, ashutoshc, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5132) Can't access to hwi due to No Java compiler available
[ https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-5132: -- Assignee: Bing Li Can't access to hwi due to No Java compiler available --- Key: HIVE-5132 URL: https://issues.apache.org/jira/browse/HIVE-5132 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Environment: JDK1.6, hadoop 2.0.4-alpha Reporter: Bing Li Assignee: Bing Li Priority: Critical I want to use hwi to submit hive queries, but after start hwi successfully, I can't open the web page of it. I noticed that someone also met the same issue in hive-0.10. Reproduce steps: -- 1. start hwi bin/hive --config $HIVE_CONF_DIR --service hwi 2. access to http://hive_hwi_node:/hwi via browser got the following error message: HTTP ERROR 500 Problem accessing /hwi/. Reason: No Java compiler available Caused by: java.lang.IllegalStateException: No Java compiler available at org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5132) Can't access to hwi due to No Java compiler available
[ https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749963#comment-13749963 ] Bing Li commented on HIVE-5132: --- The root cause of this failure is that ANT_LIB is NOT setting in hwi server. But I can resolve this failure when copy the following two ant jars into $HIVE_HOME/lib - ant-launcher.jar - ant.jar I think we can add ant as the runtime dependency of hive. Can't access to hwi due to No Java compiler available --- Key: HIVE-5132 URL: https://issues.apache.org/jira/browse/HIVE-5132 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Environment: JDK1.6, hadoop 2.0.4-alpha Reporter: Bing Li Assignee: Bing Li Priority: Critical I want to use hwi to submit hive queries, but after start hwi successfully, I can't open the web page of it. I noticed that someone also met the same issue in hive-0.10. Reproduce steps: -- 1. start hwi bin/hive --config $HIVE_CONF_DIR --service hwi 2. access to http://hive_hwi_node:/hwi via browser got the following error message: HTTP ERROR 500 Problem accessing /hwi/. Reason: No Java compiler available Caused by: java.lang.IllegalStateException: No Java compiler available at org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3562: -- Attachment: HIVE-3562.D5967.8.patch navis updated the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. Missed ASF header Removed unnecessary array creation in RS Reviewers: ashutoshc, JIRA, tarball REVISION DETAIL https://reviews.facebook.net/D5967 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D5967?vs=39009id=39015#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/build.xml ql/ivy.xml ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java ql/src/test/queries/clientpositive/limit_pushdown.q ql/src/test/queries/clientpositive/limit_pushdown_negative.q ql/src/test/results/clientpositive/limit_pushdown.q.out ql/src/test/results/clientpositive/limit_pushdown_negative.q.out To: JIRA, tarball, ashutoshc, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-5147: --- Assignee: Navis Newly added test TestSessionHooks is failing on trunk - Key: HIVE-5147 URL: https://issues.apache.org/jira/browse/HIVE-5147 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan Assignee: Navis This was recently added via HIVE-4588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750010#comment-13750010 ] Navis commented on HIVE-5147: - Sorry, I've committed wrong version of patch, which I've modified. I'll rollback that. Newly added test TestSessionHooks is failing on trunk - Key: HIVE-5147 URL: https://issues.apache.org/jira/browse/HIVE-5147 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan Assignee: Navis This was recently added via HIVE-4588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5147: -- Attachment: HIVE-5147.D12543.1.patch navis requested code review of HIVE-5147 [jira] Newly added test TestSessionHooks is failing on trunk. Reviewers: JIRA HIVE-5147 Newly added test TestSessionHooks is failing on trunk This was recently added via HIVE-4588 TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12543 AFFECTED FILES service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContext.java service/src/java/org/apache/hive/service/cli/session/HiveSessionHookContextImpl.java service/src/java/org/apache/hive/service/cli/session/SessionManager.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/30087/ To: JIRA, navis Newly added test TestSessionHooks is failing on trunk - Key: HIVE-5147 URL: https://issues.apache.org/jira/browse/HIVE-5147 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5147.D12543.1.patch This was recently added via HIVE-4588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases
forgot to add in my last reply To generate correct results, you can set hive.optimize.reducededuplication to false to turn off ReduceSinkDeDuplication On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai huaiyin@gmail.com wrote: Created a jira https://issues.apache.org/jira/browse/HIVE-5149 On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.com wrote: Seems ReduceSinkDeDuplication picked the wrong partitioning columns. On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote: I think the problem lies with in the group by operation. For this optimization to work the group bys partitioning should be on the column 1 only. It wont effect the correctness of group by, can make it slow but int this case will fasten the overall query performance. On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: I have attached the hive 10 and 11 query plans, for the sample query below, for illustration. On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, We are using DISTRIBUTE BY with custom reducer scripts in our query workload. After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY and custom reducer scripts produced incorrect results. Particularly, rows with same value on DISTRIBUTE BY column ends up in multiple reducers and thus produce multiple rows in final result, when we expect only one. I investigated a little bit and discovered the following behavior for Hive 0.11: - Hive 0.11 produces a different plan for these queries with incorrect results. The extra stage for the DISTRIBUTE BY + Transform is missing and the Transform operator for the custom reducer script is pushed into the reduce operator tree containing GROUP BY itself. - However, *if the SORT BY in the query has a DESC order in it*, the right plan is produced, and the results look correct too. Hive 0.10 produces the expected plan with right results in all cases. To illustrate, here is a simplified repro setup: Table: *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;* Query: *ADD FILE reducer.py;* *FROM(* * SELECT grp, val2 * * FROM test_cluster * * GROUP BY grp, val2 * * DISTRIBUTE BY grp * * SORT BY grp, val2 -- add DESC here to get correct results* *) **a* * * *REDUCE a.** *USING 'reducer.py'* *AS grp, reducedValue* If i understand correctly, this is a bug. Is this a known issue? Any other insights? We have reverted to Hive 0.10 to avoid the incorrect results while we investigate this. I have the repro sample, with test data and scripts, if anybody is interested. Thanks, pala
[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE
[ https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4375: -- Attachment: HIVE-4375.D10329.3.patch navis updated the revision HIVE-4375 [jira] Single sourced multi insert consists of native and non-native table mixed throws NPE. Missed to update this Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D10329 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D10329?vs=32775id=39027#toc BRANCH HIVE-4375 ARCANIST PROJECT hive AFFECTED FILES hbase-handler/src/test/queries/positive/hbase_single_sourced_multi_insert.q hbase-handler/src/test/results/positive/hbase_single_sourced_multi_insert.q.out ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java To: JIRA, ashutoshc, navis Cc: njain Single sourced multi insert consists of native and non-native table mixed throws NPE Key: HIVE-4375 URL: https://issues.apache.org/jira/browse/HIVE-4375 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch, HIVE-4375.D10329.3.patch CREATE TABLE src_x1(key string, value string); CREATE TABLE src_x2(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string); explain from src a insert overwrite table src_x1 select key,value where a.key 0 AND a.key 50 insert overwrite table src_x2 select key,value where a.key 50 AND a.key 100; throws, {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
How to validate data type in Hive
Hi, I have a requirement to validate data type of the values present in my flat file (which is source for my hive table). I am unable to find any hive feature/function which would do that. Is there any way to validate data type of the values present in the underlying file? Something like BCP (Bulk copy program), used in SQL. Please reply, my whole project is struck due to this issue. Thanks, Puneet From: Yin Huai [mailto:huaiyin@gmail.com] Sent: Monday, August 26, 2013 5:10 PM To: u...@hive.apache.org Cc: dev; Eric Chu Subject: Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases forgot to add in my last reply To generate correct results, you can set hive.optimize.reducededuplication to false to turn off ReduceSinkDeDuplication On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai huaiyin@gmail.commailto:huaiyin@gmail.com wrote: Created a jira https://issues.apache.org/jira/browse/HIVE-5149 On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.commailto:huaiyin@gmail.com wrote: Seems ReduceSinkDeDuplication picked the wrong partitioning columns. On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.commailto:s...@rocketfuel.com wrote: I think the problem lies with in the group by operation. For this optimization to work the group bys partitioning should be on the column 1 only. It wont effect the correctness of group by, can make it slow but int this case will fasten the overall query performance. On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia mchett...@rocketfuelinc.commailto:mchett...@rocketfuelinc.com wrote: I have attached the hive 10 and 11 query plans, for the sample query below, for illustration. On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia mchett...@rocketfuelinc.commailto:mchett...@rocketfuelinc.com wrote: Hi, We are using DISTRIBUTE BY with custom reducer scripts in our query workload. After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY and custom reducer scripts produced incorrect results. Particularly, rows with same value on DISTRIBUTE BY column ends up in multiple reducers and thus produce multiple rows in final result, when we expect only one. I investigated a little bit and discovered the following behavior for Hive 0.11: - Hive 0.11 produces a different plan for these queries with incorrect results. The extra stage for the DISTRIBUTE BY + Transform is missing and the Transform operator for the custom reducer script is pushed into the reduce operator tree containing GROUP BY itself. - However, if the SORT BY in the query has a DESC order in it, the right plan is produced, and the results look correct too. Hive 0.10 produces the expected plan with right results in all cases. To illustrate, here is a simplified repro setup: Table: CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; Query: ADD FILE reducer.py; FROM( SELECT grp, val2 FROM test_cluster GROUP BY grp, val2 DISTRIBUTE BY grp SORT BY grp, val2 -- add DESC here to get correct results ) a REDUCE a.* USING 'reducer.py' AS grp, reducedValue If i understand correctly, this is a bug. Is this a known issue? Any other insights? We have reverted to Hive 0.10 to avoid the incorrect results while we investigate this. I have the repro sample, with test data and scripts, if anybody is interested. Thanks, pala Any comments or statements made in this email are not necessarily those of Tavant Technologies. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you have received this in error, please contact the sender and delete the material from any computer. All e-mails sent from or to Tavant Technologies may be subject to our monitoring procedures.
[jira] [Updated] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
[ https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5100: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Gopal! RCFile::sync(long) missing 1 byte in System.arraycopy() - Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: tagus wang Assignee: Gopal V Labels: regression Fix For: 0.12.0 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
[ https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750120#comment-13750120 ] Hudson commented on HIVE-5100: -- FAILURE: Integrated in Hive-trunk-hadoop2 #382 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/382/]) HIVE-5100 : RCFile::sync(long) missing 1 byte in System.arraycopy() (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java RCFile::sync(long) missing 1 byte in System.arraycopy() - Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: tagus wang Assignee: Gopal V Labels: regression Fix For: 0.12.0 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5147) Newly added test TestSessionHooks is failing on trunk
[ https://issues.apache.org/jira/browse/HIVE-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750115#comment-13750115 ] Phabricator commented on HIVE-5147: --- ashutoshc has accepted the revision HIVE-5147 [jira] Newly added test TestSessionHooks is failing on trunk. +1 REVISION DETAIL https://reviews.facebook.net/D12543 BRANCH HIVE-5147 ARCANIST PROJECT hive To: JIRA, ashutoshc, navis Newly added test TestSessionHooks is failing on trunk - Key: HIVE-5147 URL: https://issues.apache.org/jira/browse/HIVE-5147 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.12.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5147.D12543.1.patch This was recently added via HIVE-4588 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-5144: -- Attachment: HIVE-5144.02.patch Bad merge in patch. {code} -if((hasFilter(alias) joinFilters[alias].size() 0) || joinValues[alias] +if((hasFilter(alias) filterMaps[alias].length 0) || joinValues[alias]. {code} The check is supposed to be on filterMaps not joinFilters. This fixes test-failures found in the last run. HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-5144: -- Status: Patch Available (was: Open) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5144) HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead
[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750182#comment-13750182 ] Ashutosh Chauhan commented on HIVE-5144: +1 HashTableSink allocates empty new Object[] arrays OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx512m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor Labels: perfomance Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows:140 Hashtable size: 139 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
[ https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750194#comment-13750194 ] Hudson commented on HIVE-5100: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #70 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/70/]) HIVE-5100 : RCFile::sync(long) missing 1 byte in System.arraycopy() (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java RCFile::sync(long) missing 1 byte in System.arraycopy() - Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: tagus wang Assignee: Gopal V Labels: regression Fix For: 0.12.0 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5132) Can't access to hwi due to No Java compiler available
[ https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-5132: -- Attachment: HIVE-5132-01.patch add ant.jar and ant-launcher.jar as the runtime dependencies of hive Can't access to hwi due to No Java compiler available --- Key: HIVE-5132 URL: https://issues.apache.org/jira/browse/HIVE-5132 Project: Hive Issue Type: Bug Affects Versions: 0.10.0, 0.11.0 Environment: JDK1.6, hadoop 2.0.4-alpha Reporter: Bing Li Assignee: Bing Li Priority: Critical Attachments: HIVE-5132-01.patch I want to use hwi to submit hive queries, but after start hwi successfully, I can't open the web page of it. I noticed that someone also met the same issue in hive-0.10. Reproduce steps: -- 1. start hwi bin/hive --config $HIVE_CONF_DIR --service hwi 2. access to http://hive_hwi_node:/hwi via browser got the following error message: HTTP ERROR 500 Problem accessing /hwi/. Reason: No Java compiler available Caused by: java.lang.IllegalStateException: No Java compiler available at org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5132) Can't access to hwi due to No Java compiler available
[ https://issues.apache.org/jira/browse/HIVE-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-5132: -- Status: Patch Available (was: Open) The patch is generated against the latest trunk. Can't access to hwi due to No Java compiler available --- Key: HIVE-5132 URL: https://issues.apache.org/jira/browse/HIVE-5132 Project: Hive Issue Type: Bug Affects Versions: 0.11.0, 0.10.0 Environment: JDK1.6, hadoop 2.0.4-alpha Reporter: Bing Li Assignee: Bing Li Priority: Critical Attachments: HIVE-5132-01.patch I want to use hwi to submit hive queries, but after start hwi successfully, I can't open the web page of it. I noticed that someone also met the same issue in hive-0.10. Reproduce steps: -- 1. start hwi bin/hive --config $HIVE_CONF_DIR --service hwi 2. access to http://hive_hwi_node:/hwi via browser got the following error message: HTTP ERROR 500 Problem accessing /hwi/. Reason: No Java compiler available Caused by: java.lang.IllegalStateException: No Java compiler available at org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
[ https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750235#comment-13750235 ] Hudson commented on HIVE-5100: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #138 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/138/]) HIVE-5100 : RCFile::sync(long) missing 1 byte in System.arraycopy() (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java RCFile::sync(long) missing 1 byte in System.arraycopy() - Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: tagus wang Assignee: Gopal V Labels: regression Fix For: 0.12.0 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4824) make TestWebHCatE2e run w/o requiring installing external hadoop
[ https://issues.apache.org/jira/browse/HIVE-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750226#comment-13750226 ] Eugene Koifman commented on HIVE-4824: -- Another possibility is to just call HCatCli directly from WebHCat - that would simplify the architecure and improve perf of DDL ops dramatically. One possible issue here is concurrency - hive code is not completely thread safe. We could use a new ClassLoader for each call to HCatCli - this would work around concurrency issues and will still be a good step forward. make TestWebHCatE2e run w/o requiring installing external hadoop Key: HIVE-4824 URL: https://issues.apache.org/jira/browse/HIVE-4824 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Currently WebHCat will use hive/build/dist/hcatalog/bin/hcat to execute DDL commands, which in turn uses Hadoop Jar command. This in turn requires that HADOOP_HOME env var be defined and point to an existing Hadoop install. Need to see we can apply hive/testutils/hadoop idea here to make WebHCat not depend on external hadoop. This will make Unit tests better/easier to write and make dev/test cycle simpler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5146) FilterExprOrExpr changes the order of the rows
[ https://issues.apache.org/jira/browse/HIVE-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750248#comment-13750248 ] Tony Murphy commented on HIVE-5146: --- +1 these changes look good. Thanks Jitendra. FilterExprOrExpr changes the order of the rows -- Key: HIVE-5146 URL: https://issues.apache.org/jira/browse/HIVE-5146 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5146.1.patch, HIVE-5146.2.patch FilterExprOrExpr changes the order of the rows which might break some UDFs that assume an order in data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750264#comment-13750264 ] Prasanth J commented on HIVE-5145: -- Removed the order by from the previous patch and regenerated the golden file. Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-5145: - Attachment: HIVE-5145.2.patch Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Proposing a 0.11.1
Hi folks, Any update on this? We are considering including Hive 0.11* in Bigtop 0.7 and it would be very useful and much appreciated to get a little more context into what the Hive 0.11.1 release would look like. Thanks in advance! Mark On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I am fealing more like we should release a 12.0 rather then backport things into 11.X. On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote: If this is only for addressing npath problem, we got three months for that. Would it be enough time for releasing 0.12.0? ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata. 2013/8/14 Edward Capriolo edlinuxg...@gmail.com: Should we get the npath rename in? Do we have a jira for this? If not I will take it. On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wagner.mar...@gmail.com wrote: It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been committed to trunk and it looks like 4789 is close. Thanks, Mark On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org wrote: All, I'd like to create an 0.11.1 with some fixes in it. I plan to put together a release candidate over the next week. I'm in the process of putting together the list of bugs that I want to include, but I wanted to solicit the jiras that others though would be important for an 0.11.1. Thanks, Owen
[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750302#comment-13750302 ] Ashutosh Chauhan commented on HIVE-5145: +1 Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750364#comment-13750364 ] Francis Liu commented on HIVE-4331: --- {quote} Francis, you're welcome to go ahead and review it as well. I have been looking at it as well, along with testing, although I waited on commenting till I'd finished with the hive-side review. I'll comment on both of them before Monday. {quote} Cool. It'd be good to have a reviewer that looks at both pieces. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750367#comment-13750367 ] Olga Natkovich commented on HIVE-4331: -- Hi Sushanth, Were you able to review this patch? Thanks! Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3603) Enable client-side caching for scans on HBase
[ https://issues.apache.org/jira/browse/HIVE-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750370#comment-13750370 ] Swarnim Kulkarni commented on HIVE-3603: [~appodictic] Thanks! Also how is setting this property different than directly setting the hbase.client.scanner.caching property in hive-site.xml without this enhancement? Wouldn't they have the same effect? Enable client-side caching for scans on HBase - Key: HIVE-3603 URL: https://issues.apache.org/jira/browse/HIVE-3603 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Karthik Ranganathan Assignee: Navis Priority: Minor Fix For: 0.12.0 Attachments: HIVE-3603.D7761.1.patch HBaseHandler sets up a TableInputFormat MR job against HBase to read data in. The underlying implementation (in HBaseHandler.java) makes an RPC call per row-key, which makes it very inefficient. Need to specify a client side cache size on the scan. Note that HBase currently only supports num-rows based caching (no way to specify a memory limit). Created HBASE-6770 to address this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750371#comment-13750371 ] Sushanth Sowmyan commented on HIVE-4331: I left some more comments on the hive patch phabricator review. Apart from a couple of minor code-style comments, the majority of my feeling about this is that I'm okay with what HivePTOF is doing, and it's a good first step, and I will +1 it, but I think that it does not go far enough along to address being able to use any generic MR OutputFormat with hive. That said, I agree that that would not really be possible unless we plumb out how HiveOutputFormat is used itself, and I'm not completely certain people have a need for that. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5150) UnsatisfiedLinkError when running hive unit tests on Windows
shanyu zhao created HIVE-5150: - Summary: UnsatisfiedLinkError when running hive unit tests on Windows Key: HIVE-5150 URL: https://issues.apache.org/jira/browse/HIVE-5150 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.11.0 Environment: Windows Reporter: shanyu zhao When running any hive unit tests against hadoop 2.0, it will fail with error like this: [junit] Exception in thread main java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z [junit] at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) [junit] at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423) [junit] at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:933) [junit] at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:177) [junit] at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:164) This is due to the test process failed to find hadoop.dll. This is related to YARN-729. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750398#comment-13750398 ] Phabricator commented on HIVE-3562: --- ashutoshc has commented on the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. Looks pretty good. Just requesting to add few more comments. INLINE COMMENTS conf/hive-default.xml.template:1586-1590 We can remove this now. ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1186 Just to add more clarity, say something like: we can push the limit above GBY (running in Reducer), since that will generate single row for each group. This doesn't necessarily hold for GBY (running in Mappers), so we don't push limit above it. ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:182 It will be good to add comment about what this field is holding. Add a comment saying: This two dimensional array holds key data and a corresponding Union object which contains the tag identifying the aggregate expression for distinct columns. Ideally, instead of this 2-D array, we should have probably enhanced HiveKey class for this logic. We should do that in a follow-up jira. ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:267 I didnt follow this logic completely. Seems like this is an optimization not to evaluate union object repeatedly and do system copy for it. Can you add a comment explaining this? ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:271 seems like it will be null only for all i = 0. If so, better do if (i==0) check ? Also add comment when this will be null and when it will be non-null? ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java:260 You made changes to this section, because you found bug or are you purely refactoring this? If you hit upon the bug, can you explain what was it? ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:50-51 It will be good to add comment about what these 2D arrays are holding? ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:52 Also, add comment saying this array holds hashcode for keys. Also, add note that indices of all these arrays must line up. ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:82 Nice Comments! ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:34 It will be good to add a javadoc for this class. ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java:36 Also javadoc for this interface. REVISION DETAIL https://reviews.facebook.net/D5967 To: JIRA, tarball, ashutoshc, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5150) UnsatisfiedLinkError when running hive unit tests on Windows
[ https://issues.apache.org/jira/browse/HIVE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shanyu zhao updated HIVE-5150: -- Attachment: HIVE-5150.patch Patch attached. UnsatisfiedLinkError when running hive unit tests on Windows Key: HIVE-5150 URL: https://issues.apache.org/jira/browse/HIVE-5150 Project: Hive Issue Type: Bug Components: Testing Infrastructure Affects Versions: 0.11.0 Environment: Windows Reporter: shanyu zhao Attachments: HIVE-5150.patch When running any hive unit tests against hadoop 2.0, it will fail with error like this: [junit] Exception in thread main java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z [junit] at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) [junit] at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423) [junit] at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:933) [junit] at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:177) [junit] at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:164) This is due to the test process failed to find hadoop.dll. This is related to YARN-729. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750396#comment-13750396 ] Sushanth Sowmyan commented on HIVE-4331: I would love to see SH become a first class entry in hive, and HOF be a kind of SH, leading to its eventual removal. That's precisely what my long-term goal for this is. and I'm not completely certain people have a need for that By this bit, I meant that I wasn't sure people had a need for doing away with HOF (other than for code-cleanliness, which is why I would like to see it gone) being able to use any generic OF with hive - most of the usecases of traditional M/R OFs are already covered by hive, or for newer formats being developed, the OF writer winds up making changes so that it is hive compatible, such as with orc, or with the HBase SH. So unless there were a major push to see a BlahOutputFormat that is widely used, but was not already usable from within Hive, I don't see there being a necessity case for a change in hive that I want. :) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases
Thanks for following up Yin. We realized later this was due to the reduce deduplication optimization, and found turning off the flag avoids the issue. -pala On Mon, Aug 26, 2013 at 4:40 AM, Yin Huai huaiyin@gmail.com wrote: forgot to add in my last reply To generate correct results, you can set hive.optimize.reducededuplication to false to turn off ReduceSinkDeDuplication On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai huaiyin@gmail.com wrote: Created a jira https://issues.apache.org/jira/browse/HIVE-5149 On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai huaiyin@gmail.com wrote: Seems ReduceSinkDeDuplication picked the wrong partitioning columns. On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP s...@rocketfuel.com wrote: I think the problem lies with in the group by operation. For this optimization to work the group bys partitioning should be on the column 1 only. It wont effect the correctness of group by, can make it slow but int this case will fasten the overall query performance. On Fri, Aug 23, 2013 at 5:55 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: I have attached the hive 10 and 11 query plans, for the sample query below, for illustration. On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, We are using DISTRIBUTE BY with custom reducer scripts in our query workload. After upgrade to Hive 0.11, queries with GROUP BY/DISTRIBUTE BY/SORT BY and custom reducer scripts produced incorrect results. Particularly, rows with same value on DISTRIBUTE BY column ends up in multiple reducers and thus produce multiple rows in final result, when we expect only one. I investigated a little bit and discovered the following behavior for Hive 0.11: - Hive 0.11 produces a different plan for these queries with incorrect results. The extra stage for the DISTRIBUTE BY + Transform is missing and the Transform operator for the custom reducer script is pushed into the reduce operator tree containing GROUP BY itself. - However, *if the SORT BY in the query has a DESC order in it*, the right plan is produced, and the results look correct too. Hive 0.10 produces the expected plan with right results in all cases. To illustrate, here is a simplified repro setup: Table: *CREATE TABLE test_cluster (grp STRING, val1 STRING, val2 INT, val3 STRING, val4 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;* Query: *ADD FILE reducer.py;* *FROM(* * SELECT grp, val2 * * FROM test_cluster * * GROUP BY grp, val2 * * DISTRIBUTE BY grp * * SORT BY grp, val2 -- add DESC here to get correct results* *) **a* * * *REDUCE a.** *USING 'reducer.py'* *AS grp, reducedValue* If i understand correctly, this is a bug. Is this a known issue? Any other insights? We have reverted to Hive 0.10 to avoid the incorrect results while we investigate this. I have the repro sample, with test data and scripts, if anybody is interested. Thanks, pala
Re: Proposing a 0.11.1
Hi Mark, I haven't made any progress on it yet. I hope to make progress on it this week. I will certainly include the npath changes. On a separate thread, I'll start a discussion about starting to lock down 0.12.0. -- Owen On Mon, Aug 26, 2013 at 10:20 AM, Mark Grover m...@apache.org wrote: Hi folks, Any update on this? We are considering including Hive 0.11* in Bigtop 0.7 and it would be very useful and much appreciated to get a little more context into what the Hive 0.11.1 release would look like. Thanks in advance! Mark On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am fealing more like we should release a 12.0 rather then backport things into 11.X. On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote: If this is only for addressing npath problem, we got three months for that. Would it be enough time for releasing 0.12.0? ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata. 2013/8/14 Edward Capriolo edlinuxg...@gmail.com: Should we get the npath rename in? Do we have a jira for this? If not I will take it. On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wagner.mar...@gmail.com wrote: It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been committed to trunk and it looks like 4789 is close. Thanks, Mark On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org wrote: All, I'd like to create an 0.11.1 with some fixes in it. I plan to put together a release candidate over the next week. I'm in the process of putting together the list of bugs that I want to include, but I wanted to solicit the jiras that others though would be important for an 0.11.1. Thanks, Owen
[jira] [Commented] (HIVE-5100) RCFile::sync(long) missing 1 byte in System.arraycopy()
[ https://issues.apache.org/jira/browse/HIVE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750481#comment-13750481 ] Hudson commented on HIVE-5100: -- FAILURE: Integrated in Hive-trunk-h0.21 #2290 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2290/]) HIVE-5100 : RCFile::sync(long) missing 1 byte in System.arraycopy() (Gopal V via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517547) * /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java * /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java RCFile::sync(long) missing 1 byte in System.arraycopy() - Key: HIVE-5100 URL: https://issues.apache.org/jira/browse/HIVE-5100 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: tagus wang Assignee: Gopal V Labels: regression Fix For: 0.12.0 Attachments: HIVE-5100-001.patch, HIVE-5100.01.patch this has a bug in this: System.arraycopy(buffer, buffer.length - prefix - 1, buffer, 0, prefix); it should be System.arraycopy(buffer, buffer.length - prefix, buffer, 0, prefix); it is missing 1 byte at the end. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5145: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Prasanth! Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750501#comment-13750501 ] Sushanth Sowmyan commented on HIVE-4331: Oh, and one more thing: {quote} With this StorageHandlers can use generic OFs. {quote} This assertion is incorrect. A more precise assertion would be that with this patch, Hive can use generic OFs that do not do anything useful or necessary in their outputcommitters(i.e. do not need calls on them to be made). If an OF is designed with an outputcommitter in mind, chances are that it will need some retrofitting before it will work from within hive. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5151) Going green: Container re-cycling in Tez
Gunther Hagleitner created HIVE-5151: Summary: Going green: Container re-cycling in Tez Key: HIVE-5151 URL: https://issues.apache.org/jira/browse/HIVE-5151 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Tez reuses containers to schedule tasks from same and different vertices in the same JVM. It also offers an API to reuse objects across vertices, dags and sessions. For hive we should reuse the operator plan as well as any hash tables (map join). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5151) Going green: Container re-cycling in Tez
[ https://issues.apache.org/jira/browse/HIVE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5151: - Description: Tez reuses containers to schedule tasks from same and different vertices in the same JVM. It also offers an API to reuse objects across vertices, dags and sessions. For hive we should reuse the operator plan as well as any hash tables (map join). NO PRECOMMIT TESTS (this is wip for the tez branch) was: Tez reuses containers to schedule tasks from same and different vertices in the same JVM. It also offers an API to reuse objects across vertices, dags and sessions. For hive we should reuse the operator plan as well as any hash tables (map join). Going green: Container re-cycling in Tez Key: HIVE-5151 URL: https://issues.apache.org/jira/browse/HIVE-5151 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Tez reuses containers to schedule tasks from same and different vertices in the same JVM. It also offers an API to reuse objects across vertices, dags and sessions. For hive we should reuse the operator plan as well as any hash tables (map join). NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750510#comment-13750510 ] Sushanth Sowmyan commented on HIVE-4331: As it currently stands, SHes *need* , as a prerequisite, to be able to work at a partition level, and have outputcommitter concepts before it can be used more widely. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5148) Jam sessions w/ Tez
[ https://issues.apache.org/jira/browse/HIVE-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5148: - Resolution: Fixed Status: Resolved (was: Patch Available) committed to branch Jam sessions w/ Tez --- Key: HIVE-5148 URL: https://issues.apache.org/jira/browse/HIVE-5148 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5148.1.patch Tez introduced a session api that let's you reuse certain resources during a session (AM, localized files, etc). Hive needs to tie these into hive sessions (for both CLI and HS2) NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750508#comment-13750508 ] Francis Liu commented on HIVE-4331: --- {quote} most of the usecases of traditional M/R OFs are already covered by hive, or for newer formats being developed, the OF writer winds up making changes so that it is hive compatible, such as with orc, or with the HBase SH {quote} Yes but ideally they don't really need HOF to do that. {quote} So unless there were a major push to see a BlahOutputFormat that is widely used, but was not already usable from within Hive, I don't see there being a necessity case for a change in hive that I want. {quote} Yep, which is why we want to do it incrementally. Letting it leak into SH and hcat code would make the idea of cleaning things up less appealing. I think if we just started using SH for new OFs and not use HOF, these pieces would naturally go into this state. Having said that it'd be nice if Orc could be moved to using storage handlers. It would also help SH mature. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750523#comment-13750523 ] Ashutosh Chauhan commented on HIVE-4331: [~toffer] One more thing popped up in my discussion with Sushanth. Another issue which is in this general area of Hive usage of OF is w.r.t OutputCommitter. Hive currently explicitly disables OC and performs lot of logic (which folks usually write in OC) from client side. Architecturally, its cleaner for Hive to migrate these things from client to OC and make OC first class citizen. For folks who has OF which has OC, it will be easier to integrate that in Hive, instead of understanding Hive innards and handling of OC. Wondering if you have given a thought on this? I just want to make sure if and when we go that route these current changes won't get in the way. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750526#comment-13750526 ] Francis Liu commented on HIVE-4331: --- Sigh there's the rub. Are there Jira's for this? Would be great to keep track of this in case someone would like to do it. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5151) Going green: Container re-cycling in Tez
[ https://issues.apache.org/jira/browse/HIVE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5151: - Attachment: HIVE-5151.1.patch Going green: Container re-cycling in Tez Key: HIVE-5151 URL: https://issues.apache.org/jira/browse/HIVE-5151 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5151.1.patch Tez reuses containers to schedule tasks from same and different vertices in the same JVM. It also offers an API to reuse objects across vertices, dags and sessions. For hive we should reuse the operator plan as well as any hash tables (map join). NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.
Jitendra Nath Pandey created HIVE-5152: -- Summary: Vector operators should inherit from non-vector operators for code re-use. Key: HIVE-5152 URL: https://issues.apache.org/jira/browse/HIVE-5152 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey In many cases vectorized operators could share code from non-vector operators by inheriting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.
[ https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5152: --- Attachment: HIVE-5152.1.patch Vector operators should inherit from non-vector operators for code re-use. -- Key: HIVE-5152 URL: https://issues.apache.org/jira/browse/HIVE-5152 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5152.1.patch In many cases vectorized operators could share code from non-vector operators by inheriting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750545#comment-13750545 ] Francis Liu commented on HIVE-4331: --- {quote} For folks who has OF which has OC, it will be easier to integrate that in Hive, instead of understanding Hive innards and handling of OC. Wondering if you have given a thought on this? I just want to make sure if and when we go that route these current changes won't get in the way. {quote} For HCat we already do it this way. It's not really just the OC but the OF,OC,RR in general. HOF essentially is doing the Hive specific stuff that the plain OC, RR, etc can do as well. So I don't think we changed the complexity of the work needed to support new formats? Is that what you meant by get in the way? In the long run it'd be better since HCat and Hive treat OFs the same way. Though it'd be great to document what that contract (beyond the typical OF) is. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5152) Vector operators should inherit from non-vector operators for code re-use.
[ https://issues.apache.org/jira/browse/HIVE-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5152: --- Status: Patch Available (was: Open) Vector operators should inherit from non-vector operators for code re-use. -- Key: HIVE-5152 URL: https://issues.apache.org/jira/browse/HIVE-5152 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-5152.1.patch In many cases vectorized operators could share code from non-vector operators by inheriting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5149) ReduceSinkDeDuplication can pick the wrong partitioning columns
[ https://issues.apache.org/jira/browse/HIVE-5149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750551#comment-13750551 ] Pala M Muthaia commented on HIVE-5149: -- Currently, there is a workaround to avoid this bug, by turning off all reduce deduplication optimization: hive.optimize.reducededuplication = false. However, that will affect other valid deduplications also, so the user has to be educated enough to turn it off selectively, or we turn it off globally in hive-site.xml, but give up performance in other cases. So, using this config is only a short term workaround. ReduceSinkDeDuplication can pick the wrong partitioning columns --- Key: HIVE-5149 URL: https://issues.apache.org/jira/browse/HIVE-5149 Project: Hive Issue Type: Bug Reporter: Yin Huai Assignee: Yin Huai https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4qgc+cpar5yvr8sjt...@mail.gmail.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: LIKE filter pushdown for tables and partitions
Since there's no response I am assuming nobody cares about this code... Jira is HIVE-5134, I will attach a patch with removal this week. On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin ser...@hortonworks.comwrote: Hi. I think there are issues with the way hive can currently do LIKE operator JDO pushdown and it the code should be removed for partitions and tables. Are there objections to removing LIKE from Filter.g and related areas? If no I will file a JIRA and do it. Details: There's code in metastore that is capable of pushing down LIKE expression into JDO for string partition keys, as well as tables. The code for tables doesn't appear used, and partition code definitely doesn't run in Hive proper because metastore client doesn't send LIKE expressions to server. It may be used in e.g. HCat and other places, but after asking some people here, I found out it probably isn't. I was trying to make it run and noticed some problems: 1) For partitions, Hive sends SQL patterns in a filter for like, e.g. %foo%, whereas metastore passes them into matches() JDOQL method which expects Java regex. 2) Converting the pattern to Java regex via UDFLike method, I found out that not all regexes appear to work in DN. .*foo seems to work but anything complex (such as escaping the pattern using Pattern.quote, which UDFLike does) breaks and no longer matches properly. 3) I tried to implement common cases using JDO methods startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests on Derby, they also appear to have problems with some strings (for example, partition with backslash in the name cannot be matched by LIKE %\% (single backslash in a string), after being converted to .indexOf(param) where param is \ (escaping the backslash once again doesn't work either, and anyway there's no documented reason why it shouldn't work properly), while other characters match correctly, even e.g. %. For tables, there's no SQL-like, it expects Java regex, but I am not convinced all Java regexes are going to work. So, I think that for future correctness sake it's better to remove this code. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750556#comment-13750556 ] Sergey Shelukhin commented on HIVE-5107: Would this be a good time to change module structure? I can do a followup patch after this is done. It would be nice to separate metastore client from server, both for potential external usage, and for internal features where metastore server wants to involve QL bits. Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[ https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750570#comment-13750570 ] Ashutosh Chauhan commented on HIVE-4750: Current patch passes on MacOS but fails on ubuntu. I think it is caused by order in which sub-dirs of a dir are returned by OS, which could vary. I guess we need to add order by on partition column for this to work consistently on all OS. Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23 --- Key: HIVE-4750 URL: https://issues.apache.org/jira/browse/HIVE-4750 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-4750.patch Removing 6,7,8 from the scope of HIVE-4746. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750579#comment-13750579 ] Ashutosh Chauhan commented on HIVE-4331: No HCat and Hive dont treat OFs in same way. This difference of OF handling is a reason why HCatOF couldn't be used from Hive, another being HCat uses mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that will be a win, but thats yet another topic. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5029) direct SQL perf optimization cannot be tested well
[ https://issues.apache.org/jira/browse/HIVE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5029: --- Status: Open (was: Patch Available) direct SQL perf optimization cannot be tested well -- Key: HIVE-5029 URL: https://issues.apache.org/jira/browse/HIVE-5029 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Critical Attachments: HIVE-5029.D12483.1.patch, HIVE-5029.patch, HIVE-5029.patch HIVE-4051 introduced perf optimization that involves getting partitions directly via SQL in metastore. Given that SQL queries might not work on all datastores (and will not work on non-SQL ones), JDO fallback is in place. Given that perf improvement is very large for short queries, it's on by default. However, there's a problem with tests with regard to that. If SQL code is broken, tests may fall back to JDO and pass. If JDO code is broken, SQL might allow tests to pass. We are going to disable SQL by default before the testing problem is resolved. There are several possible solultions: 1) Separate build for this setting. Seems like an overkill... 2) Enable by default; disable by default in tests, create a clone of TestCliDriver with a subset of queries that will exercise the SQL path. 3) Have some sort of test hook inside metastore that will run both ORM and SQL and compare. 3') Or make a subclass of ObjectStore that will do that. ObjectStore is already pluggable. 4) Write unit tests for one of the modes (JDO, as non-default?) and declare that they are sufficient; disable fallback in tests. 3' seems like the easiest. For now we will disable SQL by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[ https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-4750: - Attachment: HIVE-4750.2.patch Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23 --- Key: HIVE-4750 URL: https://issues.apache.org/jira/browse/HIVE-4750 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-4750.2.patch, HIVE-4750.patch Removing 6,7,8 from the scope of HIVE-4746. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5153) current database in hive prompt in cli remote mode is incorrect
Thejas M Nair created HIVE-5153: --- Summary: current database in hive prompt in cli remote mode is incorrect Key: HIVE-5153 URL: https://issues.apache.org/jira/browse/HIVE-5153 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0 Reporter: Thejas M Nair HIVE-2233 added a feature to show current database on hive cli prompt. The current implementation will work only with local mode. It will not work if you try connecting to hive server (v1) from hive cli. This is because it relies on the Hive object in current thread to have the right current database information. But when remote mode is used, the client side Hive object does not get updated when database is changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5153) current database in hive prompt in cli remote mode is incorrect
[ https://issues.apache.org/jira/browse/HIVE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750596#comment-13750596 ] Thejas M Nair commented on HIVE-5153: - Note that this is not an issue with beeline+hive-server2 which in my opinion is the recommended client and server to be used. current database in hive prompt in cli remote mode is incorrect --- Key: HIVE-5153 URL: https://issues.apache.org/jira/browse/HIVE-5153 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0 Reporter: Thejas M Nair HIVE-2233 added a feature to show current database on hive cli prompt. The current implementation will work only with local mode. It will not work if you try connecting to hive server (v1) from hive cli. This is because it relies on the Hive object in current thread to have the right current database information. But when remote mode is used, the client side Hive object does not get updated when database is changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4750) Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23
[ https://issues.apache.org/jira/browse/HIVE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750594#comment-13750594 ] Prasanth J commented on HIVE-4750: -- Added order by to make the test cases works consistently across OSes. I tested it on Mac and CentOS 6. Can you please test it on Ubuntu before committing it (I don't have an ubuntu installation)? Fix TestCliDriver.list_bucket_dml_{6,7,8}.q on 0.23 --- Key: HIVE-4750 URL: https://issues.apache.org/jira/browse/HIVE-4750 Project: Hive Issue Type: Sub-task Affects Versions: 0.12.0 Reporter: Brock Noland Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-4750.2.patch, HIVE-4750.patch Removing 6,7,8 from the scope of HIVE-4746. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750598#comment-13750598 ] Francis Liu commented on HIVE-4331: --- {quote} No HCat and Hive dont treat OFs in same way. This difference of OF handling is a reason why HCatOF couldn't be used from Hive, another being HCat uses mapreduce api while Hive uses mapred api. If we can make Hive use HCatOF that will be a win, but thats yet another topic. {quote} Currently they don't mainly because of HOF but they behave in almost the same way else this whole interoperability story is broken. With this patch they'll at least be closer when it comes to dealing with OFs that don't use HOF. Instead of having to mirror that behavior. Actually AFAIK only the HCatOF wrapper classes uses mapreduce and the underlying stuff deals with mapred which we did as part of the StorageDriver-SerDe migration. So it'd be relatively easy to support a mapred version of HCatOF. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750599#comment-13750599 ] Prasanth J commented on HIVE-5145: -- I think this patch should also fail in Ubuntu (similar to HIVE-4750) as the results are dependent on the order of subdirectories under a partition. Can you please revert the 2nd patch and apply the 1st patch since first patch has order by clause which makes the result consistent across OSes. Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5018) Avoiding object instantiation in loops (issue 6)
[ https://issues.apache.org/jira/browse/HIVE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750601#comment-13750601 ] Benjamin Jakobus commented on HIVE-5018: Bump :) Avoiding object instantiation in loops (issue 6) Key: HIVE-5018 URL: https://issues.apache.org/jira/browse/HIVE-5018 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Priority: Minor Fix For: 0.12.0 Attachments: HIVE-5018.1.patch.txt Object instantiation inside loops is very expensive. Where possible, object references should be created outside the loop so that they can be reused. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: LIKE filter pushdown for tables and partitions
Couple of questions: 1. What about LIKE operator for Hive itself? Will that continue to work (presumably because there is an alternative path for that). 2. This will nonetheless break other direct consumers of metastore client api (like HCatalog). I see your point that we have a buggy implementation, so whats out there is not safe to use. Question than really is shall we remove this code, thereby breaking people for whom current buggy implementation is good enough (or you can say salvage them from breaking in future). Or shall we try to fix it now? My take is if there are no users of this anyways, then there is no point fixing it for non-existing users, but if there are we probably have to. I will suggest you to send an email to users@hive to ask if there are users for this. Thanks, Ashutosh On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin ser...@hortonworks.comwrote: Since there's no response I am assuming nobody cares about this code... Jira is HIVE-5134, I will attach a patch with removal this week. On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. I think there are issues with the way hive can currently do LIKE operator JDO pushdown and it the code should be removed for partitions and tables. Are there objections to removing LIKE from Filter.g and related areas? If no I will file a JIRA and do it. Details: There's code in metastore that is capable of pushing down LIKE expression into JDO for string partition keys, as well as tables. The code for tables doesn't appear used, and partition code definitely doesn't run in Hive proper because metastore client doesn't send LIKE expressions to server. It may be used in e.g. HCat and other places, but after asking some people here, I found out it probably isn't. I was trying to make it run and noticed some problems: 1) For partitions, Hive sends SQL patterns in a filter for like, e.g. %foo%, whereas metastore passes them into matches() JDOQL method which expects Java regex. 2) Converting the pattern to Java regex via UDFLike method, I found out that not all regexes appear to work in DN. .*foo seems to work but anything complex (such as escaping the pattern using Pattern.quote, which UDFLike does) breaks and no longer matches properly. 3) I tried to implement common cases using JDO methods startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests on Derby, they also appear to have problems with some strings (for example, partition with backslash in the name cannot be matched by LIKE %\% (single backslash in a string), after being converted to .indexOf(param) where param is \ (escaping the backslash once again doesn't work either, and anyway there's no documented reason why it shouldn't work properly), while other characters match correctly, even e.g. %. For tables, there's no SQL-like, it expects Java regex, but I am not convinced all Java regexes are going to work. So, I think that for future correctness sake it's better to remove this code. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Proposing a 0.11.1
Hi, We recently tried to upgrade to Hive 0.11 from 0.10 and noticed we needed to add patches for the following JIRAs to make Hive 0.11 work: - HIVE-4619. None of our MR queries worked without it - HIVE-4003. We ran into this bug in 0.10 also - Data Nucleus-related: At first we were only trying to get rid of the Data Nucleus error messages, but one Jira led to another, and we ended up upgrading Data Nucleus with the following: - HIVE-4900, HIVE-3632, HIVE-4942 (and then do ant very-clean to clean up old version of Data Nucleus) - HIVE-4647 On a separate note: - HIVE-5149. This bug made us revert to Hive-0.10 b/c at the time we didn't know what's wrong. See email titled DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases, which led to the creation of HIVE-5149. The proposed workaround works but as we noted there we still need the right fix. The property hive.optimize.reducededuplication already existed in Hive 0.10 so this is a regression. Thanks, Eric On Mon, Aug 26, 2013 at 12:56 PM, Owen O'Malley omal...@apache.org wrote: Hi Mark, I haven't made any progress on it yet. I hope to make progress on it this week. I will certainly include the npath changes. On a separate thread, I'll start a discussion about starting to lock down 0.12.0. -- Owen On Mon, Aug 26, 2013 at 10:20 AM, Mark Grover m...@apache.org wrote: Hi folks, Any update on this? We are considering including Hive 0.11* in Bigtop 0.7 and it would be very useful and much appreciated to get a little more context into what the Hive 0.11.1 release would look like. Thanks in advance! Mark On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am fealing more like we should release a 12.0 rather then backport things into 11.X. On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote: If this is only for addressing npath problem, we got three months for that. Would it be enough time for releasing 0.12.0? ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata. 2013/8/14 Edward Capriolo edlinuxg...@gmail.com: Should we get the npath rename in? Do we have a jira for this? If not I will take it. On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wagner.mar...@gmail.com wrote: It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been committed to trunk and it looks like 4789 is close. Thanks, Mark On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org wrote: All, I'd like to create an 0.11.1 with some fixes in it. I plan to put together a release candidate over the next week. I'm in the process of putting together the list of bugs that I want to include, but I wanted to solicit the jiras that others though would be important for an 0.11.1. Thanks, Owen
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750620#comment-13750620 ] Sushanth Sowmyan commented on HIVE-4331: [~ashutoshc] : The mapreduce-mapred transition is not a tough one by itself(apart from losing abort/commit semantics, which need to be emulated). The Outputcommitter concept is quite a bit more involved though. For now, I'm still +1 on this patch(the hive side) because it moves us one step closer to that goal. I don't think we should require the OC concept being brought in as well for it to be committed, I brought that up because I wanted to highlight that we aren't done with this patch, and more work is required past it, and this patch alone does not make generic OFs workable from within Hive. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750638#comment-13750638 ] Phabricator commented on HIVE-4002: --- yhuai has commented on the revision HIVE-4002 [jira] Fetch task aggregation for simple group by query. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3631 Seems that this line is the same as the line 3633 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 Why do we need to change getInternalName to field? If we want to use field instead of getInternalName, can you also make this to other places of this class? ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 why do we need flushOp? I think it is not necessary to have flushOp. Also, can you change an blocking operator to a blocking operator? I am sorry about the typo I made... ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 I think we can just use operator.flush() to tell GBY to process its buffer. REVISION DETAIL https://reviews.facebook.net/D8739 To: JIRA, navis Cc: yhuai Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, HIVE-4002.D8739.3.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: No java compiler available exception for HWI
Hi Bing, Sorry for the late reply. I was in another Hive hell myself last week. We did get HWI to work but unfortunately it's so long ago that I don't remember exactly what we did for this problem. I think we had that problem only in manual set up and testing, and the problem went away when we switched to production script that deploys Hive. I do remember we needed to add core-3.1.1.jar ourself to make HWI work b/c Hive 10 didn't include that for some reason. We also needed to set HADOOP_HOME in bin/hive-wrapper.sh. We have since deprecated HWI and only use HUE now. Eric On Wed, Aug 21, 2013 at 8:02 AM, Bing Li sarah.lib...@gmail.com wrote: Hi, Edward I filed it as HIVE-5132, did you mean this one? 2013/8/21 Edward Capriolo edlinuxg...@gmail.com We rally should pre compile the jsp. There ia a jira on this somewhere. On Tuesday, August 20, 2013, Bing Li sarah.lib...@gmail.com wrote: Hi, Eric et al Did you resolve this failure? I'm using Hive-0.11.0, and get the same error when access to HWI via browser. I already set the following properties in hive-site.xml - hive.hwi.listen.host - hive.hwi.listen.port - hive.hwi.war.file And copied two jasper jars into hive/lib: - jasper-compiler-5.5.23.jar - jasper-runtime-5.5.23.jar Thanks, - Bing 2013/8/20 Bing Li sarah.lib...@gmail.com Hi, Eric et al Did you resolve this failure? I'm using Hive-0.11.0, and get the same error when access to HWI via browser. I already set the following properties in hive-site.xml - hive.hwi.listen.host - hive.hwi.listen.port - hive.hwi.war.file And copied two jasper jars into hive/lib: - jasper-compiler-5.5.23.jar - jasper-runtime-5.5.23.jar Thanks, - Bing 2013/3/30 Eric Chu e...@rocketfuel.com Hi, I'm running Hive 0.10 and I want to support HWI (besides CLI and HUE). When I started HWI I didn't get any error. However, when I went to Hive Server Address:/hwi on my browser I saw the error below complaining about No Java compiler available. My JAVA_HOME is set to /usr/lib/jvm/java-1.6.0-sun-1.6.0.16. Besides https://cwiki.apache.org/Hive/hivewebinterface.html, there's not much documentation on HWI. I'm wondering if anyone else has seen this or has any idea about what's wrong? Thanks. Eric Problem accessing /hwi/. Reason: No Java compiler available Caused by: java.lang.IllegalStateException: No Java compiler available at org.apache.jasper.JspCompilationContext.createCompiler(JspCompilationContext.java:225) at org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:560) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:299) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:503) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.handler.RequestLogHandler.handle(RequestLogHandler.java:49) at org.mortbay.jetty.handler.
[jira] [Updated] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5128: -- Attachment: HIVE-5128.D12465.2.patch sershe updated the revision HIVE-5128 [jira] Direct SQL for view is failing. Updated the patch. I tested the example, seems to work now on my setup. Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D12465 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D12465?vs=38727id=39039#toc AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java To: JIRA, sershe Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750734#comment-13750734 ] Hudson commented on HIVE-5145: -- FAILURE: Integrated in Hive-trunk-hadoop1-ptest #139 (See [https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/139/]) HIVE-5145 : Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517682) * /hive/trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-5151) Going green: Container re-cycling in Tez
[ https://issues.apache.org/jira/browse/HIVE-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-5151. -- Resolution: Fixed Going green: Container re-cycling in Tez Key: HIVE-5151 URL: https://issues.apache.org/jira/browse/HIVE-5151 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-5151.1.patch Tez reuses containers to schedule tasks from same and different vertices in the same JVM. It also offers an API to reuse objects across vertices, dags and sessions. For hive we should reuse the operator plan as well as any hash tables (map join). NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4895: - Attachment: HIVE-4895.update.patch HIVE-4895.move.patch HIVE-4895.patch HIVE-4895.patch cummulative patch HIVE-4895.move.patch only the 'git mv' part HIVE-4895.update.patch - code changes Move all HCatalog classes to org.apache.hive.hcatalog - Key: HIVE-4895 URL: https://issues.apache.org/jira/browse/HIVE-4895 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, HIVE-4895.update.patch Original Estimate: 24h Remaining Estimate: 24h make sure to preserve history in SCM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750809#comment-13750809 ] Eugene Koifman commented on HIVE-4895: -- HCat unit tests and webhcat e2e tests passed Move all HCatalog classes to org.apache.hive.hcatalog - Key: HIVE-4895 URL: https://issues.apache.org/jira/browse/HIVE-4895 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, HIVE-4895.update.patch Original Estimate: 24h Remaining Estimate: 24h make sure to preserve history in SCM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-4895: - Status: Patch Available (was: Open) Move all HCatalog classes to org.apache.hive.hcatalog - Key: HIVE-4895 URL: https://issues.apache.org/jira/browse/HIVE-4895 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, HIVE-4895.update.patch Original Estimate: 24h Remaining Estimate: 24h make sure to preserve history in SCM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: char_progress.patch.7.txt Apparently the patches I've been attaching (which I downloaded from Phabricator) are not applying correctly. Attaching char_progress.patch.7.txt which should be the same progress as char_progress.patch.6.txt, but with the patch generated from my git repository. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: char_progress.patch.7.txt, HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750811#comment-13750811 ] Eugene Koifman commented on HIVE-4895: -- groupid in pom.xml was also changed from org.apache.hcatalog to org.apache.hive.hcatalog Move all HCatalog classes to org.apache.hive.hcatalog - Key: HIVE-4895 URL: https://issues.apache.org/jira/browse/HIVE-4895 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, HIVE-4895.update.patch Original Estimate: 24h Remaining Estimate: 24h make sure to preserve history in SCM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work logged] (HIVE-4895) Move all HCatalog classes to org.apache.hive.hcatalog
[ https://issues.apache.org/jira/browse/HIVE-4895?focusedWorklogId=14841page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-14841 ] Eugene Koifman logged work on HIVE-4895: Author: Eugene Koifman Created on: 27/Aug/13 01:30 Start Date: 27/Aug/13 01:29 Worklog Time Spent: 12h Issue Time Tracking --- Worklog Id: (was: 14841) Time Spent: 12h Remaining Estimate: 12h (was: 24h) Move all HCatalog classes to org.apache.hive.hcatalog - Key: HIVE-4895 URL: https://issues.apache.org/jira/browse/HIVE-4895 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: HIVE-4895.move.patch, HIVE-4895.patch, HIVE-4895.update.patch Original Estimate: 24h Time Spent: 12h Remaining Estimate: 12h make sure to preserve history in SCM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: HIVE-4844.7.patch attached wrong patch .. replacing char_progress.7.patch with HIVE-4844.7.patch. Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4844) Add char/varchar data types
[ https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4844: - Attachment: (was: char_progress.patch.7.txt) Add char/varchar data types --- Key: HIVE-4844 URL: https://issues.apache.org/jira/browse/HIVE-4844 Project: Hive Issue Type: New Feature Components: Types Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4844.1.patch.hack, HIVE-4844.2.patch, HIVE-4844.3.patch, HIVE-4844.4.patch, HIVE-4844.5.patch, HIVE-4844.6.patch, HIVE-4844.7.patch, screenshot.png Add new char/varchar data types which have support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Proposing a 0.11.1
Hi Owen, Sounds good. Thanks for the update! Mark On Mon, Aug 26, 2013 at 12:56 PM, Owen O'Malley omal...@apache.org wrote: Hi Mark, I haven't made any progress on it yet. I hope to make progress on it this week. I will certainly include the npath changes. On a separate thread, I'll start a discussion about starting to lock down 0.12.0. -- Owen On Mon, Aug 26, 2013 at 10:20 AM, Mark Grover m...@apache.org wrote: Hi folks, Any update on this? We are considering including Hive 0.11* in Bigtop 0.7 and it would be very useful and much appreciated to get a little more context into what the Hive 0.11.1 release would look like. Thanks in advance! Mark On Tue, Aug 13, 2013 at 9:24 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I am fealing more like we should release a 12.0 rather then backport things into 11.X. On Wed, Aug 14, 2013 at 12:08 AM, Navis류승우 navis@nexr.com wrote: If this is only for addressing npath problem, we got three months for that. Would it be enough time for releasing 0.12.0? ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata. 2013/8/14 Edward Capriolo edlinuxg...@gmail.com: Should we get the npath rename in? Do we have a jira for this? If not I will take it. On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wagner.mar...@gmail.com wrote: It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been committed to trunk and it looks like 4789 is close. Thanks, Mark On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org wrote: All, I'd like to create an 0.11.1 with some fixes in it. I plan to put together a release candidate over the next week. I'm in the process of putting together the list of bugs that I want to include, but I wanted to solicit the jiras that others though would be important for an 0.11.1. Thanks, Owen
[jira] [Commented] (HIVE-5107) Change hive's build to maven
[ https://issues.apache.org/jira/browse/HIVE-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750837#comment-13750837 ] Edward Capriolo commented on HIVE-5107: --- Right now we are not making any branches/patches yet. Our plan is to hack at github and then once we get everything working like we like open a hive branch and do it all again. Breaking up meta-store sounds ok. Change hive's build to maven Key: HIVE-5107 URL: https://issues.apache.org/jira/browse/HIVE-5107 Project: Hive Issue Type: Task Reporter: Edward Capriolo Assignee: Edward Capriolo I can not cope with hive's build infrastructure any more. I have started working on porting the project to maven. When I have some solid progess i will github the entire thing for review. Then we can talk about switching the project somehow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-3562: -- Attachment: HIVE-3562.D5967.9.patch navis updated the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. Addressed comments Reviewers: ashutoshc, JIRA, tarball REVISION DETAIL https://reviews.facebook.net/D5967 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D5967?vs=39015id=39051#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java conf/hive-default.xml.template ql/build.xml ql/ivy.xml ql/src/java/org/apache/hadoop/hive/ql/exec/ExtractOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ForwardOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/SelectOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/TopNHash.java ql/src/java/org/apache/hadoop/hive/ql/io/HiveKey.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java ql/src/test/queries/clientpositive/limit_pushdown.q ql/src/test/queries/clientpositive/limit_pushdown_negative.q ql/src/test/results/clientpositive/limit_pushdown.q.out ql/src/test/results/clientpositive/limit_pushdown_negative.q.out To: JIRA, tarball, ashutoshc, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-5145) Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23
[ https://issues.apache.org/jira/browse/HIVE-5145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750890#comment-13750890 ] Hudson commented on HIVE-5145: -- FAILURE: Integrated in Hive-trunk-hadoop2-ptest #71 (See [https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/71/]) HIVE-5145 : Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 (Prasanth J via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1517682) * /hive/trunk/ql/src/test/results/clientpositive/list_bucket_query_multiskew_2.q.out Fix TestCliDriver.list_bucket_query_multiskew_2.q on hadoop 0.23 Key: HIVE-5145 URL: https://issues.apache.org/jira/browse/HIVE-5145 Project: Hive Issue Type: Bug Components: Tests Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.12.0 Attachments: HIVE-5145.2.patch, HIVE-5145.patch there is some determinism related to the output of list_bucket_query_multiskew_2.q test case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: LIKE filter pushdown for tables and partitions
Adding user list. Any objections to removing LIKE support from getPartitionsByFilter? On Mon, Aug 26, 2013 at 2:54 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Couple of questions: 1. What about LIKE operator for Hive itself? Will that continue to work (presumably because there is an alternative path for that). 2. This will nonetheless break other direct consumers of metastore client api (like HCatalog). I see your point that we have a buggy implementation, so whats out there is not safe to use. Question than really is shall we remove this code, thereby breaking people for whom current buggy implementation is good enough (or you can say salvage them from breaking in future). Or shall we try to fix it now? My take is if there are no users of this anyways, then there is no point fixing it for non-existing users, but if there are we probably have to. I will suggest you to send an email to users@hive to ask if there are users for this. Thanks, Ashutosh On Mon, Aug 26, 2013 at 2:08 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Since there's no response I am assuming nobody cares about this code... Jira is HIVE-5134, I will attach a patch with removal this week. On Wed, Aug 21, 2013 at 2:28 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. I think there are issues with the way hive can currently do LIKE operator JDO pushdown and it the code should be removed for partitions and tables. Are there objections to removing LIKE from Filter.g and related areas? If no I will file a JIRA and do it. Details: There's code in metastore that is capable of pushing down LIKE expression into JDO for string partition keys, as well as tables. The code for tables doesn't appear used, and partition code definitely doesn't run in Hive proper because metastore client doesn't send LIKE expressions to server. It may be used in e.g. HCat and other places, but after asking some people here, I found out it probably isn't. I was trying to make it run and noticed some problems: 1) For partitions, Hive sends SQL patterns in a filter for like, e.g. %foo%, whereas metastore passes them into matches() JDOQL method which expects Java regex. 2) Converting the pattern to Java regex via UDFLike method, I found out that not all regexes appear to work in DN. .*foo seems to work but anything complex (such as escaping the pattern using Pattern.quote, which UDFLike does) breaks and no longer matches properly. 3) I tried to implement common cases using JDO methods startsWith/endsWith/indexOf (I will file a JIRA), but when I run tests on Derby, they also appear to have problems with some strings (for example, partition with backslash in the name cannot be matched by LIKE %\% (single backslash in a string), after being converted to .indexOf(param) where param is \ (escaping the backslash once again doesn't work either, and anyway there's no documented reason why it shouldn't work properly), while other characters match correctly, even e.g. %. For tables, there's no SQL-like, it expects Java regex, but I am not convinced all Java regexes are going to work. So, I think that for future correctness sake it's better to remove this code. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-5128) Direct SQL for view is failing
[ https://issues.apache.org/jira/browse/HIVE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5128: --- Status: Patch Available (was: Open) Direct SQL for view is failing --- Key: HIVE-5128 URL: https://issues.apache.org/jira/browse/HIVE-5128 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Sergey Shelukhin Priority: Trivial Attachments: HIVE-5128.D12465.1.patch, HIVE-5128.D12465.2.patch I cannot sure of this, but dropping views, (it rolls back to JPA and works fine) {noformat} etastore.ObjectStore: Direct SQL failed, falling back to ORM MetaException(message:Unexpected null for one of the IDs, SD null, column null, serde null) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:195) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:98) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1758) ... {noformat} Should it be disabled for views or can be fixed? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator
Navis created HIVE-5154: --- Summary: Remove unnecessary array creation in ReduceSinkOperator Key: HIVE-5154 URL: https://issues.apache.org/jira/browse/HIVE-5154 Project: Hive Issue Type: Task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Key array is created for each row, which seemed not necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-5154: -- Attachment: HIVE-5154.D12549.1.patch navis requested code review of HIVE-5154 [jira] Remove unnecessary array creation in ReduceSinkOperator. Reviewers: JIRA HIVE-5154 Remove unnecessary array creation in ReduceSinkOperator Key array is created for each row, which seemed not necessary. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D12549 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/30117/ To: JIRA, navis Remove unnecessary array creation in ReduceSinkOperator --- Key: HIVE-5154 URL: https://issues.apache.org/jira/browse/HIVE-5154 Project: Hive Issue Type: Task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5154.D12549.1.patch Key array is created for each row, which seemed not necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-5154) Remove unnecessary array creation in ReduceSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5154: Status: Patch Available (was: Open) Remove unnecessary array creation in ReduceSinkOperator --- Key: HIVE-5154 URL: https://issues.apache.org/jira/browse/HIVE-5154 Project: Hive Issue Type: Task Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-5154.D12549.1.patch Key array is created for each row, which seemed not necessary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750945#comment-13750945 ] Phabricator commented on HIVE-4002: --- navis has commented on the revision HIVE-4002 [jira] Fetch task aggregation for simple group by query. INLINE COMMENTS ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:3631 Right. I'll fix that. ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:6985 It's the same thing. I just want to be more consistent. ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java:582 I need recursive flush method for implementing this, like what init or close method does. I think I've broken something rebasing the patch. Can I ask what query was not working with this patch? Test framework seemed not working recently. ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java:493 Flush should be called to all operators in execution tree, for this patch. REVISION DETAIL https://reviews.facebook.net/D8739 To: JIRA, navis Cc: yhuai Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, HIVE-4002.D8739.3.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4002: -- Attachment: HIVE-4002.D8739.4.patch navis updated the revision HIVE-4002 [jira] Fetch task aggregation for simple group by query. Addressed comments Reviewers: JIRA REVISION DETAIL https://reviews.facebook.net/D8739 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D8739?vs=38829id=39063#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/MuxOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchAggregation.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/SimpleFetchOptimizer.java ql/src/java/org/apache/hadoop/hive/ql/parse/MapReduceCompiler.java ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java ql/src/java/org/apache/hadoop/hive/ql/parse/RowResolver.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/queries/clientpositive/fetch_aggregation.q ql/src/test/results/clientpositive/fetch_aggregation.q.out ql/src/test/results/compiler/plan/groupby1.q.xml ql/src/test/results/compiler/plan/groupby2.q.xml ql/src/test/results/compiler/plan/groupby3.q.xml ql/src/test/results/compiler/plan/groupby5.q.xml serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java To: JIRA, navis Cc: yhuai Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch, HIVE-4002.D8739.3.patch, HIVE-4002.D8739.4.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE
[ https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4375: Status: Patch Available (was: Open) Single sourced multi insert consists of native and non-native table mixed throws NPE Key: HIVE-4375 URL: https://issues.apache.org/jira/browse/HIVE-4375 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch, HIVE-4375.D10329.3.patch CREATE TABLE src_x1(key string, value string); CREATE TABLE src_x2(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string); explain from src a insert overwrite table src_x1 select key,value where a.key 0 AND a.key 50 insert overwrite table src_x2 select key,value where a.key 50 AND a.key 100; throws, {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage
[ https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750955#comment-13750955 ] Phabricator commented on HIVE-3562: --- ashutoshc has accepted the revision HIVE-3562 [jira] Some limit can be pushed down to map stage. +1 REVISION DETAIL https://reviews.facebook.net/D5967 BRANCH HIVE-3562 ARCANIST PROJECT hive To: JIRA, tarball, ashutoshc, navis Cc: njain Some limit can be pushed down to map stage -- Key: HIVE-3562 URL: https://issues.apache.org/jira/browse/HIVE-3562 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-3562.D5967.1.patch, HIVE-3562.D5967.2.patch, HIVE-3562.D5967.3.patch, HIVE-3562.D5967.4.patch, HIVE-3562.D5967.5.patch, HIVE-3562.D5967.6.patch, HIVE-3562.D5967.7.patch, HIVE-3562.D5967.8.patch, HIVE-3562.D5967.9.patch Queries with limit clause (with reasonable number), for example {noformat} select * from src order by key limit 10; {noformat} makes operator tree, TS-SEL-RS-EXT-LIMIT-FS But LIMIT can be partially calculated in RS, reducing size of shuffling. TS-SEL-RS(TOP-N)-EXT-LIMIT-FS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4375) Single sourced multi insert consists of native and non-native table mixed throws NPE
[ https://issues.apache.org/jira/browse/HIVE-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4375: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Single sourced multi insert consists of native and non-native table mixed throws NPE Key: HIVE-4375 URL: https://issues.apache.org/jira/browse/HIVE-4375 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0 Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4375.D10329.1.patch, HIVE-4375.D10329.2.patch, HIVE-4375.D10329.3.patch CREATE TABLE src_x1(key string, value string); CREATE TABLE src_x2(key string, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:string); explain from src a insert overwrite table src_x1 select key,value where a.key 0 AND a.key 50 insert overwrite table src_x2 select key,value where a.key 50 AND a.key 100; throws, {noformat} java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.addStatsTask(GenMRFileSink1.java:236) at org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1.process(GenMRFileSink1.java:126) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:55) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.parse.GenMapRedWalker.walk(GenMapRedWalker.java:67) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8354) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8759) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira