Re: PlanUtils.java use correct classloader when calling Class.forName()
Booked on https://issues.apache.org/jira/browse/HIVE-9486. Thanks, Navis 2015-01-27 18:05 GMT+09:00 德明 施 deming@outlook.com: Hi All, I am not having some hive classpath issue. I think this is a bug. I wrote my own SerDe com.stanley.MySerde, which is a simple json serializer; It is generally the same with the built-in SerDe org.apache.hadoop.hive.serde2.DelimitedJSONSerDe. Then I issued the command:add jar /path/to/myjar.jar; (I am sure this command worked)create table t1.json_1 row format serde com.stanley.MySerde location '/user/stanley/test-data-1/' as select * from t1.plain_table; create table t1.json_2 row format serde org.apache.hadoop.hive.serde2.DelimitedJSONSerDe location '/user/stanley/test-data-2/' as select * from t1.plain_table; The second command will succeed but the first one will fail with ClassNotFoundException. But if I put myjar.jar to $HIVE_HOME/lib, both command will succeed. I went through the code of the org.apache.hadoop.hive.ql.plan.PlanUtils.java, seems it is using Class.forname(clzname) to load the class, I think it should use the Thread.contextClassLoader instead, am I right?There's a similar issue here: https://issues.apache.org/jira/browse/HIVE-6495 Here's the exception trace: java.lang.ClassNotFoundException: com.ebay.p13n.hive.bexbat.serde.JsonLazySimpleSerDe at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)at java.lang.ClassLoader.loadClass(ClassLoader.java:358)at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:190) at org.apache.hadoop.hive.ql.plan.PlanUtils.getTableDesc(PlanUtils.java:310) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:5874) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8278) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8169) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9001) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9267) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:427)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:323)at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:980)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1045) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
Re: To be included as a Hive contributor
We've been failing to add new contributor to HIVE JIRA for months. to PMCs : Should't we fix it? Maybe Carl knows how to do that. 2015-01-13 15:59 GMT+09:00 赵海明 zhao...@asiainfo.com: Admins, I’m a Hive user from Beijing.China, I want to be included as a Hive contributor, and my JIRA username is zhao...@asiainfo.com, I look forward to your replying, thanks! Best Regards Spongcer ZHAO Software Architect Cell: 0086 151 1009 3390 AsiaInfo Information Systems (Beijing) Limited AsiaInfo Plaza, Courtyard #10 East, Xibeiwang East Road, Haidian District, Beijing,100193 P.R.China [image: 说明: cid:F511F9A7-E0CB-4797-A029-5864067DEDC5@ai.com]
Re: Is there any property for don't deleting scratch dir
I think once there was a configuration for it. But cannot find recent releases. Thanks, Navis 2014-12-24 10:46 GMT+09:00 Jeff Zhang zjf...@gmail.com: Hi, Is there any property for don't deleting scratch dir ? I'd like check the intermediate data to know the internal of hive better. -- Best Regards Jeff Zhang
Re: What's the status of AccessServer?
The proposal suggested by Carl. It's once referenced in HIVE-4569 ( https://issues.apache.org/jira/browse/HIVE-4569?focusedCommentId=13691935page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13691935), but seemed not have issue for it. @Carl Steinbach, could you say anything on this? 2014-12-18 3:21 GMT+09:00 Nick Dimiduk ndimi...@gmail.com: Hi folks, I'm looking for ways to expose Apache Phoenix [0] to a wider audience. One potential way to do that is to follow in the Hive footsteps with a HS2 protocol-compatible service. I've done some prototyping along these lines and see that it's quite feasible. Along the way I came across this proposal for refactoring HS2 into the AccessServer [1]. What's the state of the AccessServer project? Is anyone working on it? Is there a relationship between this effort and Calcite's Avatica [2]? The system proposed in the AccessServer doc seems to fit nicely in line with Calcite's objectives. Thanks, Nick [0]: http://phoenix.apache.org [1]: https://cwiki.apache.org/confluence/display/Hive/AccessServer+Design+Proposal [2]: http://mail-archives.apache.org/mod_mbox/calcite-dev/201412.mbox/%3CCAMCtme%2BpVsVYP%2B-J1jDPk-fNCtAHj3f0eXif_hUG_Xy81Ufxsw%40mail.gmail.com%3E
Re: How to find Optimize query plan for given Query?
You can see the optimized logical operator tree by explain logical your query. And partial operator tree in stages are all optimized one. Thanks, Navis 2014-12-11 22:09 GMT+09:00 Akash Mishra akash.mishr...@gmail.com: Hi All, I am trying to understand the details of how query is optimize in the system? Is there anything other than EXPLAIN system. As per my understanding Abstract Syntax tree is not optimize. Is the plan in stage is the optimize physical plan ? -- With Sincere Regards, Your's Sincerely, Akash Mishra. Its not our abilities that make us, but our decisions.--Albus Dumbledore
Re: Request to add to contributor list
Strange.. I can remove name in contributor list (I tried with mine) but cannot add name to it. Is there anyone knows about this? Thanks, Navis 2014-12-11 17:33 GMT+09:00 Binglin Chang decst...@gmail.com: Hi, I fire some jira related to hiveserver2(HIVE-9005, HIVE-9006, HIVE-9013), please help adding me to contributor list so I can assign jira and create review requests. id: decster Thanks Binglin
Re: [ANNOUNCE] New Hive PMC Member - Prasad Mujumdar
Congratulations! 2014-12-10 8:35 GMT+09:00 Jason Dere jd...@hortonworks.com: Congrats! On Dec 9, 2014, at 3:02 PM, Venkat V venka...@gmail.com wrote: Congrats Prasad! On Tue, Dec 9, 2014 at 2:32 PM, Brock Noland br...@cloudera.com wrote: Congratulations Prasad!! On Tue, Dec 9, 2014 at 2:17 PM, Carl Steinbach c...@apache.org wrote: I am pleased to announce that Prasad Mujumdar has been elected to the Hive Project Management Committee. Please join me in congratulating Prasad! Thanks. - Carl -- Venkat V -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: 3 configs: hive.added.files.path, hive.added.jars.path, hive.added.archives.path
Looks like HIVEADDEDFILES/JARS/ARCHIVES are just internal variables to carry values in HiveConf, which should not be configured by user. We can exclude them from generation of xml. HIVEAUXJARS is a public configuration for user to specify jars to be added to classpath of all jobs. Seemed need a description for it. 2014-10-13 16:39 GMT+09:00 Lefty Leverenz leftylever...@gmail.com: Asking again about hive.added.files.path, hive.added.jars.path, and hive.added.archives.path. -- Lefty On Sun, Oct 5, 2014 at 9:15 PM, Lefty Leverenz leftylever...@gmail.com wrote: Can someone provide descriptions for these three configuration parameters? - hive.added.files.path (added in Hive 0.4) - hive.added.jars.path (0.4) - hive.added.archives.path (0.5) I put them in AdminManual Configuration https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfigurationVariables with pathetic descriptions. What are they for? Are they still useful or obsolete? Does anyone know which JIRAs created them? How is hive.added.jars.path different from hive.aux.jars.path? Inquiring minds want to know. -- Lefty
Re: Restarting hadoop-1 builds
Hi, I've booked current build failure problem on HIVE-8265. It seemed this kind of things will happen again. Thanks, Navis 2014-09-26 10:40 GMT+09:00 Szehon Ho sze...@cloudera.com: Hi all, There's been no build coverage of hadoop-1 on hive-trunk since 0.13 release. I'm planning to restart periodic hadoop-1 builds at http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-TRUNK-HADOOP-1 , so we know the current state. I hope this helps for the 0.14 release and beyond. But as I'm setting it up, there's an issue compiling with mvn install -Phadoop-1 on the hive trunk root dir, I wonder if this is still the command to compile it, and it is a real error? [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project hive-exec: Compilation failure [ERROR] /data/hive-ptest/working/apache-svn-trunk-source/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/CustomPartitionVertex.java:[37,27] error: cannot find symbol Appreciate any help, thanks. Szehon
Re: Please include me as Hive contributor
Done! Thanks, Navis 2014-09-15 18:19 GMT+09:00 Rajat Ratewal rajatrate...@gmail.com: Hi All, Trust you are doing well. Can you please add me as a Hive contributor. My JIRA username is ratewar. Cheers, Rajat Ratewal
Re: Timeline for release of Hive 0.14
Hi, I'll really appreciate if HIVE-5690 can be included, which becomes harder and harder to rebase. Other 79 patches I've assigned to can be held on. Thanks, Navis 2014-09-11 19:54 GMT+09:00 Vaibhav Gumashta vgumas...@hortonworks.com: Hi Vikram, Can we also add: https://issues.apache.org/jira/browse/HIVE-6799 https://issues.apache.org/jira/browse/HIVE-7935 to the list. Thanks, --Vaibhav On Wed, Sep 10, 2014 at 12:18 AM, Satish Mittal satish.mit...@inmobi.com wrote: Hi, Can you please include HIVE-7892 (Thrift Set type not working with Hive) as well? It is under code review. Regards, Satish On Tue, Sep 9, 2014 at 2:10 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: Please include https://issues.apache.org/jira/browse/HIVE-7694 as well. It is currently under review by Amareshwari and should be done in the next couple of days. Thanks Suma On Mon, Sep 8, 2014 at 5:44 PM, Alan Gates ga...@hortonworks.com wrote: I'll review that. I just need the time to test it against mysql, oracle, and hopefully sqlserver. But I think we can do this post branch if we need to, as it's a bug fix rather than a feature. Alan. Damien Carol dca...@blitzbs.com September 8, 2014 at 3:19 Same request for https://issues.apache.org/jira/browse/HIVE-7689 I already provided a patch, re-based it many times and I'm waiting for a review. Regards, Le 08/09/2014 12:08, amareshwarisr . a écrit : amareshwarisr . amareshw...@gmail.com September 8, 2014 at 3:08 Would like to include https://issues.apache.org/jira/browse/HIVE-2390 and https://issues.apache.org/jira/browse/HIVE-7936 . I can review and merge them. Thanks Amareshwari Vikram Dixit vik...@hortonworks.com September 5, 2014 at 17:53 Hi Folks, I am going to start consolidating the items mentioned in this list and create a wiki page to track it. I will wait till the end of next week to create the branch taking into account Ashutosh's request. Thanks Vikram. On Fri, Sep 5, 2014 at 5:39 PM, Ashutosh Chauhan hashut...@apache.org hashut...@apache.org Ashutosh Chauhan hashut...@apache.org September 5, 2014 at 17:39 Vikram, Some of us are working on stabilizing cbo branch and trying to get it merged into trunk. We feel we are close. May I request to defer cutting the branch for few more days? Folks interested in this can track our progress here : https://issues.apache.org/jira/browse/HIVE-7946 Thanks, Ashutosh On Fri, Aug 22, 2014 at 4:09 PM, Lars Francke lars.fran...@gmail.com lars.fran...@gmail.com Lars Francke lars.fran...@gmail.com August 22, 2014 at 16:09 Thank you for volunteering to do the release. I think a 0.14 release is a good idea. I have a couple of issues I'd like to get in too: * Either HIVE-7107[0] (Fix an issue in the HiveServer1 JDBC driver) or HIVE-6977[1] (Delete HiveServer1). The former needs a review the latter a patch * HIVE-6123[2] Checkstyle in Maven needs a review HIVE-7622[3] HIVE-7543[4] are waiting for any reviews or comments on my previous thread[5]. I'd still appreciate any helpers for reviews or even just comments. I'd feel very sad if I had done all that work for nothing. Hoping this thread gives me a wider audience. Both patches fix up issues that should have been caught in earlier reviews as they are almost all Checkstyle or other style violations but they make for huge patches. I could also create hundreds of small issues or stop doing these things entirely [0] https://issues.apache.org/jira/browse/HIVE-7107 https://issues.apache.org/jira/browse/HIVE-7107 [1] https://issues.apache.org/jira/browse/HIVE-6977 https://issues.apache.org/jira/browse/HIVE-6977 [2] https://issues.apache.org/jira/browse/HIVE-6123 https://issues.apache.org/jira/browse/HIVE-6123 [3] https://issues.apache.org/jira/browse/HIVE-7622 https://issues.apache.org/jira/browse/HIVE-7622 [4] https://issues.apache.org/jira/browse/HIVE-7543 https://issues.apache.org/jira/browse/HIVE-7543 On Fri, Aug 22, 2014 at 11:01 PM, John Pullokkaran -- Sent with Postbox http://www.getpostbox.com CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly
Re: Hive contributor
Done! Thanks, Navis 2014-09-12 6:42 GMT+09:00 Sebastien Marti marti@gmail.com: Hi all, Please add me to Hive contributor list My Jira User name : smarti Thanks Sebastien
Re: request to beome a contributor
Done! Thanks, Navis 2014-09-03 8:11 GMT+09:00 David Serafini d...@altiscale.com: JIRA user: dbsalti (david serafini) Working on HIVE-7100 thanks, dbs
Re: Running tests in IntelliJ
Make a jar dependency on metastore module with hive-metastore,jar and give higher priority to it than module source. Pretty sure there is a better way than this. Thanks, Navis 2014-08-30 18:47 GMT+09:00 Lars Francke lars.fran...@gmail.com: Hi, I'm trying to set up my dev environment properly so that I can run tests in IntelliJ but a lot of them fail due to errors like this: Caused by: org.datanucleus.api.jdo.exceptions.ClassNotPersistenceCapableException: The class org.apache.hadoop.hive.metastore.model.MVersionTable is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found. I understand that this comes due to some stuff Datanucleus does behind the scenes but I can't get that to work in IntelliJ even with the two available Datanucleus plugins. They both seem to be unmaintained. So: Does anyone use IntelliJ and has all the tests running in it? I appreciate any pointers, thank you! Cheers, Lars
Re: Why does SMB join generate hash table locally, even if input tables are large?
I don't think hash table generation is needed for SMB joins. Could you check the result of explain extended? Thanks, Navis 2014-07-31 4:08 GMT+09:00 Pala M Muthaia mchett...@rocketfuelinc.com: +hive-users On Tue, Jul 29, 2014 at 1:56 PM, Pala M Muthaia mchett...@rocketfuelinc.com wrote: Hi, I am testing SMB join for 2 large tables. The tables are bucketed and sorted on the join column. I notice that even though the table is large, Hive attempts to generate hash table for the 'small' table locally, similar to map join. Since the table is large in my case, the client runs out of memory and the query fails. I am using Hive 0.12 with the following settings: set hive.optimize.bucketmapjoin=true; set hive.optimize.bucketmapjoin.sortedmerge=true; set hive.input.format = org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; My test query does a simple join and a select, no subqueries/nested queries etc. I understand why a (bucket) map join requires hash table generation, but why is that included for an SMB join? Shouldn't a SMB join just spin up one mapper for each bucket and perform a sort merge join directly on the mapper? Thanks, pala
Re: hive udf cannot recognize generic method
I've booked this on https://issues.apache.org/jira/browse/HIVE-7588. With the patch, something like below are possible. // unknown input public String evaluate(Object arg) { return arg == null ? null : String.valueOf(arg); } // typed variable public T T evaluate(T arg) { return arg; } // typed variable, nested public T T evaluate(MapString, T arg) { return arg.values().iterator().next(); } Thanks, Navis 2014-07-31 3:37 GMT+09:00 Jason Dere jd...@hortonworks.com: Sounds like you are using the older style UDF class. In that case, yes you would have to override evaluate() for each type of input. You could also try overriding the GenericUDF class - that would allow you to do a single method, though it may be a bit more complicated (can look at the Hive code for some examples) On Jul 30, 2014, at 7:43 AM, Dan Fan d...@appnexus.com wrote: Hi there I am writing a hive UDF function. The input could be string, int, double etc. The return is based on the data type. I was trying to use the generic method, however, hive seems not recognize it. Here is the piece of code I have as example. public T T evaluate(final T s, final String column_name, final int bitmap) throws Exception { if (s instanceof Double) return (T) new Double(-1.0); Else if( s instance of Integer) Return (T) new Integer(-1) ; ….. } Does anyone know if hive supports the generic method ? Or I have to override the evaluate method for each type of input. Thanks Dan -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: want to be a contributor
Done. Feel free assign issues to yourself. Thanks, Navis 2014-07-29 10:41 GMT+09:00 Xu, Cheng A cheng.a...@intel.com: Regards, Ferdinand Xu
Re: Hive exception while getting duplicate records with in the table
Try, select a.* from tmp_source_fs_price1 a join ( select fund_id,nav_date,CURRENCY,nav,count(1) from tmp_source_fs_price1 group by fund_id,nav_date,CURRENCY,nav having count(1) 1) b on a.fund_id = b.fund_id ; The grammar you've mentioned is not supported in hive-0.11.0 (hive-0.13.0, by HIVE-6393) Thanks, Navis 2014-07-24 19:20 GMT+09:00 Adi Reddy tolla.adire...@gmail.com: Hi Team, I am looking for to get duplicate records with in the table (using hive version : 0.11.0 ) . Below is my query : select a.* from tmp_source_fs_price1 a, ( select fund_id,nav_date,CURRENCY,nav,count(1) from tmp_source_fs_price1 group by fund_id,nav_date,CURRENCY,nav having count(1) 1) b where a.fund_id = b.fund_id ; Exception : NoViableAltException(10@[175:43: (alias= identifier )?]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:144) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableSource(HiveParser_FromClauseParser.java:3616) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromSource(HiveParser_FromClauseParser.java:2815) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.joinSource(HiveParser_FromClauseParser.java:1316) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1189) at org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:30719) at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:28858) at org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:28766) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:28306) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:28100) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1213) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:928) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:418) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:39 cannot recognize input near 'a' ',' '(' in table source Could you please anyone help me on this. Thanks, AdiReddy
Re: AST to Query Sring
You need TokenRewriteStream for the ASTNode. which is in Context or ParseDriver. String rewrite(TokenRewriteStream rewriter, ASTNode source) throws Exception { // some modification.. return rewriter.toString(source.getTokenStartIndex(), source.getTokenStopIndex()); } Thanks, Navis 2014-07-25 7:17 GMT+09:00 Lin Liu lliu.bigd...@gmail.com: Hi folks, Currently I am working on a project which needs to generate query string based on the modified AST. Does Hive contain this mechanism already? If not, which tools would help to complete the task? Thanks in advance. Lin
Re: [GitHub] hive pull request: HIVE 2304 : for hive2
I've attached the patch to https://issues.apache.org/jira/browse/HIVE-6165. If you tell me your name of apache account, I can assign this to you. Thanks, Navis 2014-07-19 17:15 GMT+09:00 Nitin Pawar nitinpawar...@gmail.com: as per understanding, apache hive development does not support git pull requests yet you may want to create a patch and upload to appropriate jira ticket On Sat, Jul 19, 2014 at 3:15 AM, codingtony g...@git.apache.org wrote: GitHub user codingtony opened a pull request: https://github.com/apache/hive/pull/20 HIVE 2304 : for hive2 Fix for HivePreparedStatement for the hive2 driver. Applied the same setObject() code that fixed HIVE 2304 for hive1 driver. You can merge this pull request into a Git repository by running: $ git pull https://github.com/codingtony/hive HIVE-2304-hive2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/20.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20 commit 05dea4aaa70f9fb1676c97fe57b3f6813eeef111 Author: Sushanth Sowmyan khorg...@apache.org Date: 2014-06-02T19:25:00Z Hive 0.13.1-rc3 release. git-svn-id: https://svn.apache.org/repos/asf/hive/tags/release-0.13.1-rc3@1599318 13f79535-47bb-0310-9956-ffa450edef68 commit 85a78a0d6b992df238bce96fd57afb385b5d8b06 Author: Sushanth Sowmyan khorg...@apache.org Date: 2014-06-05T21:03:32Z Hive 0.13.1 release. git-svn-id: https://svn.apache.org/repos/asf/hive/tags/release-0.13.1@1600763 13f79535-47bb-0310-9956-ffa450edef68 commit 5464913c3e4707eba29eb5c917453afed905411b Author: Tony Bussieres tony.bussie...@ticksmith.com Date: 2014-07-18T21:36:08Z HIVE-2304 : Apply same fix but for org.apache.hive.jdbc.HivePreparedStatement (hive2) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- -- Nitin Pawar
Re: Archive Replication Issue HIVE-7429
Committed to trunk. Thanks Daniel, for the contribution. Navis 2014-07-17 6:20 GMT+09:00 Daniel Weeks dwe...@netflix.com.invalid: I was wondering if someone could quickly review the patch below. It's trivial and is just fixing a order of execution problem when uploading the archive file to hdfs and setting the replication factor (replication is set before archive is uploaded). https://issues.apache.org/jira/browse/HIVE-7429 Thanks, -Dan Weeks
Re: Case problem in complex type
Yes, it might be. But I think it's lower cased by mistake because first fields in struct was all column names. There are plenty of complex data including XML and Json, which is case sensitive. I afraid we are losing cases for them. 2014-07-13 2:26 GMT+09:00 Ashutosh Chauhan hashut...@apache.org: Following POLA[1] I would suggest that ORC should follow conventions as rest of Hive. If all other Struct OI are case-insensitive, than ORC should be as well. 1: http://en.wikipedia.org/wiki/Principle_of_least_astonishment On Thu, Jul 10, 2014 at 10:21 PM, Navis류승우 navis@nexr.com wrote: Any opinions? IMO, field names should be case-sensitive, but I'm doubt on backward compatibility issue. Thanks, Navis 2014-07-10 13:31 GMT+09:00 Lefty Leverenz leftylever...@gmail.com: Struct doesn't have its own section in the Types doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types , but it could (see Complex Types https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes ). However I don't think people will look there for information about case sensitivity -- it belongs in the DDL and DML docs. Case-insensitivity for column names is mentioned here: - Create Table https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (notes immediately after the syntax) - Alter Column -- Rules for Column Names https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn - Select Syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax (notes after the syntax) The ORC doc could also mention this issue, preferably in the section Hive QL Syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax . -- Lefty On Wed, Jul 9, 2014 at 11:48 PM, Navis류승우 navis@nexr.com wrote: For column name, hive restricts it as a lower case string. But how about field name? Currently, StructObjectInspector except ORC ignores case(lower case only). This should not be implementation dependent and should be documented somewhere. see https://issues.apache.org/jira/browse/HIVE-6198 Thanks, Navis
Re: Case problem in complex type
Any opinions? IMO, field names should be case-sensitive, but I'm doubt on backward compatibility issue. Thanks, Navis 2014-07-10 13:31 GMT+09:00 Lefty Leverenz leftylever...@gmail.com: Struct doesn't have its own section in the Types doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types, but it could (see Complex Types https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes ). However I don't think people will look there for information about case sensitivity -- it belongs in the DDL and DML docs. Case-insensitivity for column names is mentioned here: - Create Table https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable (notes immediately after the syntax) - Alter Column -- Rules for Column Names https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn - Select Syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-SelectSyntax (notes after the syntax) The ORC doc could also mention this issue, preferably in the section Hive QL Syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-HiveQLSyntax . -- Lefty On Wed, Jul 9, 2014 at 11:48 PM, Navis류승우 navis@nexr.com wrote: For column name, hive restricts it as a lower case string. But how about field name? Currently, StructObjectInspector except ORC ignores case(lower case only). This should not be implementation dependent and should be documented somewhere. see https://issues.apache.org/jira/browse/HIVE-6198 Thanks, Navis
Case problem in complex type
For column name, hive restricts it as a lower case string. But how about field name? Currently, StructObjectInspector except ORC ignores case(lower case only). This should not be implementation dependent and should be documented somewhere. see https://issues.apache.org/jira/browse/HIVE-6198 Thanks, Navis
Re: [GitHub] hive pull request: Fix lock/unlock pairing
Hive in github is just for mirroring of apache svn. So pull request cannot be handled. Could you make a patch and attach it on https://issues.apache.org/jira/browse/HIVE-7303 ? Thanks, Navis 2014-06-26 22:29 GMT+09:00 pavel-sakun g...@git.apache.org: GitHub user pavel-sakun opened a pull request: https://github.com/apache/hive/pull/17 Fix lock/unlock pairing Prevent IllegalMonitorStateException in case stmtHandle is null You can merge this pull request into a Git repository by running: $ git pull https://github.com/pavel-sakun/hive hive-statement-illegalmonitorstateexception Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/17.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17 commit 9468a23bfe76cd5be5c747998ec0c055750db2d3 Author: Pavel Sakun pavel_sa...@epam.com Date: 2014-06-26T13:26:38Z Fix lock/unlock pairing Prevent IllegalMonitorStateException in case stmtHandle is null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Hive 0.13/Hcatalog : Mapreduce Exception : java.lang.IncompatibleClassChangeError
I don't have environment to confirm this. But if the this happens, we should include HIVE-6432 into HIVE-0.13.1. 2014-06-05 12:44 GMT+09:00 Navis류승우 navis@nexr.com: It's fixed in HIVE-6432. I think you should rebuild your own hcatalog from source with profile -Phadoop-1. 2014-06-05 9:08 GMT+09:00 Sundaramoorthy, Malliyanathan malliyanathan.sundaramoor...@citi.com: Hi, I am using Hadoop 2.4.0 with Hive 0.13 + included package of HCatalog . Wrote a simple map-reduce job from the example and running the code below .. getting “*Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected“ * .. Not sure of the error I am making .. Not sure if there a compatibility issue .. please help.. *boolean* success = *true*; *try* { Configuration conf = getConf(); args = *new* GenericOptionsParser(conf, args).getRemainingArgs(); //Hive Table Details String dbName = args[0]; String inputTableName= args[1]; String outputTableName= args[2]; //Job Input Job job = *new* *Job**(conf,**Scenarios**)*; //Initialize Map/Reducer Input/Output HCatInputFormat.*setInput*(job,dbName,inputTableName); //HCatInputFormat.ssetInput(job,InputJobInfo.create(dbName, inputTableName, null)); job.setInputFormatClass(HCatInputFormat.*class*); job.setJarByClass(MainRunner.*class*); job.setMapperClass(ScenarioMapper.*class*); job.setReducerClass(ScenarioReducer.*class*); job.setMapOutputKeyClass(IntWritable.*class*); job.setMapOutputValueClass(IntWritable.*class*); job.setOutputKeyClass(WritableComparable.*class*); job.setOutputValueClass(DefaultHCatRecord.*class*); HCatOutputFormat.*setOutput*(job, OutputJobInfo.*create*(dbName, outputTableName, *null*)); HCatSchema outSchema = HCatOutputFormat.*getTableSchema*(conf); System.*err*.println(INFO: output schema explicitly set for writing:+ outSchema); HCatOutputFormat.*setSchema*(job, outSchema); job.setOutputFormatClass(HCatOutputFormat.*class*); 14/06/02 18:52:57 INFO client.RMProxy: Connecting to ResourceManager at localhost/00.04.07.174:8040 Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getJobInfo(HCatBaseOutputFormat.java:104) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.getOutputFormat(HCatBaseOutputFormat.java:84) at org.apache.hive.hcatalog.mapreduce.HCatBaseOutputFormat.checkOutputSpecs(HCatBaseOutputFormat.java:73) at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.run(MainRunner.java:79) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.citi.aqua.snu.hdp.clar.mra.service.MainRunner.main(MainRunner.java:89) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Regards, Malli
Re: Why isn't itests/ listed as submodule of root pom.xml?
Should we write this on wiki? :) 2014-03-12 8:46 GMT+09:00 Brock Noland br...@cloudera.com: Hopefully this is the last time I have to say this :) The qfile tests in itests require the packaging phase. The maven test phase is after compile and before packaging. We could change the qfile tests to run during the integration-test phase using the failsafe plugin but the failsafe plugin is different than surefire and IMO is hard to use. If you'd like to give that a try, by all means, go ahead. On Tue, Mar 11, 2014 at 6:37 PM, Jason Dere jd...@hortonworks.com wrote: Noticed this since internally we set the version number to something different than simply 0.13.0, and mvn version:set doesn't really work correctly with itests because itests isn't listed as one of the root POM's submodules. Is there a particular reason for it not being listed as a submodule when the mavenization was done? Having it as a submodule also allows you to run the qfile tests from root directory, so we could simplify the instructions for testing. Jason -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Timeline for the Hive 0.13 release?
I have really big wish list(65 pending) but it would be time to focus on finalization. - Small bugs HIVE-6403 uncorrelated subquery is failing with auto.convert.join=true HIVE-4790 MapredLocalTask task does not make virtual columns HIVE-4293 Predicates following UDTF operator are removed by PPD - Trivials HIVE-6551 group by after join with skew join optimization references invalid task sometimes HIVE-6359 beeline -f fails on scripts with tabs in them. HIVE-6314 The logging (progress reporting) is too verbose HIVE-6241 Remove direct reference of Hadoop23Shims inQTestUtil HIVE-5768 Beeline connection cannot be closed with !close command HIVE-2752 Index names are case sensitive - Memory leakage HIVE-6312 doAs with plain sasl auth should be session aware - Implementation is not accord with document HIVE-6129 alter exchange is implemented in inverted manner I'll update the wiki, too. 2014-03-05 12:18 GMT+09:00 Harish Butani hbut...@hortonworks.com: Tracking jiras to be applied to branch 0.13 here: https://cwiki.apache.org/confluence/display/Hive/Hive+0.13+release+status On Mar 4, 2014, at 5:45 PM, Harish Butani hbut...@hortonworks.com wrote: the branch is created. have changed the poms in both branches. Planning to setup a wikipage to track jiras that will get ported to 0.13 regards, Harish. On Mar 4, 2014, at 5:05 PM, Harish Butani hbut...@hortonworks.com wrote: branching now. Will be changing the pom files on trunk. Will send another email when the branch and trunk changes are in. On Mar 4, 2014, at 4:03 PM, Sushanth Sowmyan khorg...@gmail.com wrote: I have two patches still as patch-available, that have had +1s as well, but are waiting on pre-commit tests picking them up go in to 0.13: https://issues.apache.org/jira/browse/HIVE-6507 (refactor of table property names from string constants to an enum in OrcFile) https://issues.apache.org/jira/browse/HIVE-6499 (fixes bug where calls like create table and drop table can fail if metastore-side authorization is used in conjunction with custom inputformat/outputformat/serdes that are not loadable from the metastore-side) -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
JIRA seemed entered to maintenance status, so replying to mail. I've forget why I made SystemVarilables class in common package, which handled this problem in the first patch. Almost done. 2014-02-19 11:23 GMT+09:00 Brock Noland (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905033#comment-13905033] Brock Noland commented on HIVE-6037: I think to fix the build issue we could upgrade to 2.3. To fix the template issue we could: 1) reverse populate things like user.name and java.io.tmp 2) Only run the generate in a maven profile so the minor items can be fixed before commit. Synchronize HiveConf with hive-default.xml.template and support show conf - Key: HIVE-6037 URL: https://issues.apache.org/jira/browse/HIVE-6037 Project: Hive Issue Type: Improvement Components: Configuration Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.13.0 Attachments: CHIVE-6037.3.patch.txt, HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, HIVE-6037.12.patch.txt, HIVE-6037.2.patch.txt, HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, HIVE-6037.patch see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Timeline for the Hive 0.13 release?
HIVE-6037 is for generating hive-default.template file from HiveConf. Could it be included in this release? If it's not, I'll suspend further rebasing of it till next release (conflicts too frequently). 2014-02-16 20:38 GMT+09:00 Lefty Leverenz leftylever...@gmail.com: I'll try to catch up on the wikidocs backlog for 0.13.0 patches in time for the release. It's a long and growing list, though, so no promises. Feel free to do your own documentation, or hand it off to a friendly in-house writer. -- Lefty, self-appointed Hive docs maven On Sat, Feb 15, 2014 at 1:28 PM, Thejas Nair the...@hortonworks.com wrote: Sounds good to me. On Fri, Feb 14, 2014 at 7:29 PM, Harish Butani hbut...@hortonworks.com wrote: Hi, Its mid feb. Wanted to check if the community is ready to cut a branch. Could we cut the branch in a week , say 5pm PST 2/21/14? The goal is to keep the release cycle short: couple of weeks; so after the branch we go into stabilizing mode for hive 0.13, checking in only blocker/critical bug fixes. regards, Harish. On Jan 20, 2014, at 9:25 AM, Brock Noland br...@cloudera.com wrote: Hi, I agree that picking a date to branch and then restricting commits to that branch would be a less time intensive plan for the RM. Brock On Sat, Jan 18, 2014 at 4:21 PM, Harish Butani hbut...@hortonworks.com wrote: Yes agree it is time to start planning for the next release. I would like to volunteer to do the release management duties for this release(will be a great experience for me) Will be happy to do it, if the community is fine with this. regards, Harish. On Jan 17, 2014, at 7:05 PM, Thejas Nair the...@hortonworks.com wrote: Yes, I think it is time to start planning for the next release. For 0.12 release I created a branch and then accepted patches that people asked to be included for sometime, before moving a phase of accepting only critical bug fixes. This turned out to be laborious. I think we should instead give everyone a few weeks to get any patches they are working on to be ready, cut the branch, and take in only critical bug fixes to the branch after that. How about cutting the branch around mid-February and targeting to release in a week or two after that. Thanks, Thejas On Fri, Jan 17, 2014 at 4:39 PM, Carl Steinbach c...@apache.org wrote: I was wondering what people think about setting a tentative date for the Hive 0.13 release? At an old Hive Contrib meeting we agreed that Hive should follow a time-based release model with new releases every four months. If we follow that schedule we're due for the next release in mid-February. Thoughts? Thanks. Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to
Re: hive precommit tests on bigtop jenkins
bq. even if a JIRA is in the queue twice it will only be tested once. Good to know! bq. removing order-by clauses just for conforming purpose (my comment) I've tested it in https://issues.apache.org/jira/browse/HIVE-6438, making 556 sec - 418 sec for join_filters.q. Would it be worthwhile to rewrite and update so many tests/results? 2014-02-14 15:58 GMT+09:00 Brock Noland br...@cloudera.com: Hi, The pre-commit tests: 1) only test the latest attachment 2) post the attachment id to the JIRA 3) Verify the attachment id has not been tested before running This means that even if a JIRA is in the queue twice it will only be tested once. Below are relevant portions of the script: curl -s -S --location --retry 3 ${JIRA_ROOT_URL}/jira/browse/${JIRA_NAME} $JIRA_TEXT ... PATCH_URL=$(grep -o '/jira/secure/attachment/[0-9]*/[^]*' $JIRA_TEXT | \ grep -v -e 'htm[l]*$' | sort | tail -1 | \ grep -o '/jira/secure/attachment/[0-9]*/[^]*') ... # ensure attachment has not already been tested ATTACHMENT_ID=$(basename $(dirname $PATCH_URL)) if grep -q ATTACHMENT ID: $ATTACHMENT_ID $JIRA_TEXT then echo Attachment $ATTACHMENT_ID is already tested for $JIRA_NAME exit 1 fi On Fri, Feb 14, 2014 at 12:51 AM, Navis류승우 navis@nexr.com wrote: Recently, precommit test takes more than 1 day (including queue time). Deduping work queue (currently, HIVE-6403 and HIVE-6418 is queued twice) can make this better. Rewriting some test queries simpler (I'm thinking of removing order-by clauses just for conforming purpose). Any other ideas? 2014-02-14 6:46 GMT+09:00 Thejas Nair the...@hortonworks.com: I see a new job now running there. Maybe there is nothing wrong with the infra and builds actually finished (except for the 3 aborted ones). Can't complain about a shorter queue ! :) On Thu, Feb 13, 2014 at 1:30 PM, Thejas Nair the...@hortonworks.com wrote: Is the jenkins infra used for hive precommit tests under maintenance ? I see that the long queue has suddenly disappeared. The last few test builds have been aborted. The jenkins used for hive precommit tests - http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/ Thanks, Thejas -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
Re: hive precommit tests on bigtop jenkins
Recently, precommit test takes more than 1 day (including queue time). Deduping work queue (currently, HIVE-6403 and HIVE-6418 is queued twice) can make this better. Rewriting some test queries simpler (I'm thinking of removing order-by clauses just for conforming purpose). Any other ideas? 2014-02-14 6:46 GMT+09:00 Thejas Nair the...@hortonworks.com: I see a new job now running there. Maybe there is nothing wrong with the infra and builds actually finished (except for the 3 aborted ones). Can't complain about a shorter queue ! :) On Thu, Feb 13, 2014 at 1:30 PM, Thejas Nair the...@hortonworks.com wrote: Is the jenkins infra used for hive precommit tests under maintenance ? I see that the long queue has suddenly disappeared. The last few test builds have been aborted. The jenkins used for hive precommit tests - http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/ Thanks, Thejas -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [jira] [Commented] (HIVE-6329) Support column level encryption/decryption
Yes, I've removed ByteArrayRef from LazyObjectBase, which is just useless overhead. Can we just remove it? 2014-02-13 1:36 GMT+09:00 Brock Noland (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899255#comment-13899255] Brock Noland commented on HIVE-6329: Hi, It looks like this makes some changes to the init() method? I think this will impact existing Hive Serdes. Is it possible to make this change without changing the init() method? Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.indices'='2,3', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Timeline for the Hive 0.13 release?
Hi all, We have so much on-going works especially within tez/vectorization/authorization territory and they are all under cover of Hortonworks. Could anyone in the company to tell whether they have some internal milestone for next release? Thanks, Navis 2014-01-21 Brock Noland br...@cloudera.com: Hi, I agree that picking a date to branch and then restricting commits to that branch would be a less time intensive plan for the RM. Brock On Sat, Jan 18, 2014 at 4:21 PM, Harish Butani hbut...@hortonworks.com wrote: Yes agree it is time to start planning for the next release. I would like to volunteer to do the release management duties for this release(will be a great experience for me) Will be happy to do it, if the community is fine with this. regards, Harish. On Jan 17, 2014, at 7:05 PM, Thejas Nair the...@hortonworks.com wrote: Yes, I think it is time to start planning for the next release. For 0.12 release I created a branch and then accepted patches that people asked to be included for sometime, before moving a phase of accepting only critical bug fixes. This turned out to be laborious. I think we should instead give everyone a few weeks to get any patches they are working on to be ready, cut the branch, and take in only critical bug fixes to the branch after that. How about cutting the branch around mid-February and targeting to release in a week or two after that. Thanks, Thejas On Fri, Jan 17, 2014 at 4:39 PM, Carl Steinbach c...@apache.org wrote: I was wondering what people think about setting a tentative date for the Hive 0.13 release? At an old Hive Contrib meeting we agreed that Hive should follow a time-based release model with new releases every four months. If we follow that schedule we're due for the next release in mid-February. Thoughts? Thanks. Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
Recent test failures in pre-tests
Including HIVE-6259, HIVE-6310, etc., recent test failures of minimr tests are seemed to be related to memory issue (not all of them but mostly). {noformat} java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) {noformat} Currently we uses 4 datanode + 4 task tracker mini cluster. If we cannot reserve more memory only for hive tests (it would be really appreciated if it could be), should we use smaller mini cluster for pre-tests?
Re: Recent test failures in pre-tests
You're right, it might be other problem. I'm biased a little. Granted that it's not caused by hive-exec(failures are from quite simple queries), It might be disk full or network problem. But it was like that, it might be detected by operation team in Cloudera. Regard my suggestion as a cheap shot before doing some serious work. Thanks. 2014-01-28 Xuefu Zhang xzh...@cloudera.com Thanks, Navis. Just curious, how are we sure that the task failure is caused by memory issue? In other work, did you see OOM in any task log? I think you are right, but just like to confirm. Thanks, Xuefu On Mon, Jan 27, 2014 at 6:44 PM, Navis류승우 navis@nexr.com wrote: Including HIVE-6259, HIVE-6310, etc., recent test failures of minimr tests are seemed to be related to memory issue (not all of them but mostly). {noformat} java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) {noformat} Currently we uses 4 datanode + 4 task tracker mini cluster. If we cannot reserve more memory only for hive tests (it would be really appreciated if it could be), should we use smaller mini cluster for pre-tests?
Re: how to use phabricator with maven
We don't use phabricator anymore except some patches on it which are made long ago. Use apache review board, instead. - Navis 2014/1/16 Satish Mittal satish.mit...@inmobi.com Hi All, The following phabricator link https://cwiki.apache.org/confluence/display/Hive/PhabricatorCodeReviewdescribes the review process with ant. However is there any way to raise a review request with mvn based setup? Thanks, Satish -- _ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
Re: Requesting Hive contributor access.
I've enlisted you in contributor list (and assigned issues mentioned above to you). Sorry for delay. 2013/12/20 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com Hi guys, Could you please grant access for me? I prepared a couple of issues, HIVE-6006 has been already done and path is available and I just started working on HIVE-6046. So, I'd like to assign these issues to myself Thank you, Konstantin Kudryavtsev
Re: map join in subqueries
What version are you using? After 0.11.0, mapjoin hint is ignored by default. use, set hive.ignore.mapjoin.hint=false; if you want to mapjoin hint applied. 2013/12/4 Sukhendu Chakraborty sukhendu.chakrabo...@gmail.com Hi, Is there anyway mapjoin works on the subquery(not the underlying table). I have the following query: select external_id,count(category_id) from catalog_products_in_categories_orc pc inner join (select * from catalog_products_orc where s_id=118) p on pc.product_id=p.id group by external_id; Now, even though catalog_products_orc is a big table, after filtering (s_id=118) it results in very few number of rows which can be easily optimized to a mapjoin (with catalog_products_in_categories_orc as the big table and the subquery result as the small table) . However, when I try to specify /*+MAPJOIN(p)*/ to enforce this, it results in a mapjoin for the table catalog_products_orc (and not on the subquery after filtering). Any ideas to achieve mapjoin on a subquery (and not the underlying table)? -Sukhendu
Re: Build seems broken
My bad. I should removed the class committing HIVE-4518. 2013/11/26 Xuefu Zhang xzh...@cloudera.com [INFO] BUILD FAILURE [INFO] [INFO] Total time: 5.604s [INFO] Finished at: Mon Nov 25 17:53:20 PST 2013 [INFO] Final Memory: 29M/283M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-it-util: Compilation failure: Compilation failure: [ERROR] /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[45,73] cannot find symbol [ERROR] symbol : variable HIVEJOBPROGRESS [ERROR] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars [ERROR] /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[57,38] cannot find symbol [ERROR] symbol : method getCounters() [ERROR] location: class org.apache.hadoop.hive.ql.exec.Operatorcapture#671 of ? extends org.apache.hadoop.hive.ql.plan.OperatorDesc [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :hive-it-util
Re: Build seems broken
I've forgot to mark delete/add files for the patch. It seemed working now. Sorry for the inconvenience to all. 2013/11/26 Jarek Jarcec Cecho jar...@apache.org I've pushed something that I didn't want couple of minutes ago and then force push to remove it. I'm not sure whether it's caused by that though. Jarcec On Mon, Nov 25, 2013 at 06:04:34PM -0800, Xuefu Zhang wrote: [INFO] BUILD FAILURE [INFO] [INFO] Total time: 5.604s [INFO] Finished at: Mon Nov 25 17:53:20 PST 2013 [INFO] Final Memory: 29M/283M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-it-util: Compilation failure: Compilation failure: [ERROR] /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[45,73] cannot find symbol [ERROR] symbol : variable HIVEJOBPROGRESS [ERROR] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars [ERROR] /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[57,38] cannot find symbol [ERROR] symbol : method getCounters() [ERROR] location: class org.apache.hadoop.hive.ql.exec.Operatorcapture#671 of ? extends org.apache.hadoop.hive.ql.plan.OperatorDesc [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :hive-it-util
Re: FYI Hive trunk has moved to maven
Another great work, Brock! I think this really deserves award the patch of the year. 2013/11/1 Thejas Nair the...@hortonworks.com Thanks Brock and Ed! On Thu, Oct 31, 2013 at 2:36 PM, Brock Noland br...@cloudera.com wrote: Thanks guys. Another FYI, precommit tests are going to come back with a few unrelated failures until https://issues.apache.org/jira/browse/HIVE-5716 is committed. Thanks! On Thu, Oct 31, 2013 at 4:27 PM, Gunther Hagleitner ghagleit...@hortonworks.com wrote: Awesome! Great stuff, Brock! Thanks, Gunther. On Thu, Oct 31, 2013 at 1:38 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: Awesome, great effort! Thanks, +Vinod On Oct 31, 2013, at 12:11 PM, Brock Noland wrote: More details here https://issues.apache.org/jira/browse/HIVE-5610 How to configure your development environment is here: https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [ANNOUNCE] New Hive PMC Members - Thejas Nair and Brock Noland
Congrats! 2013/10/25 Gunther Hagleitner ghagleit...@hortonworks.com Congrats Thejas and Brock! Thanks, Gunther. On Thu, Oct 24, 2013 at 3:25 PM, Prasad Mujumdar pras...@cloudera.com wrote: Congratulations Thejas and Brock ! thanks Prasad On Thu, Oct 24, 2013 at 3:10 PM, Carl Steinbach c...@apache.org wrote: I am pleased to announce that Thejas Nair and Brock Noland have been elected to the Hive Project Management Committee. Please join me in congratulating Thejas and Brock! Thanks. Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Bootstrap in Hive
I don't know anything about statistics but in your case, duplicating splits(x100?) by using custom InputFormat might much simpler. 2013/9/6 Sameer Agarwal samee...@cs.berkeley.edu Hi All, In order to support approximate queries in Hive and BlinkDB ( http://blinkdb.org/), I am working towards implementing the bootstrap primitive (http://en.wikipedia.org/wiki/Bootstrapping_(statistics)) in Hive that can help us quantify the error incurred by a query Q when it operates on a small sample S of data. This method essentially requires launching the query Q simultaneously on a large number of samples of original data (typically =100) . The downside to this is of course that we have to launch the same query 100 times but the upside is that each of this query would be so small that it can be executed on a single machine. So, in order to do this efficiently, we would ideally like to execute 100 instances of the query simultaneously on the master and all available worker nodes. Furthermore, in order to avoid generating the query plan 100 times on the master, we can do either of the two things: 1. Generate the query plan once on the master, serialize it and ship it to the worker nodes. 2. Enable the worker nodes to access the Metastore so that they can generate the query plan on their own in parallel. Given that making the query plan serializable (1) would require a lot of refactoring of the current code, is (2) a viable option? Moreover, since (2) will increase the load on the existing Metastore by 100x, is there any other option? Thanks, Sameer -- Sameer Agarwal Computer Science | AMP Lab | UC Berkeley http://cs.berkeley.edu/~sameerag
Re: [ANNOUNCE] New Hive Committer - Yin Huai
Congratulations, Yin. I remember your hard work on ysmart ( https://issues.apache.org/jira/browse/HIVE-2206) and others( https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(yhuai)%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC ) 2013/9/5 Clark Yang (杨卓荦) yangzhuo...@gmail.com Congratulations, Yin Cheers, Zhuoluo (Clark) Yang 2013/9/5 Yin Huai huaiyin@gmail.com Thanks everybody! This is awesome! On Wed, Sep 4, 2013 at 1:58 PM, Daniel Dai da...@hortonworks.com wrote: Congratulation! On Wed, Sep 4, 2013 at 10:39 AM, yongqiang he heyongqiang...@gmail.com wrote: Congrats! On Wed, Sep 4, 2013 at 10:23 AM, Jason Dere jd...@hortonworks.com wrote: Yin, congrats! Jason On Sep 4, 2013, at 7:54 AM, Eugene Koifman ekoif...@hortonworks.com wrote: Congrats! On Wed, Sep 4, 2013 at 5:23 AM, Brock Noland br...@cloudera.com wrote: Congrats Yin!! On Wed, Sep 4, 2013 at 4:14 AM, Lefty Leverenz leftylever...@gmail.com wrote: Bravo, Yin! -- Lefty On Wed, Sep 4, 2013 at 4:17 AM, Sushanth Sowmyan khorg...@gmail.com wrote: Congrats, Yin! :) On Sep 4, 2013 1:13 AM, Alexander Alten-Lorenz wget.n...@gmail.com wrote: Amazing news, congratz Yin! Well deserved! On Sep 4, 2013, at 6:49 AM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Yin Huai a committer on the Apache Hive project. Please join me in congratulating Yin! Thanks. Carl -- Alexander Alten-Lorenz http://mapredit.blogspot.com German Hadoop LinkedIn Group: http://goo.gl/N8pCF -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [ANNOUNCE] New Hive Committer - Thejas Nair
Congratulations! 2013/8/20 Clark Yang (杨卓荦) yangzhuo...@gmail.com: Congrats Thejas! 在 2013年8月20日星期二,Carl Steinbach 写道: The Apache Hive PMC has voted to make Thejas Nair a committer on the Apache Hive project. Please join me in congratulating Thejas!
Re: Proposing a 0.11.1
If this is only for addressing npath problem, we got three months for that. Would it be enough time for releasing 0.12.0? ps. IMHO, n-path seemed too generic name to be patented. I hate Teradata. 2013/8/14 Edward Capriolo edlinuxg...@gmail.com: Should we get the npath rename in? Do we have a jira for this? If not I will take it. On Tue, Aug 13, 2013 at 1:58 PM, Mark Wagner wagner.mar...@gmail.comwrote: It'd be good to get both HIVE-3953 and HIVE-4789 in there. 3953 has been committed to trunk and it looks like 4789 is close. Thanks, Mark On Tue, Aug 13, 2013 at 10:02 AM, Owen O'Malley omal...@apache.org wrote: All, I'd like to create an 0.11.1 with some fixes in it. I plan to put together a release candidate over the next week. I'm in the process of putting together the list of bugs that I want to include, but I wanted to solicit the jiras that others though would be important for an 0.11.1. Thanks, Owen
Re: Discuss: End of static, thread local
https://issues.apache.org/jira/browse/HIVE-4226 seemed addressing this. 2013/8/11 Brock Noland br...@cloudera.com: I would love to get rid of the static thread local stuff. It was required to make hive work in a server model but isn't the correct solution to this problem. I do think it will be a large amount of work so it'd be great to see whoever leads this effort to have a high level plan as opposed to an adhoc effort. On Sat, Aug 10, 2013 at 12:32 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I just committed https://issues.apache.org/jira/browse/HIVE-3772. For hive-server2 Carl and others did a lot of work to clean up un thread safe things from hive. Hive was originally build as a fat client so it is not surprising that many such constructs exist. Now since we have retrofitted multi-threaded-ness onto the project we have a number of edge case bugs. My suggestions here would be for that the next release 0.13 we make a push to remove all possible non thread safe code and explicitly pass context objects or serialized structures everywhere thread safety is needed. I can see this would start with something like the Function Registry, this would be a per session object passed around rather then a global object with static hashmap instances in it. I know that this probably will not be as simple as removing all static members from our codebase, but does anyone know of specific challenges that will be intrinsically hard to solve? Please comment. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
Re: [ANNOUNCE] New Hive Committer - Gunther Hagleitner
I'm a little late. Congratulations Gunther and Brock! 2013/7/21 Prasanth J j.prasant...@gmail.com: Congrats Gunther! Thanks -- Prasanth On Jul 21, 2013, at 1:00 AM, Carl Steinbach c...@apache.org wrote: The Apache Hive PMC has voted to make Gunther Hagleitner a committer on the Apache Hive project. Congratulations Gunther! Carl
Re: GROUP BY Issue
Your table has five F1 = 9887 rows and joining will make 25 rows with the same F1 value. I cannot imagine what you're intended to do. 2013/6/12 Gourav Sengupta gourav.had...@gmail.com: Hi, I had initially forwarded this request to the user group but am yet to receive any response. I will be grateful if someone can help me out in resolving the issue or pointing out any mistakes that I may be doing. It took me around 5 to 6 hours to generate the test data of around 20 GB (or more) and there must be a better alternative. Regards, Gourav -- Forwarded message -- From: Gourav Sengupta gourav.had...@gmail.com Date: Mon, Jun 10, 2013 at 4:10 PM Subject: GROUP BY Issue To: u...@hive.apache.org Hi, On running the following query I am getting multiple records with same value of F1 SELECT F1, COUNT(*) FROM ( SELECT F1, F2, COUNT(*) FROM TABLE1 GROUP BY F1, F2 ) a GROUP BY F1; As per what I understand there are multiple number of records based on number of reducers. Replicating the test scenario: STEP1: get the dataset as available in http://snap.stanford.edu/data/amazon0302.html STEP2: Open the file and delete the heading STEP3: hadoop fs -mkdir /test STEP4: hadoop fs -put amazon0302.txt /test STEP5: create external table test (f1 int, f2 int) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile location '/test'; STEP6: create table test1 location '/test1' as select left_table.* from (select * from test where f11) left_table join (select * from test where f1 1) right_table; STEP7: hadoop fs -mkdir /test2 STEP8: create table test2 location '/test2' as select f1, count(*) from (select f1, f2, count(*) from test1 group by f1, f2) a group by f1; STEP9: select * from test2 where f1 = 9887; ENVIRONMENT: HADOOP 2.0.4 HIVE 0.11 Please do let me know whether I am doing anything wrong. Thanks and Regards, Gourav Sengupta
Re: [VOTE] Apache Hive 0.11.0 Release Candidate 2
+1 - built from source, passed all tests (without assertion fail or conversion to backup task) - working good with queries from running-site - some complaints on missing HIVE-4172 (void type for JDBC2) Thanks 2013/5/12 Owen O'Malley omal...@apache.org: Based on feedback from everyone, I have respun release candidate, RC2. Please take a look. We've fixed 7 problems with the previous RC: * Release notes were incorrect * HIVE-4018 - MapJoin failing with Distributed Cache error * HIVE-4421 - Improve memory usage by ORC dictionaries * HIVE-4500 - Ensure that HiveServer 2 closes log files. * HIVE-4494 - ORC map columns get class cast exception in some contexts * HIVE-4498 - Fix TestBeeLineWithArgs failure * HIVE-4505 - Hive can't load transforms with remote scripts * HIVE-4527 - Fix the eclipse template Source tag for RC2 is at: https://svn.apache.org/repos/asf/hive/tags/release-0.11.0rc2 Source tar ball and convenience binary artifacts can be found at: http://people.apache.org/~omalley/hive-0.11.0rc2/ This release has many goodies including HiveServer2, integrated hcatalog, windowing and analytical functions, decimal data type, better query planning, performance enhancements and various bug fixes. In total, we resolved more than 350 issues. Full list of fixed issues can be found at: http://s.apache.org/8Fr Voting will conclude in 72 hours. Hive PMC Members: Please test and vote. Thanks, Owen
Re: [ANNOUNCE] New Hive Committers: Harish Butani and Owen O'Malley
We are happy with newly added window functions and ORC format. Congratulations! 2013/4/16 Carl Steinbach c...@apache.org: The Hive PMC has voted to make Harish Butani and Owen O'Malley committers on the Apache Hive Project. A list of Harish's contributions to the project can be seen here: http://s.apache.org/n3h Owen's contributions can be seen here: http://s.apache.org/wQm Please join me in congratulating Harish and Owen! Thanks. Carl
Re: lots of tests failing
Yes, it's all my fault. I've forgot that the Driver affects test results. Reverting HIVE-1953 is created(HIVE-4319) and ready to be committed. Again, sorry for all the troubles to community (especially for Vikram). I should think twice before doing things. 2013/4/10 Namit Jain nj...@fb.com: It seems that the comments have been removed from the output files, and so a lot of tests are failing. I have not debugged, but https://issues.apache.org/jira/browse/HIVE-1953 looks like the culprit. Navis, is that so ? Are you updating the log files ?
[discussion] Convention for internal magic words
Hi, All Doing HIVE-3431(https://issues.apache.org/jira/browse/HIVE-3431), I needed to define a magic word which means this is not a path string and translate it as a temporary directory. I think HIVE-3779 (https://issues.apache.org/jira/browse/HIVE-3779) also might need something like that. What I'm asking is, is there a convention for those magic words? If not, do we need to define it? ($hive_KEYWORD, for example)
Re: stats19.q is failing on the current trunk?
It's booked on https://issues.apache.org/jira/browse/HIVE-3783 and I also seen this. 2012/12/11 Zhenxiao Luo zhenx...@fb.com: Always get this diff: [junit] diff -a /home/zhenxiao/Code/hive/build/ql/test/logs/clientpositive/stats19.q.out /home/zhenxiao/Code/hive/ql/src/test/results/clientpositive/stats19.q.out [junit] 21,22c21,22 [junit] Stats prefix is hashed: false [junit] Stats prefix is hashed: false [junit] --- [junit] Stats prefix is hashed: true [junit] Stats prefix is hashed: true [junit] 284,285c284,285 [junit] Stats prefix is hashed: false [junit] Stats prefix is hashed: false [junit] --- [junit] Stats prefix is hashed: true [junit] Stats prefix is hashed: true Will file a Jira if other people found it, too. Thanks, Zhenxiao
Re: writing your own custom map-reduce to scan a table
How about using HCatalogue? I heard it's made for that kind of works. 2012/8/18 Mahsa Mofidpoor mofidp...@gmail.com Hi all, Does anybody know how you can write your own custom mapReduce to scan a table? Thank you in advance for your response. Mahsa
Re: non map-reduce for simple queries
It supports table sampling also. select * from src TABLESAMPLE (BUCKET 1 OUT OF 40 ON key); select * from src TABLESAMPLE (0.25 PERCENT); But there is no sampling option specifying number of bytes. This can be done in another issue. 2012/7/31 Owen O'Malley omal...@apache.org On Sat, Jul 28, 2012 at 6:17 PM, Navis류승우 navis@nexr.com wrote: I was thinking of timeout for fetching, 2000msec for example. How about that? Instead of time, which requires launching the query and letting it timeout, how about determining the number of bytes that would need to be fetched to the local box? Limiting it to 100 or 200 mb seems reasonable. -- Owen
Re: non map-reduce for simple queries
I was thinking of timeout for fetching, 2000msec for example. How about that? 2012년 7월 29일 일요일에 Edward Caprioloedlinuxg...@gmail.com님이 작성: If where condition is too complex , selecting specific columns seems simple enough and useful. On Saturday, July 28, 2012, Namit Jain nj...@fb.com wrote: Currently, hive does not launch map-reduce jobs for the following queries: select * from T where condition on partition columns (limit n)? This behavior is not configurable, and cannot be altered. HIVE-2925 wants to extend this behavior. The goal is not to spawn map-reduce jobs for the following queries: Select expr from T where any condition (limit n)? It is currently controlled by one parameter: hive.aggressive.fetch.task.conversion, based on which it is decided, whether to spawn map-reduce jobs or not for the queries of the above type. Note that this can be beneficial for certain types of queries, since it is avoiding the expensive step of spawning map-reduce. However, it can be pretty expensive for certain types of queries: selecting a very large number of rows, the query having a very selective filter (which is satisfied by a very number of rows, and therefore involves scanning a very large table) etc. The user does not have any control on this. Note that it cannot be done by hooks, since the pre-semantic hooks does not have enough information: type of the query, inputs etc. and it is too late to do anything in the post-semantic hook (the query plan has already been altered). I would like to propose the following configuration parameters to control this behavior. hive.fetch.task.conversion: true, false, auto If the value is true, then all queries with only selects and filters will be converted If the value is false, then no query will be converted If the value is auto (which should be the default behavior), there should be additional parameters to control the semantics. hive.fetch.task.auto.limit.threshold --- integer value X1 hive.fetch.task.auto.inputsize.threshold --- integer value X2 If either the query has a limit lower than X1, or the input size is smaller than X2, the queries containing only filters and selects will be converted to not use map-reudce jobs. Comments… -namit
Re: Problems with Arc/Phabricator
Me too. arc diff --trace .. PHP Fatal error: Uncaught exception 'HTTPFutureResponseStatusHTTP' with message '[HTTP/500]' in /home/navis/bin/libphutil/src/future/http/base/BaseHTTPFuture.php:299 Stack trace: #0 /home/navis/bin/libphutil/src/future/http/https/HTTPSFuture.php(95): BaseHTTPFuture-parseRawHTTPResponse('HTTP/1.1 100 Co...') #1 /home/navis/bin/libphutil/src/future/proxy/FutureProxy.php(34): HTTPSFuture-isReady() #2 /home/navis/bin/libphutil/src/conduit/client/ConduitClient.php(131): FutureProxy-isReady() #3 /home/navis/bin/libphutil/src/conduit/client/ConduitClient.php(52): ConduitClient-callMethod('differential.cr...', Array) #4 /home/navis/bin/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(324): ConduitClient-callMethodSynchronous('differential.cr...', Array) #5 /home/navis/bin/arcanist/scripts/arcanist.php(266): ArcanistDiffWorkflow-run() #6 {main} thrown in /home/navis/bin/libphutil/src/future/http/base/BaseHTTPFuture.php on line 299 2012/5/9 Carl Steinbach c...@cloudera.com: Has the phabricator site stopped working for anyone else? As of today I'm not longer able to view review requests. For example: https://reviews.facebook.net/D3075 Produces the following output: br / bFatal error/b: Undefined class constant 'COMMITTED' in b/var/www/reviews.facebook.net/phabricator/src/applications/differential/storage/revision/DifferentialRevision.php/b on line b209/bbr / UNRECOVERABLE FATAL ERROR Undefined class constant #039;COMMITTED#039; /var/www/reviews.facebook.net/phabricator/src/applications/differential/storage/revision/DifferentialRevision.php:209 ┻━┻ ︵ ¯\_(ツ)_/¯ ︵ ┻━┻ Any help would be much appreciated. Thanks. Carl On Tue, May 8, 2012 at 1:47 PM, Ashutosh Chauhan hashut...@apache.orgwrote: Made some progress on using arc/phab on ubuntu. epriestley helped a ton over at #phabricator irc channel. Thanks, Evan! Now, able to make arc work on ubuntu, but seems like jira integration is broken. Hit the following problem: $arc diff —jira HIVE-3008 PHP Fatal error: Class 'ArcanistDifferentialRevisionRef' not found in /home/ashutosh/workspace/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 201 Fatal error: Class 'ArcanistDifferentialRevisionRef' not found in /home/ashutosh/workspace/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 201 Even with this error diff did get generated but it was not posted back on jira. Evan is working on a patch to fix this. He is also discussing with Facebook folks on how to tackle these issues in long term. Discussion is going on at https://secure.phabricator.com/T1206 I will request people who are actively working on Hive to follow the discussion on this ticket. Thanks, Ashutosh On Thu, Apr 19, 2012 at 5:24 PM, Ashutosh Chauhan hashut...@apache.org wrote: Problem while using arc on ubuntu $ arc patch D2871 ARC: Cannot mix P and A UNIX: No such file or directory Any ideas whats up there. Thanks, Ashutosh On Thu, Apr 19, 2012 at 17:19, Edward Capriolo edlinuxg...@gmail.com wrote: Just throwing this out there. The phabricator IRC has more people and is usually more active then Hive IRC. #JustSaying... On Thu, Apr 19, 2012 at 7:35 PM, Ashutosh Chauhan hashut...@apache.org wrote: Hit a new problem with arc today: Fatal error: Uncaught exception 'Exception' with message 'Host returned HTTP/200, but invalid JSON data in response to a Conduit method call: br / bWarning/b: Unknown: POST Content-Length of 9079953 bytes exceeds the limit of 8388608 bytes in bUnknown/b on line b0/bbr / for(;;);{result:null,error_code:ERR-INVALID-SESSION,error_info:Session key is not present.}' in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php:48 Stack trace: #0 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(62): ConduitFuture-didReceiveResult(Array) #1 /Users/ashutosh/work/hive/libphutil/src/future/proxy/FutureProxy.php(39): FutureProxy-getResult() #2 /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitClient.php(52): FutureProxy-resolve() #3 /Users/ashutosh/work/hive/arcanist/src/workflow/diff/ArcanistDiffWorkflow.php(341): ConduitClient-callMethodSynchronous('differential.cr...', Array) #4 /Users/ashutosh/work/hive/arcanist/scripts/arcanist.php(266): ArcanistDiffWo in /Users/ashutosh/work/hive/libphutil/src/conduit/client/ConduitFuture.php on line 48 Any ideas how to solve this? Thanks, Ashutosh On Wed, Apr 11, 2012 at 18:37, Edward Capriolo edlinuxg...@gmail.com wrote: I think the most practical solution is try and use arc/phab and then if there is a problem fall back to Jira and do it the old way. Edward On Wed, Apr 11, 2012 at 7:17 PM, Carl Steinbach c...@cloudera.com wrote: +1 to switching over to Git. As for the rest of the