Size of a Hive Map column in characters!
The size(map) function is defined as follows: size(MapK.V)Returns the number of elements in the map type What if I want the total size of the map for that row? This doesn't work: select length(MAP); How can I get the total size of a map column in either bytes or characters? --- Mark E. Sunderlin Data Architect | AOL NETWORKS BDM P: 703-265-6935 | C: 540-327-6222 | AIM: MESunderlin 22000 AOL Way, Dulles, VA 20166
Why from_utc_timestamp works for some bigint, but not others
Hi, I am using Hive 0.9.0, and not sure why the from_utc_timestamp gave me error to the following value, but works for others. The following example shows 2 bigint as 2 epoch value of milliseconds level. They are only 11 seconds difference. One works fine in hive 0.9.0 with from_utc_timestamp UDF, the others just broken. Why? yzhang@yzhang-linux:~ hivehive use tracking2;OKTime taken: 1.569 secondshive select from_utc_timestamp(1371746476159, 'EST') from track_sent;OK2013-06-20 07:41:16.3182013-06-20 07:41:16.318Time taken: 8.64 secondshive select from_utc_timestamp(1371746487759, 'EST') from track_sent;FAILED: Hive Internal Error: java.lang.IllegalArgumentException(Value out of range)java.lang.IllegalArgumentException: Value out of range at java.sql.Timestamp.setNanos(Timestamp.java:287) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFFromUtcTimestamp.applyOffset(GenericUDFFromUtcTimestamp.java:101) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFFromUtcTimestamp.evaluate(GenericUDFFromUtcTimestamp.java:94) at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:127) at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:214) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:767) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:888) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:165) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7755) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2310) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2112) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6165) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Re: query though hive jdbc cause chinese character become unrecognizable characters
Looks like the issue is tracked from HIVE-3245. I think we need to support adding encoding parameter as part of jdbc url similar to mysql jdbc's useUnicode/characterEncoding flags. I can take a look at it if nobody else has. For now, I think you can manually encode the result value from jdbc. Thanks, Szehon On Thu, Dec 5, 2013 at 7:43 PM, ch huang justlo...@gmail.com wrote: hi,maillist: we use hive to store UTF8 chinese character ,but query through hive jdbc ,it become some unreadable characters,it's normal to use hive shell.why? it's a bug in hive jdbc?how can i solve this?
MIN/MAX issue with timestamps and RCFILE/ORC tables
Hi, Because of the known, and believed fixed, issue with MIN/MAX (HIVE-4931), we're using a recent (2013-12-02), locally built version of Hive 0.13.0-SNAPSHOT. Unfortunately, we're still seeing issues using MIN/MAX on timestamp types when using RCFILE and ORC formatted tables. I could not find a reference to this problem in the Hive JIRA, but I'm posting here first before opening a new report in JIRA. Here's an example of what we're seeing. The dnstext table is a very simplified variation of part of our application and has the following definition. 3: jdbc:hive2://host:1 describe dnstext ; +---+---+---+ | col_name| data_type |comment| +---+---+---+ | circuitid | string| None | | rrname| string| None | | srcaddr | string| None | | dstaddr | string| None | | occurs| bigint| None | | firstseen | timestamp | None | | lastseen | timestamp | None | | histo | bigint| None | +---+---+---+ It is stored in TEXTFILE format. There are also dnsrc and dnsorc tables with the same column definition and data, but are stored in RCFILE and ORC formats, respectively. The following queries, one that aggregates the entire table and another taht aggregates with a GROUP BY clause, work as expected on the dnstext table. The second query is wrapped in an outer query to reduce the final number of rows for this example. 3: jdbc:hive2://host:1 select count(1), sum(occurs), min(firstseen), max(lastseen) from dnstext ; +---++--+--+ |_c0|_c1 | _c2| _c3| +---++--+--+ | 67110469 | 171344130 | 2013-09-05 23:59:28.977 | 2013-09-07 00:13:12.765 | +---++--+--+ 3: jdbc:hive2://host:1 select rrname, count, occurs, firstseen, lastseen from ( select rrname, count(1) count, sum(occurs) occurs, min(firstseen) firstseen, max(lastseen) lastseen from dnstext group by rrname ) t where rrname like 'spread%.google.com' order by rrname ; +++-+--+--+ | rrname | count | occurs |firstseen | lastseen | +++-+--+--+ | spreadsheets.google.com| 217| 228 | 2013-09-06 01:00:50.653 | 2013-09-07 00:03:49.82 | | spreadsheets.l.google.com | 263| 272 | 2013-09-06 00:08:20.442 | 2013-09-07 00:08:57.891 | | spreadsheets0.google.com | 8 | 8 | 2013-09-06 00:47:01.475 | 2013-09-06 21:19:50.516 | | spreadsheets2.google.com | 7 | 9 | 2013-09-06 02:23:41.724 | 2013-09-06 23:12:26.698 | +++-+--+--+ When the same queries are run on the dnsrc table, the first query returns the correct results, but the second returns incorrect results for either the min(firstseen) or max(lastseen) columns on every row. 3: jdbc:hive2://host:1 select count(1), sum(occurs), min(firstseen), max(lastseen) from dnsrc ; +---++--+--+ |_c0|_c1 | _c2| _c3| +---++--+--+ | 67110469 | 171344130 | 2013-09-05 23:59:28.977 | 2013-09-07 00:13:12.765 | +---++--+--+ 3: jdbc:hive2://host:1 select rrname, count, occurs, firstseen, lastseen from ( select rrname, count(1) count, sum(occurs) occurs, min(firstseen) firstseen, max(lastseen) lastseen from dnsrc group by rrname ) t where rrname like 'spread%.google.com' order by rrname ; +++-+--+--+ | rrname | count | occurs |firstseen | lastseen | +++-+--+--+ | spreadsheets.google.com| 217| 228 | 2013-09-06 01:00:50.653 | 2013-09-07 00:08:57.891 | | spreadsheets.l.google.com | 263| 272
Question about invocation of the terminate method in UDAF
Hi, Just want to confirm my understanding of how terminate method works in UDAF. [1] The terminate method of a UDAF is only invoked once (never twice) for the group of data that need to be aggregated. [2] When terminate method is invoked, all the data of the group that need to be aggregated have been merged together for the method to process. Am I right? Thanks! Yongcheng