Size of a Hive Map column in characters!

2013-12-06 Thread Sunderlin, Mark
The size(map) function is defined as follows: size(MapK.V)Returns the 
number of elements in the map type

What if I want the total size of the map for that row?  This doesn't work: 
select length(MAP);

How can I get the total size of a map column in either bytes or characters?
---
Mark E. Sunderlin
Data Architect | AOL NETWORKS BDM
P: 703-265-6935 | C: 540-327-6222 | AIM: MESunderlin
22000 AOL Way,  Dulles, VA  20166




Why from_utc_timestamp works for some bigint, but not others

2013-12-06 Thread java8964
Hi, I am using Hive 0.9.0, and not sure why the from_utc_timestamp gave me 
error to the following value, but works for others.
The following example shows 2 bigint as 2 epoch value of milliseconds level. 
They are only 11 seconds difference. One works fine in hive 0.9.0 with 
from_utc_timestamp UDF, the others just broken. Why?

yzhang@yzhang-linux:~ hivehive use tracking2;OKTime taken: 1.569 secondshive 
select from_utc_timestamp(1371746476159, 'EST') from track_sent;OK2013-06-20 
07:41:16.3182013-06-20 07:41:16.318Time taken: 8.64 secondshive select 
from_utc_timestamp(1371746487759, 'EST') from track_sent;FAILED: Hive Internal 
Error: java.lang.IllegalArgumentException(Value out of 
range)java.lang.IllegalArgumentException: Value out of range at 
java.sql.Timestamp.setNanos(Timestamp.java:287)  at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFFromUtcTimestamp.applyOffset(GenericUDFFromUtcTimestamp.java:101)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFFromUtcTimestamp.evaluate(GenericUDFFromUtcTimestamp.java:94)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:127)
 at 
org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:214)
 at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:767)
  at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:888)
 at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
  at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
   at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
   at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:165)
  at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7755)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2310)
   at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:2112)
   at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6165)
  at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762)
 at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
 at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)  at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at 
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at 
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611) at 
org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
   

Re: query though hive jdbc cause chinese character become unrecognizable characters

2013-12-06 Thread Szehon Ho
Looks like the issue is tracked from HIVE-3245.  I think we need to support
adding encoding parameter as part of jdbc url similar to mysql jdbc's
useUnicode/characterEncoding flags.

I can take a look at it if nobody else has.  For now, I think you can
manually encode the result value from jdbc.

Thanks,
Szehon

On Thu, Dec 5, 2013 at 7:43 PM, ch huang justlo...@gmail.com wrote:

 hi,maillist:
 we use hive to store UTF8 chinese character ,but query through
 hive jdbc ,it become some unreadable characters,it's normal to use hive
 shell.why? it's a bug in hive jdbc?how can i solve this?



MIN/MAX issue with timestamps and RCFILE/ORC tables

2013-12-06 Thread David Engel
Hi,

Because of the known, and believed fixed, issue with MIN/MAX
(HIVE-4931), we're using a recent (2013-12-02), locally built version
of Hive 0.13.0-SNAPSHOT.  Unfortunately, we're still seeing issues
using MIN/MAX on timestamp types when using RCFILE and ORC formatted
tables.  I could not find a reference to this problem in the Hive
JIRA, but I'm posting here first before opening a new report in JIRA.

Here's an example of what we're seeing.

The dnstext table is a very simplified variation of part of our
application and has the following definition.

3: jdbc:hive2://host:1 describe dnstext ;
+---+---+---+
|   col_name|   data_type   |comment|
+---+---+---+
| circuitid | string| None  |
| rrname| string| None  |
| srcaddr   | string| None  |
| dstaddr   | string| None  |
| occurs| bigint| None  |
| firstseen | timestamp | None  |
| lastseen  | timestamp | None  |
| histo | bigint| None  |
+---+---+---+

It is stored in TEXTFILE format.  There are also dnsrc and dnsorc
tables with the same column definition and data, but are stored in
RCFILE and ORC formats, respectively.

The following queries, one that aggregates the entire table and
another taht aggregates with a GROUP BY clause, work as expected on
the dnstext table.  The second query is wrapped in an outer query to
reduce the final number of rows for this example.

3: jdbc:hive2://host:1 select count(1), sum(occurs), min(firstseen), 
max(lastseen) from dnstext ;
+---++--+--+
|_c0|_c1 |   _c2|   _c3|
+---++--+--+
| 67110469  | 171344130  | 2013-09-05 23:59:28.977  | 2013-09-07 00:13:12.765  |
+---++--+--+

3: jdbc:hive2://host:1 select rrname, count, occurs, firstseen, lastseen 
from ( select rrname, count(1) count, sum(occurs) occurs, min(firstseen) 
firstseen, max(lastseen) lastseen from dnstext group by rrname ) t where rrname 
like 'spread%.google.com' order by rrname ;
+++-+--+--+
|   rrname   | count  | occurs  |firstseen |
 lastseen |
+++-+--+--+
| spreadsheets.google.com| 217| 228 | 2013-09-06 01:00:50.653  | 
2013-09-07 00:03:49.82   |
| spreadsheets.l.google.com  | 263| 272 | 2013-09-06 00:08:20.442  | 
2013-09-07 00:08:57.891  |
| spreadsheets0.google.com   | 8  | 8   | 2013-09-06 00:47:01.475  | 
2013-09-06 21:19:50.516  |
| spreadsheets2.google.com   | 7  | 9   | 2013-09-06 02:23:41.724  | 
2013-09-06 23:12:26.698  |
+++-+--+--+

When the same queries are run on the dnsrc table, the first query
returns the correct results, but the second returns incorrect results
for either the min(firstseen) or max(lastseen) columns on every row.

3: jdbc:hive2://host:1 select count(1), sum(occurs), min(firstseen), 
max(lastseen) from dnsrc ;  
+---++--+--+
|_c0|_c1 |   _c2|   _c3|
+---++--+--+
| 67110469  | 171344130  | 2013-09-05 23:59:28.977  | 2013-09-07 00:13:12.765  |
+---++--+--+

3: jdbc:hive2://host:1 select rrname, count, occurs, firstseen, lastseen 
from ( select rrname, count(1) count, sum(occurs) occurs, min(firstseen) 
firstseen, max(lastseen) lastseen from dnsrc group by rrname ) t where rrname 
like 'spread%.google.com' order by rrname ;  
+++-+--+--+
|   rrname   | count  | occurs  |firstseen |
 lastseen |
+++-+--+--+
| spreadsheets.google.com| 217| 228 | 2013-09-06 01:00:50.653  | 
2013-09-07 00:08:57.891  |
| spreadsheets.l.google.com  | 263| 272  

Question about invocation of the terminate method in UDAF

2013-12-06 Thread Yongcheng Li
Hi,

Just want to confirm my understanding of how terminate method works in UDAF. 
[1] The terminate method of a UDAF is only invoked once (never twice) for the 
group of data that need to be aggregated. [2] When terminate method is invoked, 
all the data of the group that need to be aggregated have been merged together 
for the method to process. Am I right?

Thanks!

Yongcheng