RE: Hive help on the update

2013-06-28 Thread Christopher Wirt
To emulate the RANK function you will need a custom UDF

Edward Capriolo has a nice one here
https://github.com/edwardcapriolo/hive-rank

Philip Tromans has another here https://github.com/philiptromans/hive-udf

 

If you're on 0.11 already there is a native version.

https://cwiki.apache.org/Hive/languagemanual-windowingandanalytics.html

 

 

From: Stephen Sprague [mailto:sprag...@gmail.com] 
Sent: 28 June 2013 17:30
To: user@hive.apache.org
Subject: Re: Hive help on the update

 

 what version of hive are you running? and show us your attempts - don't be
shy!  

 

 

On Fri, Jun 28, 2013 at 5:52 AM, Kumar Chinnakali
kumar_chinnak...@infosys.com wrote:

Hi Hive Team,

 

I have Teradata Query as below and my intention is to convert in to Hive
Query(HQL). Tried but it's vain. 

 

SELECT Col_1, Col_2, Col_3,  MIN(Col_3) OVER(Partition by Col_1 ORDER BY
Col_3, Col_2 rows between 1 preceding and 1 preceding) prev_Col_3 FROM
work_table ;

 

Example:

 

Actual Work table: 


Col_1

Col_3

Prev_Col_3


P1

9:00

Null


P1

10:00

Null


P2

11:00

Null


P2

12:00

Null

 

Expected result:


Col_1

Col_3

Prev_Col_3


P1

9:00

Null


P1

10:00

9:00


P2

11:00

Null


P2

12:00

11:00

 

Any help will be much help full.

 

Regards, Kumar Chinnakali J


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely

for the use of the addressee(s). If you are not the intended recipient,
please 
notify the sender by e-mail and delete the original message. Further, you
are not 
to copy, disclose, or distribute this e-mail or its contents to any other
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has
taken 
every reasonable precaution to minimize this risk, but is not liable for any
damage 
you may sustain as a result of any virus in this e-mail. You should carry
out your 
own virus checks before opening the e-mail or attachment. Infosys reserves
the 
right to monitor and review the content of all messages sent to or from this
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***

 



RE: Nullpointer Exception when using UDF

2013-02-15 Thread Christopher Wirt
Hi Neelesh,

 

I've just upgraded our cluster from hadoop-0.20/hive-0.10 to
hadoop-1.03(mapr2.1)/hive0.10 and started running into this issue
immediately. Strangely is doesn't appear to happen all of the time or be
local to any nodes or set of nodes. 

 

The only work around I've found so far it to copy the UDF jar into the
hadoop/lib directory on each task tracker node. Obviously this is not ideal
and I'm still looking for an actual fix.

 

Chris

 

From: neelesh gadhia [mailto:ngad...@yahoo.com] 
Sent: 14 February 2013 17:04
To: user@hive.apache.org; nitinpawar...@gmail.com
Subject: Re: Nullpointer Exception when using UDF

 

Nitin,

 

Below is the code I got/used from
https://issues.apache.org/jira/browse/HIVE-2361
https://issues.apache.org/jira/browse/HIVE-2361/


 

I have attached the file that has all the UDF's ( of which I am using
GenericUDFMax and GenericUDFSum) that fails with same errors at same point.

 

And also realize I tried using other udf (GenericUDFMax) and it fails
exactly at the same point.

 

Code for GenericUDFSum as shown below.

 

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * License); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.nexr.platform.analysis.udf;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.UDFType;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde.Constants;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters;
import
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Conv
erter;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;
import
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.ObjectIns
pectorCopyOption;
import
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspe
ctorFactory;
import org.apache.hadoop.io.LongWritable;
import org.apache.hive.pdk.HivePdkUnitTest;
import org.apache.hive.pdk.HivePdkUnitTests;


@Description(name = sum,
value = _FUNC_(hash_key, order_by_col1, order_by_col2 ...)  +
- Returns the summed value of group,
extended = Example:\n
+SELECT _FUNC_(HASH(p1, p2), order_by_col1, order_by_col2, ... )
FROM (\n
+SELECT ~ FROM table DISTRIBUTE BY HASH(p1,p2) SORT BY
p1,p2,order_by_col1, order_by_col2 DESC, ... \n
+);)

@HivePdkUnitTests(
setup = , cleanup = ,
cases = {
  @HivePdkUnitTest(
query = SELECT t.empno, t.deptno, t.sal,
nexr_sum(hash(t.deptno),t.sal) as sal_sum
+  FROM (
+  select a.empno, a.deptno, a.sal from emp a
+  distribute by hash(a.deptno)
+  sort BY a.deptno, a.empno
+  ) t;,
result =7782\t10\t2450\t2450\n
+7839\t10\t5000\t7450\n
+7934\t10\t1300\t8750\n
+7369\t20\t800\t800\n
+7566\t20\t2975\t3775\n
+7788\t20\t3000\t6775\n
+7876\t20\t1100\t7875\n
+7902\t20\t3000\t10875\n
+7499\t30\t1600\t1600\n
+7521\t30\t1250\t2850\n
+7654\t30\t1250\t4100\n
+7698\t30\t2850\t6950\n
+7844\t30\t1500\t8450\n
+7900\t30\t950\t9400)
}
  )

@UDFType(deterministic = false, stateful = true)
public class GenericUDFSum extends GenericUDF {
private final LongWritable longResult = new LongWritable();
private final DoubleWritable doubleResult = new DoubleWritable();
private ObjectInspector hashOI, valueOI, prevHashStandardOI, resultOI;
private Object prevHash;
@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws
UDFArgumentException {
if (arguments.length != 2) {
throw new UDFArgumentException(Exactly two argument is
expected.);
}

for(int