Question on converting an array of structs into a map

2014-04-08 Thread Sunderlin, Mark
I have data in two tables that looks like this:

Table_1
Sequence_numint,
User_id string

Table_2
Sequence_numint
User_attributes array>

Sequence number joins the two and it is a one to one and only one relationship.

Where I want to end up:  A single result set that looks like:

Sequence_numint
User_id string
User-attributes map /* A map of  keyid=valueid for each struct from the 
original array */

Can this be done in Hive? Is inline() the key to doing this?

--
Mark E. Sunderlin
Data Architect // AOL Platforms
P: 703-265-6935 // C: 540-327-6222 // 22000 AOL Way,  Dulles, VA  20166
AIM: MESunderlin





Re: A GenericUDF Function to Extract a Field From an Array of Structs

2013-04-02 Thread Navis류승우
try to change codes in evaluate method like,

for (int i = 0; i < numElements; i++) {
  Object element = listOI.getListElement(arguments[0].get(), i);
  Object product = structOI.getStructFieldData(element,
structOI.getStructFieldRef("productCategory"));
  
ret.add(((PrimitiveObjectInspector)prodCatOI).getPrimitiveWritableObject(product));
}

2013/3/29 Peter Chu :
> Sorry, the test should be following (changed extract_shas to
> extract_product_category):
>
> import org.apache.hadoop.hive.ql.metadata.HiveException;
> import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
> import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;
> import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
> import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
> import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
> import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
> import
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
> import org.testng.annotations.Test;
>
> import java.util.ArrayList;
> import java.util.List;
>
> public class TestGenericUDFExtractProductCategory
> {
> ArrayList fieldNames = new ArrayList();
> ArrayList fieldObjectInspectors = new
> ArrayList();
>
> @Test
> public void simpleTest()
> throws Exception
> {
> ListObjectInspector firstInspector = new MyListObjectInspector();
>
> ArrayList test = new ArrayList();
> test.add("test");
>
> ArrayList test2 = new ArrayList();
> test2.add(test);
>
> StructObjectInspector soi =
> ObjectInspectorFactory.getStandardStructObjectInspector(test, test2);
>
> fieldNames.add("productCategory");
>
> fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
>
> GenericUDF.DeferredObject firstDeferredObject = new
> MyDeferredObject(test2);
>
> GenericUDF extract_product_category = new
> GenericUDFExtractProductCategory();
>
> extract_product_category.initialize(new
> ObjectInspector[]{firstInspector});
>
> extract_product_category.evaluate(new
> DeferredObject[]{firstDeferredObject});
> }
>
> public class MyDeferredObject implements DeferredObject
> {
> private Object value;
>
> public MyDeferredObject(Object value) {
> this.value = value;
> }
>
> @Override
> public Object get() throws HiveException
> {
> return value;
> }
> }
>
> private class MyListObjectInspector implements ListObjectInspector
> {
> @Override
> public ObjectInspector getListElementObjectInspector()
> {
> return
> ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
> fieldObjectInspectors);
> }
>
> @Override
> public Object getListElement(Object data, int index)
> {
> List myList = (List) data;
> if (myList == null || index > myList.size()) {
> return null;
> }
> return myList.get(index);
> }
>
> @Override
> public int getListLength(Object data)
> {
> if (data == null) {
> return -1;
> }
> return ((List) data).size();
> }
>
> @Override
> public List getList(Object data)
> {
> return (List) data;
> }
>
> @Override
> public String getTypeName()
> {
> return null;  //To change body of implemented methods use File |
> Settings | File Templates.
> }
>
> @Override
> public Category getCategory()
> {
> return Category.LIST;
> }
> }
> }
>
> 
> From: pete@outlook.com
> To: user@hive.apache.org
> Subject: A GenericUDF Function to Extract a Field From an Array of Structs
> Date: Thu, 28 Mar 2013 14:16:33 -0700
>
> I am trying to write a GenericUDF function to collect all of a specific
> struct field(s) within an array for each record, and return them in an array
> as well.
>
> I wrote the UDF (as below), and it seems to work but:
>
> 1) It does not work when I am performing this on an external table, it works
> fine on a managed table, any idea?
>
> 2) I am having a tough time writing a test on this.  I have attached the
> test I have so far, and it does not work,
> alwa

UDF for transforming a collection of arrays into an array of structs

2013-03-29 Thread Roberto Congiu
Hi,
I am working on ingesting some legacy data that is denormalized in hive
somewhat like the following:

CREATE TABLE mytable (
   order_id int,

   product_id array,
   product_name array,
   product_price array

)


As you see, the product_* fields would better be represented as a
struct.
Is there an UDF that can take a group of arrays and, assuming they're all
the same size, return an array of structs instead ?

I know it's not difficult to implement using generic UDFs, but I was
wondering if anybody had already done it, and if not, if anybody was
actually interested in something like that.

R.

-- 
--
Good judgement comes with experience.
Experience comes with bad judgement.
--
Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141


RE: A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-28 Thread Peter Chu
Sorry, the test should be following (changed extract_shas to 
extract_product_category):
import org.apache.hadoop.hive.ql.metadata.HiveException;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import 
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import
 org.testng.annotations.Test;
import java.util.ArrayList;import java.util.List;
public class TestGenericUDFExtractProductCategory{ArrayList 
fieldNames = new ArrayList();ArrayList 
fieldObjectInspectors = new ArrayList();
@Testpublic void simpleTest()throws Exception{
ListObjectInspector firstInspector = new MyListObjectInspector();
ArrayList test = new ArrayList();test.add("test");
ArrayList test2 = new ArrayList();test2.add(test);
StructObjectInspector soi = 
ObjectInspectorFactory.getStandardStructObjectInspector(test, test2);
fieldNames.add("productCategory");
fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
GenericUDF.DeferredObject firstDeferredObject = new 
MyDeferredObject(test2);
GenericUDF extract_product_category = new 
GenericUDFExtractProductCategory();
extract_product_category.initialize(new 
ObjectInspector[]{firstInspector});
extract_product_category.evaluate(new 
DeferredObject[]{firstDeferredObject});}
public class MyDeferredObject implements DeferredObject{private 
Object value;
public MyDeferredObject(Object value) {this.value = value;  
  }
@Overridepublic Object get() throws HiveException{  
  return value;}}
private class MyListObjectInspector implements ListObjectInspector{ 
   @Overridepublic ObjectInspector getListElementObjectInspector()  
  {return 
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, 
fieldObjectInspectors);}
@Overridepublic Object getListElement(Object data, int index)   
 {List myList = (List) data;if (myList == null || 
index > myList.size()) {return null;}
return myList.get(index);}
@Overridepublic int getListLength(Object data){ 
   if (data == null) {return -1;}return 
((List) data).size();}
@Overridepublic List getList(Object data){   
 return (List) data;}
@Overridepublic String getTypeName(){return 
null;  //To change body of implemented methods use File | Settings | File 
Templates.}
@Overridepublic Category getCategory(){
return Category.LIST;}}}
From: pete@outlook.com
To: user@hive.apache.org
Subject: A GenericUDF Function to Extract a Field From an Array of Structs
Date: Thu, 28 Mar 2013 14:16:33 -0700




I am trying to write a GenericUDF function to collect all of a specific struct 
field(s) within an array for each record, and return them in an array as well.
I wrote the UDF (as below), and it seems to work but:
1) It does not work when I am performing this on an external table, it works 
fine on a managed table, any idea?
2) I am having a tough time writing a test on this.  I have attached the test I 
have so far, and it does not work, always getting 'java.util.ArrayList cannot 
be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or 
cannot cast String to LazyString', my question is how do I supply a list of 
structs for the evalue method?
Any help will be greatly appreciated.
Thanks,Peter
The table:
CREATE EXTERNAL TABLE FOO (TS string,customerId string,products 
array< struct >  )  PARTITIONED BY (ds string)  ROW 
FORMAT SERDE 'some.serde'  WITH SERDEPROPERTIES ('error.ignore'='true')  
LOCATION 'some_locations'  ;
A row of record holds:1340321132000, 'some_company', 
[{"productCategory":"footwear"},{"productCategory":"eyewear"}]
This is my code:
import org.apache.hadoop.hive.ql.exec.Description;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import 
org.apache.hadoop.hive.ql.metadata.HiveException;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;

A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-28 Thread Peter Chu
I am trying to write a GenericUDF function to collect all of a specific struct 
field(s) within an array for each record, and return them in an array as well.
I wrote the UDF (as below), and it seems to work but:
1) It does not work when I am performing this on an external table, it works 
fine on a managed table, any idea?
2) I am having a tough time writing a test on this.  I have attached the test I 
have so far, and it does not work, always getting 'java.util.ArrayList cannot 
be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or 
cannot cast String to LazyString', my question is how do I supply a list of 
structs for the evalue method?
Any help will be greatly appreciated.
Thanks,Peter
The table:
CREATE EXTERNAL TABLE FOO (TS string,customerId string,products 
array< struct >  )  PARTITIONED BY (ds string)  ROW 
FORMAT SERDE 'some.serde'  WITH SERDEPROPERTIES ('error.ignore'='true')  
LOCATION 'some_locations'  ;
A row of record holds:1340321132000, 'some_company', 
[{"productCategory":"footwear"},{"productCategory":"eyewear"}]
This is my code:
import org.apache.hadoop.hive.ql.exec.Description;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import 
org.apache.hadoop.hive.ql.metadata.HiveException;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import 
org.apache.hadoop.hive.serde2.lazy.LazyString;import 
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.StructField;import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import
 
org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;import
 org.apache.hadoop.io.Text;
import java.util.ArrayList;
@Description(name = "extract_product_category",value = "_FUNC_( array< 
struct > ) - Collect all product category field values inside an 
array of struct(s), and return the results in an array",
extended = "Example:\n SELECT 
_FUNC_(array_of_structs_with_product_category_field)")public class 
GenericUDFExtractProductCategoryextends GenericUDF{private 
ArrayList ret;
private ListObjectInspector listOI;private StructObjectInspector 
structOI;private ObjectInspector prodCatOI;
@Overridepublic ObjectInspector initialize(ObjectInspector[] args)  
  throws UDFArgumentException{if (args.length != 1) {   
 throw new UDFArgumentLengthException("The function extract_product_category() 
requires exactly one argument.");}
if (args[0].getCategory() != Category.LIST) {throw new 
UDFArgumentTypeException(0, "Type array is expected to be the argument 
for extract_product_category but " + args[0].getTypeName() + " is found 
instead");}
listOI = ((ListObjectInspector) args[0]);structOI = 
((StructObjectInspector) listOI.getListElementObjectInspector());
if (structOI.getAllStructFieldRefs().size() != 1) {throw 
new UDFArgumentTypeException(0, "Incorrect number of fields in the struct, 
should be one");}
StructField productCategoryField = 
structOI.getStructFieldRef("productCategory");//If not, throw exception 
   if (productCategoryField == null) {throw new 
UDFArgumentTypeException(0, "NO \"productCategory\" field in input structure"); 
   }
//Are they of the correct types?//We store these object 
inspectors for use in the evaluate() methodprodCatOI = 
productCategoryField.getFieldObjectInspector();
//First are they primitivesif (prodCatOI.getCategory() != 
Category.PRIMITIVE) {throw new UDFArgumentTypeException(0, 
"productCategory field must be of string type");}
//Are they of the correct primitives?if 
(((PrimitiveObjectInspector)prodCatOI).getPrimitiveCategory() != 
PrimitiveObjectInspector.PrimitiveCategory.STRING) {throw new 
UDFArgumentTypeException(0, "productCategory field must be of string type");
}
ret = new ArrayList();
return 
ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
}
@Overridepublic ArrayList evaluate(DeferredObject[] arguments)  
  throws HiveException{ret.clear();
if (arguments.length != 1) {return null;}
if (arguments[0].get() == null

Array of Structs

2012-07-13 Thread Bhaskar, Snehalata
Hi all,

 

How to create an array of structs in hive? And How to populate that
table with data?

 

Please help.

 

Thanks and regards,

Snehalata Deorukhkar

Nortel No:0229-5814

 


Confidential: This electronic message and all contents contain information from 
Syntel, Inc. which may be privileged, confidential or otherwise protected from 
disclosure. The information is intended to be for the addressee only. If you 
are not the addressee, any disclosure, copy, distribution or use of the 
contents of this message is prohibited. If you have received this electronic 
message in error, please notify the sender immediately and destroy the original 
message and all copies.