Question on converting an array of structs into a map

2014-04-08 Thread Sunderlin, Mark
I have data in two tables that looks like this:

Table_1
Sequence_numint,
User_id string

Table_2
Sequence_numint
User_attributes arraystructkeyid:bigint,valueid:bigint

Sequence number joins the two and it is a one to one and only one relationship.

Where I want to end up:  A single result set that looks like:

Sequence_numint
User_id string
User-attributes map /* A map of  keyid=valueid for each struct from the 
original array */

Can this be done in Hive? Is inline() the key to doing this?

--
Mark E. Sunderlin
Data Architect // AOL Platforms
P: 703-265-6935 // C: 540-327-6222 // 22000 AOL Way,  Dulles, VA  20166
AIM: MESunderlin





Re: A GenericUDF Function to Extract a Field From an Array of Structs

2013-04-02 Thread Navis류승우
try to change codes in evaluate method like,

for (int i = 0; i  numElements; i++) {
  Object element = listOI.getListElement(arguments[0].get(), i);
  Object product = structOI.getStructFieldData(element,
structOI.getStructFieldRef(productCategory));
  
ret.add(((PrimitiveObjectInspector)prodCatOI).getPrimitiveWritableObject(product));
}

2013/3/29 Peter Chu pete@outlook.com:
 Sorry, the test should be following (changed extract_shas to
 extract_product_category):

 import org.apache.hadoop.hive.ql.metadata.HiveException;
 import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
 import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;
 import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
 import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
 import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
 import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
 import
 org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
 import org.testng.annotations.Test;

 import java.util.ArrayList;
 import java.util.List;

 public class TestGenericUDFExtractProductCategory
 {
 ArrayListString fieldNames = new ArrayListString();
 ArrayListObjectInspector fieldObjectInspectors = new
 ArrayListObjectInspector();

 @Test
 public void simpleTest()
 throws Exception
 {
 ListObjectInspector firstInspector = new MyListObjectInspector();

 ArrayList test = new ArrayList();
 test.add(test);

 ArrayList test2 = new ArrayList();
 test2.add(test);

 StructObjectInspector soi =
 ObjectInspectorFactory.getStandardStructObjectInspector(test, test2);

 fieldNames.add(productCategory);

 fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);

 GenericUDF.DeferredObject firstDeferredObject = new
 MyDeferredObject(test2);

 GenericUDF extract_product_category = new
 GenericUDFExtractProductCategory();

 extract_product_category.initialize(new
 ObjectInspector[]{firstInspector});

 extract_product_category.evaluate(new
 DeferredObject[]{firstDeferredObject});
 }

 public class MyDeferredObject implements DeferredObject
 {
 private Object value;

 public MyDeferredObject(Object value) {
 this.value = value;
 }

 @Override
 public Object get() throws HiveException
 {
 return value;
 }
 }

 private class MyListObjectInspector implements ListObjectInspector
 {
 @Override
 public ObjectInspector getListElementObjectInspector()
 {
 return
 ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
 fieldObjectInspectors);
 }

 @Override
 public Object getListElement(Object data, int index)
 {
 List myList = (List) data;
 if (myList == null || index  myList.size()) {
 return null;
 }
 return myList.get(index);
 }

 @Override
 public int getListLength(Object data)
 {
 if (data == null) {
 return -1;
 }
 return ((List) data).size();
 }

 @Override
 public List? getList(Object data)
 {
 return (List) data;
 }

 @Override
 public String getTypeName()
 {
 return null;  //To change body of implemented methods use File |
 Settings | File Templates.
 }

 @Override
 public Category getCategory()
 {
 return Category.LIST;
 }
 }
 }

 
 From: pete@outlook.com
 To: user@hive.apache.org
 Subject: A GenericUDF Function to Extract a Field From an Array of Structs
 Date: Thu, 28 Mar 2013 14:16:33 -0700

 I am trying to write a GenericUDF function to collect all of a specific
 struct field(s) within an array for each record, and return them in an array
 as well.

 I wrote the UDF (as below), and it seems to work but:

 1) It does not work when I am performing this on an external table, it works
 fine on a managed table, any idea?

 2) I am having a tough time writing a test on this.  I have attached the
 test I have so far, and it does not work,
 always getting 'java.util.ArrayList cannot be cast to
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or cannot
 cast String to LazyString',
 my question is how do I supply a list of structs for the evalue method?

 Any help will be greatly appreciated.

 Thanks,
 Peter

 The table:

 CREATE EXTERNAL TABLE FOO (
   TS string,
   customerId string,
   products array structproductCategory:string 
 )
 PARTITIONED BY (ds string)
 ROW FORMAT SERDE 'some.serde'
 WITH SERDEPROPERTIES ('error.ignore'='true')
 LOCATION

RE: A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-29 Thread Peter Chu
Sorry, the test should be following (changed extract_shas to 
extract_product_category):
import org.apache.hadoop.hive.ql.metadata.HiveException;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import 
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import
 org.testng.annotations.Test;
import java.util.ArrayList;import java.util.List;
public class TestGenericUDFExtractProductCategory{ArrayListString 
fieldNames = new ArrayListString();ArrayListObjectInspector 
fieldObjectInspectors = new ArrayListObjectInspector();
@Testpublic void simpleTest()throws Exception{
ListObjectInspector firstInspector = new MyListObjectInspector();
ArrayList test = new ArrayList();test.add(test);
ArrayList test2 = new ArrayList();test2.add(test);
StructObjectInspector soi = 
ObjectInspectorFactory.getStandardStructObjectInspector(test, test2);
fieldNames.add(productCategory);
fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
GenericUDF.DeferredObject firstDeferredObject = new 
MyDeferredObject(test2);
GenericUDF extract_product_category = new 
GenericUDFExtractProductCategory();
extract_product_category.initialize(new 
ObjectInspector[]{firstInspector});
extract_product_category.evaluate(new 
DeferredObject[]{firstDeferredObject});}
public class MyDeferredObject implements DeferredObject{private 
Object value;
public MyDeferredObject(Object value) {this.value = value;  
  }
@Overridepublic Object get() throws HiveException{  
  return value;}}
private class MyListObjectInspector implements ListObjectInspector{ 
   @Overridepublic ObjectInspector getListElementObjectInspector()  
  {return 
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, 
fieldObjectInspectors);}
@Overridepublic Object getListElement(Object data, int index)   
 {List myList = (List) data;if (myList == null || 
index  myList.size()) {return null;}
return myList.get(index);}
@Overridepublic int getListLength(Object data){ 
   if (data == null) {return -1;}return 
((List) data).size();}
@Overridepublic List? getList(Object data){   
 return (List) data;}
@Overridepublic String getTypeName(){return 
null;  //To change body of implemented methods use File | Settings | File 
Templates.}
@Overridepublic Category getCategory(){
return Category.LIST;}}}
From: pete@outlook.com
To: user@hive.apache.org
Subject: A GenericUDF Function to Extract a Field From an Array of Structs
Date: Thu, 28 Mar 2013 14:16:33 -0700




I am trying to write a GenericUDF function to collect all of a specific struct 
field(s) within an array for each record, and return them in an array as well.
I wrote the UDF (as below), and it seems to work but:
1) It does not work when I am performing this on an external table, it works 
fine on a managed table, any idea?
2) I am having a tough time writing a test on this.  I have attached the test I 
have so far, and it does not work, always getting 'java.util.ArrayList cannot 
be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or 
cannot cast String to LazyString', my question is how do I supply a list of 
structs for the evalue method?
Any help will be greatly appreciated.
Thanks,Peter
The table:
CREATE EXTERNAL TABLE FOO (TS string,customerId string,products 
array structproductCategory:string   )  PARTITIONED BY (ds string)  ROW 
FORMAT SERDE 'some.serde'  WITH SERDEPROPERTIES ('error.ignore'='true')  
LOCATION 'some_locations'  ;
A row of record holds:1340321132000, 'some_company', 
[{productCategory:footwear},{productCategory:eyewear}]
This is my code:
import org.apache.hadoop.hive.ql.exec.Description;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import 
org.apache.hadoop.hive.ql.metadata.HiveException;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import 
org.apache.hadoop.hive.serde2.lazy.LazyString;import 
org.apache.hadoop.hive.serde2

UDF for transforming a collection of arrays into an array of structs

2013-03-29 Thread Roberto Congiu
Hi,
I am working on ingesting some legacy data that is denormalized in hive
somewhat like the following:

CREATE TABLE mytable (
   order_id int,

   product_id arrayint,
   product_name arraystring,
   product_price arraybigint

)


As you see, the product_* fields would better be represented as a
structint,string,bigint.
Is there an UDF that can take a group of arrays and, assuming they're all
the same size, return an array of structs instead ?

I know it's not difficult to implement using generic UDFs, but I was
wondering if anybody had already done it, and if not, if anybody was
actually interested in something like that.

R.

-- 
--
Good judgement comes with experience.
Experience comes with bad judgement.
--
Roberto Congiu - Data Engineer - OpenX
tel: +1 626 466 1141


A GenericUDF Function to Extract a Field From an Array of Structs

2013-03-28 Thread Peter Chu
I am trying to write a GenericUDF function to collect all of a specific struct 
field(s) within an array for each record, and return them in an array as well.
I wrote the UDF (as below), and it seems to work but:
1) It does not work when I am performing this on an external table, it works 
fine on a managed table, any idea?
2) I am having a tough time writing a test on this.  I have attached the test I 
have so far, and it does not work, always getting 'java.util.ArrayList cannot 
be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or 
cannot cast String to LazyString', my question is how do I supply a list of 
structs for the evalue method?
Any help will be greatly appreciated.
Thanks,Peter
The table:
CREATE EXTERNAL TABLE FOO (TS string,customerId string,products 
array structproductCategory:string   )  PARTITIONED BY (ds string)  ROW 
FORMAT SERDE 'some.serde'  WITH SERDEPROPERTIES ('error.ignore'='true')  
LOCATION 'some_locations'  ;
A row of record holds:1340321132000, 'some_company', 
[{productCategory:footwear},{productCategory:eyewear}]
This is my code:
import org.apache.hadoop.hive.ql.exec.Description;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import 
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import 
org.apache.hadoop.hive.ql.metadata.HiveException;import 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import 
org.apache.hadoop.hive.serde2.lazy.LazyString;import 
org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.StructField;import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import
 
org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;import
 org.apache.hadoop.io.Text;
import java.util.ArrayList;
@Description(name = extract_product_category,value = _FUNC_( array 
structsha256:string  ) - Collect all product category field values inside an 
array of struct(s), and return the results in an arraystring,
extended = Example:\n SELECT 
_FUNC_(array_of_structs_with_product_category_field))public class 
GenericUDFExtractProductCategoryextends GenericUDF{private 
ArrayList ret;
private ListObjectInspector listOI;private StructObjectInspector 
structOI;private ObjectInspector prodCatOI;
@Overridepublic ObjectInspector initialize(ObjectInspector[] args)  
  throws UDFArgumentException{if (args.length != 1) {   
 throw new UDFArgumentLengthException(The function extract_product_category() 
requires exactly one argument.);}
if (args[0].getCategory() != Category.LIST) {throw new 
UDFArgumentTypeException(0, Type arraystruct is expected to be the argument 
for extract_product_category but  + args[0].getTypeName() +  is found 
instead);}
listOI = ((ListObjectInspector) args[0]);structOI = 
((StructObjectInspector) listOI.getListElementObjectInspector());
if (structOI.getAllStructFieldRefs().size() != 1) {throw 
new UDFArgumentTypeException(0, Incorrect number of fields in the struct, 
should be one);}
StructField productCategoryField = 
structOI.getStructFieldRef(productCategory);//If not, throw exception 
   if (productCategoryField == null) {throw new 
UDFArgumentTypeException(0, NO \productCategory\ field in input structure); 
   }
//Are they of the correct types?//We store these object 
inspectors for use in the evaluate() methodprodCatOI = 
productCategoryField.getFieldObjectInspector();
//First are they primitivesif (prodCatOI.getCategory() != 
Category.PRIMITIVE) {throw new UDFArgumentTypeException(0, 
productCategory field must be of string type);}
//Are they of the correct primitives?if 
(((PrimitiveObjectInspector)prodCatOI).getPrimitiveCategory() != 
PrimitiveObjectInspector.PrimitiveCategory.STRING) {throw new 
UDFArgumentTypeException(0, productCategory field must be of string type);
}
ret = new ArrayList();
return 
ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
}
@Overridepublic ArrayList evaluate(DeferredObject[] arguments)  
  throws HiveException{ret.clear();
if (arguments.length != 1) {return null;}
if 

Array of Structs

2012-07-13 Thread Bhaskar, Snehalata
Hi all,

 

How to create an array of structs in hive? And How to populate that
table with data?

 

Please help.

 

Thanks and regards,

Snehalata Deorukhkar

Nortel No:0229-5814

 


Confidential: This electronic message and all contents contain information from 
Syntel, Inc. which may be privileged, confidential or otherwise protected from 
disclosure. The information is intended to be for the addressee only. If you 
are not the addressee, any disclosure, copy, distribution or use of the 
contents of this message is prohibited. If you have received this electronic 
message in error, please notify the sender immediately and destroy the original 
message and all copies.