Question on converting an array of structs into a map
I have data in two tables that looks like this: Table_1 Sequence_numint, User_id string Table_2 Sequence_numint User_attributes array> Sequence number joins the two and it is a one to one and only one relationship. Where I want to end up: A single result set that looks like: Sequence_numint User_id string User-attributes map /* A map of keyid=valueid for each struct from the original array */ Can this be done in Hive? Is inline() the key to doing this? -- Mark E. Sunderlin Data Architect // AOL Platforms P: 703-265-6935 // C: 540-327-6222 // 22000 AOL Way, Dulles, VA 20166 AIM: MESunderlin
Re: A GenericUDF Function to Extract a Field From an Array of Structs
try to change codes in evaluate method like, for (int i = 0; i < numElements; i++) { Object element = listOI.getListElement(arguments[0].get(), i); Object product = structOI.getStructFieldData(element, structOI.getStructFieldRef("productCategory")); ret.add(((PrimitiveObjectInspector)prodCatOI).getPrimitiveWritableObject(product)); } 2013/3/29 Peter Chu : > Sorry, the test should be following (changed extract_shas to > extract_product_category): > > import org.apache.hadoop.hive.ql.metadata.HiveException; > import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; > import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject; > import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; > import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; > import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; > import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; > import > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; > import org.testng.annotations.Test; > > import java.util.ArrayList; > import java.util.List; > > public class TestGenericUDFExtractProductCategory > { > ArrayList fieldNames = new ArrayList(); > ArrayList fieldObjectInspectors = new > ArrayList(); > > @Test > public void simpleTest() > throws Exception > { > ListObjectInspector firstInspector = new MyListObjectInspector(); > > ArrayList test = new ArrayList(); > test.add("test"); > > ArrayList test2 = new ArrayList(); > test2.add(test); > > StructObjectInspector soi = > ObjectInspectorFactory.getStandardStructObjectInspector(test, test2); > > fieldNames.add("productCategory"); > > fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); > > GenericUDF.DeferredObject firstDeferredObject = new > MyDeferredObject(test2); > > GenericUDF extract_product_category = new > GenericUDFExtractProductCategory(); > > extract_product_category.initialize(new > ObjectInspector[]{firstInspector}); > > extract_product_category.evaluate(new > DeferredObject[]{firstDeferredObject}); > } > > public class MyDeferredObject implements DeferredObject > { > private Object value; > > public MyDeferredObject(Object value) { > this.value = value; > } > > @Override > public Object get() throws HiveException > { > return value; > } > } > > private class MyListObjectInspector implements ListObjectInspector > { > @Override > public ObjectInspector getListElementObjectInspector() > { > return > ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, > fieldObjectInspectors); > } > > @Override > public Object getListElement(Object data, int index) > { > List myList = (List) data; > if (myList == null || index > myList.size()) { > return null; > } > return myList.get(index); > } > > @Override > public int getListLength(Object data) > { > if (data == null) { > return -1; > } > return ((List) data).size(); > } > > @Override > public List getList(Object data) > { > return (List) data; > } > > @Override > public String getTypeName() > { > return null; //To change body of implemented methods use File | > Settings | File Templates. > } > > @Override > public Category getCategory() > { > return Category.LIST; > } > } > } > > > From: pete@outlook.com > To: user@hive.apache.org > Subject: A GenericUDF Function to Extract a Field From an Array of Structs > Date: Thu, 28 Mar 2013 14:16:33 -0700 > > I am trying to write a GenericUDF function to collect all of a specific > struct field(s) within an array for each record, and return them in an array > as well. > > I wrote the UDF (as below), and it seems to work but: > > 1) It does not work when I am performing this on an external table, it works > fine on a managed table, any idea? > > 2) I am having a tough time writing a test on this. I have attached the > test I have so far, and it does not work, > alwa
UDF for transforming a collection of arrays into an array of structs
Hi, I am working on ingesting some legacy data that is denormalized in hive somewhat like the following: CREATE TABLE mytable ( order_id int, product_id array, product_name array, product_price array ) As you see, the product_* fields would better be represented as a struct. Is there an UDF that can take a group of arrays and, assuming they're all the same size, return an array of structs instead ? I know it's not difficult to implement using generic UDFs, but I was wondering if anybody had already done it, and if not, if anybody was actually interested in something like that. R. -- -- Good judgement comes with experience. Experience comes with bad judgement. -- Roberto Congiu - Data Engineer - OpenX tel: +1 626 466 1141
RE: A GenericUDF Function to Extract a Field From an Array of Structs
Sorry, the test should be following (changed extract_shas to extract_product_category): import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.testng.annotations.Test; import java.util.ArrayList;import java.util.List; public class TestGenericUDFExtractProductCategory{ArrayList fieldNames = new ArrayList();ArrayList fieldObjectInspectors = new ArrayList(); @Testpublic void simpleTest()throws Exception{ ListObjectInspector firstInspector = new MyListObjectInspector(); ArrayList test = new ArrayList();test.add("test"); ArrayList test2 = new ArrayList();test2.add(test); StructObjectInspector soi = ObjectInspectorFactory.getStandardStructObjectInspector(test, test2); fieldNames.add("productCategory"); fieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector); GenericUDF.DeferredObject firstDeferredObject = new MyDeferredObject(test2); GenericUDF extract_product_category = new GenericUDFExtractProductCategory(); extract_product_category.initialize(new ObjectInspector[]{firstInspector}); extract_product_category.evaluate(new DeferredObject[]{firstDeferredObject});} public class MyDeferredObject implements DeferredObject{private Object value; public MyDeferredObject(Object value) {this.value = value; } @Overridepublic Object get() throws HiveException{ return value;}} private class MyListObjectInspector implements ListObjectInspector{ @Overridepublic ObjectInspector getListElementObjectInspector() {return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldObjectInspectors);} @Overridepublic Object getListElement(Object data, int index) {List myList = (List) data;if (myList == null || index > myList.size()) {return null;} return myList.get(index);} @Overridepublic int getListLength(Object data){ if (data == null) {return -1;}return ((List) data).size();} @Overridepublic List getList(Object data){ return (List) data;} @Overridepublic String getTypeName(){return null; //To change body of implemented methods use File | Settings | File Templates.} @Overridepublic Category getCategory(){ return Category.LIST;}}} From: pete@outlook.com To: user@hive.apache.org Subject: A GenericUDF Function to Extract a Field From an Array of Structs Date: Thu, 28 Mar 2013 14:16:33 -0700 I am trying to write a GenericUDF function to collect all of a specific struct field(s) within an array for each record, and return them in an array as well. I wrote the UDF (as below), and it seems to work but: 1) It does not work when I am performing this on an external table, it works fine on a managed table, any idea? 2) I am having a tough time writing a test on this. I have attached the test I have so far, and it does not work, always getting 'java.util.ArrayList cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or cannot cast String to LazyString', my question is how do I supply a list of structs for the evalue method? Any help will be greatly appreciated. Thanks,Peter The table: CREATE EXTERNAL TABLE FOO (TS string,customerId string,products array< struct > ) PARTITIONED BY (ds string) ROW FORMAT SERDE 'some.serde' WITH SERDEPROPERTIES ('error.ignore'='true') LOCATION 'some_locations' ; A row of record holds:1340321132000, 'some_company', [{"productCategory":"footwear"},{"productCategory":"eyewear"}] This is my code: import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
A GenericUDF Function to Extract a Field From an Array of Structs
I am trying to write a GenericUDF function to collect all of a specific struct field(s) within an array for each record, and return them in an array as well. I wrote the UDF (as below), and it seems to work but: 1) It does not work when I am performing this on an external table, it works fine on a managed table, any idea? 2) I am having a tough time writing a test on this. I have attached the test I have so far, and it does not work, always getting 'java.util.ArrayList cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector' or cannot cast String to LazyString', my question is how do I supply a list of structs for the evalue method? Any help will be greatly appreciated. Thanks,Peter The table: CREATE EXTERNAL TABLE FOO (TS string,customerId string,products array< struct > ) PARTITIONED BY (ds string) ROW FORMAT SERDE 'some.serde' WITH SERDEPROPERTIES ('error.ignore'='true') LOCATION 'some_locations' ; A row of record holds:1340321132000, 'some_company', [{"productCategory":"footwear"},{"productCategory":"eyewear"}] This is my code: import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDFArgumentException;import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;import org.apache.hadoop.hive.ql.metadata.HiveException;import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;import org.apache.hadoop.hive.serde2.lazy.LazyString;import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category;import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.StructField;import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;import org.apache.hadoop.io.Text; import java.util.ArrayList; @Description(name = "extract_product_category",value = "_FUNC_( array< struct > ) - Collect all product category field values inside an array of struct(s), and return the results in an array", extended = "Example:\n SELECT _FUNC_(array_of_structs_with_product_category_field)")public class GenericUDFExtractProductCategoryextends GenericUDF{private ArrayList ret; private ListObjectInspector listOI;private StructObjectInspector structOI;private ObjectInspector prodCatOI; @Overridepublic ObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException{if (args.length != 1) { throw new UDFArgumentLengthException("The function extract_product_category() requires exactly one argument.");} if (args[0].getCategory() != Category.LIST) {throw new UDFArgumentTypeException(0, "Type array is expected to be the argument for extract_product_category but " + args[0].getTypeName() + " is found instead");} listOI = ((ListObjectInspector) args[0]);structOI = ((StructObjectInspector) listOI.getListElementObjectInspector()); if (structOI.getAllStructFieldRefs().size() != 1) {throw new UDFArgumentTypeException(0, "Incorrect number of fields in the struct, should be one");} StructField productCategoryField = structOI.getStructFieldRef("productCategory");//If not, throw exception if (productCategoryField == null) {throw new UDFArgumentTypeException(0, "NO \"productCategory\" field in input structure"); } //Are they of the correct types?//We store these object inspectors for use in the evaluate() methodprodCatOI = productCategoryField.getFieldObjectInspector(); //First are they primitivesif (prodCatOI.getCategory() != Category.PRIMITIVE) {throw new UDFArgumentTypeException(0, "productCategory field must be of string type");} //Are they of the correct primitives?if (((PrimitiveObjectInspector)prodCatOI).getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) {throw new UDFArgumentTypeException(0, "productCategory field must be of string type"); } ret = new ArrayList(); return ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.writableStringObjectInspector); } @Overridepublic ArrayList evaluate(DeferredObject[] arguments) throws HiveException{ret.clear(); if (arguments.length != 1) {return null;} if (arguments[0].get() == null
Array of Structs
Hi all, How to create an array of structs in hive? And How to populate that table with data? Please help. Thanks and regards, Snehalata Deorukhkar Nortel No:0229-5814 Confidential: This electronic message and all contents contain information from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee only. If you are not the addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error, please notify the sender immediately and destroy the original message and all copies.