[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-10-07 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918795#action_12918795
 ] 

Amareshwari Sriramadasu commented on HIVE-537:
--

bq. Constants.java is a generated file ? Can you change serde/if/serde.thrift
After adding the constant to serde/if/serde.thrift, do i need to regenerate the 
java file? If yes, How should i do it?

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Fix For: 0.7.0

 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537-2.txt, 
 patch-537-3.txt, patch-537-4.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-10-06 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918740#action_12918740
 ] 

Namit Jain commented on HIVE-537:
-

Otherwise it looks good to me

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Fix For: 0.7.0

 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537-2.txt, 
 patch-537-3.txt, patch-537-4.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-22 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913475#action_12913475
 ] 

Amareshwari Sriramadasu commented on HIVE-537:
--

Zheng, Can you give an example usage of union type as UDF? I looked at Struct, 
Map and array UDFs, but Union is quiet different from them because it holds 
only one object at any point of time.

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-22 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913670#action_12913670
 ] 

Zheng Shao commented on HIVE-537:
-

{code}
unionT0,T1,T2 create_union(byte tag, T0 o0, T1 o1, T2 o2, ...)
Some real examples:
unionSchool,Company create_union( is_student ? 0 : 1, school, company)
{code}

Depending on the value of the tag, the returned union object will choose to 
store only the object corresponding to that tag.


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-20 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912420#action_12912420
 ] 

HBase Review Board commented on HIVE-537:
-

Message from: Amareshwari Sriramadasu amar...@yahoo-inc.com


bq.  On 2010-09-15 15:15:08, Zheng Shao wrote:
bq.   Overall looks like a good first step.  We need to change Hive.g, add UDF 
etc to allow users to use it in the Hive language.

Zheng, there is already keyword (KW_UNION: 'UNION') used for doing union/union 
all operations. Do you think we should use a different keyword for specifying 
Union type?


- Amareshwari


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/795/#review1231
---





 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-20 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912614#action_12912614
 ] 

Zheng Shao commented on HIVE-537:
-

I think so. Let's use a different name for the UDF.

Using 'UNION' as UDF name will not cause grammar ambiguity, but it may cause 
other issues in the future.

Zheng


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-15 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909919#action_12909919
 ] 

HBase Review Board commented on HIVE-537:
-

Message from: Zheng Shao zsh...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/795/#review1231
---


Overall looks like a good first step.  We need to change Hive.g, add UDF etc to 
allow users to use it in the Hive language.


trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
http://review.cloudera.org/r/795/#comment4192

unioin - union



trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
http://review.cloudera.org/r/795/#comment4193

We cannot compare 2 union objects like this.  We need to first compare 
their TAG.  Only when the TAG is the same shall we compare the field.


- Zheng





 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-09-07 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906759#action_12906759
 ] 

HBase Review Board commented on HIVE-537:
-

Message from: Amareshwari Sriramadasu amar...@yahoo-inc.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/795/
---

Review request for Hive Developers.


Summary
---

Adds Union type to Standard ObjectInSpectors, TypeInfo and Lazy 
ObjectInspectors.


This addresses bug HIVE-537.
http://issues.apache.org/jira/browse/HIVE-537


Diffs
-

  trunk/serde/src/gen-java/org/apache/hadoop/hive/serde/Constants.java 991812 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/SerDeUtils.java 991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
 991812 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 
991812 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java 
991812 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUnion.java 
PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyObjectInspectorFactory.java
 991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/objectinspector/LazyUnionObjectInspector.java
 PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspector.java
 991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
 991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardUnionObjectInspector.java
 PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/UnionObject.java
 PRE-CREATION 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/UnionObjectInspector.java
 PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.java 
991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoFactory.java
 991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java 
991812 
  
trunk/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/UnionTypeInfo.java 
PRE-CREATION 
  
trunk/serde/src/test/org/apache/hadoop/hive/serde2/lazy/TestLazyArrayMapStruct.java
 991812 
  
trunk/serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestStandardObjectInspectors.java
 991812 

Diff: http://review.cloudera.org/r/795/diff


Testing
---


Thanks,

Amareshwari




 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Amareshwari Sriramadasu
 Attachments: HIVE-537.1.patch, patch-537-1.txt, patch-537.txt


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2010-08-30 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904099#action_12904099
 ] 

Amareshwari Sriramadasu commented on HIVE-537:
--

Min, any update on the patch?

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-537.1.patch


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-07-07 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727999#action_12727999
 ] 

Min Zhou commented on HIVE-537:
---

Zheng, how would you get field value from an object without a ordinal?


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-537.1.patch


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-07-06 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727846#action_12727846
 ] 

Zheng Shao commented on HIVE-537:
-

@HIVE-537.1.patch:
1. Can you remove the property changes? These java files don't need to be 
executable:
Property changes on: 
src/java/org/apache/hadoop/hive/serde2/objectinspector/StandardUnionObjectInspector.java
___
Name: svn:executable
   + *
2. UnionObjectInspector.java: byte getTag(Object o, int ordinal);
We don't need ordinal here.
3. Can you add union to TypeInfoUtils.java: class TypeInfoParser as well?
4. We need some test cases. Please take a look at 
TestStandardObjectInspectors.java
5. We need to add the capability of serializing/deserializing Union types to 
LazySimpleSerDe.


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Min Zhou
 Attachments: HIVE-537.1.patch


 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-30 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725532#action_12725532
 ] 

Min Zhou commented on HIVE-537:
---

Even if UnionObjectInspector has been implemented,  the DynamicSerDe seems 
don't support  the schema with a union type  which thrift can't recoginze.
We must find a way solving it, any suggestions?  

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 An example serialization format (Using deliminated format, with ' ' as 
 first-level delimitor and '=' as second-level delimitor)
 userid:int,log:union0:structtouserid:int,message:string,1:string
 123 1=login
 123 0=243=helloworld
 123 1=logout
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-27 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724916#action_12724916
 ] 

Min Zhou commented on HIVE-537:
---

we've done a test about this issue, dataset: 700m records.

first approach, each distinct count needs 119 seconds, that's means 10 distinct 
count needs at least  1190 seconds.
second approach where distinct keys were distinguished by a tag,  10 distinct 
count need 148 seconds.

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-11 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718373#action_12718373
 ] 

Min Zhou commented on HIVE-537:
---

first approach:
  O(mN/p) + O(m(N/p log (N/p))) + O(mN/r) + O(m)
I don't agree with you about this O(m).  It would be indeed very large cost.  
and meanwhile,  you should adding the cost in the end joining all results into 
one. 

 for the second approach, I think it should be  
  O(N/p) + O(mN/p log (mN/p)) + O(mN/r)  

 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-03 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715878#action_12715878
 ] 

Zheng Shao commented on HIVE-537:
-

An example usage is for multiple distinct. Min Zhou talked with me offline and 
has shown that doing multiple distinct in a single map-reduce job can be much 
faster than doing them separately and then join the results.

{code}
Query:
  select a, count(distinct b), count(distinct c), sum(d)

Plan:
  Map side:
Emit: distribution_key: a, sort_key: a, 0, b, value: d
Emit: distribution_key: a, sort_key: a, 1, c, value: nothing
  Reduce side:
Group By:
  a, 0, count(distinct b), sum(d)
  a, 1, count(distinct c)
Flatten:
  a, count(distinct b), sum(d), count(distinct c)
{code}


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-537) Hive TypeInfo/ObjectInspector to support union (besides struct, array, and map)

2009-06-03 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716015#action_12716015
 ] 

Ashish Thusoo commented on HIVE-537:


One thing that you need to be careful about is the fact that you will be 
increasing the number of rows between the map and the reduce boundaries which, 
if there are a lot of distincts can lead to data explosion and a subsequent 
slowdown in the sort.

From that I mean the following:

Suppose we have a query with m different distincts and the base table with N 
rows and p mappers and r reducers
By doing multiple map/reduce jobs, the predominant term in our complexity is

O(mN/p) + O(m(N/p log (N/p))) + O(mN/r) + O(m)

ie.
map side scan + map side sort + Reduce side merge + fixed cost of starting the 
map/reduce job.

how with the current approach the corresponding formula will be

O(mN/p) + O(mN/p log (mN/p)) + O(mN/r)
=
O(mN/p) + O(mN/p log (N/p)) + O(mN/p log m) + O(mN/r)

There may be situations where one is better than the other... Something to keep 
in mind.


 Hive TypeInfo/ObjectInspector to support union (besides struct, array, and 
 map)
 ---

 Key: HIVE-537
 URL: https://issues.apache.org/jira/browse/HIVE-537
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao

 There are already some cases inside the code that we use heterogeneous data: 
 JoinOperator, and UnionOperator (in the sense that different parents can pass 
 in records with different ObjectInspectors).
 We currently use Operator's parentID to distinguish that. However that 
 approach does not extend to more complex plans that might be needed in the 
 future.
 We will support the union type like this:
 {code}
 TypeDefinition:
   type: primitivetype | structtype | arraytype | maptype | uniontype
   uniontype: union  tag : type (, tag : type)* 
 Example:
   union0:int,1:double,2:arraystring,3:structa:int,b:string
 Example of serialized data format:
   We will first store the tag byte before we serialize the object. On 
 deserialization, we will first read out the tag byte, then we know what is 
 the current type of the following object, so we can deserialize it 
 successfully.
 Interface for ObjectInspector:
 interface UnionObjectInspector {
   /** Returns the array of OIs that are for each of the tags
*/
   ObjectInspector[] getObjectInspectors();
   /** Return the tag of the object.
*/
   byte getTag(Object o);
   /** Return the field based on the tag value associated with the Object.
*/
   Object getField(Object o);
 };
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.