[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2015-08-10 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680395#comment-14680395
 ] 

Anthony Hsu commented on HIVE-4734:
---

Any updates on this patch? I'd love to see this committed, too! :-)

> Use custom ObjectInspectors for AvroSerde
> -
>
> Key: HIVE-4734
> URL: https://issues.apache.org/jira/browse/HIVE-4734
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>  Labels: Avro, AvroSerde, Performance
> Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch, 
> HIVE-4734.4.patch, HIVE-4734.5.patch
>
>
> Currently, the AvroSerde recursively copies all fields of a record from the 
> GenericRecord to a List row object and provides the standard 
> ObjectInspectors. Performance can be improved by providing ObjectInspectors 
> to the Avro record itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2015-07-08 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618187#comment-14618187
 ] 

Ratandeep Ratti commented on HIVE-4734:
---

I found some minor issues with the patch. Adding notes below.

In class AvroObjectInspectorGenerator.java, while constructing an 
{{AvroMapObjectInspector}} we are using  a {{JavaStringObjectInspector}}

{code}
   case MAP:
MapTypeInfo mti = (MapTypeInfo)ti;
result = new AvroMapObjectInspector(
PrimitiveObjectInspectorFactory

.getPrimitiveJavaObjectInspector(PrimitiveObjectInspector.PrimitiveCategory.STRING),
createObjectInspectorWorker(mti.getMapValueTypeInfo(), 
subSchema.getValueType()));
break;
   
{code}

Whereas Avro string could be of type UTF8 or java.lang.String . Default being 
UTF8.

Also, {{AvroStringObjectInspector}} assumes that avro strings are always 
{{Utf8}} which may not be the case. Avro strings can either be UFT-8 or String 
type.

{{AvroMapObjectInspector}} also assumes that Map keys will always be {{String}} 
s. Further the below methods in MapObjectInspector  assume the second argument 
('key') passed to them is of type {{String}} whereas it could be either 
{{utf8}} or {{String}}

{code}
@Override
  public Object getMapValueElement(Object data, Object key) {
Utf8 utf8key = new Utf8((String) key);
return super.getMapValueElement(data, utf8key);
  }

  @Override
  public Object put(Object map, Object key, Object value) {
Utf8 utf8key = new Utf8((String) key);
return super.put(map, utf8key, value);
  }

  @Override
  public Object remove(Object map, Object key) {
Utf8 utf8key = new Utf8((String) key);
return super.remove(map, utf8key);
  }
{code}

> Use custom ObjectInspectors for AvroSerde
> -
>
> Key: HIVE-4734
> URL: https://issues.apache.org/jira/browse/HIVE-4734
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>  Labels: Avro, AvroSerde, Performance
> Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch, 
> HIVE-4734.4.patch, HIVE-4734.5.patch
>
>
> Currently, the AvroSerde recursively copies all fields of a record from the 
> GenericRecord to a List row object and provides the standard 
> ObjectInspectors. Performance can be improved by providing ObjectInspectors 
> to the Avro record itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)