I don't know enough about the serdes to say whether that's a problem...maybe 
someone else does?  It seems like as long as the JSON form doesn't include the 
delimiter unescaped, it might work?

JVS

On Aug 26, 2010, at 6:29 PM, Steven Wong wrote:

That sounds like it’ll work, at least conceptually. But if the row contains 
primitive and non-primitive columns, the row serialization will be a mix of 
non-JSON and JSON serializations, right? Is that a good thing?


From: John Sichi [mailto:jsi...@facebook.com]
Sent: Thursday, August 26, 2010 12:11 PM
To: Steven Wong
Cc: Zheng Shao; hive-dev@hadoop.apache.org<mailto:hive-dev@hadoop.apache.org>; 
Jerome Boulon
Subject: Re: Deserializing map column via JDBC (HIVE-1378)

If you replace DynamicSerDe with LazySimpleSerDe on the JDBC client side, can't 
you then tell it to expect JSON serialization for the maps?  That way you can 
leave the FetchTask server side as is.

JVS

On Aug 24, 2010, at 2:50 PM, Steven Wong wrote:


I got sidetracked for awhile....

Looking at client.fetchOne, it is a call to the Hive server, which shows the 
following call stack:

SerDeUtils.getJSONString(Object, ObjectInspector) line: 205
LazySimpleSerDe.serialize(Object, ObjectInspector) line: 420
FetchTask.fetch(ArrayList<String>) line: 130
Driver.getResults(ArrayList<String>) line: 660
HiveServer$HiveServerHandler.fetchOne() line: 238

In other words, FetchTask.mSerde (an instance of LazySimpleSerDe) serializes 
the map column into JSON strings. It’s because FetchTask.mSerde has been 
initialized by FetchTask.initialize to do it that way.

It appears that the fix is to initialize FetchTask.mSerde differently to do 
ctrl-serialization instead – presumably for the JDBC use case only and not for 
other use cases of FetchTask. Further, it appears that FetchTask.mSerde will do 
ctrl-serialization if it is initialized (via the properties “columns” and 
“columns.types”) with the proper schema.

Are these right? Pointers on how to get the proper schema? (From 
FetchTask.work?) And on how to restrict the change to JDBC only? (I have no 
idea.)

For symmetry, LazySimpleSerDe should be used to do ctrl-deserialization on the 
client side, per Zheng’s suggestion.

Steven


From: Zheng Shao [mailto:zs...@facebook.com]
Sent: Monday, August 16, 2010 3:57 PM
To: Steven Wong; hive-dev@hadoop.apache.org<mailto:hive-dev@hadoop.apache.org>
Cc: Jerome Boulon
Subject: RE: Deserializing map column via JDBC (HIVE-1378)

I think the call to client.fetchOne should use delimited format, so that 
DynamicSerDe can deserialize it.
This should be a good short-term fix.

Also on a higher level, DynamicSerDe is deprecated.  It will be great to use 
LazySimpleSerDe to handle all serialization/deserializations instead.

Zheng
From: Steven Wong [mailto:sw...@netflix.com]
Sent: Friday, August 13, 2010 7:02 PM
To: Zheng Shao; hive-dev@hadoop.apache.org<mailto:hive-dev@hadoop.apache.org>
Cc: Jerome Boulon
Subject: Deserializing map column via JDBC (HIVE-1378)

Trying to work on HIVE-1378. My first step is to get the Hive JDBC driver to 
return actual values for mapcol in the result set of “select mapcol, bigintcol, 
stringcol from foo”, where mapcol is a map<string,string> column, instead of 
the current behavior of complaining that mapcol’s column type is not recognized.

I changed HiveResultSetMetaData.{getColumnType,getColumnTypeName} to recognize 
the map type, but then the returned value for mapcol is always {}, even though 
mapcol does contain some key-value entries. Turns out this is happening in 
HiveQueryResultSet.next:

1.       The call to client.fetchOne returns the string “{"a":"b","x":"y"}   
123         abc”.
2.       The serde (DynamicSerDe ds) deserializes the string to the list 
[{},123,"abc"].

The serde cannot correctly deserialize the map because apparently the map is 
not in the serde’s expected serialization format. The serde has been 
initialized with TCTLSeparatedProtocol.

Should we make client.fetchOne return a ctrl-separated string? Or should we use 
a different serde/format in HiveQueryResultSet? It seems the first way is 
right; correct me if that’s wrong. And how do we do that?

Thanks.
Steven



Reply via email to