[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746496#comment-13746496
 ] 

Brock Noland commented on HIVE-1511:
------------------------------------

I am leaving for vacation this afternoon so I won't be able to help with this 
effort for over a week. I've been working with the test 
bucketsortoptimize_insert_2.q which fails with "KryoException: Encountered 
unregistered class ID: 20" during clone of the query plan. All other 
serialization seems to work fine.  I added some debug code to the Utilities 
class to help debug this issue. It appears that either the data written out is 
corrupt or it gets confused on read. Below I have the trace logs to show it.

Here is the write logs, I have placed a comment where the write and read log 
start differing:

{noformat}
00:42 TRACE: [kryo] Write field: rowSchema 
(org.apache.hadoop.hive.ql.parse.RowResolver) pos=873
00:42 TRACE: [kryo] Write class name reference 21: 
org.apache.hadoop.hive.ql.exec.RowSchema
00:42 TRACE: [kryo] setGenerics
00:42 TRACE: [kryo] Write initial object reference 1014: _col0: int_col1: 
string_col6: string)
00:42 DEBUG: [kryo] Write: _col0: int_col1: string_col6: string)
00:42 TRACE: [kryo] FieldSerializer.write fields of class 
org.apache.hadoop.hive.ql.exec.RowSchema
00:42 TRACE: [kryo] Write field: signature 
(org.apache.hadoop.hive.ql.exec.RowSchema) pos=876
00:42 TRACE: [kryo] Write class name reference 9: java.util.ArrayList
00:42 DEBUG: [kryo] Write object reference 625: [_col0: int, _col1: string, 
_col6: string]
00:42 TRACE: [kryo] Write field: rslvMap 
(org.apache.hadoop.hive.ql.parse.RowResolver) pos=880
00:42 TRACE: [kryo] Write class name reference 30: java.util.HashMap
00:42 TRACE: [kryo] Write initial object reference 1015: {b={value=_col6: 
string}, a={key=_col0: int, value=_col1: string}}
00:42 DEBUG: [kryo] Write: {b={value=_col6: string}, a={key=_col0: int, 
value=_col1: string}}
00:42 DEBUG: [kryo] Write object reference 436: b
00:42 TRACE: [kryo] Write class name reference 1: java.util.LinkedHashMap
00:42 TRACE: [kryo] Write initial object reference 1016: {value=_col6: string}
00:42 DEBUG: [kryo] Write: {value=_col6: string}
00:42 DEBUG: [kryo] Write object reference 479: value
############# Here is where it it gets confused on the read side ###############
00:42 DEBUG: [kryo] Write object reference 628: _col6: string
00:42 DEBUG: [kryo] Write object reference 429: a
00:42 TRACE: [kryo] Write class name reference 1: java.util.LinkedHashMap
00:42 TRACE: [kryo] Write initial object reference 1017: {key=_col0: int, 
value=_col1: string}
00:42 DEBUG: [kryo] Write: {key=_col0: int, value=_col1: string}
00:42 TRACE: [kryo] Write class 1: String
00:42 DEBUG: [kryo] Write object reference 477: key
00:42 TRACE: [kryo] Write class name reference 22: 
org.apache.hadoop.hive.ql.exec.ColumnInfo
00:42 DEBUG: [kryo] Write object reference 626: _col0: int
00:42 TRACE: [kryo] Write class 1: String
00:42 DEBUG: [kryo] Write object reference 479: value
00:42 TRACE: [kryo] Write class name reference 22: 
org.apache.hadoop.hive.ql.exec.ColumnInfo
00:42 DEBUG: [kryo] Write object reference 627: _col1: string
00:42 TRACE: [kryo] Write class name reference 10: 
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator
00:42 DEBUG: [kryo] Write object reference 372: MAPJOIN[12]
00:42 TRACE: [kryo] Write class name reference 43: 
org.apache.hadoop.hive.ql.parse.OpParseContext
{noformat}

As you can see above, after writing "Write object reference 479: value" Kryo 
writes "Write object reference 628: _col6: string". Below we are able to read 
"Read object reference 479: value" but then it starts reading junk and fails 
shortly thereafter.

{noformat}
00:45 TRACE: [kryo] Read field: rowSchema 
(org.apache.hadoop.hive.ql.parse.RowResolver) pos=873
00:45 TRACE: [kryo] Read class name reference 21: 
org.apache.hadoop.hive.ql.exec.RowSchema
00:45 TRACE: [kryo] setGenerics
00:45 TRACE: [kryo] Read initial object reference 1014: 
org.apache.hadoop.hive.ql.exec.RowSchema
00:45 TRACE: [kryo] Read field: signature 
(org.apache.hadoop.hive.ql.exec.RowSchema) pos=876
00:45 TRACE: [kryo] Read class name reference 9: java.util.ArrayList
00:45 DEBUG: [kryo] Read object reference 625: [_col0: int, _col1: string, 
_col6: string]
00:45 DEBUG: [kryo] Read: _col0: int_col1: string_col6: string)
00:45 TRACE: [kryo] Read field: rslvMap 
(org.apache.hadoop.hive.ql.parse.RowResolver) pos=880
00:45 TRACE: [kryo] Read class name reference 30: java.util.HashMap
00:45 TRACE: [kryo] Read initial object reference 1015: java.util.HashMap
00:45 DEBUG: [kryo] Read object reference 436: b
00:45 TRACE: [kryo] Read class name reference 1: java.util.LinkedHashMap
00:45 TRACE: [kryo] Read initial object reference 1016: java.util.LinkedHashMap
00:45 DEBUG: [kryo] Read object reference 479: value
########### Here it appears to get confused #############
00:45 TRACE: [kryo] Read: -216
00:45 DEBUG: [kryo] Read: {value=-216}
00:45 TRACE: [kryo] Read initial object reference 1017: String
00:45 TRACE: [kryo] Read: _
00:45 TRACE: [kryo] Read initial object reference 1018: String
00:45 TRACE: [kryo] Read: t
00:45 DEBUG: [kryo] Read: {_=t, b={value=-216}}
00:45 DEBUG: [kryo] Read: RowResolver
00:45 DEBUG: [kryo] Read: org.apache.hadoop.hive.ql.parse.OpParseContext
00:45 TRACE: [kryo] Read class 2: float
00:45 TRACE: [kryo] Read: 1.3225001E-36
00:45 TRACE: [kryo] Object graph complete.
Exception: Client Execution failed with error code = 40000
{noformat}

Additionally this version of the patch saves the byte arrays to disk, you'll 
note during bucketsortoptimize_insert_2.q two files are written out, the second 
one is the corrupt one:

{noformat}
$ ls -l /tmp/kryo-mapredwork-*
-rw-r--r-- 1 brock brock 10365 Aug 21 10:25 /tmp/kryo-mapredwork-1
-rw-r--r-- 1 brock brock 19981 Aug 21 10:25 /tmp/kryo-mapredwork-2
{noformat}
                
> Hive plan serialization is slow
> -------------------------------
>
>                 Key: HIVE-1511
>                 URL: https://issues.apache.org/jira/browse/HIVE-1511
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Ning Zhang
>            Assignee: Mohammad Kamrul Islam
>         Attachments: HIVE-1511.4.patch, HIVE-1511.5.patch, HIVE-1511.6.patch, 
> HIVE-1511.7.patch, HIVE-1511.8.patch, HIVE-1511.patch, HIVE-1511-wip2.patch, 
> HIVE-1511-wip3.patch, HIVE-1511-wip4.patch, HIVE-1511-wip.patch
>
>
> As reported by Edward Capriolo:
> For reference I did this as a test case....
> SELECT * FROM src where
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
> OR key=0 OR key=0 OR key=0 OR
> ...(100 more of these)
> No OOM but I gave up after the test case did not go anywhere for about
> 2 minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to