[ https://issues.apache.org/jira/browse/HIVE-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438723#comment-16438723 ]
Matt McCline edited comment on HIVE-19200 at 4/15/18 2:41 PM: -------------------------------------------------------------- Patch #1 build cratered for some unknown reason after * 2 days 1 hr waiting in the queue; was (Author: mmccline): Patch #1 build cratered for some unknown reason. > Vectorization: Disable vectorization for LLAP I/O when a > non-VECTORIZED_INPUT_FILE_FORMAT mode is needed (i.e. rows) and data type > conversion is needed > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-19200 > URL: https://issues.apache.org/jira/browse/HIVE-19200 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 3.0.0 > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-19200.01.patch, HIVE-19200.02.patch > > > Disable vectorization for issue in HIVE-18763 until we can do the harder VRB > conversion code. > The main changes are: > 1) In the Vectorizer, detect if data type conversion is needed between the > partition and the desired table schema. If so and LLAP I/O is enabled that > does encoded catching, then do not vectorize. Why? When LLAP I/O is in > encoded catching mode, it delivers VectorizedRowBatch (VRB) to the > VectorMapOperator instead of (object) rows. We currently do not have logic > for converting VRBs. So, we either get Wrong Results or more likely > ClassCastException on the expected vs actual ColumnVector columns. > 2) Cleaned up error message logic.that was suppressing the new message from > the EXPLAIN VECTORIZATION display. > 3) NOTE: Some of the SELECT statements in the schema_evol_test*.q are > commented out because I bumped into a another bug. I'll file that one soon > and add comments to the Q files. > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > The longer-term solution can be done later in steps: > 1) Write a new code that can take a VectorizedRowBatch (VRB) and convert > columns to different data types. This is needed when LLAP is doing its > encoding / caching and feeds VRBs to VectorMapOperator instead of rows. > Similar to what MapOperator does today, VectorMapOperator would need to be > enhanced to convert partition VRBs into the table schema VRBs that the vector > operator tree expect. > 2) Today, vectorization logic is strictly positional based. It insists that > the partition columns have the same names as the table schema. The > MapOperator (and ORC) does more general conversion that uses column names > instead of column position. We'd need to enhance all 3 classes to handle > column name based conversion. The 3 classes are: the new VRB-to-VRB > conversion class, VectorDeserializeRow, and VectorAssignRow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)