[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

Owen O'Malley (JIRA) Mon, 09 Nov 2015 17:04:20 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-11981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997767#comment-14997767
 ]


Owen O'Malley commented on HIVE-11981:
--------------------------------------

Ok, we need to do some work to avoid tying ORC back into Hive. In particular, 
we need should add to OrcFile.ReaderOptions a method to set the desired schema. 
It should look like:

{code}
  /**
   * Define the schema that the reader should read as.
   */
  public ReaderOptions schema(TypeDescription schema);

 /**
  * The accessor for the schema to read as.
  */
  TypeDescription getSchema();
{code}

The OrcInputFormat should use the SchemaEvolution code to figure out whether to 
call ReaderOptions.schema on the underlying Reader. (You could also put this 
into Reader.Options, which is the options object used to create RecordReaders.) 
The critical piece is that OrcFile and the parts under it can't depend on 
anything from serde or io.



> ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)
> ------------------------------------------------------------------
>
>                 Key: HIVE-11981
>                 URL: https://issues.apache.org/jira/browse/HIVE-11981
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Transactions
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-11981.01.patch, HIVE-11981.02.patch, 
> HIVE-11981.03.patch, HIVE-11981.05.patch, HIVE-11981.06.patch, 
> HIVE-11981.07.patch, HIVE-11981.08.patch, HIVE-11981.09.patch, 
> HIVE-11981.091.patch, HIVE-11981.092.patch, HIVE-11981.093.patch, 
> HIVE-11981.094.patch, HIVE-11981.095.patch, HIVE-11981.096.patch, 
> HIVE-11981.097.patch, ORC Schema Evolution Issues.docx
>
>
> High priority issues with schema evolution for the ORC file format.
> Schema evolution here is limited to adding new columns and a few cases of 
> column type-widening (e.g. int to bigint).
> Renaming columns, deleting column, moving columns and other schema evolution 
> were not pursued due to lack of importance and lack of time.  Also, it 
> appears a much more sophisticated metadata would be needed to support them.
> The biggest issues for users have been adding new columns for ACID table 
> (HIVE-11421 Support Schema evolution for ACID tables) and vectorization 
> (HIVE-10598 Vectorization borks when column is added to table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11981) ORC Schema Evolution Issues (Vectorized, ACID, and Non-Vectorized)

Reply via email to