Re: Parquet support (HIVE-5783)

Lefty Leverenz Thu, 20 Feb 2014 23:28:28 -0800

This is in the Terminology
section<https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-Terminology>
of
the Storage Handlers doc:


Storage handlers introduce a distinction between *native* and
*non-native* tables.
> A native table is one which Hive knows how to manage and access without a
> storage handler; a non-native table is one which requires a storage handler.


It goes on to say that non-native tables are created with a STORED BY
clause (as opposed to a STORED AS clause).

Does that clarify or muddy the waters?


-- Lefty


On Thu, Feb 20, 2014 at 7:37 PM, Lefty Leverenz <leftylever...@gmail.com>wrote:

> Some of these issues can be addressed in the documentation.  The "File
> Formats" section of the Language Manual needs an overview, and that might
> be a good place to explain the differences between Hive-owned formats and
> external formats.  Or the SerDe doc could be beefed up:  Built-In 
> SerDes<https://cwiki.apache.org/confluence/display/Hive/SerDe#SerDe-Built-inSerDes>
> .
>
> In the meantime, I've added a link to the Avro doc in the "File Formats"
> list and mentioned Parquet in DDL's Row Format, Storage Format, and 
> SerDe<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>section:
>
> Use STORED AS PARQUET (without ROW FORMAT SERDE) for the 
> Parquet<https://cwiki.apache.org/confluence/display/Hive/Parquet> columnar
>> storage format in Hive 0.13.0 and 
>> later<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.13andlater>;
>> or use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ... in 
>> Hive
>> 0.10, 0.11, or 
>> 0.12<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.10-0.12>
>> .
>
>
> Does that work?
>
> -- Lefty
>
>
> On Tue, Feb 18, 2014 at 1:31 PM, Brock Noland <br...@cloudera.com> wrote:
>
>> Hi Alan,
>>
>> Response is inline, below:
>>
>> On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <ga...@hortonworks.com>
>> wrote:
>> > Gunther, is it the case that there is anything extra that needs to be
>> done to ship Parquet code with Hive right now?  If I read the patch
>> correctly the Parquet jars were added to the pom and thus will be shipped
>> as part of Hive.  As long as it works out of the box when a user says
>> "create table ... stored as parquet" why do we care whether the parquet jar
>> is owned by Hive or another project?
>> >
>> > The concern about feature mismatch in Parquet versus Hive is valid, but
>> I'm not sure what to do about it other than assure that there are good
>> error messages.  Users will often want to use non-Hive based storage
>> formats (Parquet, Avro, etc.).  This means we need a good way to detect at
>> SQL compile time that the underlying storage doesn't support the indicated
>> data type and throw a good error.
>>
>> Agreed, the error messages should absolutely be good. I will ensure
>> this is the case via https://issues.apache.org/jira/browse/HIVE-6457
>>
>> >
>> > Also, it's important to be clear going forward about what Hive as a
>> project is signing up for.  If tomorrow someone decides to add a new
>> datatype or feature we need to be clear that we expect the contributor to
>> make this work for Hive owned formats (text, RC, sequence, ORC) but not
>> necessarily for external formats
>>
>> This makes sense to me.
>>
>> I'd just like to add that I have a patch available to improve the
>> hive-exec uber jar and general query speed:
>> https://issues.apache.org/jira/browse/HIVE-860. Additionally I have a
>> patch available to finish the generic STORED AS functionality:
>> https://issues.apache.org/jira/browse/HIVE-5976
>>
>> Brock
>>
>
>

Re: Parquet support (HIVE-5783)

Reply via email to