This is in the Terminology section<https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-Terminology> of the Storage Handlers doc:
Storage handlers introduce a distinction between *native* and *non-native* tables. > A native table is one which Hive knows how to manage and access without a > storage handler; a non-native table is one which requires a storage handler. It goes on to say that non-native tables are created with a STORED BY clause (as opposed to a STORED AS clause). Does that clarify or muddy the waters? -- Lefty On Thu, Feb 20, 2014 at 7:37 PM, Lefty Leverenz <leftylever...@gmail.com>wrote: > Some of these issues can be addressed in the documentation. The "File > Formats" section of the Language Manual needs an overview, and that might > be a good place to explain the differences between Hive-owned formats and > external formats. Or the SerDe doc could be beefed up: Built-In > SerDes<https://cwiki.apache.org/confluence/display/Hive/SerDe#SerDe-Built-inSerDes> > . > > In the meantime, I've added a link to the Avro doc in the "File Formats" > list and mentioned Parquet in DDL's Row Format, Storage Format, and > SerDe<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>section: > > Use STORED AS PARQUET (without ROW FORMAT SERDE) for the > Parquet<https://cwiki.apache.org/confluence/display/Hive/Parquet> columnar >> storage format in Hive 0.13.0 and >> later<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.13andlater>; >> or use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ... in >> Hive >> 0.10, 0.11, or >> 0.12<https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-Hive0.10-0.12> >> . > > > Does that work? > > -- Lefty > > > On Tue, Feb 18, 2014 at 1:31 PM, Brock Noland <br...@cloudera.com> wrote: > >> Hi Alan, >> >> Response is inline, below: >> >> On Tue, Feb 18, 2014 at 11:49 AM, Alan Gates <ga...@hortonworks.com> >> wrote: >> > Gunther, is it the case that there is anything extra that needs to be >> done to ship Parquet code with Hive right now? If I read the patch >> correctly the Parquet jars were added to the pom and thus will be shipped >> as part of Hive. As long as it works out of the box when a user says >> "create table ... stored as parquet" why do we care whether the parquet jar >> is owned by Hive or another project? >> > >> > The concern about feature mismatch in Parquet versus Hive is valid, but >> I'm not sure what to do about it other than assure that there are good >> error messages. Users will often want to use non-Hive based storage >> formats (Parquet, Avro, etc.). This means we need a good way to detect at >> SQL compile time that the underlying storage doesn't support the indicated >> data type and throw a good error. >> >> Agreed, the error messages should absolutely be good. I will ensure >> this is the case via https://issues.apache.org/jira/browse/HIVE-6457 >> >> > >> > Also, it's important to be clear going forward about what Hive as a >> project is signing up for. If tomorrow someone decides to add a new >> datatype or feature we need to be clear that we expect the contributor to >> make this work for Hive owned formats (text, RC, sequence, ORC) but not >> necessarily for external formats >> >> This makes sense to me. >> >> I'd just like to add that I have a patch available to improve the >> hive-exec uber jar and general query speed: >> https://issues.apache.org/jira/browse/HIVE-860. Additionally I have a >> patch available to finish the generic STORED AS functionality: >> https://issues.apache.org/jira/browse/HIVE-5976 >> >> Brock >> > >