Re: Optimizing hive queries
On Thu, Mar 28, 2013 at 11:08 PM, Jagat Singh wrote: > Hello Owen, > > Thanks for your reply. > > I am seeing its providing the advantage which Avro provided , of adding > and removing fields. > ORC files like Avro files are self-describing. They include the type structure of the records in the metadata of the file. It will take more integration work with hive to make the schemas very flexible with ORC. > Can you please write some sample code for hive table which is partitioned > and each partitioned has different schema. > As with all tables: create table people (first_name string, last_name string) partitioned by (state string); load data local inpath 'part-0' overwrite into table people partition (state='ca'); alter table people add columns (address string); load data local inpath 'part-1' overwrite into table people partition (state='nv'); You'll end up with the first partition with 2 columns (and thus implicitly the third one is null) and the second partition with 3 columns. -- Owen > > I tried searching but could not find any example. > > Thanks in advance for your help. > > Regards, > > Jagat Singh > > > On Fri, Mar 29, 2013 at 4:48 PM, Owen O'Malley wrote: > >> Actually, Hive already has the ability to have different schemas for >> different partitions. (Although of course it would be nice to have the >> alter table be more flexible!) >> >> The "versioned metadata" means that the ORC file's metadata is stored in >> ProtoBufs so that we can add (or remove) fields to the metadata. That means >> that for some changes to ORC file format we can provide both forward and >> backward compatibility. >> >> -- Owen >> >> >> On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh wrote: >> >>> Hello Nitin, >>> >>> Thanks for sharing. >>> >>> Do we have more details on >>> >>> Versioned metadata feature of ORC ? , is it like handling varying >>> schemas in Hive? >>> >>> Regards, >>> >>> Jagat Singh >>> >>> >>> >>> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar wrote: >>> >>>> >>>> Hi, >>>> >>>> Here is is a nice presentation from Owen from Hortonworks on >>>> "Optimizing hive queries" >>>> >>>> http://www.slideshare.net/oom65/optimize-hivequeriespptx >>>> >>>> >>>> >>>> Thanks, >>>> Nitin Pawar >>>> >>> >>> >> >
Re: Optimizing hive queries
Hello Owen, Thanks for your reply. I am seeing its providing the advantage which Avro provided , of adding and removing fields. Can you please write some sample code for hive table which is partitioned and each partitioned has different schema. I tried searching but could not find any example. Thanks in advance for your help. Regards, Jagat Singh On Fri, Mar 29, 2013 at 4:48 PM, Owen O'Malley wrote: > Actually, Hive already has the ability to have different schemas for > different partitions. (Although of course it would be nice to have the > alter table be more flexible!) > > The "versioned metadata" means that the ORC file's metadata is stored in > ProtoBufs so that we can add (or remove) fields to the metadata. That means > that for some changes to ORC file format we can provide both forward and > backward compatibility. > > -- Owen > > > On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh wrote: > >> Hello Nitin, >> >> Thanks for sharing. >> >> Do we have more details on >> >> Versioned metadata feature of ORC ? , is it like handling varying schemas >> in Hive? >> >> Regards, >> >> Jagat Singh >> >> >> >> On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar wrote: >> >>> >>> Hi, >>> >>> Here is is a nice presentation from Owen from Hortonworks on "Optimizing >>> hive queries" >>> >>> http://www.slideshare.net/oom65/optimize-hivequeriespptx >>> >>> >>> >>> Thanks, >>> Nitin Pawar >>> >> >> >
Re: Optimizing hive queries
Actually, Hive already has the ability to have different schemas for different partitions. (Although of course it would be nice to have the alter table be more flexible!) The "versioned metadata" means that the ORC file's metadata is stored in ProtoBufs so that we can add (or remove) fields to the metadata. That means that for some changes to ORC file format we can provide both forward and backward compatibility. -- Owen On Thu, Mar 28, 2013 at 10:25 PM, Jagat Singh wrote: > Hello Nitin, > > Thanks for sharing. > > Do we have more details on > > Versioned metadata feature of ORC ? , is it like handling varying schemas > in Hive? > > Regards, > > Jagat Singh > > > > On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar wrote: > >> >> Hi, >> >> Here is is a nice presentation from Owen from Hortonworks on "Optimizing >> hive queries" >> >> http://www.slideshare.net/oom65/optimize-hivequeriespptx >> >> >> >> Thanks, >> Nitin Pawar >> > >
Re: Optimizing hive queries
I could just find this link http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html according to this, the metadata is handled by protobuf which allows of adding/removing fields. On Fri, Mar 29, 2013 at 10:55 AM, Jagat Singh wrote: > Hello Nitin, > > Thanks for sharing. > > Do we have more details on > > Versioned metadata feature of ORC ? , is it like handling varying schemas > in Hive? > > Regards, > > Jagat Singh > > > > On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar wrote: > >> >> Hi, >> >> Here is is a nice presentation from Owen from Hortonworks on "Optimizing >> hive queries" >> >> http://www.slideshare.net/oom65/optimize-hivequeriespptx >> >> >> >> Thanks, >> Nitin Pawar >> > > -- Nitin Pawar
Re: Optimizing hive queries
Hello Nitin, Thanks for sharing. Do we have more details on Versioned metadata feature of ORC ? , is it like handling varying schemas in Hive? Regards, Jagat Singh On Fri, Mar 29, 2013 at 4:16 PM, Nitin Pawar wrote: > > Hi, > > Here is is a nice presentation from Owen from Hortonworks on "Optimizing > hive queries" > > http://www.slideshare.net/oom65/optimize-hivequeriespptx > > > > Thanks, > Nitin Pawar >
Optimizing hive queries
Hi, Here is is a nice presentation from Owen from Hortonworks on "Optimizing hive queries" http://www.slideshare.net/oom65/optimize-hivequeriespptx Thanks, Nitin Pawar