Re: Case Studies for 'Programming Hive' book from O'Reilly
Hi Jason, I'd be happy to discuss in your O'Reilly book our case study using HIVE for Regional Climate Modeling. Let me know how to proceed. Thank you! Cheers, Chris On Apr 11, 2012, at 10:48 AM, Jason Rutherglen wrote: > Dear Hive User, > > We want your interesting case study for our upcoming book titled > 'Programming Hive' from O'Reilly. > > How you use Hive, either high level or low level code details are both > encouraged! > > Feel free to reach out with a brief abstract. > > Regards, > > Jason Rutherglen ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Case Studies for 'Programming Hive' book from O'Reilly
Hi Jason, I work for an international organization involved in the mobilization of biodiversity data (specifically we are dealing a lot with observations of species) so think of it as a lot of point based information with metadata tags. We have built an Oozie workflow that uses Sqoop to suck in a few databases and then does a big transformation and set of quality control which we did using Hive and some custom UDFs. There is a blog introducing this on http://www.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/ All our work and data are open, so I can freely write about any of it, and can link to real production code in Google svn. If it would be of interest to you I am happy to discuss what would be most useful to help write up for your book. Some possible angles you might consider: - real UDFs in action (e.g. parsing species scientific names) - UDTFs to generate a Google map tile cache - Hive in an ETL workflow to remove load from DBs - The pros and cons of calling web services from a UDF (we do it, but it keeps concerns clean and accept the risk of a DDoS we can control) - Sqoop and Hive together - We are getting into Hive on HBase and have found UDFs can help with type safety since we aren't running HIVE-1634 [with the advancements in Hive 0.9 I would think our workarounds are not worth documenting] - Metrics illustrating the importance of join order, and knowing data cardinality to ensure decent performance. Hope this is of interest, Tim On Wed, Apr 11, 2012 at 7:48 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Dear Hive User, > > We want your interesting case study for our upcoming book titled > 'Programming Hive' from O'Reilly. > > How you use Hive, either high level or low level code details are both > encouraged! > > Feel free to reach out with a brief abstract. > > Regards, > > Jason Rutherglen >