Re: Case Studies for 'Programming Hive' book from O'Reilly

2012-04-17 Thread Mattmann, Chris A (388J)
Hi Jason,

I'd be happy to discuss in your O'Reilly book our case study using 
HIVE for Regional Climate Modeling.

Let me know how to proceed.

Thank you!

Cheers,
Chris

On Apr 11, 2012, at 10:48 AM, Jason Rutherglen wrote:

 Dear Hive User,
 
 We want your interesting case study for our upcoming book titled
 'Programming Hive' from O'Reilly.
 
 How you use Hive, either high level or low level code details are both
 encouraged!
 
 Feel free to reach out with a brief abstract.
 
 Regards,
 
 Jason Rutherglen


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Case Studies for 'Programming Hive' book from O'Reilly

2012-04-15 Thread Tim Robertson
Hi Jason,

I work for an international organization involved in the mobilization of
biodiversity data (specifically we are dealing a lot with observations of
species) so think of it as a lot of point based information with metadata
tags.  We have built an Oozie workflow that uses Sqoop to suck in a few
databases and then does a big transformation and set of quality control
which we did using Hive and some custom UDFs.  There is a blog introducing
this on
http://www.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/

All our work and data are open, so I can freely write about any of it, and
can link to real production code in Google svn.

If it would be of interest to you I am happy to discuss what would be most
useful to help write up for your book.  Some possible angles you might
consider:
- real UDFs in action (e.g. parsing species scientific names)
- UDTFs to generate a Google map tile cache
- Hive in an ETL workflow to remove load from DBs
- The pros and cons of calling web services from a UDF (we do it, but it
keeps concerns clean and accept the risk of a DDoS we can control)
- Sqoop and Hive together
- We are getting into Hive on HBase and have found UDFs can help with type
safety since we aren't running HIVE-1634
  [with the advancements in Hive 0.9 I would think our workarounds are not
worth documenting]
- Metrics illustrating the importance of join order, and knowing data
cardinality to ensure decent performance.

Hope this is of interest,
Tim





On Wed, Apr 11, 2012 at 7:48 PM, Jason Rutherglen 
jason.rutherg...@gmail.com wrote:

 Dear Hive User,

 We want your interesting case study for our upcoming book titled
 'Programming Hive' from O'Reilly.

 How you use Hive, either high level or low level code details are both
 encouraged!

 Feel free to reach out with a brief abstract.

 Regards,

 Jason Rutherglen



Case Studies for 'Programming Hive' book from O'Reilly

2012-04-11 Thread Jason Rutherglen
Dear Hive User,

We want your interesting case study for our upcoming book titled
'Programming Hive' from O'Reilly.

How you use Hive, either high level or low level code details are both
encouraged!

Feel free to reach out with a brief abstract.

Regards,

Jason Rutherglen