Yes, we can do this: SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2', 1, 0)), SUM(IF(action = 'action3', 1, 0)) FROM mytable GROUP BY ip;
For more details on IF, please refer to: http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if Zheng On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte <[email protected]> wrote: > Hello all, > > Very newto Hive (haven't even installed it yet!), but I had a use case that > I didn't see demonstrated in any of the tutorial/documentation that I've > read thus far. > > Let's say that I have apache logs that I want to process with Hadoop/Hive. > Of course there may be different types of log records all tying back to the > same user or IP address or other log attribute. Is there a way to submit a > SINGLE Hive query to get back results that may look like: > > > IP Action1Count Action2Count Action3Count > > .. where the different actions correspond to different log events for that > IP address. > > Do I have to submit 3 different Hive queries here or can I submit a single > Hive query? In a regular Java-based map/reduce job, I would have written a > custom Writable that would record counts for each of the different actions, > and submit it to the reducer using output.collect(IP, customWritable). Here > I wouldn't have to submit multiple map/reduce jobs, just 1. > > Thanks > Ryan > > -- Yours, Zheng
