Re: Multiple aggregated metrics in one query

Ryan LeCompte Sat, 10 Oct 2009 20:36:06 -0700

Again, this is now working.

Thanks,
Ryan



On Sat, Oct 10, 2009 at 9:30 PM, Ryan LeCompte <[email protected]> wrote:

> Ah, this time I'm running into a different issue.
>
> So I've created my Hive table and I'm now at the point where I want to load
> data into it from HDFS. However, I get the following error on the load data
> command:
>
> Loading data to table actions
> Failed with exception Wrong file format. Please check the file's format.
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
>
> Any ideas how to get more info on what's wrong? The file is a SequenceFile.
>
>
>
> On Sat, Oct 10, 2009 at 9:10 PM, Ryan LeCompte <[email protected]> wrote:
>
>> I was able to get this working -- just needed to adjust classpaths.
>> Thanks!
>>
>> Ryan
>>
>>
>>
>> On Sat, Oct 10, 2009 at 8:50 PM, Ryan LeCompte <[email protected]>wrote:
>>
>>> I printed out the classpath environment variables that I saw in the file,
>>> and the paths were valid... hmmm... is there something else I could try?
>>>
>>>
>>> On Sat, Oct 10, 2009 at 8:41 PM, Zheng Shao <[email protected]> wrote:
>>>
>>>> Try modify bin/hive and print out the last line in that file.
>>>> It should display some classpaths stuff, make sure those classpaths are
>>>> valid.
>>>>
>>>> Zheng
>>>>
>>>>
>>>> On Sat, Oct 10, 2009 at 5:14 PM, Ryan LeCompte <[email protected]>wrote:
>>>>
>>>>> Thank you!
>>>>>
>>>>> Very helpful.
>>>>>
>>>>> Another problem:
>>>>>
>>>>> I am trying to install Hive 0.4, and I'm coming across the following
>>>>> error when I try to start bin/hive after building:
>>>>>
>>>>>
>>>>> java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:247)
>>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
>>>>>     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>>     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.apache.hadoop.hive.conf.HiveConf
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>>>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>>>>>     ... 7 more
>>>>>
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>>
>>>>> On Sat, Oct 10, 2009 at 2:47 PM, Zheng Shao <[email protected]> wrote:
>>>>>
>>>>>> Yes, we can do this:
>>>>>>
>>>>>> SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action =
>>>>>> 'action2', 1, 0)), SUM(IF(action = 'action3', 1, 0))
>>>>>> FROM mytable
>>>>>> GROUP BY ip;
>>>>>>
>>>>>> For more details on IF, please refer to:
>>>>>> http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if
>>>>>>
>>>>>> Zheng
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> Very newto Hive (haven't even installed it yet!), but I had a use
>>>>>>> case that I didn't see demonstrated in any of the tutorial/documentation
>>>>>>> that I've read thus far.
>>>>>>>
>>>>>>> Let's say that I have apache logs that I want to process with
>>>>>>> Hadoop/Hive. Of course there may be different types of log records all 
>>>>>>> tying
>>>>>>> back to the same user or IP address or other log attribute. Is there a 
>>>>>>> way
>>>>>>> to submit a SINGLE Hive query to get back results that may look like:
>>>>>>>
>>>>>>>
>>>>>>> IP Action1Count Action2Count Action3Count
>>>>>>>
>>>>>>> .. where the different actions correspond to different log events for
>>>>>>> that IP address.
>>>>>>>
>>>>>>> Do I have to submit 3 different Hive queries here or can I submit a
>>>>>>> single Hive query? In a regular Java-based map/reduce job, I would have
>>>>>>> written a custom Writable that would record counts for each of the 
>>>>>>> different
>>>>>>> actions, and submit it to the reducer using output.collect(IP,
>>>>>>> customWritable). Here I wouldn't have to submit multiple map/reduce 
>>>>>>> jobs,
>>>>>>> just 1.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ryan
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Yours,
>>>>>> Zheng
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Yours,
>>>> Zheng
>>>>
>>>
>>>
>>
>

Re: Multiple aggregated metrics in one query

Reply via email to