Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
hi Alex, I'm using jre 1.6.0_24 with hadoop 0.20.0 hive 0.80 thx 2012/2/1 alo alt wget.n...@googlemail.com Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote: hi Alex, I'm using jre 1.6.0_24 with hadoop 0.20.0 hive 0.80 thx 2012/2/1 alo alt wget.n...@googlemail.com Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote: hi Alex, I'm using jre 1.6.0_24 with hadoop 0.20.0 hive 0.80 thx 2012/2/1 alo alt wget.n...@googlemail.com Hi, + hdfs-user (bcc'd) which jre version u use? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote: hi , I'm using hive to do some log analysis, and I have encountered a problem. My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs the logs look like: 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted. 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted. 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted. 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted. this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down. to solve this problem, I have to restart the cluster. but obviously, this is not a good solution. these jobs eventually will be run on the other tasktracker, and they will run well, the job will success. has anybody have encountered this problem and give me some advices? and occasionally there will be some errlog like: 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201201311041_0071_r_-386575334 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211) at org.apache.hadoop.ipc.Server.access$2300(Server.java:77) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419) at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328) Is there some connections between these two errors? thank you very much! xiaobin
Can hive 0.8.1 work with hadoop 0.23.0?
Hi, I installed hadoop 0.23.0 which can work. The version of my hive is 0.8.1. The query like ‘select * from tablename’ can work. But an exception is thrown when executing query like ‘select col1 form tablename’. 2012-02-01 16:32:20,296 WARN mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(139)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2012-02-01 16:32:20,389 INFO mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(388)) - Cleaning up the staging area file:/tmp/hadoop-hadoop/mapred/staging/hadoop-469936305/.staging/job_local_0 001 2012-02-01 16:32:20,392 ERROR exec.ExecDriver (SessionState.java:printError(380)) - Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /home/hadoop/hive-0.8.1/lib/hive-builtins-0.8.1.jar)' java.io.FileNotFoundException: File does not exist: /home/hadoop/hive-0.8.1/lib/hive-builtins-0.8.1.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy stem.java:764) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileS tatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determin eTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter. java:246) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter. java:284) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java :355) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1159) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1156) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1152) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1156) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:189) Thanks, xiaofeng
hive with metastore limits maps per node
Hive list, I am facing a unique situation here where using hive (0.7.1) with a remote, external metastore (pgsql) is limiting the number of map slots per node to 1 (out of 25 slots available). Other map jobs are successfully utilizing all available slots, just hive jobs are limited across the 10-node cluster (CDH2). This was not occurring before, and we are not setting the max maps per node in the job.xml to force this. The only significant change was upgrading java to 1.6_30 (per other requirements). Thank you all for your time on this, Clint -- The information contained in this message may be privileged and/or confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting the material from any computer.
Re: Hive query result in sequence file
Andrew, This might come in handy: http://www.congiu.com/node/7 Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com Best Trading Platform - World Finance's Forex Awards 2009. The One to Watch - Treasury Today's Adam Smith Awards 2009. - Original Message - From: jingjung Ng jingjun...@gmail.com To: user@hive.apache.org Sent: Wednesday, January 25, 2012 1:47:12 PM Subject: Re: Hive query result in sequence file Thanks Aniket. I am pretty new to hive, any java example (serde) for archieving this ? -Andrew On Wed, Jan 25, 2012 at 12:12 AM, Aniket Mokashi aniket...@gmail.com wrote: You will have to do your own serde.. Hive can write it sequencefile but it will be Text with NULL(bytewritable) key. Thanks, Aniket On Tue, Jan 24, 2012 at 11:41 PM, jingjung Ng jingjun...@gmail.com wrote: Hi, I have following hive query (pseudo hive query code) select name, address, phone from t1 join t2 Executing above query will end up file stored in the format of name, address, phone format on the fie system (hdfs or local). However I'd like to write to either to a sequence file (key: name, value: address and phone). Is this possible, if so how could I do this ? Thank you. JingJung -- ...:::Aniket:::... Quetzalco@tl
Re: Invoke a UDAF inside another UDAF
thanks Mark, I ended up going the custom reducer way. I will try out the query you have sent. Regards, -- Rohan Monga On Wed, Feb 1, 2012 at 11:06 AM, Mark Grover mgro...@oanda.com wrote: Rohan, You could do it one of the following ways: 1) Write a UDAF that does the avg(f2 - avg_f2) computation. 2) Write a custom reducer that does the avg(f2 - avg_f2) computation. 3) Do it with multiple passes over the data. Something like this (untested): select table.f1, avg_table.avg_f2, avg(table.f2-avg_table.avg_f2) from ( select f1, avg(f2) as avg_f2 from table group by f1)avg_table join table ON (avg_table.f1=table.f1) group by table.f1, avg_table.avg_f2; Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com e: mgro...@oanda.com Best Trading Platform - World Finance's Forex Awards 2009. The One to Watch - Treasury Today's Adam Smith Awards 2009. - Original Message - From: rohan monga monga.ro...@gmail.com To: user@hive.apache.org Sent: Friday, January 20, 2012 6:00:54 PM Subject: Re: Invoke a UDAF inside another UDAF my bad, i hastily converted the query to a wrong example. it should be like this select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1; In essence, I just want to use the value generated by one UDAF ( in this case avg(f2) ) as a single number and then apply that value to the group inside a different UDAF. For e.g. if I were to use a streaming reducer, it would be something like this avg1 = computeSum(list) / len(list) return computeSum(x-avg1 for x in list) / len(list) As I write this I realize why this might not be possible [ the group computation being done in one step and the information being lost ] :) But why the nullpointer exception? Regards, -- Rohan Monga On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo edlinuxg...@gmail.com wrote: IMHO You can not possibly nest the percentile calculation because the results would be meaningless. percentile has to aggregate a set and pick the Nth element, But if you nest then the inner percentile only returns one result to the outer percentile, and that is pretty meaningless. (I think someone talked about this on list in the last month or so). Without seeing your input data and your expected results, i can not understand what your query wants to do, and suggest an alternative. On 1/20/12, rohan monga monga.ro...@gmail.com wrote: thanks edward that seems to work :) However, I have another query is like this select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx( b, .5), .5 ) from table1 group by a Here I will loose the group info if I include the inner query in the FROM clause, is there a way to get this to work? Thanks, -- Rohan Monga On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I think if you are grouping by b, b has to be in your select list. Try this. FROM ( select b,count(a) as theCount from table one group by b ) a select mean(theCount); I think that should work. On 1/20/12, rohan monga monga.ro...@gmail.com wrote: Hi, I am trying to run a query like select mean(count(a)) from table1 group by b; I am getting the following error snip FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480) at
RE: Hive ODBC - Microsofts Involvement
The Hive driver that Microsoft will be releasing is ODBC, so you should be able to interact with Hive just like you would with any other relational database. From: John Omernik [mailto:j...@omernik.com] Sent: Wednesday, February 01, 2012 3:22 PM To: user@hive.apache.org Subject: Hive ODBC - Microsofts Involvement Does anyone know if the driver Microsoft is talking about with their Azure based hadoop/hive setup would work for connecting Windows applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix? Looking for a way to connect .NET Web apps to Hive for some process flow upgrades. Thanks!
Re: Exception when hive submits M/R jobs
I have resolved this, so I ll share what the issue was, I had set HIVE_AUX_JARS_PATH in my hive-env.sh as HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar. The empty HIVE_AUX_JARS_PATH was causing the exception. The following fix made it work if [ -z $HIVE_AUX_JARS_PATH ]; then HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar else HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar Thanks, Sam On Jan 31, 2012, at 11:50 AM, Sam William wrote: I have a new Hive installation . Im able to create tables and do select * queries from them. But as soon as I try to execute a query that would involve a Hadoop M/R job, I get this exception . java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82) at org.apache.hadoop.fs.Path.init(Path.java:90) at org.apache.hadoop.fs.Path.init(Path.java:50) at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:608) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:713) at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:637) at org.apache.hadoop.mapred.JobClient.access$300(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:848) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807) The table is pretty simple . It is an external table on the HDFS and does not have any partitions. Any idea why this could be happening ? Thanks, Sam William sa...@stumbleupon.com Sam William sa...@stumbleupon.com
get_json_object escape characters
Is it possible to escape '.' in get_json_object? For example: {a.b: test} get_json_object(json, '$.a.b') will return NULL because it's looking for a nested object. Something like this would be nice: get_json_object(json, '$.a\\.b') Beyond changing how the json object is keyed, anything I can do that I'm missing? Thanks Sean
Re: Hive ODBC - Microsofts Involvement
I see that, but will that hive ODBC driver work with a standard hive install, or will it be limited to Microsoft's cloud version of Hadoop/Hive? Anyone tried the driver? On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.com wrote: The Hive driver that Microsoft will be releasing is ODBC, so you should be able to interact with Hive just like you would with any other relational database. ** ** *From:* John Omernik [mailto:j...@omernik.com] *Sent:* Wednesday, February 01, 2012 3:22 PM *To:* user@hive.apache.org *Subject:* Hive ODBC - Microsofts Involvement ** ** Does anyone know if the driver Microsoft is talking about with their Azure based hadoop/hive setup would work for connecting Windows applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix? Looking for a way to connect .NET Web apps to Hive for some process flow upgrades. ** ** Thanks! ** **
Re: Hive ODBC - Microsofts Involvement
Any reason you want to use a ODBC and not Thrift ? Hive supports the thrift protocol. There are thrift libraries for C# and you can easily integrate it into your project for direct access to HIVE via your C# code. On Wed, Feb 1, 2012 at 6:40 PM, John Omernik j...@omernik.com wrote: I see that, but will that hive ODBC driver work with a standard hive install, or will it be limited to Microsoft's cloud version of Hadoop/Hive? Anyone tried the driver? On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.comwrote: The Hive driver that Microsoft will be releasing is ODBC, so you should be able to interact with Hive just like you would with any other relational database. ** ** *From:* John Omernik [mailto:j...@omernik.com] *Sent:* Wednesday, February 01, 2012 3:22 PM *To:* user@hive.apache.org *Subject:* Hive ODBC - Microsofts Involvement ** ** Does anyone know if the driver Microsoft is talking about with their Azure based hadoop/hive setup would work for connecting Windows applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix? Looking for a way to connect .NET Web apps to Hive for some process flow upgrades. ** ** Thanks! ** **
Re: Hive ODBC - Microsofts Involvement
I've tried it. It seems to work fine, but with ODBC, you still need to send SQL commands to the server, and Hive SQL is incomplete and non-ansi compliant in many ways. This means that an application that uses ANSI SQL will not always generate Hive friendly queries. They do have an excel connector under development also, which you can get if you are on the beta. On Feb 1, 2012 9:41 PM, John Omernik j...@omernik.com wrote: I see that, but will that hive ODBC driver work with a standard hive install, or will it be limited to Microsoft's cloud version of Hadoop/Hive? Anyone tried the driver? On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.comwrote: The Hive driver that Microsoft will be releasing is ODBC, so you should be able to interact with Hive just like you would with any other relational database. ** ** *From:* John Omernik [mailto:j...@omernik.com] *Sent:* Wednesday, February 01, 2012 3:22 PM *To:* user@hive.apache.org *Subject:* Hive ODBC - Microsofts Involvement ** ** Does anyone know if the driver Microsoft is talking about with their Azure based hadoop/hive setup would work for connecting Windows applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix? Looking for a way to connect .NET Web apps to Hive for some process flow upgrades. ** ** Thanks! ** **
Re: Hive ODBC - Microsofts Involvement
Btw, i tried it on CDH3 hive. On Feb 1, 2012 10:02 PM, Chris Shain ch...@tresata.com wrote: I've tried it. It seems to work fine, but with ODBC, you still need to send SQL commands to the server, and Hive SQL is incomplete and non-ansi compliant in many ways. This means that an application that uses ANSI SQL will not always generate Hive friendly queries. They do have an excel connector under development also, which you can get if you are on the beta. On Feb 1, 2012 9:41 PM, John Omernik j...@omernik.com wrote: I see that, but will that hive ODBC driver work with a standard hive install, or will it be limited to Microsoft's cloud version of Hadoop/Hive? Anyone tried the driver? On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.comwrote: The Hive driver that Microsoft will be releasing is ODBC, so you should be able to interact with Hive just like you would with any other relational database. ** ** *From:* John Omernik [mailto:j...@omernik.com] *Sent:* Wednesday, February 01, 2012 3:22 PM *To:* user@hive.apache.org *Subject:* Hive ODBC - Microsofts Involvement ** ** Does anyone know if the driver Microsoft is talking about with their Azure based hadoop/hive setup would work for connecting Windows applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix? Looking for a way to connect .NET Web apps to Hive for some process flow upgrades. ** ** Thanks! ** **
Problem in creating table in hive
Hello all, I am trying for sqoop import from SQL Server into Hive. When I execute sqoop-import command, the problem is that import task gets completed, I can see the complete data on HDFS (under /user/hive/warehouse/table_name_dir) but when I execute SHOW TABLES command in HIve CLI I am not able to see the table in the list. (Once I have tried like: after importing the table same thing happened as above then I run the CREATE TABLE QUERY in Hive CLI taking the same field as that was in imported one and I was able to see and access the table in Hive CLI. But I dont think this is effective way every time I perform). Pls suggest me some solution Is there any step I missed out or is there any problem? I am not getting why it is happening? Thanks to all -- Regards, Bhavesh Shah