Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread alo alt
Hi,

+ hdfs-user (bcc'd)

which jre version u use?

- Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

 hi ,
 
 
 I'm using hive to do some log analysis, and I have encountered a problem.
 
 My cluster have 3 nodes, one for NameNode/JobTracker and the other two for 
 DataNode/TaskTracker
 
 One of the tasktracker will repeatedly receive KillJobAction and then delete 
 unknown jobs
 
 the logs look like:
 
 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0381
 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0381 being deleted.
 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0383
 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0383 being deleted.
 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0384
 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0384 being deleted.
 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0385
 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0385 being deleted.  
 
 this happens occasionally, and if this happens, this tasktracker will do 
 notghing but keep receiveing KillJobAction and delete unknown job, and thus 
 the performance will drop down.
 
 to solve this problem, I have to restart the cluster.
 but obviously, this is not a good solution.
 
 these jobs eventually will be run on the other tasktracker, and they will run 
 well, the job will success.
 
 has anybody have encountered this problem and give me some advices?
 
 and occasionally there will be some errlog like:
 
 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
 at sun.nio.ch.IOUtil.read(IOUtil.java:175)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
 at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
 at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
 at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
 jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing 
 unknown JVM jvm_201201311041_0071_r_-386575334
 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
 at sun.nio.ch.IOUtil.read(IOUtil.java:175)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
 at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
 at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
 at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
 at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
 at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)  
 
 Is there some connections between these two errors?
 
 thank you very much!
 
 xiaobin



Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread Xiaobin She
hi Alex,

I'm using jre 1.6.0_24

with hadoop 0.20.0
hive 0.80

thx


2012/2/1 alo alt wget.n...@googlemail.com

 Hi,

 + hdfs-user (bcc'd)

 which jre version u use?

 - Alex

 --
 Alexander Lorenz
 http://mapredit.blogspot.com

 On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:

  hi ,
 
 
  I'm using hive to do some log analysis, and I have encountered a problem.
 
  My cluster have 3 nodes, one for NameNode/JobTracker and the other two
 for DataNode/TaskTracker
 
  One of the tasktracker will repeatedly receive KillJobAction and then
 delete unknown jobs
 
  the logs look like:
 
  2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0381
  2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0381 being deleted.
  2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0383
  2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0383 being deleted.
  2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0384
  2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0384 being deleted.
  2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker:
 Received 'KillJobAction' for job: job_201201301055_0385
  2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker:
 Unknown job job_201201301055_0385 being deleted.
 
  this happens occasionally, and if this happens, this tasktracker will do
 notghing but keep receiveing KillJobAction and delete unknown job, and thus
 the performance will drop down.
 
  to solve this problem, I have to restart the cluster.
  but obviously, this is not a good solution.
 
  these jobs eventually will be run on the other tasktracker, and they
 will run well, the job will success.
 
  has anybody have encountered this problem and give me some advices?
 
  and occasionally there will be some errlog like:
 
  2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 55837: readAndProcess threw exception java.io.IOException:
 Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
  2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM :
 jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
  2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker:
 Killing unknown JVM jvm_201201311041_0071_r_-386575334
  2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 55837: readAndProcess threw exception java.io.IOException:
 Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 
  Is there some connections between these two errors?
 
  thank you very much!
 
  xiaobin




Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread alo alt
How much namenode handler (dfs.namenode.handler.count) you have defined for 
your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

 
 hi Alex,
 
 I'm using jre 1.6.0_24
 
 with hadoop 0.20.0
 hive 0.80
 
 thx
 
 
 2012/2/1 alo alt wget.n...@googlemail.com
 Hi,
 
 + hdfs-user (bcc'd)
 
 which jre version u use?
 
 - Alex
 
 --
 Alexander Lorenz
 http://mapredit.blogspot.com
 
 On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
 
  hi ,
 
 
  I'm using hive to do some log analysis, and I have encountered a problem.
 
  My cluster have 3 nodes, one for NameNode/JobTracker and the other two for 
  DataNode/TaskTracker
 
  One of the tasktracker will repeatedly receive KillJobAction and then 
  delete unknown jobs
 
  the logs look like:
 
  2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0381
  2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0381 being deleted.
  2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0383
  2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0383 being deleted.
  2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0384
  2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0384 being deleted.
  2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 
  'KillJobAction' for job: job_201201301055_0385
  2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
  job job_201201301055_0385 being deleted.
 
  this happens occasionally, and if this happens, this tasktracker will do 
  notghing but keep receiveing KillJobAction and delete unknown job, and thus 
  the performance will drop down.
 
  to solve this problem, I have to restart the cluster.
  but obviously, this is not a good solution.
 
  these jobs eventually will be run on the other tasktracker, and they will 
  run well, the job will success.
 
  has anybody have encountered this problem and give me some advices?
 
  and occasionally there will be some errlog like:
 
  2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server 
  listener on 55837: readAndProcess threw exception java.io.IOException: 
  Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at 
  org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
  2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
  jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
  2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing 
  unknown JVM jvm_201201311041_0071_r_-386575334
  2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server 
  listener on 55837: readAndProcess threw exception java.io.IOException: 
  Connection reset by peer. Count of bytes read: 0
  java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
  at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
  at 
  org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
  at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 
  Is there some connections between these two errors?
 
  thank you very much!
 
  xiaobin
 
 



Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive

2012-02-01 Thread alo alt
How much namenode handler (dfs.namenode.handler.count) you have defined for 
your cluster?

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:

 
 hi Alex,
 
 I'm using jre 1.6.0_24
 
 with hadoop 0.20.0
 hive 0.80
 
 thx
 
 
 2012/2/1 alo alt wget.n...@googlemail.com
 Hi,
 
 + hdfs-user (bcc'd)
 
 which jre version u use?
 
 - Alex
 
 --
 Alexander Lorenz
 http://mapredit.blogspot.com
 
 On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
 
 hi ,
 
 
 I'm using hive to do some log analysis, and I have encountered a problem.
 
 My cluster have 3 nodes, one for NameNode/JobTracker and the other two for 
 DataNode/TaskTracker
 
 One of the tasktracker will repeatedly receive KillJobAction and then delete 
 unknown jobs
 
 the logs look like:
 
 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0381
 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0381 being deleted.
 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0383
 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0383 being deleted.
 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0384
 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0384 being deleted.
 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 
 'KillJobAction' for job: job_201201301055_0385
 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown 
 job job_201201301055_0385 being deleted.
 
 this happens occasionally, and if this happens, this tasktracker will do 
 notghing but keep receiveing KillJobAction and delete unknown job, and thus 
 the performance will drop down.
 
 to solve this problem, I have to restart the cluster.
 but obviously, this is not a good solution.
 
 these jobs eventually will be run on the other tasktracker, and they will 
 run well, the job will success.
 
 has anybody have encountered this problem and give me some advices?
 
 and occasionally there will be some errlog like:
 
 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 2012-01-31 13:11:40,211 INFO org.apache.hadoop.mapred.JvmManager: JVM : 
 jvm_201201311041_0071_r_-1096994286 exited. Number of tasks it ran: 0
 2012-01-31 13:11:40,214 INFO org.apache.hadoop.mapred.TaskTracker: Killing 
 unknown JVM jvm_201201311041_0071_r_-386575334
 2012-01-31 13:11:40,221 INFO org.apache.hadoop.ipc.Server: IPC Server 
 listener on 55837: readAndProcess threw exception java.io.IOException: 
 Connection reset by peer. Count of bytes read: 0
 java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
at org.apache.hadoop.ipc.Server.access$2300(Server.java:77)
at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:799)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:419)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:328)
 
 Is there some connections between these two errors?
 
 thank you very much!
 
 xiaobin
 
 



Can hive 0.8.1 work with hadoop 0.23.0?

2012-02-01 Thread 张晓峰
Hi,

 

I installed hadoop 0.23.0 which can work.

The version of my hive is 0.8.1. The query like ‘select * from tablename’
can work. But an exception is thrown when executing query like ‘select col1
form tablename’.

 

2012-02-01 16:32:20,296 WARN  mapreduce.JobSubmitter
(JobSubmitter.java:copyAndConfigureFiles(139)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.

2012-02-01 16:32:20,389 INFO  mapreduce.JobSubmitter
(JobSubmitter.java:submitJobInternal(388)) - Cleaning up the staging area
file:/tmp/hadoop-hadoop/mapred/staging/hadoop-469936305/.staging/job_local_0
001

2012-02-01 16:32:20,392 ERROR exec.ExecDriver
(SessionState.java:printError(380)) - Job Submission failed with exception
'java.io.FileNotFoundException(File does not exist:
/home/hadoop/hive-0.8.1/lib/hive-builtins-0.8.1.jar)'

java.io.FileNotFoundException: File does not exist:
/home/hadoop/hive-0.8.1/lib/hive-builtins-0.8.1.jar

at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSy
stem.java:764)

at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileS
tatus(ClientDistributedCacheManager.java:208)

at
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determin
eTimestamps(ClientDistributedCacheManager.java:71)

at
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.
java:246)

at
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.
java:284)

at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java
:355)

at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1159)

at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1156)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1152)

at org.apache.hadoop.mapreduce.Job.submit(Job.java:1156)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:571)

at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)

at
org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:710)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.RunJar.main(RunJar.java:189)

 

Thanks,

xiaofeng

 



hive with metastore limits maps per node

2012-02-01 Thread Clint Green
Hive list,

 

I am facing a unique situation here where using hive (0.7.1) with a remote,
external metastore (pgsql) is limiting the number of map slots per node to 1
(out of 25 slots available).  Other map jobs are successfully utilizing all
available slots, just hive jobs are limited across the 10-node cluster
(CDH2).

 

This was not occurring before, and we are not setting the max maps per node
in the job.xml to force this.

 

The only significant change was upgrading java to 1.6_30 (per other
requirements).

 

Thank you all for your time on this,

 

Clint

 

--

 

The information contained in this message may be privileged and/or
confidential and protected from disclosure. If the reader of this message is
not the intended recipient or an employee or agent responsible for
delivering this message to the intended recipient, you are hereby notified
that any dissemination, distribution or copying of this communication is
strictly prohibited. If you have received this communication in error,
please notify the sender immediately by replying to this message and
deleting the material from any computer. 

 

 



Re: Hive query result in sequence file

2012-02-01 Thread Mark Grover
Andrew,
This might come in handy:
http://www.congiu.com/node/7

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgro...@oanda.com 

Best Trading Platform - World Finance's Forex Awards 2009. 
The One to Watch - Treasury Today's Adam Smith Awards 2009. 


- Original Message -
From: jingjung Ng jingjun...@gmail.com
To: user@hive.apache.org
Sent: Wednesday, January 25, 2012 1:47:12 PM
Subject: Re: Hive query result in sequence file

Thanks Aniket. 


I am pretty new to hive, any java example (serde) for archieving this ? 


-Andrew 



On Wed, Jan 25, 2012 at 12:12 AM, Aniket Mokashi  aniket...@gmail.com  wrote: 


You will have to do your own serde.. 


Hive can write it sequencefile but it will be Text with NULL(bytewritable) key. 


Thanks, 
Aniket 




On Tue, Jan 24, 2012 at 11:41 PM, jingjung Ng  jingjun...@gmail.com  wrote: 


Hi, 

I have following hive query (pseudo hive query code) 

select name, address, phone from t1 join t2 

Executing above query will end up file stored in the format of name, address, 
phone format on the fie system (hdfs or local). 

However I'd like to write to either to a sequence file (key: name, value: 
address and phone). 

Is this possible, if so how could I do this ? 


Thank you. 

JingJung 




-- 
...:::Aniket:::... Quetzalco@tl 



Re: Invoke a UDAF inside another UDAF

2012-02-01 Thread rohan monga
thanks Mark,
I ended up going the custom reducer way. I will try out the query you have
sent.

Regards,
--
Rohan Monga


On Wed, Feb 1, 2012 at 11:06 AM, Mark Grover mgro...@oanda.com wrote:

 Rohan,
 You could do it one of the following ways:
 1) Write a UDAF that does the avg(f2 - avg_f2) computation.
 2) Write a custom reducer that does the avg(f2 - avg_f2) computation.
 3) Do it with multiple passes over the data. Something like this
 (untested):

 select
   table.f1,
   avg_table.avg_f2,
   avg(table.f2-avg_table.avg_f2)
 from
 (
 select
   f1,
   avg(f2) as avg_f2
 from
   table
 group by
   f1)avg_table
 join
 table
 ON (avg_table.f1=table.f1)
 group by
   table.f1,
   avg_table.avg_f2;

 Mark

 Mark Grover, Business Intelligence Analyst
 OANDA Corporation

 www: oanda.com www: fxtrade.com
 e: mgro...@oanda.com

 Best Trading Platform - World Finance's Forex Awards 2009.
 The One to Watch - Treasury Today's Adam Smith Awards 2009.


 - Original Message -
 From: rohan monga monga.ro...@gmail.com
 To: user@hive.apache.org
 Sent: Friday, January 20, 2012 6:00:54 PM
 Subject: Re: Invoke a UDAF inside another UDAF

 my bad, i hastily converted the query to a wrong example.

 it should be like this

 select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1;

 In essence, I just want to use the value generated by one UDAF ( in this
 case avg(f2) ) as a single number and then apply that value to the group
 inside a different UDAF.
 For e.g. if I were to use a streaming reducer, it would be something like
 this

 avg1 = computeSum(list) / len(list)
 return computeSum(x-avg1 for x in list) / len(list)

 As I write this I realize why this might not be possible [ the group
 computation being done in one step and the information being lost ] :)

 But why the nullpointer exception?

 Regards,
 --
 Rohan Monga



 On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo  edlinuxg...@gmail.com 
 wrote:


 IMHO You can not possibly nest the percentile calculation because the
 results would be meaningless. percentile has to aggregate a set and
 pick the Nth element, But if you nest then the inner percentile only
 returns one result to the outer percentile, and that is pretty
 meaningless.

 (I think someone talked about this on list in the last month or so).
 Without seeing your input data and your expected results, i can not
 understand what your query wants to do, and suggest an alternative.





 On 1/20/12, rohan monga  monga.ro...@gmail.com  wrote:
  thanks edward that seems to work :)
 
  However, I have another query is like this
 
  select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx(
 b,
  .5), .5 ) from table1 group by a
 
  Here I will loose the group info if I include the inner query in the FROM
  clause, is there a way to get this to work?
 
  Thanks,
  --
  Rohan Monga
 
 
  On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo
   edlinuxg...@gmail.com wrote:
 
  I think if you are grouping by b, b has to be in your select list. Try
  this.
  FROM (
  select b,count(a) as theCount from table one group by b
  ) a select mean(theCount);
 
  I think that should work.
 
  On 1/20/12, rohan monga  monga.ro...@gmail.com  wrote:
   Hi,
   I am trying to run a query like
   select mean(count(a)) from table1 group by b;
  
   I am getting the following error
   snip
   FAILED: Hive Internal Error: java.lang.NullPointerException(null)
   java.lang.NullPointerException
   at
  
 
 org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151)
   at
  
 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656)
   at
  
 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777)
   at
  
 
 org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
   at
  
 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
   at
  
 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
   at
  
 
 org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
   at
  
 
 org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157)
   at
  
 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447)
   at
  
 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405)
   at
  
 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747)
   at
  
 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365)
   at
  
 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858)
   at
  
 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480)
   at
  
 
 

RE: Hive ODBC - Microsofts Involvement

2012-02-01 Thread Tucker, Matt
The Hive driver that Microsoft will be releasing is ODBC, so you should be able 
to interact with Hive just like you would with any other relational database.

From: John Omernik [mailto:j...@omernik.com]
Sent: Wednesday, February 01, 2012 3:22 PM
To: user@hive.apache.org
Subject: Hive ODBC - Microsofts Involvement

Does anyone know if the driver Microsoft is talking about with their Azure 
based hadoop/hive setup would work for connecting Windows applications 
(Excel/.NET Web Apps etc) to Apache Hive running on Unix?  Looking for a way to 
connect .NET Web apps to Hive for some process flow upgrades.

Thanks!



Re: Exception when hive submits M/R jobs

2012-02-01 Thread Sam William
I have resolved this, so  I ll share what the issue was,


I had set  HIVE_AUX_JARS_PATH in my hive-env.sh

as
  
HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar.

The empty HIVE_AUX_JARS_PATH was causing the exception. 

The following fix made it work

if [ -z $HIVE_AUX_JARS_PATH ]; then

HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar
else
   
HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH,$HIVE_HOME/lib/jar1.jar,$HIVE_HOME/lib/jar2.jar,$HIVE_HOME/lib/jar3.jar


Thanks,
Sam


On Jan 31, 2012, at 11:50 AM, Sam William wrote:

 
 I have a new Hive installation . Im able to create  tables and do select * 
 queries from them.  But as soon as I try to execute a query that would 
 involve a Hadoop M/R job,  I get this exception .
 
 
 
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
at org.apache.hadoop.fs.Path.init(Path.java:90)
at org.apache.hadoop.fs.Path.init(Path.java:50)
at 
 org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:608)
at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:713)
at 
 org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:637)
at org.apache.hadoop.mapred.JobClient.access$300(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:848)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
 
 
 
 The table is pretty simple .  It is an external table on the  HDFS  and does 
 not have  any partitions.   Any idea why this could be happening ?
 
 
 
 Thanks,
 Sam William
 sa...@stumbleupon.com
 
 
 

Sam William
sa...@stumbleupon.com





get_json_object escape characters

2012-02-01 Thread Sean McNamara
Is it possible to escape '.' in get_json_object?

For example:  {a.b: test}

get_json_object(json, '$.a.b')  will return NULL because it's looking for a 
nested object.


Something like this would be nice:  get_json_object(json, '$.a\\.b')


Beyond changing how the json object is keyed, anything I can do that I'm 
missing?

Thanks

Sean






Re: Hive ODBC - Microsofts Involvement

2012-02-01 Thread John Omernik
I see that, but will that hive ODBC driver work with a standard hive
install, or will it be limited to Microsoft's cloud version of Hadoop/Hive?
 Anyone tried the driver?

On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.com wrote:

 The Hive driver that Microsoft will be releasing is ODBC, so you should be
 able to interact with Hive just like you would with any other relational
 database.

 ** **

 *From:* John Omernik [mailto:j...@omernik.com]
 *Sent:* Wednesday, February 01, 2012 3:22 PM
 *To:* user@hive.apache.org
 *Subject:* Hive ODBC - Microsofts Involvement

 ** **

 Does anyone know if the driver Microsoft is talking about with their Azure
 based hadoop/hive setup would work for connecting Windows applications
 (Excel/.NET Web Apps etc) to Apache Hive running on Unix?  Looking for a
 way to connect .NET Web apps to Hive for some process flow upgrades. 

 ** **

 Thanks!

 ** **



Re: Hive ODBC - Microsofts Involvement

2012-02-01 Thread Viral Bajaria
Any reason you want to use a ODBC and not Thrift ?  Hive supports the
thrift protocol. There are thrift libraries for C# and you can easily
integrate it into your project for direct access to HIVE via your C# code.

On Wed, Feb 1, 2012 at 6:40 PM, John Omernik j...@omernik.com wrote:

 I see that, but will that hive ODBC driver work with a standard hive
 install, or will it be limited to Microsoft's cloud version of Hadoop/Hive?
  Anyone tried the driver?


 On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.comwrote:

 The Hive driver that Microsoft will be releasing is ODBC, so you should
 be able to interact with Hive just like you would with any other relational
 database.

 ** **

 *From:* John Omernik [mailto:j...@omernik.com]
 *Sent:* Wednesday, February 01, 2012 3:22 PM
 *To:* user@hive.apache.org
 *Subject:* Hive ODBC - Microsofts Involvement

 ** **

 Does anyone know if the driver Microsoft is talking about with their
 Azure based hadoop/hive setup would work for connecting Windows
 applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix?
  Looking for a way to connect .NET Web apps to Hive for some process flow
 upgrades. 

 ** **

 Thanks!

 ** **





Re: Hive ODBC - Microsofts Involvement

2012-02-01 Thread Chris Shain
I've tried it. It seems to work fine, but with ODBC, you still need to send
SQL commands to the server, and Hive SQL is incomplete and non-ansi
compliant in many ways. This means that an application that uses ANSI SQL
will not always generate Hive friendly queries.

They do have an excel connector under development also, which you can get
if you are on the beta.
On Feb 1, 2012 9:41 PM, John Omernik j...@omernik.com wrote:

 I see that, but will that hive ODBC driver work with a standard hive
 install, or will it be limited to Microsoft's cloud version of Hadoop/Hive?
  Anyone tried the driver?

 On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.comwrote:

 The Hive driver that Microsoft will be releasing is ODBC, so you should
 be able to interact with Hive just like you would with any other relational
 database.

 ** **

 *From:* John Omernik [mailto:j...@omernik.com]
 *Sent:* Wednesday, February 01, 2012 3:22 PM
 *To:* user@hive.apache.org
 *Subject:* Hive ODBC - Microsofts Involvement

 ** **

 Does anyone know if the driver Microsoft is talking about with their
 Azure based hadoop/hive setup would work for connecting Windows
 applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix?
  Looking for a way to connect .NET Web apps to Hive for some process flow
 upgrades. 

 ** **

 Thanks!

 ** **





Re: Hive ODBC - Microsofts Involvement

2012-02-01 Thread Chris Shain
Btw, i tried it on CDH3 hive.
On Feb 1, 2012 10:02 PM, Chris Shain ch...@tresata.com wrote:

 I've tried it. It seems to work fine, but with ODBC, you still need to
 send SQL commands to the server, and Hive SQL is incomplete and non-ansi
 compliant in many ways. This means that an application that uses ANSI SQL
 will not always generate Hive friendly queries.

 They do have an excel connector under development also, which you can get
 if you are on the beta.
 On Feb 1, 2012 9:41 PM, John Omernik j...@omernik.com wrote:

 I see that, but will that hive ODBC driver work with a standard hive
 install, or will it be limited to Microsoft's cloud version of Hadoop/Hive?
  Anyone tried the driver?

 On Wed, Feb 1, 2012 at 4:23 PM, Tucker, Matt matt.tuc...@disney.comwrote:

 The Hive driver that Microsoft will be releasing is ODBC, so you should
 be able to interact with Hive just like you would with any other relational
 database.

 ** **

 *From:* John Omernik [mailto:j...@omernik.com]
 *Sent:* Wednesday, February 01, 2012 3:22 PM
 *To:* user@hive.apache.org
 *Subject:* Hive ODBC - Microsofts Involvement

 ** **

 Does anyone know if the driver Microsoft is talking about with their
 Azure based hadoop/hive setup would work for connecting Windows
 applications (Excel/.NET Web Apps etc) to Apache Hive running on Unix?
  Looking for a way to connect .NET Web apps to Hive for some process flow
 upgrades. 

 ** **

 Thanks!

 ** **





Problem in creating table in hive

2012-02-01 Thread Bhavesh Shah
Hello all,

I am trying for sqoop import from SQL Server into Hive.
When I execute sqoop-import command, the problem is that import task gets
completed,
I can see the complete data on HDFS (under
/user/hive/warehouse/table_name_dir)
but when I execute SHOW TABLES  command in HIve CLI I am not able to see
the table in the list.

(Once I have tried like: after importing the table same thing happened as
above then I run the CREATE TABLE QUERY in Hive CLI taking the same field
as that was in imported one and I was able to see and access the table in
Hive CLI.  But I dont think this is effective way every time I perform).

Pls suggest me some solution
Is there any step I missed out or is there any problem?
I am not getting why it is happening?
Thanks to all



-- 
Regards,
Bhavesh Shah