fileSystem.create(...) is blocked when selector is closed
I got this selector exception, and all my threads are blocked at FileSystem.create(...) level, anyone see this issue before, I'm running at 0.18.3. java.nio.channels.ClosedSelectorException at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66) at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88) at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:301) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:178) at org.apache.hadoop.ipc.Client.getConnection(Client.java:820) at org.apache.hadoop.ipc.Client.call(Client.java:705) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:2302) at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:471) at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:178) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:503) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:391) *all other threads are blocked when calling fileSystem.create(...)* "Thread-20" prio=5 tid=0x000101a2f000 nid=0x11ff2a000 in Object.wait() [0x00011ff28000..0x00011ff29ad0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.ipc.Client.call(Client.java:710) - locked <0x000107cc9430> (a org.apache.hadoop.ipc.Client$Call) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:2302) at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:471) at org.apache.hadoop.dfs.DistributedFileSystem.create(DistributedFileSystem.java:178) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:503) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:484) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:391)
Re: anyone knows why setting mapred.tasktracker.map.tasks.maximum not working?
not actually When I just run a standalone server, meaning the server is a namenode, datanode, jobtracker and tasktracker, and I configured the map max to 10, I have 174 62~75 MB files, my block size is 65MB. I can see that 189 map tasks are generated for this, and only 2 are running, others are waiting. When I configured another datanode, and have the same settings for tasktracker, and then the task is running at 12 map tasks for the same task which produces 189 map tasks, it's using 2 map task slots from my namenode and 10 slots from my datanode. I just can't figure out why the namenode is just running at 2 map tasks while 10 are available. On Tue, Apr 21, 2009 at 7:47 PM, jason hadoop wrote: > There must be only 2 input splits being produced for your job. > Either you have 2 unsplitable files, or the input file(s) you have are not > large enough compared to the block size to be split. > > Table 6-1 in chapter 06 gives a breakdown of all of the configuration > parameters that affect split size in hadoop 0.19. Alphas are available :) > > This is detailed in my book in ch06 > > On Tue, Apr 21, 2009 at 5:07 PM, javateck javateck >wrote: > > > anyone knows why setting *mapred.tasktracker.map.tasks.maximum* not > > working? > > I set it to 10, but still see only 2 map tasks running when running one > job > > > > > > -- > Alpha Chapters of my book on Hadoop are available > http://www.apress.com/book/view/9781430219422 >
anyone knows why setting mapred.tasktracker.map.tasks.maximum not working?
anyone knows why setting *mapred.tasktracker.map.tasks.maximum* not working? I set it to 10, but still see only 2 map tasks running when running one job
Re: mapred.tasktracker.map.tasks.maximum
I want to have something to clarify, for the max task slots, are these places to check: 1. hadoop-site.xml 2. the specific job's job.conf which can be retrieved though the job, for example, logs/job_200904212336_0002_conf.xml Any other place to limit the task map counts? In my case, it's strange, because I set 10 for " mapred.tasktracker.map.tasks.maximum", and I check the job's conf is also 10, but actual hadoop is just using 2 map jobs. On Tue, Apr 21, 2009 at 1:20 PM, javateck javateck wrote: > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a > task, it's only using 2 out of 10, any way to know why it's only using 2? > thanks >
Re: mapred.tasktracker.map.tasks.maximum
no, it's plain text file with \t delimited. And I'm expecting one mapper per file, because I have 175 files, and I got 189 map tasks from what I can see from the web UI. My issue is that since I have 189 map tasks waiting, why hadoop is just using 2 of my 10 map slots, and I assume that all map tasks should be independent. On Tue, Apr 21, 2009 at 2:23 PM, Miles Osborne wrote: > is your input data compressed? if so then you will get one mapper per file > > Miles > > 2009/4/21 javateck javateck : > > Hi Koji, > > > > Thanks for helping. > > > > I don't know why hadoop is just using 2 out of 10 map tasks slots. > > > > Sure, I just cut and paste the job tracker web UI, clearly I set the max > > tasks to 10(which I can verify from hadoop-site.xml and from the > individual > > job configuration also), and I did have the first mapreduce running at 10 > > map tasks when I checked from UI, but all subsequent queries are running > > with 2 map tasks. And I have almost 176 files with each input file around > > 62~75MB. > > > > > > *mapred.tasktracker.map.tasks.maximum* 10 > > > > *Kind* > > > > *% Complete* > > > > *Num Tasks* > > > > *Pending* > > > > *Running* > > > > *Complete* > > > > *Killed* > > > > *Failed/Killed*< > http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025> > > > > *Task Attempts* > > > > *map* > > > > 28.04% > > > > > > > > 189 > > > > 134 > > > > 2 > > > > 53 > > > > 0 > > > > 0 / 0 > > > > *reduce* > > > > 0.00% > > > > > > 1 > > > > 1< > http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending > > > > > > 0 > > > > 0 > > > > 0 > > > > 0 / 0 > > > > * > > * > > > > On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi >wrote: > > > >> It's probably a silly question, but you do have more than 2 mappers on > >> your second job? > >> > >> If yes, I have no idea what's happening. > >> > >> Koji > >> > >> -Original Message- > >> From: javateck javateck [mailto:javat...@gmail.com] > >> Sent: Tuesday, April 21, 2009 1:38 PM > >> To: core-user@hadoop.apache.org > >> Subject: Re: mapred.tasktracker.map.tasks.maximum > >> > >> right, I set it in hadoop-site.xml before starting the whole hadoop > >> processes, I have one job running fully utilizing the 10 map tasks, but > >> subsequent queries are only using 2 of them, don't know why. > >> I have enough RAM also, no paging out is happening, I'm running on > >> 0.18.3. > >> Right now I put all processes on one machine, namenode, datanode, > >> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM. > >> > >> > >> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi > >> wrote: > >> > >> > This is a cluster config and not a per job config. > >> > > >> > So this has to be set when the mapreduce cluster first comes up. > >> > > >> > Koji > >> > > >> > > >> > -Original Message- > >> > From: javateck javateck [mailto:javat...@gmail.com] > >> > Sent: Tuesday, April 21, 2009 1:20 PM > >> > To: core-user@hadoop.apache.org > >> > Subject: mapred.tasktracker.map.tasks.maximum > >> > > >> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run > >> a > >> > task, it's only using 2 out of 10, any way to know why it's only using > >> > 2? > >> > thanks > >> > > >> > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. >
Re: mapred.tasktracker.map.tasks.maximum
Hi Koji, Thanks for helping. I don't know why hadoop is just using 2 out of 10 map tasks slots. Sure, I just cut and paste the job tracker web UI, clearly I set the max tasks to 10(which I can verify from hadoop-site.xml and from the individual job configuration also), and I did have the first mapreduce running at 10 map tasks when I checked from UI, but all subsequent queries are running with 2 map tasks. And I have almost 176 files with each input file around 62~75MB. *mapred.tasktracker.map.tasks.maximum* 10 *Kind* *% Complete* *Num Tasks* *Pending* *Running* *Complete* *Killed* *Failed/Killed*<http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025> *Task Attempts* *map* 28.04% 189 134 2 53 0 0 / 0 *reduce* 0.00% 1 1<http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending> 0 0 0 0 / 0 * * On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi wrote: > It's probably a silly question, but you do have more than 2 mappers on > your second job? > > If yes, I have no idea what's happening. > > Koji > > -Original Message- > From: javateck javateck [mailto:javat...@gmail.com] > Sent: Tuesday, April 21, 2009 1:38 PM > To: core-user@hadoop.apache.org > Subject: Re: mapred.tasktracker.map.tasks.maximum > > right, I set it in hadoop-site.xml before starting the whole hadoop > processes, I have one job running fully utilizing the 10 map tasks, but > subsequent queries are only using 2 of them, don't know why. > I have enough RAM also, no paging out is happening, I'm running on > 0.18.3. > Right now I put all processes on one machine, namenode, datanode, > jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM. > > > On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi > wrote: > > > This is a cluster config and not a per job config. > > > > So this has to be set when the mapreduce cluster first comes up. > > > > Koji > > > > > > -Original Message- > > From: javateck javateck [mailto:javat...@gmail.com] > > Sent: Tuesday, April 21, 2009 1:20 PM > > To: core-user@hadoop.apache.org > > Subject: mapred.tasktracker.map.tasks.maximum > > > > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run > a > > task, it's only using 2 out of 10, any way to know why it's only using > > 2? > > thanks > > >
Re: mapred.tasktracker.map.tasks.maximum
right, I set it in hadoop-site.xml before starting the whole hadoop processes, I have one job running fully utilizing the 10 map tasks, but subsequent queries are only using 2 of them, don't know why. I have enough RAM also, no paging out is happening, I'm running on 0.18.3. Right now I put all processes on one machine, namenode, datanode, jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM. On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi wrote: > This is a cluster config and not a per job config. > > So this has to be set when the mapreduce cluster first comes up. > > Koji > > > -Original Message- > From: javateck javateck [mailto:javat...@gmail.com] > Sent: Tuesday, April 21, 2009 1:20 PM > To: core-user@hadoop.apache.org > Subject: mapred.tasktracker.map.tasks.maximum > > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a > task, it's only using 2 out of 10, any way to know why it's only using > 2? > thanks >
mapred.tasktracker.map.tasks.maximum
I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a task, it's only using 2 out of 10, any way to know why it's only using 2? thanks
raw files become zero bytes when mapreduce job hit outofmemory error
I'm running some mapreduce, and some jobs has outofmemory errors, and I find that that the raw data itself also got corrupted, becomes zero bytes, very strange to me, I did not look very detail into it, but just want to check quickly with someone with such experience. I'm running at 0.18.3. thanks
Re: API: FSDataOutputStream create(Path f, boolean overwrite)
sorry, it's my fault, it's working as expected On Sun, Apr 12, 2009 at 12:43 AM, javateck javateck wrote: > Hi: > I'm trying to use "FSDataOutputStream create(Path f, boolean > overwrite)", I'm calling "create(new Path("somePath"), false)", but creation > still fails with IOException even when the file does not exist, can someone > explain the behavior? > > thanks, >
API: FSDataOutputStream create(Path f, boolean overwrite)
Hi: I'm trying to use "FSDataOutputStream create(Path f, boolean overwrite)", I'm calling "create(new Path("somePath"), false)", but creation still fails with IOException even when the file does not exist, can someone explain the behavior? thanks,
does hadoop have any way to append to an existing file?
Hi, does hadoop have any way to append to an existing file? for example, I wrote some contents to a file, and later on I want to append some more contents to the file. thanks,
safemode forever
Hi, I'm wondering if anyone has solutions about the nonstopped safe mode, any way to get it around? thanks, error: org.apache.hadoop.dfs.SafeModeException: Cannot delete /mapred/system. Name node is in safe mode. The ratio of reported blocks 0.4696 has not reached the threshold 0.9990. Safe mode will be turned off automatically.
hadoop 0.18.3 writing not flushing to hadoop server?
I have a strange issue that when I write to hadoop, I find that the content is not transferred to hadoop even after a long time, is there any way to force flush the local temp files to hadoop after writing to hadoop? And when I shutdown the VM, it's getting flushed. thanks,
HDFS data block clarification
Can someone tell whether a file will occupy one or more blocks? for example, the default block size is 64MB, and if I save a 4k file to HDFS, will the 4K file occupy the whole 64MB block alone? so in this case, do I do need to configure the block size to 10k if most of my files are less than 10K? thanks,
Re: Running MapReduce without setJar
you can run from java program: JobConf conf = new JobConf(MapReduceWork.class); // setting your params JobClient.runJob(conf); On Wed, Apr 1, 2009 at 11:42 AM, Farhan Husain wrote: > Can I get rid of the whole jar thing? Is there any way to run map reduce > programs without using a jar? I do not want to use "hadoop jar ..." either. > > On Wed, Apr 1, 2009 at 1:10 PM, javateck javateck >wrote: > > > I think you need to set a property (mapred.jar) inside hadoop-site.xml, > > then > > you don't need to hardcode in your java code, and it will be fine. > > But I don't know if there is any way that we can set multiple jars, since > a > > lot of times our own mapreduce class needs to reference other jars. > > > > On Wed, Apr 1, 2009 at 10:57 AM, Farhan Husain > wrote: > > > > > Hello, > > > > > > Can anyone tell me if there is any way running a map-reduce job from a > > java > > > program without specifying the jar file by JobConf.setJar() method? > > > > > > Thanks, > > > > > > -- > > > Mohammad Farhan Husain > > > Research Assistant > > > Department of Computer Science > > > Erik Jonsson School of Engineering and Computer Science > > > University of Texas at Dallas > > > > > > > > > -- > Mohammad Farhan Husain > Research Assistant > Department of Computer Science > Erik Jonsson School of Engineering and Computer Science > University of Texas at Dallas >
Re: Running MapReduce without setJar
I think you need to set a property (mapred.jar) inside hadoop-site.xml, then you don't need to hardcode in your java code, and it will be fine. But I don't know if there is any way that we can set multiple jars, since a lot of times our own mapreduce class needs to reference other jars. On Wed, Apr 1, 2009 at 10:57 AM, Farhan Husain wrote: > Hello, > > Can anyone tell me if there is any way running a map-reduce job from a java > program without specifying the jar file by JobConf.setJar() method? > > Thanks, > > -- > Mohammad Farhan Husain > Research Assistant > Department of Computer Science > Erik Jonsson School of Engineering and Computer Science > University of Texas at Dallas >