Re: risks of using Hadoop
Hi, When you say that 0.20.205 will support appends, you mean for general purpose writes on the HDFS? or only Hbase? Thanks, George On 9/17/2011 7:08 AM, Uma Maheswara Rao G 72686 wrote: 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org We are planning to use Hadoop in my organisation for quality of servicesanalysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 wrote: Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-user Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac. -- --- George Kousiouris Electrical and Computer Engineer Division of Communications, Electronics and Information Engineering School of Electrical and Computer Engineering Tel: +30 210 772 2546 Mobile: +30 6939354121 Fax: +30 210 772 2569 Email: gkous...@mail.ntua.gr Site: http://users.ntua.gr/gkousiou/ National Technical University of Athens 9 Heroon Polytechniou str., 157 73 Zografou, Athens, Greece
Re: Job Scheduler, Task Scheduler and Fair Scheduler
On Sep 16, 2011, at 11:26 PM, kartheek muthyala wrote: > Any updates!! A bit of patience will help. It also helps to do some homework and ask specific questions. I don't know if you have looked at any of the code, but there are 3 schedulers: JobQueueTaskScheduler (aka default scheduler or fifo scheduler) Capacity Scheduler (CS) Fair Scheduler (FS). TaskScheduler is just an interface for all schedulers (default, CS, FS). Then there is JobInProgress which handles scheduling for map tasks of an individual job based on data locality (JobInProgress.obtainNew*MapTask). Other than that each of the schedulers (default, CS, FS) use different criteria for picking a certain job to offer a 'slot' on a given TT when it's available. All this has changed radically and completely with MRv2 which is now in branch-0.23 and trunk to allow MR and non-MR apps on same Hadoop cluster: http://wiki.apache.org/hadoop/NextGenMapReduce Arun > > -- Forwarded message -- > From: kartheek muthyala > Date: Fri, Sep 16, 2011 at 8:38 PM > Subject: Job Scheduler, Task Scheduler and Fair Scheduler > To: common-user@hadoop.apache.org > > > Hi all, > Can any one explain me the responsibilities of each scheduler?. I am > interested in the flow of commands that goes between these scheduler. And if > any one have any info regarding how the job scheduler schedules a job based > on the data locality?. As of I know, there is some heartbeat mechanism that > goes from task scheduler to job scheduler and in response job scheduler does > something here to find out the node where the data is more closely located > and schedules the task in that node. Is there an elaborate way of > explanation around this area?. Any help will be greatly appreciated. > Thanks and Regards, > Kartheek.
Fwd: Job Scheduler, Task Scheduler and Fair Scheduler
Any updates!! -- Forwarded message -- From: kartheek muthyala Date: Fri, Sep 16, 2011 at 8:38 PM Subject: Job Scheduler, Task Scheduler and Fair Scheduler To: common-user@hadoop.apache.org Hi all, Can any one explain me the responsibilities of each scheduler?. I am interested in the flow of commands that goes between these scheduler. And if any one have any info regarding how the job scheduler schedules a job based on the data locality?. As of I know, there is some heartbeat mechanism that goes from task scheduler to job scheduler and in response job scheduler does something here to find out the node where the data is more closely located and schedules the task in that node. Is there an elaborate way of explanation around this area?. Any help will be greatly appreciated. Thanks and Regards, Kartheek.
Re: risks of using Hadoop
Hi Uma, Response very much appreciated. Thanks. Isaac. On 17 September 2011 05:08, Uma Maheswara Rao G 72686 wrote: > Hi Kobina, > > Some experiences which may helpful for you with respective to DFS. > > 1. Selecting the correct version. >I will recommend to use 0.20X version. This is pretty stable version and > all other organizations prefers it. Well tested as well. > Dont go for 21 version.This version is not a stable version.This is risk. > > 2. You should perform thorough test with your customer operations. > (of-course you will do this :-)) > > 3. 0.20x version has the problem of SPOF. > If NameNode goes down you will loose the data.One way of recovering is by > using the secondaryNameNode.You can recover the data till last > checkpoint.But here manual intervention is required. > In latest trunk SPOF will be addressed bu HDFS-1623. > > 4. 0.20x NameNodes can not scale. Federation changes included in latest > versions. ( i think in 22). this may not be the problem for your cluster. > But please consider this aspect as well. > > 5. Please select the hadoop version depending on your security > requirements. There are versions available for security as well in 0.20X. > > 6. If you plan to use Hbase, it requires append support. 20Append has the > support for append. 0.20.205 release also will have append support but not > yet released. Choose your correct version to avoid sudden surprises. > > > > Regards, > Uma > - Original Message - > From: Kobina Kwarko > Date: Saturday, September 17, 2011 3:42 am > Subject: Re: risks of using Hadoop > To: common-user@hadoop.apache.org > > > We are planning to use Hadoop in my organisation for quality of > > servicesanalysis out of CDR records from mobile operators. We are > > thinking of having > > a small cluster of may be 10 - 15 nodes and I'm preparing the > > proposal. my > > office requires that i provide some risk analysis in the proposal. > > > > thank you. > > > > On 16 September 2011 20:34, Uma Maheswara Rao G 72686 > > wrote: > > > > > Hello, > > > > > > First of all where you are planning to use Hadoop? > > > > > > Regards, > > > Uma > > > - Original Message - > > > From: Kobina Kwarko > > > Date: Saturday, September 17, 2011 0:41 am > > > Subject: risks of using Hadoop > > > To: common-user > > > > > > > Hello, > > > > > > > > Please can someone point some of the risks we may incur if we > > > > decide to > > > > implement Hadoop? > > > > > > > > BR, > > > > > > > > Isaac. > > > > > > > > > >
Re: risks of using Hadoop
Hi Kobina, Some experiences which may helpful for you with respective to DFS. 1. Selecting the correct version. I will recommend to use 0.20X version. This is pretty stable version and all other organizations prefers it. Well tested as well. Dont go for 21 version.This version is not a stable version.This is risk. 2. You should perform thorough test with your customer operations. (of-course you will do this :-)) 3. 0.20x version has the problem of SPOF. If NameNode goes down you will loose the data.One way of recovering is by using the secondaryNameNode.You can recover the data till last checkpoint.But here manual intervention is required. In latest trunk SPOF will be addressed bu HDFS-1623. 4. 0.20x NameNodes can not scale. Federation changes included in latest versions. ( i think in 22). this may not be the problem for your cluster. But please consider this aspect as well. 5. Please select the hadoop version depending on your security requirements. There are versions available for security as well in 0.20X. 6. If you plan to use Hbase, it requires append support. 20Append has the support for append. 0.20.205 release also will have append support but not yet released. Choose your correct version to avoid sudden surprises. Regards, Uma - Original Message - From: Kobina Kwarko Date: Saturday, September 17, 2011 3:42 am Subject: Re: risks of using Hadoop To: common-user@hadoop.apache.org > We are planning to use Hadoop in my organisation for quality of > servicesanalysis out of CDR records from mobile operators. We are > thinking of having > a small cluster of may be 10 - 15 nodes and I'm preparing the > proposal. my > office requires that i provide some risk analysis in the proposal. > > thank you. > > On 16 September 2011 20:34, Uma Maheswara Rao G 72686 > wrote: > > > Hello, > > > > First of all where you are planning to use Hadoop? > > > > Regards, > > Uma > > - Original Message - > > From: Kobina Kwarko > > Date: Saturday, September 17, 2011 0:41 am > > Subject: risks of using Hadoop > > To: common-user > > > > > Hello, > > > > > > Please can someone point some of the risks we may incur if we > > > decide to > > > implement Hadoop? > > > > > > BR, > > > > > > Isaac. > > > > > >
Re: risks of using Hadoop
We are planning to use Hadoop in my organisation for quality of services analysis out of CDR records from mobile operators. We are thinking of having a small cluster of may be 10 - 15 nodes and I'm preparing the proposal. my office requires that i provide some risk analysis in the proposal. thank you. On 16 September 2011 20:34, Uma Maheswara Rao G 72686 wrote: > Hello, > > First of all where you are planning to use Hadoop? > > Regards, > Uma > - Original Message - > From: Kobina Kwarko > Date: Saturday, September 17, 2011 0:41 am > Subject: risks of using Hadoop > To: common-user > > > Hello, > > > > Please can someone point some of the risks we may incur if we > > decide to > > implement Hadoop? > > > > BR, > > > > Isaac. > > >
RE: risks of using Hadoop
Risks? Well if you come to Hadoop World in Nov, we actually have a presentation that might help reduce some of your initial risks. There are always risks when starting a new project. Regardless of the underlying technology, you have costs associated with failure and unless you can level set expectations you'll increase your odds of failure. Best advice... don't listen to sales critters or marketing folks. ;-) [Right Tom?] They have an agenda. ;-) > Date: Fri, 16 Sep 2011 20:11:20 +0100 > Subject: risks of using Hadoop > From: kobina.kwa...@gmail.com > To: common-user@hadoop.apache.org > > Hello, > > Please can someone point some of the risks we may incur if we decide to > implement Hadoop? > > BR, > > Isaac.
Re: risks of using Hadoop
And that once your business folks see what they have been missing you'll never able to stop giving them the benefit of that insight. --Original Message-- From: Harsh J To: common-user ReplyTo: common-user Subject: Re: risks of using Hadoop Sent: Sep 16, 2011 12:38 PM Hey Kobina, You might find some interesting results with your data that may change the world. Big risk, I'd say :-) On Sat, Sep 17, 2011 at 12:41 AM, Kobina Kwarko wrote: > Hello, > > Please can someone point some of the risks we may incur if we decide to > implement Hadoop? J/K. As Uma says, we need more context. -- Harsh J --- Sent from my Blackberry so please excuse typing and spelling errors.
Re: risks of using Hadoop
Hey Kobina, You might find some interesting results with your data that may change the world. Big risk, I'd say :-) On Sat, Sep 17, 2011 at 12:41 AM, Kobina Kwarko wrote: > Hello, > > Please can someone point some of the risks we may incur if we decide to > implement Hadoop? J/K. As Uma says, we need more context. -- Harsh J
Re: Creating a hive table for a custom log
Any Ideas? The most common aproach will be writting your own serde and plug it to your hive like: http://code.google.com/p/hive-json-serde/ But I'm wondering if there is some work already done in this area. Raimon Bosch wrote: > > Hi, > > I'm trying to create a table similar to apache_log but I'm trying to avoid > to write my own map-reduce task because I don't want to have my HDFS files > twice. > > So if you're working with log lines like this: > > 186.92.134.151 [31/Aug/2011:00:10:41 +] "GET > /client/action1/?transaction_id=8002&user_id=87179311248&ts=1314749223525&item1=271&item2=6045&environment=2 > HTTP/1.1" > > 112.201.65.238 [31/Aug/2011:00:10:41 +] "GET > /client/action1/?transaction_id=9002&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2 > HTTP/1.1" > > 90.45.198.251 [31/Aug/2011:00:10:41 +] "GET > /client/action2/?transaction_id=9022&ts=1314749223525&user_id=9048871793100&item2=6045&item1=271&environment=2 > HTTP/1.1" > > And having in mind that the parameters could be in different orders. Which > will be the best strategy to create this table? Write my own > org.apache.hadoop.hive.contrib.serde2? Is there any resource already > implemented that I could use to perform this task? > > In the end the objective is convert all the parameters in fields and use > as type the "action". With this big table I will be able to perform my > queries, my joins or my views. > > Any ideas? > > Thanks in Advance, > Raimon Bosch. > -- View this message in context: http://old.nabble.com/Creating-a-hive-table-for-a-custom-log-tp32379849p32481457.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: risks of using Hadoop
Hello, First of all where you are planning to use Hadoop? Regards, Uma - Original Message - From: Kobina Kwarko Date: Saturday, September 17, 2011 0:41 am Subject: risks of using Hadoop To: common-user > Hello, > > Please can someone point some of the risks we may incur if we > decide to > implement Hadoop? > > BR, > > Isaac. >
risks of using Hadoop
Hello, Please can someone point some of the risks we may incur if we decide to implement Hadoop? BR, Isaac.
Running dependent jobs in 0.20.2
I'm using hadoop 0.20.2. I would like to use ChainMapper to chain [Map / Reduce / Map]. I noticed that ChainMapper expects a JobConf object which is deprecated in 0.20.2. Do I need to need to switch to using the deprecated JobConf or is there a way to use ChainMapper with the current mapreduce.Job? Or something similar to ChainMapper? I did see that mapreduce.lib.chain.ChainMapper is available in 0.21.0 but I'd like to stay with a stable release. Any suggestions would be appreciated.
Re: Tutorial about Security in Hadoop
Hi, please find the below links https://media.blackhat.com/bh-us-10/whitepapers/Becherer/BlackHat-USA-2010-Becherer-Andrew-Hadoop-Security-wp.pdf http://markmail.org/download.xqy?id=yjdqleg3zv5pr54t&number=1 Which will help you to understand more. Regards, Uma - Original Message - From: Xianqing Yu Date: Friday, September 16, 2011 10:43 pm Subject: Tutorial about Security in Hadoop To: common-user@hadoop.apache.org > Hi Community, > > I am trying to install security mechanism in the Hadoop, for > instance, using > kerberos. However, I didn't find much information about it. Anyone > knows > that if there any link talking about the tutorial about installing > kerberos > in Hadoop? > > Thanks, > > Xianqing Yu > > -- > Graduate Research Assistant, Cyber Defense Lab > Department of Computer Science > North Carolina State University, Raleigh, NC > E-mail: x...@ncsu.edu > >
RE: Datanodes going down frequently
By KVM I was referring to Keyboard-Video-Mouse console. Basically a cart with a monitor, mouse & keyboard that you plug into a server for console access. Ah, yes, it does sound like your OS was having problems with memory then. We're not generally having problems with MR Jobs per-se, but it _appears_ that there is something going on when doing HDFS accesses. Most of our Jobs use a custom grouping & sorting comparators, but they aren't joins so probably not too intensive. Our newer cluster we are going to be using from now on is CDH3u1, and from the mailing list they don't really have a clue why we're seeing this behavior. We're running on FreeBSD with the Diablo-JVM (Java 1.6), which a guy on their list feels is a pretty unusual configuration that people aren't really running. --Aaron -Original Message- From: john smith [mailto:js1987.sm...@gmail.com] Sent: Friday, September 16, 2011 10:04 AM To: common-user@hadoop.apache.org Subject: Re: Datanodes going down frequently Hi Aaron, I haven't really run any MR jobs on my cluster till now. I've just been pushing data into the hdfs . So network shouldn't be a problem. Initially my HADOOP_HEAPSIZE was set to 2000MB and my ram size was 2GB . This resulted in datanodes going down randomly. I actually realized that the OS kept crashing and system went unresponsive until I manually power it on again. So I reduced the HADOOP_HEAPSIZE to 800MB and the cluster seems to be stable again and the datanodes are stable from the past few hours.(I am not sure though,I need to run a few heavy tasks to check it thoroughly). Looks like my problem wasn't with ethernet interface going down and its actually a full OS crash. I am not used to KVM , so i'll have to google it and i'll attach it to the datanodes and watch them closely incase they fail again in the future. What abt your cluster? Are you running any "suffle intense" jobs like JOINs or CROSS PRODUCTs ? Thanks On Fri, Sep 16, 2011 at 10:16 PM, Aaron Baff wrote: > John, > > Are the machines simply unreachable? Or has the OS crashed? We've been > having quite a few problems with our network mbufs filling up and not > getting released, which causes a machine to eventually become unreachable > via the network, although they are otherwise up and running fine. Can you > attach a KVM to a machine when it becomes unreachable and take a look? Or > add some monitoring to keep an eye on the network mbufs? Don't know if this > is your problem as well or not. > > --Aaron > -Original Message- > From: john smith [mailto:js1987.sm...@gmail.com] > Sent: Thursday, September 15, 2011 9:46 PM > To: common-user@hadoop.apache.org > Subject: Re: Datanodes going down frequently > > Hi All, > > Thanks for your inputs, > > @Aaron : No, they aren't recovering. They are losing network connectivity > and they are not getting it back. I am unable to ssh to them and I need to > manually go and restart the networking. > > @harsh and Raj, > > One thing I noticed in my hadoop-env.sh that "export HADOOP_HEAPSIZE=2000" > . Isn't this strange? Allocating my whole ram to the JVM ? Should I > consider > this? Right now I am not running any MR jobs as such . > > I've started my cluster and I've put around 30 to 40GB of data with a > replication factor of 3 . This takes the machines down. Looks like swapping > issue .. But how to see if I am swapping or not? Any help? > > Thanks > jS > > On Fri, Sep 16, 2011 at 10:03 AM, Harsh J wrote: > > > I bet its swapping. You may just be oversubscribing those machines > > with your MR slots and heap per slot or otherwise. Could also be low > > heap given number of blocks its gotta report (which would equate to a > > small files issue given your cluster size possibly, but that's a > > different discussion). > > > > On Fri, Sep 16, 2011 at 3:36 AM, john smith > > wrote: > > > Hi all, > > > > > > I am running a 10 node cluster (1NN + 9DN, ubuntu server 10.04, 2GB RAM > > > each). I am facing a strange problem. My datanodes go down randomly and > > > nothing showup in the logs. They lose their network connectivity > suddenly > > > and NN declares them as dead. Any one faced this problem? Is it because > > of > > > hadoop or is it some problem with my infrastructure? > > > > > > The worst part of the problem is, I need to manually go to the remote > > > machine and restart networking. Can someone help me with this? Did any > > one > > > face a similar kind of a problem > > > > > > Btw: my had version : 0.20.2 > > > > > > Thanks, > > > jS > > > > > > > > > > > -- > > Harsh J > > >
Tutorial about Security in Hadoop
Hi Community, I am trying to install security mechanism in the Hadoop, for instance, using kerberos. However, I didn't find much information about it. Anyone knows that if there any link talking about the tutorial about installing kerberos in Hadoop? Thanks, Xianqing Yu -- Graduate Research Assistant, Cyber Defense Lab Department of Computer Science North Carolina State University, Raleigh, NC E-mail: x...@ncsu.edu
Re: Datanodes going down frequently
Hi Aaron, I haven't really run any MR jobs on my cluster till now. I've just been pushing data into the hdfs . So network shouldn't be a problem. Initially my HADOOP_HEAPSIZE was set to 2000MB and my ram size was 2GB . This resulted in datanodes going down randomly. I actually realized that the OS kept crashing and system went unresponsive until I manually power it on again. So I reduced the HADOOP_HEAPSIZE to 800MB and the cluster seems to be stable again and the datanodes are stable from the past few hours.(I am not sure though,I need to run a few heavy tasks to check it thoroughly). Looks like my problem wasn't with ethernet interface going down and its actually a full OS crash. I am not used to KVM , so i'll have to google it and i'll attach it to the datanodes and watch them closely incase they fail again in the future. What abt your cluster? Are you running any "suffle intense" jobs like JOINs or CROSS PRODUCTs ? Thanks On Fri, Sep 16, 2011 at 10:16 PM, Aaron Baff wrote: > John, > > Are the machines simply unreachable? Or has the OS crashed? We've been > having quite a few problems with our network mbufs filling up and not > getting released, which causes a machine to eventually become unreachable > via the network, although they are otherwise up and running fine. Can you > attach a KVM to a machine when it becomes unreachable and take a look? Or > add some monitoring to keep an eye on the network mbufs? Don't know if this > is your problem as well or not. > > --Aaron > -Original Message- > From: john smith [mailto:js1987.sm...@gmail.com] > Sent: Thursday, September 15, 2011 9:46 PM > To: common-user@hadoop.apache.org > Subject: Re: Datanodes going down frequently > > Hi All, > > Thanks for your inputs, > > @Aaron : No, they aren't recovering. They are losing network connectivity > and they are not getting it back. I am unable to ssh to them and I need to > manually go and restart the networking. > > @harsh and Raj, > > One thing I noticed in my hadoop-env.sh that "export HADOOP_HEAPSIZE=2000" > . Isn't this strange? Allocating my whole ram to the JVM ? Should I > consider > this? Right now I am not running any MR jobs as such . > > I've started my cluster and I've put around 30 to 40GB of data with a > replication factor of 3 . This takes the machines down. Looks like swapping > issue .. But how to see if I am swapping or not? Any help? > > Thanks > jS > > On Fri, Sep 16, 2011 at 10:03 AM, Harsh J wrote: > > > I bet its swapping. You may just be oversubscribing those machines > > with your MR slots and heap per slot or otherwise. Could also be low > > heap given number of blocks its gotta report (which would equate to a > > small files issue given your cluster size possibly, but that's a > > different discussion). > > > > On Fri, Sep 16, 2011 at 3:36 AM, john smith > > wrote: > > > Hi all, > > > > > > I am running a 10 node cluster (1NN + 9DN, ubuntu server 10.04, 2GB RAM > > > each). I am facing a strange problem. My datanodes go down randomly and > > > nothing showup in the logs. They lose their network connectivity > suddenly > > > and NN declares them as dead. Any one faced this problem? Is it because > > of > > > hadoop or is it some problem with my infrastructure? > > > > > > The worst part of the problem is, I need to manually go to the remote > > > machine and restart networking. Can someone help me with this? Did any > > one > > > face a similar kind of a problem > > > > > > Btw: my had version : 0.20.2 > > > > > > Thanks, > > > jS > > > > > > > > > > > -- > > Harsh J > > >
RE: Datanodes going down frequently
John, Are the machines simply unreachable? Or has the OS crashed? We've been having quite a few problems with our network mbufs filling up and not getting released, which causes a machine to eventually become unreachable via the network, although they are otherwise up and running fine. Can you attach a KVM to a machine when it becomes unreachable and take a look? Or add some monitoring to keep an eye on the network mbufs? Don't know if this is your problem as well or not. --Aaron -Original Message- From: john smith [mailto:js1987.sm...@gmail.com] Sent: Thursday, September 15, 2011 9:46 PM To: common-user@hadoop.apache.org Subject: Re: Datanodes going down frequently Hi All, Thanks for your inputs, @Aaron : No, they aren't recovering. They are losing network connectivity and they are not getting it back. I am unable to ssh to them and I need to manually go and restart the networking. @harsh and Raj, One thing I noticed in my hadoop-env.sh that "export HADOOP_HEAPSIZE=2000" . Isn't this strange? Allocating my whole ram to the JVM ? Should I consider this? Right now I am not running any MR jobs as such . I've started my cluster and I've put around 30 to 40GB of data with a replication factor of 3 . This takes the machines down. Looks like swapping issue .. But how to see if I am swapping or not? Any help? Thanks jS On Fri, Sep 16, 2011 at 10:03 AM, Harsh J wrote: > I bet its swapping. You may just be oversubscribing those machines > with your MR slots and heap per slot or otherwise. Could also be low > heap given number of blocks its gotta report (which would equate to a > small files issue given your cluster size possibly, but that's a > different discussion). > > On Fri, Sep 16, 2011 at 3:36 AM, john smith > wrote: > > Hi all, > > > > I am running a 10 node cluster (1NN + 9DN, ubuntu server 10.04, 2GB RAM > > each). I am facing a strange problem. My datanodes go down randomly and > > nothing showup in the logs. They lose their network connectivity suddenly > > and NN declares them as dead. Any one faced this problem? Is it because > of > > hadoop or is it some problem with my infrastructure? > > > > The worst part of the problem is, I need to manually go to the remote > > machine and restart networking. Can someone help me with this? Did any > one > > face a similar kind of a problem > > > > Btw: my had version : 0.20.2 > > > > Thanks, > > jS > > > > > > -- > Harsh J >
Job Scheduler, Task Scheduler and Fair Scheduler
Hi all, Can any one explain me the responsibilities of each scheduler?. I am interested in the flow of commands that goes between these scheduler. And if any one have any info regarding how the job scheduler schedules a job based on the data locality?. As of I know, there is some heartbeat mechanism that goes from task scheduler to job scheduler and in response job scheduler does something here to find out the node where the data is more closely located and schedules the task in that node. Is there an elaborate way of explanation around this area?. Any help will be greatly appreciated. Thanks and Regards, Kartheek.
Re: Running example application with capacity scheduler ?
Hi all ! Problem found ! I have set the queue properties in the mapred-site.xml instead of capacity-scheduler.xml. Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Running-example-application-with-capacity-scheduler-tp3335471p3341934.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Search over the index created by hadoop contrib/index
I have used the source code in hadoop contrib/index built a lucene index,but I didn't use Shards ,I used a indexpath(in the code UpdateIndex.java,we can see). Now,how can I search over this index? And ,If I can search ,Can it search one word from a file ? Thank you ! -- View this message in context: http://lucene.472066.n3.nabble.com/Search-over-the-index-created-by-hadoop-contrib-index-tp3341458p3341458.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
HELP NEEDED: What to do after crash and fsck says that .2% Blocks missing. Namenode in safemode
Just had an HDFS/HBase instance where all the slave/regionservers processes crashed, but the namenode stayed up. I did proper shutdown of the namenode After bringing Hadoop back up the namenode is stuck in safe mode. Fsck shows 235 corrupt/missing blocks out of 117280 Blocks. All the slaves are doing DataBlockScanner: Verification succeeded. As far as I can tell there are no errors in the datanodes. Can I expect it to self-heal? Or do I need to do something to help it along? Anyway to tell how long it will take to recover if I do have to just wait? Other than the verification messages on the datanodes, the namenode fsck numbers are not changing and the namenode log continues to say: The ratio of reported blocks 0.9980 has not reached the threshold 0.9990. Safe mode will be turned off automatically. The ratio has not changed for over an hour now. If you happen to know the answer, please get back to me right away by email or on #hadoop IRC as I'm trying to figure it out now... Thanks! __ Robert J Berger - CTO Runa Inc. +1 408-838-8896 http://blog.ibd.com