Re: Question about log files
I noticed that too, I think Hadoop keeps the file open all the time and when you delete it it is just no more able to write on it and doesn't try to recreate it. Not sure if it's a Log4j problem or an Hadoop one... yanghaogn, which is the *correct* way to delete the Hadoop logs? I didn't find anything better than deleting the file and restarting the service... On Mon, Apr 6, 2015 at 9:27 AM, 杨浩 yangha...@gmail.com wrote: I think the log information has lost. the hadoop is not designed for that you deleted these files incorrectly 2015-04-02 11:45 GMT+08:00 煜 韦 yu20...@hotmail.com: Hi there, If log files are deleted without restarting service, it seems that the logs is to be lost for later operation. For example, on namenode, datanode. Why not log files could be re-created when deleted by mistake or on purpose during cluster is running? Thanks, Jared
Re: Prune out data to a specific reduce task
As far as I know the code running in each reducer is the same you specify in your reduce function, so if you know in advance the features of the data you want to ignore you can just instruct reducers to do so. If you are able to tell whether or not to keep an entry at the beginning, you can filter them out within the map function. I could think of a wordcount example where we tell the map phase to ignore all the words starting with a specific letter... What kind of data are you processing and what is the filtering condition? Anyway I'm sorry I can't help with the actual code, but I'm not really into this right now. On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail xeonmailingl...@gmail.com wrote: Maybe the correct question is, how can I filter data in mapreduce in Java? On 11-03-2015 10:36, xeonmailinglist-gmail wrote: To exclude data to a specific reducer, should I build a partitioner that do this? Should I have a map function that checks to which reduce task the output goes? Can anyone give me some suggestion? And by the way, I really want to exclude data to a reduce task. So, I will run more than 1 reducer, even if one of them does not get input data. On 11-03-2015 10:28, xeonmailinglist-gmail wrote: Hi, I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute. How can I do this in MapReduce? [image: Example Job Execution] Thanks, -- -- -- -- -- --
Can RM ignore heartbeats?
Hi everyone, I have a question about the ResourceManager behavior: when the ResourceManager allocates a container, it takes some time before the NMToken is sent and then received by the ApplicationMaster. During this time, it is possible to receive another heartbeat from the AM, equal to the last one (since the AM is not aware of the allocated resources). Is there any policy in YARN that makes the RM aware of this and ignore this last heartbeat? I ask this because I would expect way more superfluous containers allocated, in comparison to the ones I can see from the logs. Thanks in advance Fabio
Re: hadoop learning
Hi Rishabh, I didn't know anything about Hadoop a few months ago, and I started from the very beginning. I don't suggest you to start with online documentation, that is always fragmented, incomplete and sometimes not even up to date. Also starting by directly using Hadoop is the fastest way to frustration and will just take you to abandon this technology. I can suggest you two books I used to start with, and they have been quite helpful for someone who didn't even know what mapreduce is. They provide many examples and use cases (especially the first one): - OReilly - Hadoop The Definitive Guide 3rd Edition. This is quite old but, other than the coding part, it could explain quite well what hadoop is, what it does and how it works. It is mainly about old versions of Hadoop, but I believe it's something you should know, even because most of articles online still refer to the pre-YARN terminology. - Addison-Wesley Professional - Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2. This is what you I used to really understand the new hadoop architecture and terminology. Sometimes it gives too many details, but better more than less. It also has a couple of chapters about installing Hadoop. Good luck Fabio On Sat, Feb 21, 2015 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote: Rishabh: You can start with: http://wiki.apache.org/hadoop/HowToContribute There're several components: common, hdfs, YARN, mapreduce, ... Which ones are you interested in ? Cheers On Sat, Feb 21, 2015 at 12:18 AM, Bhupendra Gupta bhupendra1...@gmail.com wrote: I have been learning and trying to implement a hadoop ecosystem for one of the POC from last 1 month or so and i think that the best way to learn is by doing it.. Hadoop as the concept has lots of implementation and i picked up hortonworks sandbox for learning... This has helped me in guaging some of the concepts and few practical understanding as well. Happy learning Sent from my iPhone Bhupendra Gupta On 21-Feb-2015, at 1:39 pm, Rishabh Agrawal ss.rishab...@gmail.com wrote: Hello, Please tell me where can i learn the concepts of Big Data and Hadoop from the scratch. Please provide some links online. Rishabh Agrawal
Steps for container release
Hi everyone, I was trying to understand the process that makes the resources of a container available again to the ResourceManager. As far as I can guess from the logs, the AM: - sends a stop request to the NodeManager for the specific container - suddenly tells the RM about the release of the resources, which become available (queues are re-sorted). Actually, I was expecting the RM to wait for an acknowledgment from the NM (through NM-RM heartbeat) about the real end of the container, but it looks to me that the resources are made available upon receiving this info from the AM (AM-RM heartbeat). Maybe the container decommission time is so small to be irrelevant? The logs are at INFO level, and I can't change it to DEBUG since I'm not the only one using the cluster, so maybe I am missing something. Thanks Fabio