Re: Seeking examples beyond word count
See also mahout project (http://lucene.apache.org/mahout/) A mahout is a person who drives an elephant. :) On Wed, Oct 15, 2008 at 1:00 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote: > > On Oct 14, 2008, at 8:37 PM, Bert Schmidt wrote: > >> I'm trying to think of how I might use it yet all the examples I find are >> variations of word count. > > Look in the src/examples directory. > PiEstimator - estimates the value of pi using distributed brute force > Pentomino - solves Pentomino tile placement problems including one sided > variants > Terasort - tools to generate the required data, sort it into a total order, > and verify the sort order > > There is also distcp in src/tools that uses map/reduce to copy a lot of > files between clusters. > >> Are there any interesting examples of how people are using it for real >> tasks? > > A final pointer would be to Nutch, that uses Hadoop for distribution. > > -- Owen > > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: Need reboot the whole system if adding new datanodes?
As long as the new node is in the slaves file on the master, just do a start-all.sh and it will attempt to start everything. Nodes that are already running will keep running and new nodes will be started. Consider doing a rebalance after adding a new node for better distribution. -paul On Oct 15, 2008, at 1:55 AM, "Amit k. Saha" <[EMAIL PROTECTED]> wrote: On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote: It seems that we need to restart the whole hadoop system in order to add new nodes inside the cluster. Any solution for us that no need for the rebooting? From what I know so far, you have to start the HDFS dameon (which reads the 'slaves' file) to 'let it know' which are the data nodes. So everytime you add a new DataNode, I believe you will have to restarted the daemon, which is like re-initiating the NameNode. Hope I am not very wrong :-) Best, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Need reboot the whole system if adding new datanodes?
you can use the hadoop-daemon.sh script provided in bin folder. The following will be the steps. In the new machine to be added, 1.) ensure hadoop config is pointing to the right namenode. 2.) run bin/hadoop-daemon.sh start datanode this should add datanode without needing a restart of complete cluster. - Prasad. On Wednesday 15 October 2008 11:25:29 am Amit k. Saha wrote: > On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote: > > It seems that we need to restart the whole hadoop system in order to add > > new nodes inside the cluster. Any solution for us that no need for the > > rebooting? > > > >From what I know so far, you have to start the HDFS dameon (which > > reads the 'slaves' file) to 'let it know' which are the data nodes. So > everytime you add a new DataNode, I believe you will have to restarted > the daemon, which is like re-initiating the NameNode. > > Hope I am not very wrong :-) > > Best, > Amit
Re: Need reboot the whole system if adding new datanodes?
On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote: > It seems that we need to restart the whole hadoop system in order to add new > nodes inside the cluster. Any solution for us that no need for the > rebooting? >From what I know so far, you have to start the HDFS dameon (which reads the 'slaves' file) to 'let it know' which are the data nodes. So everytime you add a new DataNode, I believe you will have to restarted the daemon, which is like re-initiating the NameNode. Hope I am not very wrong :-) Best, Amit -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: Are There Books of Hadoop/Pig?
On Wed, Oct 15, 2008 at 4:10 AM, Steve Gao <[EMAIL PROTECTED]> wrote: > Does anybody know if there are books about hadoop or pig? The wiki and manual > are kind of ad-hoc and hard to comprehend, for example "I want to know how to > apply patchs to my Hadoop, but can't find how to do it" that kind of things. > > Would anybody help? Thanks. http://oreilly.com/catalog/9780596521998/ HTH, Amit > > > > -- Amit Kumar Saha http://blogs.sun.com/amitsaha/ http://amitsaha.in.googlepages.com/ Skype: amitkumarsaha
Re: graphics in hadoop
ya.. will write up in hadoop wiki.. is there a way other than copying from local filesystem to hdfs... like writing directly to hdfs...? Thanks S.Chandravadana Steve Loughran wrote: > > chandravadana wrote: >> hi >> Thanks all.. ur guidelines helped me a lot.. >> i'm using Jfreechart... when i set >> System.setProperty("java.awt.headless", >> "true"); i'm able to run this properly... > > this is good; consider writing this up on the hadoop wiki >> >> if i specify the path (where the chart is to be saved) as local >> filesystem.. i'm able to save the chart.. >> but if i set path to be hdfs, then i'm unable to... >> so what changes do i need to make.. > > You'll need to copy the local file to HDFS after it is rendered. > >> >> Thanks >> Chandravadana.S >> >> >> >> Steve Loughran wrote: >>> Alex Loddengaard wrote: Hadoop runs Java code, so you can do anything that Java could do. This means that you can create and/or analyze images. However, as Lukas has said, Hadoop runs on a cluster of computers and is used for data storage and processing. >>> -If you are trying to do 2D graphics (AWT operations included) on unix >>> servers, you often need to have X11 up and running before the rendering >>> works >>> - You need to start whichever JVM runs your rendering code with the >>> property java.awt.headless=true; you can actually set this in your >>> code. >>> -if the rendering code uses the OS/hardware, then different hardware can >>> render differently. This may not be visible to the eye, but it makes >>> testing more complex as the generated bitmaps can be slightly different >>> from machine to machine >>> >>> -steve >>> >>> >>> >>> >>> >>> -- >>> Steve Loughran http://www.1060.org/blogxter/publish/5 >>> Author: Ant in Action http://antbook.org/ >>> >>> >> > > > -- > Steve Loughran http://www.1060.org/blogxter/publish/5 > Author: Ant in Action http://antbook.org/ > > -- View this message in context: http://www.nabble.com/graphics-in-hadoop-tp19853939p19986476.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Seeking examples beyond word count
On Oct 14, 2008, at 8:37 PM, Bert Schmidt wrote: I'm trying to think of how I might use it yet all the examples I find are variations of word count. Look in the src/examples directory. PiEstimator - estimates the value of pi using distributed brute force Pentomino - solves Pentomino tile placement problems including one sided variants Terasort - tools to generate the required data, sort it into a total order, and verify the sort order There is also distcp in src/tools that uses map/reduce to copy a lot of files between clusters. Are there any interesting examples of how people are using it for real tasks? A final pointer would be to Nutch, that uses Hadoop for distribution. -- Owen
Need reboot the whole system if adding new datanodes?
It seems that we need to restart the whole hadoop system in order to add new nodes inside the cluster. Any solution for us that no need for the rebooting? PS: We just had one namenode in the cluster Thx! David
Seeking examples beyond word count
I think I now grasp the mechanics of MapReduce and Hadoop. I'm trying to think of how I might use it yet all the examples I find are variations of word count. Are there any interesting examples of how people are using it for real tasks? I am not necessarily looking for code (though that would be quite welcome), just some brief descriptions of the types of problems it is good at solving. Thanks in advance, -- Bert
Re: getting HDFS to rack-aware mode
In the master, I execute this command ok. -bash-3.00$ ./bin/hadoop fsck / . /tmp/hadoop-hadoop/mapred/system/job_200810100944_0001/job.jar: Under replicated blk_6972591866335308074_1001. Target Replicas is 10 but found 2 replica(s). Status: HEALTHY Total size:2798816 B Total dirs:10 Total files: 5 Total blocks (validated): 5 (avg. block size 559763 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 1 (20.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:2 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 8 (80.0 %) Number of data-nodes: 2 Number of racks: 1 imcaptor 写道: I get this error: -bash-3.00$ ./bin/hadoop fsck / Exception in thread "main" java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.Socket.connect(Socket.java:519) at java.net.Socket.connect(Socket.java:469) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:382) at sun.net.www.http.HttpClient.openServer(HttpClient.java:509) at sun.net.www.http.HttpClient.(HttpClient.java:231) at sun.net.www.http.HttpClient.New(HttpClient.java:304) at sun.net.www.http.HttpClient.New(HttpClient.java:316) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:813) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:765) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:690) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:934) at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137) Yi-Kai Tsai 写道: hi Sriram Run hadoop fsck / will give you summary of current HDFS status including some useful information : Minimally replicated blocks: 51224 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 7 (0.013665469 %) Default replication factor: 3 Average block replication: 3.0 Missing replicas: 0 (0.0 %) Number of data-nodes: 83 Number of racks: 6 Hi, We have a cluster where we running HDFS in non-rack-aware mode. Now, we want to switch HDFS to run in rack-aware mode. Apart from the config changes (and restarting HDFS), to rackify the existing data, we were thinking of increasing/decreasing replication level a few times to get the data spread. Are there any tools that will enable us to know when we are "done"? Sriram
Re: getting HDFS to rack-aware mode
I get this error: -bash-3.00$ ./bin/hadoop fsck / Exception in thread "main" java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.Socket.connect(Socket.java:519) at java.net.Socket.connect(Socket.java:469) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:382) at sun.net.www.http.HttpClient.openServer(HttpClient.java:509) at sun.net.www.http.HttpClient.(HttpClient.java:231) at sun.net.www.http.HttpClient.New(HttpClient.java:304) at sun.net.www.http.HttpClient.New(HttpClient.java:316) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:813) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:765) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:690) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:934) at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137) Yi-Kai Tsai 写道: hi Sriram Run hadoop fsck / will give you summary of current HDFS status including some useful information : Minimally replicated blocks: 51224 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 7 (0.013665469 %) Default replication factor: 3 Average block replication: 3.0 Missing replicas: 0 (0.0 %) Number of data-nodes: 83 Number of racks: 6 Hi, We have a cluster where we running HDFS in non-rack-aware mode. Now, we want to switch HDFS to run in rack-aware mode. Apart from the config changes (and restarting HDFS), to rackify the existing data, we were thinking of increasing/decreasing replication level a few times to get the data spread. Are there any tools that will enable us to know when we are "done"? Sriram
Re: getting HDFS to rack-aware mode
hi Sriram Run hadoop fsck / will give you summary of current HDFS status including some useful information : Minimally replicated blocks: 51224 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 7 (0.013665469 %) Default replication factor:3 Average block replication: 3.0 Missing replicas: 0 (0.0 %) Number of data-nodes: 83 Number of racks: 6 Hi, We have a cluster where we running HDFS in non-rack-aware mode. Now, we want to switch HDFS to run in rack-aware mode. Apart from the config changes (and restarting HDFS), to rackify the existing data, we were thinking of increasing/decreasing replication level a few times to get the data spread. Are there any tools that will enable us to know when we are "done"? Sriram -- Yi-Kai Tsai (cuma) <[EMAIL PROTECTED]>, Asia Regional Search Engineering.
Are There Books of Hadoop/Pig?
Does anybody know if there are books about hadoop or pig? The wiki and manual are kind of ad-hoc and hard to comprehend, for example "I want to know how to apply patchs to my Hadoop, but can't find how to do it" that kind of things. Would anybody help? Thanks.
Re: Hadoop for real time
Hi. Video storage, processing and streaming. Regards. 2008/9/25 Edward J. Yoon <[EMAIL PROTECTED]> > What kind of the real-time app? > > On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <[EMAIL PROTECTED]> wrote: > > Hi. > > > > Is it possible to use Hadoop for real-time app, in video processing > field? > > > > Regards. > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org >
Re: getting HDFS to rack-aware mode
Using -w option for the set replication command will wait until replication is done. Then run fsck to check if the all blocks are on at least two racks. Hairong On 10/14/08 12:06 PM, "Sriram Rao" <[EMAIL PROTECTED]> wrote: > Hi, > > We have a cluster where we running HDFS in non-rack-aware mode. Now, > we want to switch HDFS to run in rack-aware mode. Apart from the > config changes (and restarting HDFS), to rackify the existing data, we > were thinking of increasing/decreasing replication level a few times > to get the data spread. Are there any tools that will enable us to > know when we are "done"? > > Sriram
getting HDFS to rack-aware mode
Hi, We have a cluster where we running HDFS in non-rack-aware mode. Now, we want to switch HDFS to run in rack-aware mode. Apart from the config changes (and restarting HDFS), to rackify the existing data, we were thinking of increasing/decreasing replication level a few times to get the data spread. Are there any tools that will enable us to know when we are "done"? Sriram
Re: Gets a number of reduce_output_records
On Oct 10, 2008, at 12:52 AM, Edward J. Yoon wrote: Hi, To get a number of reduce_output_records, I was write code as: long rows = rJob.getCounters().findCounter( "org.apache.hadoop.mapred.Task$Counter", 8, "REDUCE_OUTPUT_RECORDS") .getCounter(); http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/Counters.html#findCounter(java.lang.Enum) Arun I want to know other method to get it since findCounter(String group, int id, String name) is deprecated. -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
Re: How to get absolute path
Path.getParent() returns the parent of a path. On Tue, Oct 14, 2008 at 7:30 PM, Tarandeep Singh <[EMAIL PROTECTED]> wrote: > Hi, > > How can I get absolute path: /user/taran/logfiles/log.txt > from Path- new Path( "logfiles/log.txt"); > > Thanks, > Taran > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org
How can I fix problem for LogConfigurationException?
Hi all, I am trying to write a file directly in HDFS using java. However when I run the java project, I got the error message like the following. Please, help me Exception in thread "main" java.lang.ExceptionInInitializerError at WordWriter.main(WordWriter.java:17) Caused by: org.apache.commons.logging.LogConfigurationException: org.apache.commons.logging.LogConfigurationException: java.lang.NullPointerException (Caused by java.lang.NullPointerException) (Caused by org.apache.commons.logging.LogConfigurationException: java.lang.NullPointerException (Caused by java.lang.NullPointerException)) at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:543) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370) at org.apache.hadoop.conf.Configuration.(Configuration.java:128)
Hadoop Training
Hey all Just wanted to make a quick announcement that Scale Unlimited has started delivering its 2 day Hadoop Boot Camp. More info here: http://www.scaleunlimited.com/hadoop-bootcamp.html We currently are offering the classes on-site within the US/UK/EU to those companies needing to get a team up to speed rapidly. But we are working to put together a public class in the Bay Area. Please email me if you are interested, so we can gauge interest. Also, we may be in NY for the NY Hadoop User Group, if any org out there wants to throw together a class during the week of Nov. 10, again, give me a shout. cheers, chris -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/
How to get absolute path
Hi, How can I get absolute path: /user/taran/logfiles/log.txt from Path- new Path( "logfiles/log.txt"); Thanks, Taran
Re: graphics in hadoop
chandravadana wrote: hi Thanks all.. ur guidelines helped me a lot.. i'm using Jfreechart... when i set System.setProperty("java.awt.headless", "true"); i'm able to run this properly... this is good; consider writing this up on the hadoop wiki if i specify the path (where the chart is to be saved) as local filesystem.. i'm able to save the chart.. but if i set path to be hdfs, then i'm unable to... so what changes do i need to make.. You'll need to copy the local file to HDFS after it is rendered. Thanks Chandravadana.S Steve Loughran wrote: Alex Loddengaard wrote: Hadoop runs Java code, so you can do anything that Java could do. This means that you can create and/or analyze images. However, as Lukas has said, Hadoop runs on a cluster of computers and is used for data storage and processing. -If you are trying to do 2D graphics (AWT operations included) on unix servers, you often need to have X11 up and running before the rendering works - You need to start whichever JVM runs your rendering code with the property java.awt.headless=true; you can actually set this in your code. -if the rendering code uses the OS/hardware, then different hardware can render differently. This may not be visible to the eye, but it makes testing more complex as the generated bitmaps can be slightly different from machine to machine -steve -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/ -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: Gets a number of reduce_output_records
Anybody knows? /Edward On Fri, Oct 10, 2008 at 4:52 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > Hi, > > To get a number of reduce_output_records, I was write code as: > >long rows = rJob.getCounters().findCounter( >"org.apache.hadoop.mapred.Task$Counter", 8, "REDUCE_OUTPUT_RECORDS") >.getCounter(); > > I want to know other method to get it since findCounter(String group, > int id, String name) is deprecated. > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org > -- Best regards, Edward J. Yoon [EMAIL PROTECTED] http://blog.udanax.org