date:20081014

Re: Seeking examples beyond word count

2008-10-14 Thread Edward J. Yoon

See also mahout project (http://lucene.apache.org/mahout/)

A mahout is a person who drives an elephant. :)

On Wed, Oct 15, 2008 at 1:00 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
>
> On Oct 14, 2008, at 8:37 PM, Bert Schmidt wrote:
>
>> I'm trying to think of how I might use it yet all the examples I find are
>> variations of word count.
>
> Look in the src/examples directory.
>  PiEstimator - estimates the value of pi using distributed brute force
>  Pentomino - solves Pentomino tile placement problems including one sided
> variants
>  Terasort - tools to generate the required data, sort it into a total order,
> and verify the sort order
>
> There is also distcp in src/tools that uses map/reduce to copy a lot of
> files between clusters.
>
>> Are there any interesting examples of how people are using it for real
>> tasks?
>
> A final pointer would be to Nutch, that uses Hadoop for distribution.
>
> -- Owen
>
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

Re: Need reboot the whole system if adding new datanodes?

2008-10-14 Thread Paul

As long as the new node is in the slaves file on the master, just do a  
start-all.sh and it will attempt to start everything.  Nodes that are  
already running will keep running and new nodes will be started.


Consider doing a rebalance after adding a new node for better  
distribution.




-paul

On Oct 15, 2008, at 1:55 AM, "Amit k. Saha" <[EMAIL PROTECTED]>  
wrote:


On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]>  
wrote:
It seems that we need to restart the whole hadoop system in order  
to add new

nodes inside the cluster. Any solution for us that no need for the
rebooting?


From what I know so far, you have to start the HDFS dameon (which
reads the 'slaves' file) to 'let it know' which are the data nodes. So
everytime you add a new DataNode, I believe you will have to restarted
the daemon, which is like re-initiating the NameNode.

Hope I am not very wrong :-)

Best,
Amit

--
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha

Re: Need reboot the whole system if adding new datanodes?

2008-10-14 Thread Prasad Pingali

you can use the hadoop-daemon.sh script provided in bin folder. The following 
will be the steps.

In the new machine to be added,
1.) ensure hadoop config is pointing to the right namenode.
2.) run bin/hadoop-daemon.sh start datanode

this should add datanode without needing a restart of complete cluster.

- Prasad. 

On Wednesday 15 October 2008 11:25:29 am Amit k. Saha wrote:
> On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote:
> > It seems that we need to restart the whole hadoop system in order to add
> > new nodes inside the cluster. Any solution for us that no need for the
> > rebooting?
> >
> >From what I know so far, you have to start the HDFS dameon (which
>
> reads the 'slaves' file) to 'let it know' which are the data nodes. So
> everytime you add a new DataNode, I believe you will have to restarted
> the daemon, which is like re-initiating the NameNode.
>
> Hope I am not very wrong :-)
>
> Best,
> Amit

Re: Need reboot the whole system if adding new datanodes?

2008-10-14 Thread Amit k. Saha

On Wed, Oct 15, 2008 at 9:09 AM, David Wei <[EMAIL PROTECTED]> wrote:
> It seems that we need to restart the whole hadoop system in order to add new
> nodes inside the cluster. Any solution for us that no need for the
> rebooting?

>From what I know so far, you have to start the HDFS dameon (which
reads the 'slaves' file) to 'let it know' which are the data nodes. So
everytime you add a new DataNode, I believe you will have to restarted
the daemon, which is like re-initiating the NameNode.

Hope I am not very wrong :-)

Best,
Amit

-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha

Re: Are There Books of Hadoop/Pig?

2008-10-14 Thread Amit k. Saha

On Wed, Oct 15, 2008 at 4:10 AM, Steve Gao <[EMAIL PROTECTED]> wrote:
> Does anybody know if there are books about hadoop or pig? The wiki and manual 
> are kind of ad-hoc and hard to comprehend, for example "I want to know how to 
> apply patchs to my Hadoop, but can't find how to do it" that kind of things.
>
> Would anybody help? Thanks.

http://oreilly.com/catalog/9780596521998/

HTH,
Amit
>
>
>
>



-- 
Amit Kumar Saha
http://blogs.sun.com/amitsaha/
http://amitsaha.in.googlepages.com/
Skype: amitkumarsaha

Re: graphics in hadoop

2008-10-14 Thread chandravadana


ya.. will write up in hadoop wiki..
is there a way other than copying from local filesystem to hdfs...
like writing directly to hdfs...?

Thanks 
S.Chandravadana


Steve Loughran wrote:
> 
> chandravadana wrote:
>> hi
>> Thanks all.. ur guidelines helped me a lot..
>> i'm using Jfreechart... when i set
>> System.setProperty("java.awt.headless",
>> "true"); i'm able to run this properly... 
> 
> this is good; consider writing this up on the hadoop wiki
>> 
>> if i specify the path (where the chart is to be saved) as  local
>> filesystem.. i'm able to save the chart..
>> but if i set path to be hdfs, then i'm unable to...
>> so what changes do i need to make..
> 
> You'll need to copy the local file to HDFS after it is rendered.
> 
>> 
>> Thanks 
>> Chandravadana.S
>> 
>> 
>> 
>> Steve Loughran wrote:
>>> Alex Loddengaard wrote:
 Hadoop runs Java code, so you can do anything that Java could do.  This
 means that you can create and/or analyze images.  However, as Lukas has
 said, Hadoop runs on a cluster of computers and is used for data
 storage
 and
 processing.

>>> -If you are trying to do 2D graphics (AWT operations included) on unix 
>>> servers, you often need to have X11 up and running before the rendering 
>>> works
>>> - You need to start whichever JVM runs your rendering code with the 
>>> property  java.awt.headless=true; you can actually set this in your
>>> code.
>>> -if the rendering code uses the OS/hardware, then different hardware can 
>>> render differently. This may not be visible to the eye, but it makes 
>>> testing more complex as the generated bitmaps can be slightly different 
>>> from machine to machine
>>>
>>> -steve
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> Steve Loughran  http://www.1060.org/blogxter/publish/5
>>> Author: Ant in Action   http://antbook.org/
>>>
>>>
>> 
> 
> 
> -- 
> Steve Loughran  http://www.1060.org/blogxter/publish/5
> Author: Ant in Action   http://antbook.org/
> 
> 

-- 
View this message in context: 
http://www.nabble.com/graphics-in-hadoop-tp19853939p19986476.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Seeking examples beyond word count

2008-10-14 Thread Owen O'Malley



On Oct 14, 2008, at 8:37 PM, Bert Schmidt wrote:

I'm trying to think of how I might use it yet all the examples I  
find are variations of word count.


Look in the src/examples directory.
  PiEstimator - estimates the value of pi using distributed brute force
  Pentomino - solves Pentomino tile placement problems including one  
sided variants
  Terasort - tools to generate the required data, sort it into a  
total order, and verify the sort order


There is also distcp in src/tools that uses map/reduce to copy a lot  
of files between clusters.


Are there any interesting examples of how people are using it for  
real tasks?


A final pointer would be to Nutch, that uses Hadoop for distribution.

-- Owen

Need reboot the whole system if adding new datanodes?

2008-10-14 Thread David Wei

It seems that we need to restart the whole hadoop system in order to add 
new nodes inside the cluster. Any solution for us that no need for the 
rebooting?


PS: We just had one namenode in the cluster

Thx!

David

Seeking examples beyond word count

2008-10-14 Thread Bert Schmidt

I think I now grasp the mechanics of MapReduce and Hadoop.  I'm trying to think 
of how I might use it yet all the examples I find are variations of word count. 
 Are there any interesting examples of how people are using it for real tasks?  
I am not necessarily looking for code (though that would be quite welcome), 
just some brief descriptions of the types of problems it is good at solving.  

Thanks in advance,

 -- Bert

Re: getting HDFS to rack-aware mode

2008-10-14 Thread imcaptor


In the master, I execute this command ok.

-bash-3.00$ ./bin/hadoop fsck /
.
/tmp/hadoop-hadoop/mapred/system/job_200810100944_0001/job.jar:  Under 
replicated blk_6972591866335308074_1001. Target Replicas is 10 but found 
2 replica(s).

Status: HEALTHY
Total size:2798816 B
Total dirs:10
Total files:   5
Total blocks (validated):  5 (avg. block size 559763 B)
Minimally replicated blocks:   5 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   1 (20.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:2
Average block replication: 2.0
Corrupt blocks:0
Missing replicas:  8 (80.0 %)
Number of data-nodes:  2
Number of racks:   1

imcaptor 写道:

I get this error：

-bash-3.00$ ./bin/hadoop fsck /
Exception in thread "main" java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:382)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:509)
at sun.net.www.http.HttpClient.(HttpClient.java:231)
at sun.net.www.http.HttpClient.New(HttpClient.java:304)
at sun.net.www.http.HttpClient.New(HttpClient.java:316)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:813) 

at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:765) 

at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:690) 

at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:934) 


at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137)

Yi-Kai Tsai 写道:

hi Sriram

Run hadoop fsck / will give you summary of current HDFS status 
including some useful information :


Minimally replicated blocks: 51224 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 7 (0.013665469 %)
Default replication factor: 3
Average block replication: 3.0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 83
Number of racks: 6


Hi,

We have a cluster where we running HDFS in non-rack-aware mode. Now,
we want to switch HDFS to run in rack-aware mode. Apart from the
config changes (and restarting HDFS), to rackify the existing data, we
were thinking of increasing/decreasing replication level a few times
to get the data spread. Are there any tools that will enable us to
know when we are "done"?

Sriram

Re: getting HDFS to rack-aware mode

2008-10-14 Thread imcaptor


I get this error：

-bash-3.00$ ./bin/hadoop fsck /
Exception in thread "main" java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:519)
at java.net.Socket.connect(Socket.java:469)
at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:382)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:509)
at sun.net.www.http.HttpClient.(HttpClient.java:231)
at sun.net.www.http.HttpClient.New(HttpClient.java:304)
at sun.net.www.http.HttpClient.New(HttpClient.java:316)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:813)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:765)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:690)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:934)

at org.apache.hadoop.dfs.DFSck.run(DFSck.java:116)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.dfs.DFSck.main(DFSck.java:137)

Yi-Kai Tsai 写道:

hi Sriram

Run hadoop fsck / will give you summary of current HDFS status 
including some useful information :


Minimally replicated blocks: 51224 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 7 (0.013665469 %)
Default replication factor: 3
Average block replication: 3.0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 83
Number of racks: 6


Hi,

We have a cluster where we running HDFS in non-rack-aware mode. Now,
we want to switch HDFS to run in rack-aware mode. Apart from the
config changes (and restarting HDFS), to rackify the existing data, we
were thinking of increasing/decreasing replication level a few times
to get the data spread. Are there any tools that will enable us to
know when we are "done"?

Sriram

Re: getting HDFS to rack-aware mode

2008-10-14 Thread Yi-Kai Tsai


hi Sriram

Run hadoop fsck / will give you summary of current HDFS status including 
some useful information :


Minimally replicated blocks:   51224 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks: 7 (0.013665469 %)
Default replication factor:3
Average block replication: 3.0
Missing replicas:  0 (0.0 %)
Number of data-nodes:  83
Number of racks:   6


Hi,

We have a cluster where we running HDFS in non-rack-aware mode.  Now,
we want to switch HDFS to run in rack-aware mode.  Apart from the
config changes (and restarting HDFS), to rackify the existing data, we
were thinking of increasing/decreasing replication level a few times
to get the data spread.  Are there any tools that will enable us to
know when we are "done"?

Sriram
  



--
Yi-Kai Tsai (cuma) <[EMAIL PROTECTED]>, Asia Regional Search Engineering.

Are There Books of Hadoop/Pig?

2008-10-14 Thread Steve Gao

Does anybody know if there are books about hadoop or pig? The wiki and manual 
are kind of ad-hoc and hard to comprehend, for example "I want to know how to 
apply patchs to my Hadoop, but can't find how to do it" that kind of things.

Would anybody help? Thanks.

Re: Hadoop for real time

2008-10-14 Thread Stas Oskin

Hi.

Video storage, processing and streaming.

Regards.

2008/9/25 Edward J. Yoon <[EMAIL PROTECTED]>

> What kind of the real-time app?
>
> On Wed, Sep 24, 2008 at 4:50 AM, Stas Oskin <[EMAIL PROTECTED]> wrote:
> > Hi.
> >
> > Is it possible to use Hadoop for real-time app, in video processing
> field?
> >
> > Regards.
> >
>
> --
> Best regards, Edward J. Yoon
> [EMAIL PROTECTED]
> http://blog.udanax.org
>

Re: getting HDFS to rack-aware mode

2008-10-14 Thread Hairong Kuang

Using -w option for the set replication command will wait until replication
is done. Then run fsck to check if the all blocks are on at least two racks.

Hairong

On 10/14/08 12:06 PM, "Sriram Rao" <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> We have a cluster where we running HDFS in non-rack-aware mode.  Now,
> we want to switch HDFS to run in rack-aware mode.  Apart from the
> config changes (and restarting HDFS), to rackify the existing data, we
> were thinking of increasing/decreasing replication level a few times
> to get the data spread.  Are there any tools that will enable us to
> know when we are "done"?
> 
> Sriram

getting HDFS to rack-aware mode

2008-10-14 Thread Sriram Rao

Hi,

We have a cluster where we running HDFS in non-rack-aware mode.  Now,
we want to switch HDFS to run in rack-aware mode.  Apart from the
config changes (and restarting HDFS), to rackify the existing data, we
were thinking of increasing/decreasing replication level a few times
to get the data spread.  Are there any tools that will enable us to
know when we are "done"?

Sriram

Re: Gets a number of reduce_output_records

2008-10-14 Thread Arun C Murthy



On Oct 10, 2008, at 12:52 AM, Edward J. Yoon wrote:


Hi,

To get a number of reduce_output_records, I was write code as:

   long rows = rJob.getCounters().findCounter(
   "org.apache.hadoop.mapred.Task$Counter", 8,  
"REDUCE_OUTPUT_RECORDS")

   .getCounter();



http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/Counters.html#findCounter(java.lang.Enum)

Arun


I want to know other method to get it since findCounter(String group,
int id, String name) is deprecated.

--
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

Re: How to get absolute path

2008-10-14 Thread Edward J. Yoon

Path.getParent() returns the parent of a path.

On Tue, Oct 14, 2008 at 7:30 PM, Tarandeep Singh <[EMAIL PROTECTED]> wrote:
> Hi,
>
> How can I get absolute path: /user/taran/logfiles/log.txt
> from Path- new Path( "logfiles/log.txt");
>
> Thanks,
> Taran
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

How can I fix problem for LogConfigurationException?

2008-10-14 Thread Chunhyeok Lim

Hi all,
I am trying to write a file directly in HDFS using java.
However when I run the java project, I got the error message like the following.
Please, help me

Exception in thread "main" java.lang.ExceptionInInitializerError
at WordWriter.main(WordWriter.java:17)
Caused by: org.apache.commons.logging.LogConfigurationException: 
org.apache.commons.logging.LogConfigurationException: 
java.lang.NullPointerException (Caused by java.lang.NullPointerException) 
(Caused by org.apache.commons.logging.LogConfigurationException: 
java.lang.NullPointerException (Caused by java.lang.NullPointerException))
at 
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:543)
at 
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
at org.apache.hadoop.conf.Configuration.(Configuration.java:128)

Hadoop Training

2008-10-14 Thread Chris K Wensel


Hey all

Just wanted to make a quick announcement that Scale Unlimited has  
started delivering its 2 day Hadoop Boot Camp.


More info here:
http://www.scaleunlimited.com/hadoop-bootcamp.html

We currently are offering the classes on-site within the US/UK/EU to  
those companies needing to get a team up to speed rapidly.


But we are working to put together a public class in the Bay Area.  
Please email me if you are interested, so we can gauge interest.


Also, we may be in NY for the NY Hadoop User Group, if any org out  
there wants to throw together a class during the week of Nov. 10,  
again, give me a shout.


cheers,
chris

--
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

How to get absolute path

2008-10-14 Thread Tarandeep Singh

Hi,

How can I get absolute path: /user/taran/logfiles/log.txt
from Path- new Path( "logfiles/log.txt");

Thanks,
Taran

Re: graphics in hadoop

2008-10-14 Thread Steve Loughran


chandravadana wrote:

hi
Thanks all.. ur guidelines helped me a lot..
i'm using Jfreechart... when i set System.setProperty("java.awt.headless",
"true"); i'm able to run this properly... 


this is good; consider writing this up on the hadoop wiki


if i specify the path (where the chart is to be saved) as  local
filesystem.. i'm able to save the chart..
but if i set path to be hdfs, then i'm unable to...
so what changes do i need to make..


You'll need to copy the local file to HDFS after it is rendered.



Thanks 
Chandravadana.S




Steve Loughran wrote:

Alex Loddengaard wrote:

Hadoop runs Java code, so you can do anything that Java could do.  This
means that you can create and/or analyze images.  However, as Lukas has
said, Hadoop runs on a cluster of computers and is used for data storage
and
processing.

-If you are trying to do 2D graphics (AWT operations included) on unix 
servers, you often need to have X11 up and running before the rendering 
works
- You need to start whichever JVM runs your rendering code with the 
property  java.awt.headless=true; you can actually set this in your code.
-if the rendering code uses the OS/hardware, then different hardware can 
render differently. This may not be visible to the eye, but it makes 
testing more complex as the generated bitmaps can be slightly different 
from machine to machine


-steve





--
Steve Loughran  http://www.1060.org/blogxter/publish/5
Author: Ant in Action   http://antbook.org/







--
Steve Loughran  http://www.1060.org/blogxter/publish/5
Author: Ant in Action   http://antbook.org/

Re: Gets a number of reduce_output_records

2008-10-14 Thread Edward J. Yoon

Anybody knows?

/Edward

On Fri, Oct 10, 2008 at 4:52 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
> Hi,
>
> To get a number of reduce_output_records, I was write code as:
>
>long rows = rJob.getCounters().findCounter(
>"org.apache.hadoop.mapred.Task$Counter", 8, "REDUCE_OUTPUT_RECORDS")
>.getCounter();
>
> I want to know other method to get it since findCounter(String group,
> int id, String name) is deprecated.
>
> --
> Best regards, Edward J. Yoon
> [EMAIL PROTECTED]
> http://blog.udanax.org
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org

Re: Seeking examples beyond word count

Re: Need reboot the whole system if adding new datanodes?

Re: Need reboot the whole system if adding new datanodes?

Re: Need reboot the whole system if adding new datanodes?

Re: Are There Books of Hadoop/Pig?

Re: graphics in hadoop

Re: Seeking examples beyond word count

Need reboot the whole system if adding new datanodes?

Seeking examples beyond word count

Re: getting HDFS to rack-aware mode

Re: getting HDFS to rack-aware mode

Re: getting HDFS to rack-aware mode

Are There Books of Hadoop/Pig?

Re: Hadoop for real time

Re: getting HDFS to rack-aware mode

getting HDFS to rack-aware mode

Re: Gets a number of reduce_output_records

Re: How to get absolute path

How can I fix problem for LogConfigurationException?

Hadoop Training

How to get absolute path

Re: graphics in hadoop

Re: Gets a number of reduce_output_records

23 matches

Site Navigation

Mail list logo

Footer information