RE: Datanode not created on hadoop-0.20.203.0

2011-06-15 Thread Jeff.Schmitz
You have to format the datanode too hadoop datanode -format also make sure it 
is in the slaves file - 

Cheers - 

-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com] 
Sent: Wednesday, June 15, 2011 12:01 PM
To: common-user@hadoop.apache.org
Subject: Re: Datanode not created on hadoop-0.20.203.0

By any chance, are you running as root? If so, try running as a different user.

-Joey

On Wed, Jun 15, 2011 at 12:53 PM, rutesh  wrote:
> Hi,
>
>   I am new to hadoop (Just 1 month old). These are the steps I followed to
> install and run hadoop-0.20.203.0:
>
> 1) Downloaded tar file from
> http://mirrors.axint.net/apache/hadoop/common/hadoop-0.20.203.0/hadoop-0.20.203.0rc1.tar.gz.
> 2) Untarred it in /usr/local/ .
> 3) Set JAVA_HOME=/usr/lib/jvm/java-6-sun (which has already been installed)
> 4) Modified the config files viz. core-site.xml , hdfs-site.xml and
> mapred-site.xml as provided on the single node installation page [
> http://hadoop.apache.org/common/docs/r0.20.203.0/single_node_setup.html#PseudoDistributed].
> 5) Formatted the new distributed-filesystem using bin/hadoop namenode
> -format
> 6) Started the hdfs daemon using bin/start-dfs.sh
>
> Now, here is the error...
>
> # start-dfs.sh
> starting namenode, logging to
> /usr/local/hadoop/bin/../logs/hadoop-root-namenode-ip-10-98-94-62.out
> localhost: starting datanode, logging to
> /usr/local/hadoop/bin/../logs/hadoop-root-datanode-ip-10-98-94-62.out
> localhost: starting secondarynamenode, logging to
> /usr/local/hadoop/bin/../logs/hadoop-root-secondarynamenode-ip-10-98-94-62.out
>
> The terminal says that datanode has been started, but when I run jps
> command, its shows different.
>
> # jps
> 395 Jps
> 32612 SecondaryNameNode
> 32442 NameNode
>
> And in the /usr/local/hadoop/logs/hadoop-root-datanode-ip-10-98-94-62.out
> this is the log:
>
> Unrecognized option: -jvm
> Could not create the Java virtual machine.
>
> My question is, can anybody tell me what is the error or what am I doing
> wrong, or in general, how I can make my datanode run?
>
> Thanks.
>
> With regards,
> Rutesh Chavda
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434




Heap Size is 27.25 MB/888.94 MB

2011-06-16 Thread Jeff.Schmitz

So its saying my heap size is  (Heap Size is 27.25 MB/888.94 MB)


but my configured capacity is 971GB (4 nodes)

 

Is heap size on the main page just for the namenode or do I need to
increase it to include the datanodes

 

Cheers - 

 

Jeffery Schmitz
Projects and Technology
3737 Bellaire Blvd Houston, Texas 77001
Tel: +1-713-245-7326 Fax: +1 713 245 7678
Email: jeff.schm...@shell.com  

"TK-421, why aren't you at your post?"

 

 



RE: Append to Existing File

2011-06-21 Thread Jeff.Schmitz
I was under the impression the append thing was worked out as of 09? I
guess I was way off

-Original Message-
From: Eric Charles [mailto:eric.char...@u-mangate.com] 
Sent: Tuesday, June 21, 2011 6:18 AM
To: common-user@hadoop.apache.org
Subject: Re: Append to Existing File

Hi Madhu,

Tks for the pointer. Even after reading the section on 0.21/22/23 
written by Tsz-Wo, I still remain in the fog...

Will HDFS-265 (and its mentioned Jiras) provide a solution for append 
(whatever the release it will be)?

Another way of asking is: "Are there today other Jiras than the ones 
mentioned on HDFS-265 to take into consideration to have working hadoop 
append?".

Tks, Eric


On 21/06/11 12:58, madhu phatak wrote:
> Please refer to this discussion
>
http://search-hadoop.com/m/rnG0h1zCZcL1/Re%253A+HDFS+File+Appending+URGE
NT&subj=Fw+HDFS+File+Appending+URGENT
>
> On Tue, Jun 21, 2011 at 4:23 PM, Eric
Charleswrote:
>
>> When you say "bugs pending", are your refering to HDFS-265 (which
links to
>> HDFS-1060, HADOOP-6239 and HDFS-744?
>>
>> Are there other issues related to append than the one above?
>>
>> Tks, Eric
>>
>>
https://issues.apache.org/**jira/browse/HDFS-265
>>
>>
>>
>> On 21/06/11 12:36, madhu phatak wrote:
>>
>>> Its not stable . There are some bugs pending . According one of the
>>> disccusion till date the append is not ready for production.
>>>
>>> On Tue, Jun 14, 2011 at 12:19 AM, jagaran
das**
>>> wrote:
>>>
>>>   I am using hadoop-0.20.203.0 version.
 I have set

 dfs.support.append to true and then using append method

 It is working but need to know how stable it is to deploy and use
in
 production
 clusters ?

 Regards,
 Jagaran



 __**__
 From: jagaran das
 To: common-user@hadoop.apache.org
 Sent: Mon, 13 June, 2011 11:07:57 AM
 Subject: Append to Existing File

 Hi All,

 Is append to an existing file is now supported in Hadoop for
production
 clusters?
 If yes, please let me know which version and how

 Thanks
 Jagaran


>>>
>> --
>> Eric
>>
>

-- 
Eric




RE: How to select random n records using mapreduce ?

2011-06-27 Thread Jeff.Schmitz
Wait - Habermaas like in Critical Theory

-Original Message-
From: Habermaas, William [mailto:william.haberm...@fatwire.com] 
Sent: Monday, June 27, 2011 2:55 PM
To: common-user@hadoop.apache.org
Subject: RE: How to select random n records using mapreduce ?

I did something similar.  Basically I had a random sampling algorithm
that I called from the mapper. If it returned true I would collect the
data, otherwise I would discard it. 

Bill 

-Original Message-
From: ni...@basj.es [mailto:ni...@basj.es] On Behalf Of Niels Basjes
Sent: Monday, June 27, 2011 3:29 PM
To: mapreduce-u...@hadoop.apache.org
Cc: core-u...@hadoop.apache.org
Subject: Re: How to select random n records using mapreduce ?

The only solution I can think of is by creating a counter in Hadoop
that is incremented each time a mapper lets a record through.
As soon as the value reaches a preselected value the mappers simply
discard the additional input they receive.

Note that this will not at all be random yet it's the best I can
come up with right now.

HTH

On Mon, Jun 27, 2011 at 09:11, Jeff Zhang  wrote:
>
> Hi all,
> I'd like to select random N records from a large amount of data using
> hadoop, just wonder how can I archive this ? Currently my idea is that
let
> each mapper task select N / mapper_number records. Does anyone has
such
> experience ?
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes




RE: Why I cannot see live nodes in a LAN-based cluster setup?

2011-06-27 Thread Jeff.Schmitz
http://www.mentby.com/tim-robertson/error-register-getprotocolversion.html



-Original Message-
From: Jingwei Lu [mailto:j...@ucsd.edu] 
Sent: Monday, June 27, 2011 3:58 PM
To: common-user@hadoop.apache.org
Subject: Re: Why I cannot see live nodes in a LAN-based cluster setup?

Hi,

I just manually modify the masters & slaves files in the both machines.

I found something wrong in the log files, as shown below:

-- Master :
namenote.log:


2011-06-27 13:44:47,055 INFO org.mortbay.log: jetty-6.1.14
2011-06-27 13:44:47,394 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50070
2011-06-27 13:44:47,395 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
0.0.0.0:50070
2011-06-27 13:44:47,395 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2011-06-27 13:44:47,395 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 54310: starting
2011-06-27 13:44:47,396 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 54310: starting
2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 54310: starting
2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 54310: starting
2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 54310: starting
2011-06-27 13:44:47,402 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 54310: starting
2011-06-27 13:44:47,404 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 54310: starting
2011-06-27 13:44:47,406 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 54310: starting
2011-06-27 13:44:47,406 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 54310: starting
2011-06-27 13:44:47,406 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 54310: starting
2011-06-27 13:44:47,408 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 54310: starting
2011-06-27 13:44:47,500 INFO org.apache.hadoop.ipc.Server: Error register
getProtocolVersion
java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersion
at
org.apache.hadoop.metrics.util.MetricsRegistry.add(MetricsRegistry.java:53)
at
org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.(MetricsTimeVaryingRate.java:89)
at
org.apache.hadoop.metrics.util.MetricsTimeVaryingRate.(MetricsTimeVaryingRate.java:99)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2011-06-27 13:45:02,572 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage
DS-87816363-127.0.0.1-50010-1309207502566



-- slave:
datanode.log:


  1 2011-06-27 13:45:00,335 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
  2 /
  3 STARTUP_MSG: Starting DataNode
  4 STARTUP_MSG:   host = hdl.ucsd.edu/127.0.0.1
  5 STARTUP_MSG:   args = []
  6 STARTUP_MSG:   version = 0.20.2
  7 STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
  8 /
  9 2011-06-27 13:45:02,476 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 0 time(s).
 10 2011-06-27 13:45:03,549 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 1 time(s).
 11 2011-06-27 13:45:04,552 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 2 time(s).
 12 2011-06-27 13:45:05,609 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 3 time(s).
 13 2011-06-27 13:45:06,640 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 4 time(s).
 14 2011-06-27 13:45:07,643 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 5 time(s).
 15 2011-06-27 13:45:08,646 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 6 time(s).
 16 2011-06-27 13:45:09,661 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 7 time(s).
 17 2011-06-27 13:45:10,664 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hdl.ucsd.edu/127.0.0.1:54310. Already tried 8 time(s).
 18 2011-06-27 13:45:11,678 INFO org.apache.hadoop.ipc.Clien

RE: Why I cannot see live nodes in a LAN-based cluster setup?

2011-06-28 Thread Jeff.Schmitz
You may also try removing the hadoop-"yourname" directory from /tmp - and 
reformatting HDFS - it may be corrupted

-Original Message-
From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.com] 
Sent: Monday, June 27, 2011 10:57 PM
To: common-user@hadoop.apache.org
Subject: RE: Why I cannot see live nodes in a LAN-based cluster setup?

At this point if that is the correct ip then I would see if you can actually 
ssh from the DN to the NN to make sure it can actually connect to the other 
box. If you can successfully connect through ssh then it's just a matter of 
figuring out why that port is having issues (netstat is your friend in this 
case). If you see it listening on 54310 then just power cycle the box and try 
again.

Matt

-Original Message-
From: Jingwei Lu [mailto:j...@ucsd.edu] 
Sent: Monday, June 27, 2011 5:38 PM
To: common-user@hadoop.apache.org
Subject: Re: Why I cannot see live nodes in a LAN-based cluster setup?

Hi Matt and Jeff:

Thanks a lot for your instructions. I corrected the mistakes in conf files
of DN, and now the log on DN becomes:

2011-06-27 15:32:36,025 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 0 time(s).
2011-06-27 15:32:37,028 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 1 time(s).
2011-06-27 15:32:38,031 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 2 time(s).
2011-06-27 15:32:39,034 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 3 time(s).
2011-06-27 15:32:40,037 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 4 time(s).
2011-06-27 15:32:41,040 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 5 time(s).
2011-06-27 15:32:42,043 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 6 time(s).
2011-06-27 15:32:43,046 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 7 time(s).
2011-06-27 15:32:44,049 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 8 time(s).
2011-06-27 15:32:45,052 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: clock.ucsd.edu/132.239.95.91:54310. Already tried 9 time(s).
2011-06-27 15:32:45,053 INFO org.apache.hadoop.ipc.RPC: Server at
clock.ucsd.edu/132.239.95.91:54310 not available yet, Z...

Seems DN is trying to bind with NN but always fails...



Best Regards
Yours Sincerely

Jingwei Lu



On Mon, Jun 27, 2011 at 2:22 PM, GOEKE, MATTHEW (AG/1000) <
matthew.go...@monsanto.com> wrote:

> As a follow-up to what Jeff posted: go ahead and ignore the message you got
> on the NN for now.
>
> If you look at the address that the DN log shows it is 127.0.0.1 and the
> ip:port it is trying to connect to for the NN is 127.0.0.1:54310 ---> it
> is trying to bind to itself as if it was still in single machine mode. Make
> sure that you have correctly pushed the URI for the NN into the config files
> on both machines and then bounce DFS.
>
> Matt
>
> -Original Message-
> From: jeff.schm...@shell.com [mailto:jeff.schm...@shell.com]
> Sent: Monday, June 27, 2011 4:08 PM
> To: common-user@hadoop.apache.org
> Subject: RE: Why I cannot see live nodes in a LAN-based cluster setup?
>
> http://www.mentby.com/tim-robertson/error-register-getprotocolversion.html
>
>
>
> -Original Message-
> From: Jingwei Lu [mailto:j...@ucsd.edu]
> Sent: Monday, June 27, 2011 3:58 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Why I cannot see live nodes in a LAN-based cluster setup?
>
> Hi,
>
> I just manually modify the masters & slaves files in the both machines.
>
> I found something wrong in the log files, as shown below:
>
> -- Master :
> namenote.log:
>
> 
> 2011-06-27 13:44:47,055 INFO org.mortbay.log: jetty-6.1.14
> 2011-06-27 13:44:47,394 INFO org.mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50070
> 2011-06-27 13:44:47,395 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Web-server up at:
> 0.0.0.0:50070
> 2011-06-27 13:44:47,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2011-06-27 13:44:47,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 54310: starting
> 2011-06-27 13:44:47,396 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 54310: starting
> 2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 54310: starting
> 2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 54310: starting
> 2011-06-27 13:44:47,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 5431

RE: conferences

2011-06-29 Thread Jeff.Schmitz
http://developer.yahoo.com/events/hadoopsummit2011/

There will also be a lot about Hadoop at OSCON

http://www.oscon.com/oscon2011

I believe Hadoop world is in NYC in November

-Original Message-
From: Keren Ouaknine [mailto:ker...@gmail.com] 
Sent: Wednesday, June 29, 2011 6:34 AM
To: common-user@hadoop.apache.org
Subject: conferences

Hello,

I would like to find the list of prestigious conferences related to
Hadoop.
Where can I find the list of these? Thanks!

Keren

-- 
Keren Ouaknine
Cell: +972 54 2565404
Web: www.kereno.com



RE: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-05 Thread Jeff.Schmitz
Um kill  -9 "pid" ?

-Original Message-
From: Juwei Shi [mailto:shiju...@gmail.com] 
Sent: Friday, July 01, 2011 10:53 AM
To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
Subject: Jobs are still in running state after executing "hadoop job
-kill jobId"

Hi,

I faced a problem that the jobs are still running after executing
"hadoop
job -kill jobId". I rebooted the cluster but the job still can not be
killed.

The hadoop version is 0.20.2.

Any idea?

Thanks in advance!

-- 
- Juwei



RE: HTTP Error

2011-07-07 Thread Jeff.Schmitz
Adarsh,

You could also run from command line

[root@xxx bin]# ./hadoop dfsadmin -report
Configured Capacity: 1151948095488 (1.05 TB)
Present Capacity: 1059350446080 (986.6 GB)
DFS Remaining: 1056175992832 (983.64 GB)
DFS Used: 3174453248 (2.96 GB)
DFS Used%: 0.3%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Datanodes available: 5 (5 total, 0 dead)




-Original Message-
From: dhru...@gmail.com [mailto:dhru...@gmail.com] On Behalf Of Dhruv
Kumar
Sent: Thursday, July 07, 2011 10:01 AM
To: common-user@hadoop.apache.org
Subject: Re: HTTP Error

1) Check with jps to see if all services are functioning.

2) Have you tried appending dfshealth.jsp at the end of the URL as the
404
says?

Try using this:
http://localhost:50070/dfshealth.jsp



On Thu, Jul 7, 2011 at 7:13 AM, Adarsh Sharma
wrote:

> Dear all,
>
> Today I am stucked with the strange problem in the running hadoop
cluster.
>
> After starting hadoop by bin/start-all.sh, all nodes are started. But
when
> I check through web UI ( MAster-Ip:50070), It shows :
>
>
>   HTTP ERROR: 404
>
> /dfshealth.jsp
>
> RequestURI=/dfshealth.jsp
>
> /Powered by Jetty:// 
> /
>
> /I check by command line that hadoop cannot able to get out of safe
mode.
> /
>
> /I know , manually command to leave safe mode
> /
>
> /bin/hadoop dfsadmin -safemode leave
> /
>
> /But How can I make hadoop  run properly and what are the reasons of
this
> error
> /
>
> /
> Thanks
> /
>
>
>



RE: Sanity check re: value of 10GbE NICs for Hadoop?

2011-07-11 Thread Jeff.Schmitz
Also there is info on this at Cloudera here

http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-
basic-hardware-recommendations/



-Original Message-
From: Saqib Jang -- Margalla Communications
[mailto:saq...@margallacomm.com] 
Sent: Tuesday, June 28, 2011 5:06 PM
To: common-user@hadoop.apache.org
Subject: RE: Sanity check re: value of 10GbE NICs for Hadoop?

Matt,
Thanks, this is helpful, I was wondering if you may have some thoughts
on the list of other potential benefits of 10GbE NICs for Hadoop
(listed in my original e-mail to the list)?

regards,
Saqib

-Original Message-
From: Matthew Foley [mailto:ma...@yahoo-inc.com] 
Sent: Tuesday, June 28, 2011 12:04 PM
To: common-user@hadoop.apache.org
Cc: Matthew Foley
Subject: Re: Sanity check re: value of 10GbE NICs for Hadoop?

Hadoop common provides an abstract FileSystem class, and Hadoop
applications
should be designed to run on that.  HDFS is just one implementation of a
valid Hadoop filesystem, and ports to S3 and KFS as well as OS-supported
LocalFileSystem are provided in Hadoop common.  Use of NFS-mounted
storage
would fall under the LocalFileSystem model.

However, one of the core values of Hadoop is the model of "bring the
computation to the data".  This does not seem viable with an NFS-based
NAS-model storage subsystem.  Thus, while it will "work" for small
clusters
and small jobs, it is unlikely to scale with high performance to
thousands
of nodes and petabytes of data in the way Hadoop can scale with HDFS or
S3.

--Matt


On Jun 28, 2011, at 10:41 AM, Darren Govoni wrote:

I see. However, Hadoop is designed to operate best with HDFS because of
its
inherent striping and blocking strategy - which is tracked by Hadoop.
Going outside of that mechanism will probably yield poor results and/or
confuse Hadoop.

Just my thoughts.

On 06/28/2011 01:27 PM, Saqib Jang -- Margalla Communications wrote:
> Darren,
> Thanks, the last pt was basically about 10GbE potentially allowing the

> use of a network file system e.g. via NFS as an alternative to HDFS, 
> the question is there any merit in this. Basically, I was exploring if

> the commercial clustered NAS products offer any high-availability or 
> data management benefits for use with Hadoop?
> 
> Saqib
> 
> -Original Message-
> From: Darren Govoni [mailto:dar...@ontrenet.com]
> Sent: Tuesday, June 28, 2011 10:21 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Sanity check re: value of 10GbE NICs for Hadoop?
> 
> Hadoop, like other parallel networked computation architectures is I/O

> bound, predominantly.
> This means any increase in network bandwidth is "A Good Thing" and can

> have drastic positive effects on performance. All your points stem 
> from this simple realization.
> 
> Although I'm confused by your #6. Hadoop already uses a distributed 
> file system. HDFS.
> 
> On 06/28/2011 01:16 PM, Saqib Jang -- Margalla Communications wrote:
>> Folks,
>> 
>> I've been digging into the potential benefits of using
>> 
>> 10 Gigabit Ethernet (10GbE) NIC server connections for
>> 
>> Hadoop and wanted to run what I've come up with
>> 
>> through initial research by the list for 'sanity check'
>> 
>> feedback. I'd very much appreciate your input on
>> 
>> the importance (or lack of it) of the following potential benefits of
>> 
>> 10GbE server connectivity as well as other thoughts regarding
>> 
>> 10GbE and Hadoop (My interest is specifically in the value
>> 
>> of 10GbE server connections and 10GbE switching infrastructure,
>> 
>> over scenarios such as bonded 1GbE server connections with
>> 
>> 10GbE switching).
>> 
>> 
>> 
>> 1.   HDFS Data Loading. The higher throughput enabled by 10GbE
>> 
>> server and switching infrastructure allows faster processing and
>> 
>> distribution of data.
>> 
>> 2.   Hadoop Cluster Scalability. High-performance for initial
data
>> processing
>> 
>> and distribution directly impacts the degree of parallelism or 
>> scalability supported
>> 
>> by the cluster.
>> 
>> 3.   HDFS Replication. Higher speed server connections allows
faster
>> file replication.
>> 
>> 4.   Map/Reduce Shuffle Phase. Improved end-to-end throughput and
>> latency directly impact the
>> 
>> shuffle phase of a data set reduction especially for tasks that are 
>> at the document level
>> 
>> (including large documents) and lots of metadata generated by those 
>> documents as well as video analytics and images.
>> 
>> 5.   Data Reporting. 10GbE server networking etwork performance
can
>> 
>> improve data reporting performance, especially if the Hadoop cluster 
>> is running
>> 
>> multiple data reductions.
>> 
>> 6.   Support of Cluster File Systems.  With 10 GbE NICs, Hadoop
could
> be
>> reorganized
>> 
>> to use a cluster or network file system. This would allow Hadoop even

>> with its Java implementation
>> 
>> to have higher performance I/O and not have to be so concerned with 
>> disk drive density in the same server.
>> 

RE: Which release to use?

2011-07-18 Thread Jeff.Schmitz
Steve,

I read your blog nice post - I believe EMC is selling the Greenplumb
solution as an appliance - 

Cheers - 

Jeffery

-Original Message-
From: Steve Loughran [mailto:ste...@apache.org] 
Sent: Friday, July 15, 2011 4:07 PM
To: common-user@hadoop.apache.org
Subject: Re: Which release to use?

On 15/07/2011 18:06, Arun C Murthy wrote:
> Apache Hadoop is a volunteer driven, open-source project. The
contributors to Apache Hadoop, both individuals and folks across a
diverse set of organizations, are committed to driving the project
forward and making timely releases - see discussion on hadoop-0.23 with
a raft newer features such as HDFS Federation, NextGen MapReduce and
plans for HA NameNode etc.
>
> As with most successful projects there are several options for
commercial support to Hadoop or its derivatives.
>
> However, Apache Hadoop has thrived before there was any commercial
support (I've personally been involved in over 20 releases of Apache
Hadoop and deployed them while at Yahoo) and I'm sure it will in this
new world order.
>
> We, the Apache Hadoop community, are committed to keeping Apache
Hadoop 'free', providing support to our users and to move it forward at
a rapid rate.
>

Arun makes a good point which is that the Apache project depends on 
contributions from the community to thrive. That includes

  -bug reports
  -patches to fix problems
  -more tests
  -documentation improvements: more examples, more on getting started, 
troubleshooting, etc.

If there's something lacking in the codebase, and you think you can fix 
it, please do so. Helping with the documentation is a good start, as it 
can be improved, and you aren't going to break anything.

Once you get into changing the code, you'll end up working with the head

of whichever branch you are targeting.

The other area everyone can contribute on is testing. Yes, Y! and FB can

test at scale, yes, other people can test large clusters too -but nobody

has a network that looks like yours but you. And Hadoop does care about 
network configurations. Testing beta and release candidate releases in 
your infrastructure, helps verify that the final release will work on 
your site, and you don't end up getting all the phone calls about 
something not working




RE: Which release to use?

2011-07-18 Thread Jeff.Schmitz

 Most people are using CH3 - if you need some features from another
distro use that - 

http://www.cloudera.com/hadoop/

I wonder if the Cloudera people realize that CH3 was a pretty happening
punk band back in the day (if not they do now = )

http://en.wikipedia.org/wiki/Channel_3_%28band%29

cheers - 


Jeffery Schmitz
Projects and Technology
3737 Bellaire Blvd Houston, Texas 77001
Tel: +1-713-245-7326 Fax: +1 713 245 7678
Email: jeff.schm...@shell.com
Intergalactic Proton Powered Electrical Tentacled Advertising Droids!





-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com] 
Sent: Monday, July 18, 2011 2:10 PM
To: common-user@hadoop.apache.org
Subject: RE: Which release to use?


Tom,

I'm not sure that you're really honoring the purpose and approach of
this list.

I mean on the one hand, you're not under any obligation to respond or
participate on the list. And I can respect that. You're not in an S&D
role so you're not 'customer facing' and not used to having to deal with
these types of questions.

On the other, you're not being free with your information. So when this
type of question comes up, it becomes very easy to discount IBM as a
release or source provider for commercial support.

Without information, I'm afraid that I may have to make recommendations
to my clients that may be out of date.

There is even some speculation from analysts that recent comments from
IBM are more of an indication that IBM is still not ready for prime
time. 

I'm sorry you're not in a position to detail your offering.

Maybe by September you might be ready and then talk to our CHUG?

-Mike



> To: common-user@hadoop.apache.org
> Subject: Re: Which release to use?
> From: tdeut...@us.ibm.com
> Date: Sat, 16 Jul 2011 10:29:55 -0700
> 
> Hi Rita - I want to make sure we are honoring the purpose/approach of
this 
> list. So you are welcome to ping me for information, but let's take
this 
> discussion off the list at this point.
> 
> 
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> IBM
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> tdeut...@us.ibm.com
> 
> 
> 
> 
> Rita  
> 07/16/2011 08:53 AM
> Please respond to
> common-user@hadoop.apache.org
> 
> 
> To
> common-user@hadoop.apache.org
> cc
> 
> Subject
> Re: Which release to use?
> 
> 
> 
> 
> 
> 
> I am curious about the IBM product BigInishgts. Where can we download
it? 
> It
> seems we have to register to download it?
> 
> 
> On Fri, Jul 15, 2011 at 12:38 PM, Tom Deutsch 
wrote:
> 
> > One quick clarification - IBM GA'd a product called BigInsights in
2Q. 
> It
> > faithfully uses the Hadoop stack and many related projects - but 
> provides
> > a number of extensions (that are compatible) based on customer
requests.
> > Not appropriate to say any more on this list, but the info on it is
all
> > publically available.
> >
> >
> > 
> > Tom Deutsch
> > Program Director
> > CTO Office: Information Management
> > Hadoop Product Manager / Customer Exec
> > IBM
> > 3565 Harbor Blvd
> > Costa Mesa, CA 92626-1420
> > tdeut...@us.ibm.com
> >
> >
> >
> >
> > Michael Segel 
> > 07/15/2011 07:58 AM
> > Please respond to
> > common-user@hadoop.apache.org
> >
> >
> > To
> > 
> > cc
> >
> > Subject
> > RE: Which release to use?
> >
> >
> >
> >
> >
> >
> >
> > Unfortunately the picture is a bit more confusing.
> >
> > Yahoo! is now HortonWorks. Their stated goal is to not have their
own
> > derivative release but to sell commercial support for the official 
> Apache
> > release.
> > So those selling commercial support are:
> > *Cloudera
> > *HortonWorks
> > *MapRTech
> > *EMC (reselling MapRTech, but had announced their own)
> > *IBM (not sure what they are selling exactly... still seems like
smoke 
> and
> > mirrors...)
> > *DataStax
> >
> > So while you can use the Apache release, it may not make sense for
your
> > organization to do so. (Said as I don the flame retardant suit...)
> >
> > The issue is that outside of HortonWorks which is stating that they
will
> > support the official Apache release, everything else is a derivative

> work
> > of Apache's Hadoop. From what I have seen, Cloudera's release is the
> > closest to the Apache release.
> >
> > Like I said, things are getting interesting.
> >
> > HTH
> >
> >
> >
> >
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--
> 
  



RE: Hadoop Discrete Event Simulator

2011-07-19 Thread Jeff.Schmitz
Maneesh, 

You may want to check this out

https://issues.apache.org/jira/browse/HADOOP-5005

-Original Message-
From: maneesh varshney [mailto:mvarsh...@gmail.com] 
Sent: Monday, July 18, 2011 8:09 PM
To: common-user@hadoop.apache.org
Subject: Hadoop Discrete Event Simulator

Hello,

Perhaps somebody can point out if there have been efforts to "simulate"
Hadoop clusters.

What I mean is a discrete event simulator that models the hosts and the
networks and run hadoop algorithms for some synthetic workload.
Something
similar to network simulators (for example, ns2).

If such as tool is available, I was hoping to use it for:
a. Getting a general sense of how the HDFS and MapReduce algorithms
work.
For example, if I were to store 1TB data over 100 nodes, how would the
blocks get distributed.
b. Use the simulation to optimize my configuration parameters. For
example,
the relationship between performance and number of cluster node, or
number
of replicas, and so on.

The need for point b. above is to be able to study/analyze the
performance
without (or before) actually running the algorithms on an actual
cluster.

Thanks in advance,
Maneesh

PS: I apologize if this question has been asked earlier. I could not
seem to
locate the search feature in the mailing list archive.



RE: Hadoop upgrade Java version

2011-07-19 Thread Jeff.Schmitz
I am using this

java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03)
Java HotSpot(TM) 64-Bit Server VM (build 14.1-b02, mixed mode)

With latest release and it works fine - 

Cheers - 

JGS

-Original Message-
From: highpointe [mailto:highpoint...@gmail.com] 
Sent: Monday, July 18, 2011 5:45 PM
To: high pointe
Cc: common-user@hadoop.apache.org
Subject: Re: Hadoop upgrade Java version

So uhm yeah. Thanks for the Informica  commercial. 

Now back to my original question. 

Anyone have a suggestion on what version of Java I should be using with
the latest Hadoop release. 

Sent from my iPhone

On Jul 18, 2011, at 11:26 AM, high pointe 
wrote:

> We are in the process of upgrading to the most current version of
Hadoop.
> 
> At the same time we are in need of upgrading Java.  We are currently
running u17.
> 
> I have read elsewhere that u21 or up is the best route to go.
Currently the version is u26.
> 
> Has anyone gone all the way to u26 with or without issues?
> 
> Thanks for the help.




RE: localhost permission denied

2011-07-19 Thread Jeff.Schmitz
Your SSH isn't setup properly

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the
following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start The hadoop daemons:
$ bin/start-all.sh

http://hadoop.apache.org/common/docs/r0.17.0/quickstart.html#Setup+passp
hraseless

cheers - 

JGS




-Original Message-
From: Kobina Kwarko [mailto:kobina.kwa...@gmail.com] 
Sent: Tuesday, July 19, 2011 2:48 PM
To: common-user@hadoop.apache.org
Subject: localhost permission denied

Hello,

Please any assistance?? I am using Hadoop for a school project and
managed
to install it on two computers testing with the wordcount example.
However,
after stopping Hadoop and restarting the computers (Ubuntu Server 10.10)
I
am getting the following error:

root@localhost's password: localhost: Permission denied, please try
again.

If I enter the administrative password the same message comes again
preventing me from starting Hadoop.

What am I getting wrong? Has anyone encountered such error before?

I'm using Hadoop 0.20.203.

thanks in advance.

Kobina.