org.apache.hadoop.ipc.StandbyException occurs at the thirty of per hour in standby NN

2014-01-24 Thread Francis . Hu
hello, All

 

Installed 2 NN and 3 DN in my hadoop-2.2.0 cluster,and implemented HDFS HA
with QJM. Currently, looking at the log of standby NN ,it throws below
exception at a regular interval, one hour:

 

2014-01-24 03:30:01,245 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:ubuntu (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby

 

Actually, the active NN is working ,and no applications access HDFS with
error.

 

Does anyone know what the problem is ?

 

 

 

Thanks,

Francis.Hu



Re: org.apache.hadoop.ipc.StandbyException occurs at the thirty of per hour in standby NN

2014-01-24 Thread Harsh J
Hi Francis,

This is a non-worry, but you're basically hitting
https://issues.apache.org/jira/browse/HDFS-3447. A temporary
workaround could be to disable the UGI at the logging configuration
level.

On Fri, Jan 24, 2014 at 2:46 PM, Francis.Hu
francis...@reachjunction.com wrote:
 hello, All



 Installed 2 NN and 3 DN in my hadoop-2.2.0 cluster,and implemented HDFS HA
 with QJM. Currently, looking at the log of standby NN ,it throws below
 exception at a regular interval, one hour:



 2014-01-24 03:30:01,245 ERROR
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
 as:ubuntu (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
 Operation category READ is not supported in state standby



 Actually, the active NN is working ,and no applications access HDFS with
 error.



 Does anyone know what the problem is ?







 Thanks,

 Francis.Hu



-- 
Harsh J


Re: hdfs fsck -locations

2014-01-24 Thread Mark Kerzner
Here is an example

 hdfs fsck /user/mark/data/word_count.csv
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
/user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
.Status: HEALTHY
 Total size: 7217 B
 Total dirs: 0
 Total files: 1
 Total blocks (validated): 1 (avg. block size 7217 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds



On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.

 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  Hi,
 
  hdfs fsck -locations
 
  is supposed to show every block with its location? Is location the ip of
 the
  datanode?
 
  Thank you,
  Mark



 --
 Harsh J



RE: HDFS buffer sizes

2014-01-24 Thread John Lilley
Ah, I see... it is a constant
CommonConfigurationKeysPublic.java:  public static final int 
IO_FILE_BUFFER_SIZE_DEFAULT = 4096;
Are there benefits to increasing this for large reads or writes?
john

From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
Sent: Thursday, January 23, 2014 3:31 PM
To: user@hadoop.apache.org
Subject: Re: HDFS buffer sizes

HDFS does not appear to use dfs.stream-buffer-size.

On Thu, Jan 23, 2014 at 6:57 AM, John Lilley 
john.lil...@redpoint.netmailto:john.lil...@redpoint.net wrote:
What is the interaction between dfs.stream-buffer-size and 
dfs.client-write-packet-size?
I see that the default for dfs.stream-buffer-size is 4K.  Does anyone have 
experience using larger buffers to optimize large writes?
Thanks
John



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Fw: Hadoop 2 Namenode HA not working properly

2014-01-24 Thread Bruno Andrade


Begin forwarded message:

Date: Tue, 21 Jan 2014 09:35:23 +
From: Bruno Andrade b...@eurotux.com
To: user@hadoop.apache.org
Subject: Re: Hadoop 2 Namenode HA not working properly


Hey,

this is my hdfs-site.xml - http://pastebin.com/qpELkwH8
this is my core-site.xml:

configuration
 property
 namefs.defaultFS/name
 valuehdfs://blabla-hadoop/value
 /property

 property
 namehadoop.tmp.dir/name
 value/opt/hadoop/hadoop/tmp/value
 /property
/configuration

I kill only the namenode process and nothing happen, then I killed zkfc 
process and the transition happend, the second namenode become active.

Thanks.



On 01/20/2014 06:44 PM, Jing Zhao wrote:
 Hi Bruno,

  Could you post your configuration? Also, when you killed one of
 the NN, you mean only killing the NN process or you shutdown the whole
 machine?

 Thanks,
 -Jing

 On Mon, Jan 20, 2014 at 4:11 AM, Bruno Andrade b...@eurotux.com
 wrote:
 Hey,

 I have configured a Hadoop v2.2.0 cluster with QJM and Zookeeper for
 HA and automatic failover.
 But I'm having a problem. If I test the automatic failover, by
 killing one of the namenodes, nothing happens. But if I kill the
 zkfc of that namenode, then zookeeper elects the other namenode as
 active.

 What can it be the problem?

 Thanks.

 --
 Bruno Andrade b...@eurotux.com
 Programador (ID)
 Eurotux Informática, S.A. | www.eurotux.com
 (t) +351 253 680 300 (m) +351 936 293 858

-- 
Bruno Andrade b...@eurotux.com
Programador (ID)
Eurotux Informática, S.A. | www.eurotux.com
(t) +351 253 680 300 (m) +351 936 293 858



-- 
Bruno Andrade b...@eurotux.com
Programador (ID)
Eurotux Informática, S.A. | www.eurotux.com
(t) +351 253 680 300 (m) +351 936 293 858

  
  
Hey,

this is my hdfs-site.xml - http://pastebin.com/qpELkwH8
this is my core-site.xml:

  configuration
   property
namefs.defaultFS/name
valuehdfs://blabla-hadoop/value
   /property
  
   property
namehadoop.tmp.dir/name
value/opt/hadoop/hadoop/tmp/value
   /property
  /configuration

I kill only the namenode process and nothing happen, then I killed
zkfc process and the transition happend, the second namenode become
active.

Thanks.



On 01/20/2014 06:44 PM, Jing Zhao
  wrote:


  Hi Bruno,

Could you post your configuration? Also, when you killed one of
the NN, you mean only killing the NN process or you shutdown the whole
machine?

Thanks,
-Jing

On Mon, Jan 20, 2014 at 4:11 AM, Bruno Andrade b...@eurotux.com wrote:

  
Hey,

I have configured a Hadoop v2.2.0 cluster with QJM and Zookeeper for HA and
automatic failover.
But I'm having a problem. If I test the automatic failover, by killing one
of the namenodes, nothing happens. But if I kill the zkfc of that namenode,
then zookeeper elects the other namenode as active.

What can it be the problem?

Thanks.

--
Bruno Andrade b...@eurotux.com
Programador (ID)
Eurotux Informtica, S.A. | www.eurotux.com
(t) +351 253 680 300 (m) +351 936 293 858

  
  



-- 
Bruno Andrade b...@eurotux.com
Programador (ID)
Eurotux Informtica, S.A. | www.eurotux.com
(t) +351 253 680 300 (m) +351 936 293 858
  



No space left on device during merge.

2014-01-24 Thread Tim Potter

Hi,
  I'm getting the below error while trying to sort a lot of data with Hadoop.

I strongly suspect the node the merge is on is running out of local disk space. 
Assuming this is the case, is there any way
to get around this limitation considering I can't increase the local disk space 
available on the nodes?  Like specify sort/merge parameters or similar.

Thanks,
  Tim.

2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.lzo_deflate]
2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 100 segments left of total size: 642610678884 bytes
2014-01-24 10:02:36,281 ERROR [main] 
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
as:XX (auth:XX) 
cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in OnDiskMerger - Thread to merge on-disk map-outputs
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on 
device
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
at 
org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
at 
org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
at 
org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
at 
org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
at 
org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:318)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
... 14 more

2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
cleanup for the task



Fwd: HDFS data transfer is faster than SCP based transfer?

2014-01-24 Thread rab ra
Hi

Can anyone please answer my query?

-Rab
-- Forwarded message --
From: rab ra rab...@gmail.com
Date: 24 Jan 2014 10:55
Subject: HDFS data transfer is faster than SCP based transfer?
To: user@hadoop.apache.org

Hello

I have a use case that requires transfer of input files from remote storage
using SCP protocol (using jSCH jar).  To optimize this use case, I have
pre-loaded all my input files into HDFS and modified my use case so that it
copies required files from HDFS. So, when tasktrackers works, it copies
required number of input files to its local directory from HDFS. All my
tasktrackers are also datanodes. I could see my use case has run faster.
The only modification in my application is that file copy from HDFS instead
of transfer using SCP. Also, my use case involves parallel operations (run
in tasktrackers) and they do lot of file transfer. Now all these transfers
are replaced with HDFS copy.

Can anyone tell me HDFS transfer is faster as I witnessed? Is it because,
it uses TCP/IP? Can anyone give me reasonable reasons to support the
decrease of time?


with thanks and regards
rab


Re: HDFS buffer sizes

2014-01-24 Thread Arpit Agarwal
I don't think that value is used either except in the legacy block reader
which is turned off by default.


On Fri, Jan 24, 2014 at 6:34 AM, John Lilley john.lil...@redpoint.netwrote:

  Ah, I see… it is a constant

 CommonConfigurationKeysPublic.java:  public static final int
 IO_FILE_BUFFER_SIZE_DEFAULT = 4096;

 Are there benefits to increasing this for large reads or writes?

 john



 *From:* Arpit Agarwal [mailto:aagar...@hortonworks.com]
 *Sent:* Thursday, January 23, 2014 3:31 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: HDFS buffer sizes



 HDFS does not appear to use dfs.stream-buffer-size.



 On Thu, Jan 23, 2014 at 6:57 AM, John Lilley john.lil...@redpoint.net
 wrote:

 What is the interaction between dfs.stream-buffer-size and
 dfs.client-write-packet-size?

 I see that the default for dfs.stream-buffer-size is 4K.  Does anyone have
 experience using larger buffers to optimize large writes?

 Thanks


 John






 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


What is the fix for this error ?

2014-01-24 Thread Kokkula, Sada
-bash-4.1$ /usr/jdk64/jdk1.6.0_31/bin/javac -Xlint -classpath 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core-2.2.0.2.0.6.0-76.jar:/usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-76.jar:./hadoop-annotations-2.0.0-cdh4.0.1.jar
 WordCount.java WordCount.java:62: warning: [deprecation] 
Job(org.apache.hadoop.conf.Configuration,java.lang.String) in 
org.apache.hadoop.mapreduce.Job has been deprecated
 Job job = new Job(conf, WordCount);

Thanks,

The information contained in this e-mail, and any attachment, is confidential 
and is intended solely for the use of the intended recipient. Access, copying 
or re-use of the e-mail or any attachment, or any information contained 
therein, by any other person is not authorized. If you are not the intended 
recipient please return the e-mail to the sender and delete it from your 
computer. Although we attempt to sweep e-mail and attachments for viruses, we 
do not guarantee that either are virus-free and accept no liability for any 
damage sustained as a result of viruses. 

Please refer to http://disclaimer.bnymellon.com/eu.htm for certain disclosures 
relating to European legal entities.

Re: hdfs fsck -locations

2014-01-24 Thread Harsh J
Sorry, but what was the question? I also do not see a locations option flag.
On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote:

 Here is an example

  hdfs fsck /user/mark/data/word_count.csv
 Connecting to namenode via http://mark-7:50070
 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
 .Status: HEALTHY
  Total size: 7217 B
  Total dirs: 0
  Total files: 1
  Total blocks (validated): 1 (avg. block size 7217 B)
  Minimally replicated blocks: 1 (100.0 %)
  Over-replicated blocks: 0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor: 1
  Average block replication: 1.0
  Corrupt blocks: 0
  Missing replicas: 0 (0.0 %)
  Number of data-nodes: 1
  Number of racks: 1
 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds



 On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.

 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  Hi,
 
  hdfs fsck -locations
 
  is supposed to show every block with its location? Is location the ip
 of the
  datanode?
 
  Thank you,
  Mark



 --
 Harsh J





Re: hdfs fsck -locations

2014-01-24 Thread Mark Kerzner
Sorry, did not copy the full command

hdfs fsck /user/mark/data/word_count.csv -locations
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
/user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014
.Status: HEALTHY
 Total size: 7217 B
 Total dirs: 0
 Total files: 1
 Total blocks (validated): 1 (avg. block size 7217 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds


The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY



On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.com wrote:

 Sorry, but what was the question? I also do not see a locations option
 flag.
 On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote:

 Here is an example

  hdfs fsck /user/mark/data/word_count.csv
 Connecting to namenode via http://mark-7:50070
 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
 .Status: HEALTHY
  Total size: 7217 B
  Total dirs: 0
  Total files: 1
  Total blocks (validated): 1 (avg. block size 7217 B)
  Minimally replicated blocks: 1 (100.0 %)
  Over-replicated blocks: 0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor: 1
  Average block replication: 1.0
  Corrupt blocks: 0
  Missing replicas: 0 (0.0 %)
  Number of data-nodes: 1
  Number of racks: 1
 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds



 On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.

 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  Hi,
 
  hdfs fsck -locations
 
  is supposed to show every block with its location? Is location the ip
 of the
  datanode?
 
  Thank you,
  Mark



 --
 Harsh J





RE: hdfs fsck -locations

2014-01-24 Thread Nascimento, Rodrigo
I'm not seeing locations flag yet.

Rod Nascimento
Systems Engineer @ Brazil

People don't buy WHAT you do. They buy WHY you do it.

From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
Sent: Friday, January 24, 2014 3:16 PM
To: Hadoop User
Subject: Re: hdfs fsck -locations

Sorry, did not copy the full command

hdfs fsck /user/mark/data/word_count.csv -locations
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014
.Status: HEALTHY
 Total size:   7217 B
 Total dirs:   0
 Total files:  1
 Total blocks (validated):1 (avg. block size 7217 B)
 Minimally replicated blocks:  1 (100.0 %)
 Over-replicated blocks:  0 (0.0 %)
 Under-replicated blocks:0 (0.0 %)
 Mis-replicated blocks:0 (0.0 %)
 Default replication factor:  1
 Average block replication: 1.0
 Corrupt blocks:  0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds


The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY


On Fri, Jan 24, 2014 at 11:08 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:

Sorry, but what was the question? I also do not see a locations option flag.
On Jan 24, 2014 7:17 PM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
Here is an example

 hdfs fsck /user/mark/data/word_count.csv
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
.Status: HEALTHY
 Total size: 7217 B
 Total dirs: 0
 Total files: 1
 Total blocks (validated): 1 (avg. block size 7217 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds


On Fri, Jan 24, 2014 at 4:34 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:
Hi Mark,

Yes, the locations are shown as IP.

On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
 Hi,

 hdfs fsck -locations

 is supposed to show every block with its location? Is location the ip of the
 datanode?

 Thank you,
 Mark


--
Harsh J




RE: hdfs fsck -locations

2014-01-24 Thread Nascimento, Rodrigo
Hi Mark,

It is a sample from my sandbox. Your question is about the part that is in RED 
at the output below, right?

[root@sandbox ~]# hdfs fsck /user/ambari-qa/passwd  -locations
Connecting to namenode via http://sandbox.hortonworks.com:50070
FSCK started by root (auth:SIMPLE) from /172.16.13.30 for path 
/user/ambari-qa/passwd at Fri Jan 24 09:53:43 PST 2014
.
/user/ambari-qa/passwd:  Under replicated 
BP-1578958328-10.0.2.15-1382306880516:blk_1073742464_1640. Target Replicas is 3 
but found 1 replica(s).
Status: HEALTHY
 Total size:1708 B
 Total dirs:0
 Total files:1
 Total symlinks:0
 Total blocks (validated):1 (avg. block size 1708 B)
 Minimally replicated blocks:1 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:1 (100.0 %)
 Mis-replicated blocks:0 (0.0 %)
 Default replication factor:3
 Average block replication:1.0
 Corrupt blocks:0
 Missing replicas:2 (66.64 %)
 Number of data-nodes:1
 Number of racks:1
FSCK ended at Fri Jan 24 09:53:43 PST 2014 in 1 milliseconds


The filesystem under path '/user/ambari-qa/passwd' is HEALTHY
[root@sandbox ~]#

Rod Nascimento


From: Nascimento, Rodrigo [rodrigo.nascime...@netapp.com]
Sent: Friday, January 24, 2014 3:34 PM
To: user@hadoop.apache.org
Subject: RE: hdfs fsck -locations

I’m not seeing locations flag yet.

Rod Nascimento
Systems Engineer @ Brazil

People don’t buy WHAT you do. They buy WHY you do it.

From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
Sent: Friday, January 24, 2014 3:16 PM
To: Hadoop User
Subject: Re: hdfs fsck -locations

Sorry, did not copy the full command

hdfs fsck /user/mark/data/word_count.csv -locations
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014
.Status: HEALTHY
 Total size:   7217 B
 Total dirs:   0
 Total files:  1
 Total blocks (validated):1 (avg. block size 7217 B)
 Minimally replicated blocks:  1 (100.0 %)
 Over-replicated blocks:  0 (0.0 %)
 Under-replicated blocks:0 (0.0 %)
 Mis-replicated blocks:0 (0.0 %)
 Default replication factor:  1
 Average block replication: 1.0
 Corrupt blocks:  0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds


The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY


On Fri, Jan 24, 2014 at 11:08 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:

Sorry, but what was the question? I also do not see a locations option flag.
On Jan 24, 2014 7:17 PM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
Here is an example

 hdfs fsck /user/mark/data/word_count.csv
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
.Status: HEALTHY
 Total size: 7217 B
 Total dirs: 0
 Total files: 1
 Total blocks (validated): 1 (avg. block size 7217 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds


On Fri, Jan 24, 2014 at 4:34 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:
Hi Mark,

Yes, the locations are shown as IP.

On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
 Hi,

 hdfs fsck -locations

 is supposed to show every block with its location? Is location the ip of the
 datanode?

 Thank you,
 Mark


--
Harsh J




Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Adam Retter
Hi there,

We have several diverse large datasets to process (one set may be as
much as 27 TB), however all of the files in these datasets are binary
files. We need to be able to pass each binary file to several tools
running in the Map Reduce framework.
We already have a working pipeline of MapReduce tasks that receives
each binary file (as BytesWritable) and processes it, we have tested
it with very small test datasets so far.

For any particular data set, the size of the files involves varies
wildly with each file being anywhere between about 2 KB and 4 GB. With
that in mind we have tried to follow the advice to read the files into
a Sequence File in HDFS. To create the Sequence File we have a Map
Reduce Job that uses a SequenceFileOutputFormat[Text, BytesWritable].

We cannot split these files into chunks, they must be processed by our
tools in our mappers and reducers as complete files. The problem we
have is that BytesWritable appears to load the entire content of a
file into memory, and now that we are trying to process our production
size datasets, once you get a couple of large files on the go, the JVM
throws the dreaded OutOfMemoryError.

What we need is someway to process these binary files, by reading and
writing their contents as Streams to and from the Sequence File. Or
really any other mechanism that does not involve loading the entire
file into RAM! Our own tools that we use in the mappers and reducers
in-fact expect to work with java.io.InputStream. We have tried quite a
few things now, including writing some custom Writable
implementations, but we then end up buffering data in temporary files
which is not exactly ideal when the data already exists in the
sequence files in HDFS.

Is there any hope?


Thanks Adam.

-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk


Re: Ambari upgrade 1.4.1 to 1.4.2

2014-01-24 Thread Vinod Kumar Vavilapalli
+user@ambari -user@hadoop

Please post ambari related questions to the ambari user mailing list.

Thanks
+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 9:15 AM, Kokkula, Sada 
sadanandam.kokk...@bnymellon.com wrote:



 Ambari-Server upgrade from 1.4.1 to 1.4.2 wipes out Ambari database during
 upgrade. After that, not able to open the Ambari Server GUI.

 Reviewed the Horton works web site for help, but the steps in doc plan not
 help out to fix the issue.



 Appreciated for any updates.



 Thanks,

 The information contained in this e-mail, and any attachment, is
 confidential and is intended solely for the use of the intended recipient.
 Access, copying or re-use of the e-mail or any attachment, or any
 information contained therein, by any other person is not authorized. If
 you are not the intended recipient please return the e-mail to the sender
 and delete it from your computer. Although we attempt to sweep e-mail and
 attachments for viruses, we do not guarantee that either are virus-free and
 accept no liability for any damage sustained as a result of viruses.

 Please refer to http://disclaimer.bnymellon.com/eu.htm for certain
 disclosures relating to European legal entities.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDFS data transfer is faster than SCP based transfer?

2014-01-24 Thread Vinod Kumar Vavilapalli
Is it a single file? Lots of files? How big are the files? Is the copy on a
single node or are you running some kind of a MapReduce program?

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 7:21 AM, rab ra rab...@gmail.com wrote:

 Hi

 Can anyone please answer my query?

 -Rab
 -- Forwarded message --
 From: rab ra rab...@gmail.com
 Date: 24 Jan 2014 10:55
 Subject: HDFS data transfer is faster than SCP based transfer?
 To: user@hadoop.apache.org

 Hello

 I have a use case that requires transfer of input files from remote
 storage using SCP protocol (using jSCH jar).  To optimize this use case, I
 have pre-loaded all my input files into HDFS and modified my use case so
 that it copies required files from HDFS. So, when tasktrackers works, it
 copies required number of input files to its local directory from HDFS. All
 my tasktrackers are also datanodes. I could see my use case has run faster.
 The only modification in my application is that file copy from HDFS instead
 of transfer using SCP. Also, my use case involves parallel operations (run
 in tasktrackers) and they do lot of file transfer. Now all these transfers
 are replaced with HDFS copy.

 Can anyone tell me HDFS transfer is faster as I witnessed? Is it because,
 it uses TCP/IP? Can anyone give me reasonable reasons to support the
 decrease of time?


 with thanks and regards
 rab


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: HDFS federation configuration

2014-01-24 Thread AnilKumar B
Thanks Suresh.

I followed the link it's clear now.

But client side configuration is not covered on the doc.

Thanks  Regards,
B Anil Kumar.


On Thu, Jan 23, 2014 at 11:44 PM, Suresh Srinivas sur...@hortonworks.comwrote:

 Have you looked at -
 http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/Federation.html
 ?



 On Thu, Jan 23, 2014 at 9:35 AM, AnilKumar B akumarb2...@gmail.comwrote:

 Hi,

 We tried setting up HDFS name node federation set up with 2 name nodes. I
 am facing few issues.

 Can any one help me in understanding below points?

 1) how can we configure different namespaces to different name node?
 Where exactly we need to configure this?

 See the documentation. If it is not clear, please open a jira.



 2) After formatting each NN with one cluster id, Do we need to set this
 cluster id in hdfs-site.xml?

 There is no need to set the cluster id in hdfs-site.xml



 3) I am getting exception like, data dir already locked by one of the NN,
 But when don't specify data.dir, then it's not showing exception. So what
 could be the issue?


 Are you running the two namenode processes on the same machine?


  Thanks  Regards,
 B Anil Kumar.




 --
 http://hortonworks.com/download/

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Vinod Kumar Vavilapalli
Is your data in any given file a bunch of key-value pairs? If that isn't
the case, I'm wondering how writing a single large key-value into a
sequence file helps. It won't. May be you can give an example of your input
data?

If indeed they are a bunch of smaller sized key-value pairs, you can write
your own custom InputFormat that reads the data from your input files one
k-v pair after another, and feed it to your MR job. There isn't any need
for converting them to sequence-files at that point.

Thanks
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: No space left on device during merge.

2014-01-24 Thread Vinod Kumar Vavilapalli
That's a lot of data to process for a single reducer. You should try
increasing the number of reducers to achieve more parallelism and also try
modifying your logic to avoid significant skew in the reducers.

Unfortunately this means rethinking about your app, but that's the only way
about it. It will also help you scale smoothly into the future if you have
adjustable parallelism and more balanced data processing.

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter t...@yahoo-inc.com wrote:

  Hi,
   I'm getting the below error while trying to sort a lot of data with Hadoop.

 I strongly suspect the node the merge is on is running out of local disk 
 space. Assuming this is the case, is there any way
 to get around this limitation considering I can't increase the local disk 
 space available on the nodes?  Like specify sort/merge parameters or similar.

 Thanks,
   Tim.

 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
 Got brand-new decompressor [.lzo_deflate]
 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
 the last merge-pass, with 100 segments left of total size: 642610678884 bytes
 2014-01-24 10:02:36,281 ERROR [main] 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:XX (auth:XX) 
 cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
 Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left 
 on device
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   at 
 org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
   at 
 org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
   at 
 org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
   at 
 org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
   at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
 Caused by: java.io.IOException: No space left on device
   at java.io.FileOutputStream.writeBytes(Native Method)
   at java.io.FileOutputStream.write(FileOutputStream.java:318)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
   ... 14 more

 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
 cleanup for the task



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: hdfs fsck -locations

2014-01-24 Thread Mark Kerzner
hdfs fsck /user/mark/data/word_count.csv *-locations*


On Fri, Jan 24, 2014 at 11:34 AM, Nascimento, Rodrigo 
rodrigo.nascime...@netapp.com wrote:

  I’m not seeing locations flag yet.



 *Rod Nascimento*

 *Systems Engineer @ Brazil*



 *People **don’t** buy **WHAT** you do. They buy **WHY** you do it.*



 *From:* Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
 *Sent:* Friday, January 24, 2014 3:16 PM
 *To:* Hadoop User
 *Subject:* Re: hdfs fsck -locations



 Sorry, did not copy the full command



 hdfs fsck /user/mark/data/word_count.csv -locations

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014

 .Status: HEALTHY

  Total size:   7217 B

  Total dirs:   0

  Total files:  1

  Total blocks (validated):1 (avg. block size 7217 B)

  Minimally replicated blocks:  1 (100.0 %)

  Over-replicated blocks:  0 (0.0 %)

  Under-replicated blocks:0 (0.0 %)

  Mis-replicated blocks:0 (0.0 %)

  Default replication factor:  1

  Average block replication: 1.0

  Corrupt blocks:  0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes:  1

  Number of racks:   1

 FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds





 The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY





 On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.com wrote:

 Sorry, but what was the question? I also do not see a locations option
 flag.

 On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote:

 Here is an example



  hdfs fsck /user/mark/data/word_count.csv

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014

 .Status: HEALTHY

  Total size: 7217 B

  Total dirs: 0

  Total files: 1

  Total blocks (validated): 1 (avg. block size 7217 B)

  Minimally replicated blocks: 1 (100.0 %)

  Over-replicated blocks: 0 (0.0 %)

  Under-replicated blocks: 0 (0.0 %)

  Mis-replicated blocks: 0 (0.0 %)

  Default replication factor: 1

  Average block replication: 1.0

  Corrupt blocks: 0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes: 1

  Number of racks: 1

 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds





 On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.


 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  Hi,
 
  hdfs fsck -locations
 
  is supposed to show every block with its location? Is location the ip of
 the
  datanode?
 
  Thank you,
  Mark


   --
 Harsh J







Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Adam Retter
 Is your data in any given file a bunch of key-value pairs?

No. The content of each file itself is the value we are interested in,
and I guess that it's filename is the key.

 If that isn't the
 case, I'm wondering how writing a single large key-value into a sequence
 file helps. It won't. May be you can give an example of your input data?

Well from the Hadoop O'Reilly book, I rather got the impression that
HDFS does not like small files due to it's 64MB block size, and it is
instead recommended to place small files into a Sequence file. Is that
not the case?

Our input data really varies between 130 different file types, it
could be Microsoft Office documents, Video Recordings, Audio, CAD
diagrams etc.

 If indeed they are a bunch of smaller sized key-value pairs, you can write
 your own custom InputFormat that reads the data from your input files one
 k-v pair after another, and feed it to your MR job. There isn't any need for
 converting them to sequence-files at that point.

As I mentioned in my initial email, each file cannot be split up!

 Thanks
 +Vinod
 Hortonworks Inc.
 http://hortonworks.com/


 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.



-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk


Re: hdfs fsck -locations

2014-01-24 Thread Mark Kerzner
Can you send me your output?

hadoop version
Hadoop 2.0.0-cdh4.5.0
Subversion
git://ubuntu64-12-04-mk1/var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.5.0-Packaging-Hadoop-2013-11-20_14-31-53/hadoop-2.0.0+1518-1.cdh4.5.0.p0.24~precise/src/hadoop-common-project/hadoop-common
-r 8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on Wed Nov 20 15:10:35 PST 2013
From source with checksum 9848b0f85b461913ed63fa19c2b79ccc
This command was run using /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.5.0.jar



On Fri, Jan 24, 2014 at 3:14 PM, Nascimento, Rodrigo 
rodrigo.nascime...@netapp.com wrote:

  Mark,

  Did you see that your output is different from mine?

  which Is your hadoop versioks

 Rodrigo Nascimento
 Systems Engineer @ Brazil
 Mobile +55 11 991.873.810

  Sent from my iPhone

 On 24/01/2014, at 18:31, Mark Kerzner mark.kerz...@shmsoft.com wrote:

   HI, Rodrigo, I am fine thank you :)

  Here is the _complete_ output

  mark@mark-7:~$ hdfs fsck /user/mark/data/word_count.csv -locations
 Connecting to namenode via http://mark-7:50070
 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 14:30:23 CST 2014
 .Status: HEALTHY
  Total size: 7217 B
  Total dirs: 0
  Total files: 1
  Total blocks (validated): 1 (avg. block size 7217 B)
  Minimally replicated blocks: 1 (100.0 %)
  Over-replicated blocks: 0 (0.0 %)
  Under-replicated blocks: 0 (0.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor: 1
  Average block replication: 1.0
  Corrupt blocks: 0
  Missing replicas: 0 (0.0 %)
  Number of data-nodes: 1
  Number of racks: 1
 FSCK ended at Fri Jan 24 14:30:23 CST 2014 in 0 milliseconds


  The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY
 mark@mark-7:~$


 On Fri, Jan 24, 2014 at 1:35 PM, Nascimento, Rodrigo 
 rodrigo.nascime...@netapp.com wrote:

  Hi Mark,

  How are you?

  at your output from word_count.csv, the portion related to the nlock
 locations is missing. This is the reason why I told you in my previous
 message that I'm not seeing locations yet.

  please, take a look at my last e-mail.

 All the best,

 Rodrigo Nascimento
 Systems Engineer @ Brazil

  Sent from my iPhone

 On 24/01/2014, at 16:40, Mark Kerzner mark.kerz...@shmsoft.com wrote:


 hdfs fsck /user/mark/data/word_count.csv *-locations*


 On Fri, Jan 24, 2014 at 11:34 AM, Nascimento, Rodrigo 
 rodrigo.nascime...@netapp.com wrote:

  I’m not seeing locations flag yet.



 *Rod Nascimento*

 *Systems Engineer @ Brazil*



 *People **don’t** buy **WHAT** you do. They buy **WHY** you do it.*



 *From:* Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
 *Sent:* Friday, January 24, 2014 3:16 PM
 *To:* Hadoop User
 *Subject:* Re: hdfs fsck -locations



 Sorry, did not copy the full command



 hdfs fsck /user/mark/data/word_count.csv -locations

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014

 .Status: HEALTHY

  Total size:   7217 B

  Total dirs:   0

  Total files:  1

  Total blocks (validated):1 (avg. block size 7217 B)

  Minimally replicated blocks:  1 (100.0 %)

  Over-replicated blocks:  0 (0.0 %)

  Under-replicated blocks:0 (0.0 %)

  Mis-replicated blocks:0 (0.0 %)

  Default replication factor:  1

  Average block replication: 1.0

  Corrupt blocks:  0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes:  1

  Number of racks:   1

 FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds





 The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY





 On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.com wrote:

 Sorry, but what was the question? I also do not see a locations option
 flag.

 On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:

 Here is an example



  hdfs fsck /user/mark/data/word_count.csv

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014

 .Status: HEALTHY

  Total size: 7217 B

  Total dirs: 0

  Total files: 1

  Total blocks (validated): 1 (avg. block size 7217 B)

  Minimally replicated blocks: 1 (100.0 %)

  Over-replicated blocks: 0 (0.0 %)

  Under-replicated blocks: 0 (0.0 %)

  Mis-replicated blocks: 0 (0.0 %)

  Default replication factor: 1

  Average block replication: 1.0

  Corrupt blocks: 0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes: 1

  Number of racks: 1

 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds





 On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.


 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  

Re: hdfs fsck -locations

2014-01-24 Thread Mark Kerzner
Yes, Rodrigo,

that's what I was looking for. So in my install I somehow don't have it at
all. Was asked by my students, so I got the answer.

Mark


On Fri, Jan 24, 2014 at 4:00 PM, Nascimento, Rodrigo 
rodrigo.nascime...@netapp.com wrote:

  Mark,

  there we go ;-)

 Rodrigo Nascimento
 Systems Engineer @ Brazil
 Mobile +55 11 991.873.810

  Sent from my iPhone

 Begin forwarded message:

  *From:* Nascimento, Rodrigo rodrigo.nascime...@netapp.com
 *Date:* 24 de janeiro de 2014 15:59:33 BRST
 *To:* user@hadoop.apache.org user@hadoop.apache.org
 *Subject:* *RE: hdfs fsck -locations*

   Hi Mark,

 It is a sample from my sandbox. Your question is about the part that is in
 RED at the output below, right?

 [root@sandbox ~]# hdfs fsck /user/ambari-qa/passwd  -locations
 Connecting to namenode via http://sandbox.hortonworks.com:50070
 FSCK started by root (auth:SIMPLE) from /172.16.13.30 for path
 /user/ambari-qa/passwd at Fri Jan 24 09:53:43 PST 2014
 .
 */user/ambari-qa/passwd:  Under replicated
 BP-1578958328-10.0.2.15-1382306880516:blk_1073742464_1640. Target Replicas
 is 3 but found 1 replica(s).*
 Status: HEALTHY
  Total size:1708 B
  Total dirs:0
  Total files:1
  Total symlinks:0
  Total blocks (validated):1 (avg. block size 1708 B)
  Minimally replicated blocks:1 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:1 (100.0 %)
  Mis-replicated blocks:0 (0.0 %)
  Default replication factor:3
  Average block replication:1.0
  Corrupt blocks:0
  Missing replicas:2 (66.64 %)
  Number of data-nodes:1
  Number of racks:1
 FSCK ended at Fri Jan 24 09:53:43 PST 2014 in 1 milliseconds


 The filesystem under path '/user/ambari-qa/passwd' is HEALTHY
 [root@sandbox ~]#

 Rod Nascimento

  --
 *From:* Nascimento, Rodrigo [rodrigo.nascime...@netapp.com]
 *Sent:* Friday, January 24, 2014 3:34 PM
 *To:* user@hadoop.apache.org
 *Subject:* RE: hdfs fsck -locations

   I’m not seeing locations flag yet.



 *Rod Nascimento*

 *Systems Engineer @ Brazil*



 *People **don’t** buy **WHAT** you do. They buy **WHY** you do it.*



 *From:* Mark Kerzner 
 [mailto:mark.kerz...@shmsoft.commark.kerz...@shmsoft.com]

 *Sent:* Friday, January 24, 2014 3:16 PM
 *To:* Hadoop User
 *Subject:* Re: hdfs fsck -locations



 Sorry, did not copy the full command



 hdfs fsck /user/mark/data/word_count.csv -locations

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014

 .Status: HEALTHY

  Total size:   7217 B

  Total dirs:   0

  Total files:  1

  Total blocks (validated):1 (avg. block size 7217 B)

  Minimally replicated blocks:  1 (100.0 %)

  Over-replicated blocks:  0 (0.0 %)

  Under-replicated blocks:0 (0.0 %)

  Mis-replicated blocks:0 (0.0 %)

  Default replication factor:  1

  Average block replication: 1.0

  Corrupt blocks:  0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes:  1

  Number of racks:   1

 FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds





 The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY





 On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.com wrote:

 Sorry, but what was the question? I also do not see a locations option
 flag.

 On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote:

 Here is an example



  hdfs fsck /user/mark/data/word_count.csv

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014

 .Status: HEALTHY

  Total size: 7217 B

  Total dirs: 0

  Total files: 1

  Total blocks (validated): 1 (avg. block size 7217 B)

  Minimally replicated blocks: 1 (100.0 %)

  Over-replicated blocks: 0 (0.0 %)

  Under-replicated blocks: 0 (0.0 %)

  Mis-replicated blocks: 0 (0.0 %)

  Default replication factor: 1

  Average block replication: 1.0

  Corrupt blocks: 0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes: 1

  Number of racks: 1

 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds





 On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.


 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  Hi,
 
  hdfs fsck -locations
 
  is supposed to show every block with its location? Is location the ip of
 the
  datanode?
 
  Thank you,
  Mark


   --
 Harsh J








Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Vinod Kumar Vavilapalli
Okay. Assuming you don't need a whole file (video) in memory for your 
processing, you can simply write a Inputformat/RecordReader implementation that 
streams through any given file and processes it.

+Vinod

On Jan 24, 2014, at 12:44 PM, Adam Retter adam.ret...@googlemail.com wrote:

 Is your data in any given file a bunch of key-value pairs?
 
 No. The content of each file itself is the value we are interested in,
 and I guess that it's filename is the key.
 
 If that isn't the
 case, I'm wondering how writing a single large key-value into a sequence
 file helps. It won't. May be you can give an example of your input data?
 
 Well from the Hadoop O'Reilly book, I rather got the impression that
 HDFS does not like small files due to it's 64MB block size, and it is
 instead recommended to place small files into a Sequence file. Is that
 not the case?
 
 Our input data really varies between 130 different file types, it
 could be Microsoft Office documents, Video Recordings, Audio, CAD
 diagrams etc.
 
 If indeed they are a bunch of smaller sized key-value pairs, you can write
 your own custom InputFormat that reads the data from your input files one
 k-v pair after another, and feed it to your MR job. There isn't any need for
 converting them to sequence-files at that point.
 
 As I mentioned in my initial email, each file cannot be split up!
 
 Thanks
 +Vinod
 Hortonworks Inc.
 http://hortonworks.com/
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.
 
 
 
 -- 
 Adam Retter
 
 skype: adam.retter
 tweet: adamretter
 http://www.adamretter.org.uk


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Adam Retter
So I am not sure I follow you, as we already have a custom InputFormat
and RecordReader and that does not seem to help.

The reason it does not seem to help is that it needs to return the
data as a Writable so that the Writable can then be used in the
following map operation. The map operation needs access to the entire
file.

The only way to do this in Hadoop by default is to use BytesWritable,
but that places everything in memory.

What am I missing?

On 24 January 2014 22:42, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
 Okay. Assuming you don't need a whole file (video) in memory for your 
 processing, you can simply write a Inputformat/RecordReader implementation 
 that streams through any given file and processes it.

 +Vinod

 On Jan 24, 2014, at 12:44 PM, Adam Retter adam.ret...@googlemail.com wrote:

 Is your data in any given file a bunch of key-value pairs?

 No. The content of each file itself is the value we are interested in,
 and I guess that it's filename is the key.

 If that isn't the
 case, I'm wondering how writing a single large key-value into a sequence
 file helps. It won't. May be you can give an example of your input data?

 Well from the Hadoop O'Reilly book, I rather got the impression that
 HDFS does not like small files due to it's 64MB block size, and it is
 instead recommended to place small files into a Sequence file. Is that
 not the case?

 Our input data really varies between 130 different file types, it
 could be Microsoft Office documents, Video Recordings, Audio, CAD
 diagrams etc.

 If indeed they are a bunch of smaller sized key-value pairs, you can write
 your own custom InputFormat that reads the data from your input files one
 k-v pair after another, and feed it to your MR job. There isn't any need for
 converting them to sequence-files at that point.

 As I mentioned in my initial email, each file cannot be split up!

 Thanks
 +Vinod
 Hortonworks Inc.
 http://hortonworks.com/


 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.



 --
 Adam Retter

 skype: adam.retter
 tweet: adamretter
 http://www.adamretter.org.uk


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk


HIVE versus SQL DB

2014-01-24 Thread Felipe Gutierrez
Hi,

I am in a project that has three databases with flat files. Our plan is to
normalize these DB in one. We will need to follow the Data warehouse
concept (ETL - Extraction, Transform, Load).

We are thinking to use Hadoop at the Transform step, because we need to
relate datas from the three databases. Do you think this is a good option?
Is there any tutorial/article about it?

We are also thinking to use HIVE to Extract the files, insert it on Hadoop
and use HIVE to query these datas. At this step we are going to eliminate
blank spaces, duplicate datas, transform a name register to an ID.

What are yours experience about this?

Thanks a lot for any contribution!

Felipe

-- 



* Felipe Oliveira Gutierrez-- felipe.o.gutier...@gmail.com
felipe.o.gutier...@gmail.com--
https://sites.google.com/site/lipe82/Home/diaadia
https://sites.google.com/site/lipe82/Home/diaadia*


Re: hdfs fsck -locations

2014-01-24 Thread Harsh J
The right syntax is to use -files -blocks -locations, so it drills
down all the way. You are not missing a feature - this has existed
for as long as I've known HDFS.

In Rodrigo's output, he's seeing a BlockPool ID, which is not
equivalent to a location, but just carries an IP in it for
identification purposes.

On Sat, Jan 25, 2014 at 3:53 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote:
 Yes, Rodrigo,

 that's what I was looking for. So in my install I somehow don't have it at
 all. Was asked by my students, so I got the answer.

 Mark


 On Fri, Jan 24, 2014 at 4:00 PM, Nascimento, Rodrigo
 rodrigo.nascime...@netapp.com wrote:

 Mark,

 there we go ;-)

 Rodrigo Nascimento
 Systems Engineer @ Brazil
 Mobile +55 11 991.873.810

 Sent from my iPhone

 Begin forwarded message:

 From: Nascimento, Rodrigo rodrigo.nascime...@netapp.com
 Date: 24 de janeiro de 2014 15:59:33 BRST
 To: user@hadoop.apache.org user@hadoop.apache.org
 Subject: RE: hdfs fsck -locations

 Hi Mark,

 It is a sample from my sandbox. Your question is about the part that is in
 RED at the output below, right?

 [root@sandbox ~]# hdfs fsck /user/ambari-qa/passwd  -locations
 Connecting to namenode via http://sandbox.hortonworks.com:50070
 FSCK started by root (auth:SIMPLE) from /172.16.13.30 for path
 /user/ambari-qa/passwd at Fri Jan 24 09:53:43 PST 2014
 .
 /user/ambari-qa/passwd:  Under replicated
 BP-1578958328-10.0.2.15-1382306880516:blk_1073742464_1640. Target Replicas
 is 3 but found 1 replica(s).
 Status: HEALTHY
  Total size:1708 B
  Total dirs:0
  Total files:1
  Total symlinks:0
  Total blocks (validated):1 (avg. block size 1708 B)
  Minimally replicated blocks:1 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:1 (100.0 %)
  Mis-replicated blocks:0 (0.0 %)
  Default replication factor:3
  Average block replication:1.0
  Corrupt blocks:0
  Missing replicas:2 (66.64 %)
  Number of data-nodes:1
  Number of racks:1
 FSCK ended at Fri Jan 24 09:53:43 PST 2014 in 1 milliseconds


 The filesystem under path '/user/ambari-qa/passwd' is HEALTHY
 [root@sandbox ~]#

 Rod Nascimento

 
 From: Nascimento, Rodrigo [rodrigo.nascime...@netapp.com]
 Sent: Friday, January 24, 2014 3:34 PM
 To: user@hadoop.apache.org
 Subject: RE: hdfs fsck -locations

 I’m not seeing locations flag yet.



 Rod Nascimento

 Systems Engineer @ Brazil



 People don’t buy WHAT you do. They buy WHY you do it.



 From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
 Sent: Friday, January 24, 2014 3:16 PM
 To: Hadoop User
 Subject: Re: hdfs fsck -locations



 Sorry, did not copy the full command



 hdfs fsck /user/mark/data/word_count.csv -locations

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014

 .Status: HEALTHY

  Total size:   7217 B

  Total dirs:   0

  Total files:  1

  Total blocks (validated):1 (avg. block size 7217 B)

  Minimally replicated blocks:  1 (100.0 %)

  Over-replicated blocks:  0 (0.0 %)

  Under-replicated blocks:0 (0.0 %)

  Mis-replicated blocks:0 (0.0 %)

  Default replication factor:  1

  Average block replication: 1.0

  Corrupt blocks:  0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes:  1

  Number of racks:   1

 FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds





 The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY





 On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.com wrote:

 Sorry, but what was the question? I also do not see a locations option
 flag.

 On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote:

 Here is an example



  hdfs fsck /user/mark/data/word_count.csv

 Connecting to namenode via http://mark-7:50070

 FSCK started by mark (auth:SIMPLE) from /192.168.1.232 for path
 /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014

 .Status: HEALTHY

  Total size: 7217 B

  Total dirs: 0

  Total files: 1

  Total blocks (validated): 1 (avg. block size 7217 B)

  Minimally replicated blocks: 1 (100.0 %)

  Over-replicated blocks: 0 (0.0 %)

  Under-replicated blocks: 0 (0.0 %)

  Mis-replicated blocks: 0 (0.0 %)

  Default replication factor: 1

  Average block replication: 1.0

  Corrupt blocks: 0

  Missing replicas: 0 (0.0 %)

  Number of data-nodes: 1

  Number of racks: 1

 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds





 On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.com wrote:

 Hi Mark,

 Yes, the locations are shown as IP.


 On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.com
 wrote:
  Hi,
 
  hdfs fsck -locations
 
  is supposed to show every block with its location? Is 

Spoofing Ganglia Metrics

2014-01-24 Thread Calvin Jia
Is there a way to configure hdfs/hbase/mapreduce to spoof the ganglia
metrics being sent? This is because the machines are behind a NAT and the
monitoring box is outside, so all the metrics are recognized as coming from
the same machine.

Thanks!


Re: Fw: Hadoop 2 Namenode HA not working properly

2014-01-24 Thread Juan Carlos
Hi Bruno,
ha.zookeeper.quorum is a property in core-site and you have it in
hdfs-site, maybe it's your problem.


2014/1/24 Bruno Andrade b...@eurotux.com



 Begin forwarded message:

 Date: Tue, 21 Jan 2014 09:35:23 +
 From: Bruno Andrade b...@eurotux.com
 To: user@hadoop.apache.org
 Subject: Re: Hadoop 2 Namenode HA not working properly


 Hey,

 this is my hdfs-site.xml - http://pastebin.com/qpELkwH8
 this is my core-site.xml:

 configuration
  property
  namefs.defaultFS/name
  valuehdfs://blabla-hadoop/value
  /property

  property
  namehadoop.tmp.dir/name
  value/opt/hadoop/hadoop/tmp/value
  /property
 /configuration

 I kill only the namenode process and nothing happen, then I killed zkfc
 process and the transition happend, the second namenode become active.

 Thanks.



 On 01/20/2014 06:44 PM, Jing Zhao wrote:
  Hi Bruno,
 
   Could you post your configuration? Also, when you killed one of
  the NN, you mean only killing the NN process or you shutdown the whole
  machine?
 
  Thanks,
  -Jing
 
  On Mon, Jan 20, 2014 at 4:11 AM, Bruno Andrade b...@eurotux.com
  wrote:
  Hey,
 
  I have configured a Hadoop v2.2.0 cluster with QJM and Zookeeper for
  HA and automatic failover.
  But I'm having a problem. If I test the automatic failover, by
  killing one of the namenodes, nothing happens. But if I kill the
  zkfc of that namenode, then zookeeper elects the other namenode as
  active.
 
  What can it be the problem?
 
  Thanks.
 
  --
  Bruno Andrade b...@eurotux.com
  Programador (ID)
  Eurotux Informática, S.A. | www.eurotux.com
  (t) +351 253 680 300 (m) +351 936 293 858

 --
 Bruno Andrade b...@eurotux.com
 Programador (ID)
 Eurotux Informática, S.A. | www.eurotux.com
 (t) +351 253 680 300 (m) +351 936 293 858



 --
 Bruno Andrade b...@eurotux.com
 Programador (ID)
 Eurotux Informática, S.A. | www.eurotux.com
 (t) +351 253 680 300 (m) +351 936 293 858



Re: Datanode Shutting down automatically

2014-01-24 Thread Harsh J
You reformatted your NameNode at some point, but likely failed to also
clear out the DN data directories, which would not auto-wipe
themselves.

Clear the contents of /app/hadoop/tmp/dfs/data at the DN and it should
start up the next time you invoke it.

P.s. Please do not email gene...@hadoop.apache.org with any user
questions. It exists for project level discussions and announcements.

On Sat, Jan 25, 2014 at 11:15 AM, Pranav Gadekar ppgadekar...@gmail.com wrote:
 This is my log file.

 014-01-24 17:24:58,238 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
 STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = user/127.0.1.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
 STARTUP_MSG:   java = 1.6.0_27
 /
 2014-01-24 17:24:58,622 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
 loaded properties from hadoop-metrics2.properties
 2014-01-24 17:24:58,669 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 2014-01-24 17:24:58,670 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period
 at 10 second(s).
 2014-01-24 17:24:58,670 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
 started
 2014-01-24 17:24:58,877 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 2014-01-24 17:24:58,880 WARN
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
 exists!
 2014-01-24 17:25:10,778 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
 Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
 = 102782159; datanode namespaceID = 1227483104
 at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
 at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:321)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)

 2014-01-24 17:25:10,779 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at user/127.0.1.1
 /
 2014-01-24 17:26:13,413 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = user/127.0.1.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
 STARTUP_MSG:   java = 1.6.0_27
 /
 2014-01-24 17:26:13,510 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
 loaded properties from hadoop-metrics2.properties
 2014-01-24 17:26:13,518 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 2014-01-24 17:26:13,518 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period
 at 10 second(s).
 2014-01-24 17:26:13,518 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
 started
 2014-01-24 17:26:13,626 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 2014-01-24 17:26:13,628 WARN
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
 exists!
 2014-01-24 17:26:28,860 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
 Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
 = 102782159; datanode namespaceID = 1227483104
 at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
 at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:321)
 at
 

Re: Datanode Shutting down automatically

2014-01-24 Thread Shekhar Sharma
Incompatible name space Id error.its because dat u might have formatted the
namenode but data nodes folder is still have the same I'd.

What is the value of the following property
dfs. Data.dir
dfs. name.dir
hadoop.tmp.dir

The value of these properties is directory on local file system

Solution is to open the version file of the name node (under
dfs/name/current folder) you will see the first line as name space Id, copy
that and then open version file of datanode which is not coming up
(dfs/data/ current folder) you copy name space Id. Now start the data node
process

A dirty hack is to delete the folders(directories specified by the above
properties) from all the machines and den format ur name node and start ur
process. Please note dat the previous data will be lost
On 25 Jan 2014 11:55, Pranav Gadekar ppgadekar...@gmail.com wrote:

 This is my log file.

 014-01-24 17:24:58,238 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = user/127.0.1.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
 STARTUP_MSG:   java = 1.6.0_27
 /
 2014-01-24 17:24:58,622 INFO
 org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
 hadoop-metrics2.properties
 2014-01-24 17:24:58,669 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 2014-01-24 17:24:58,670 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
 period at 10 second(s).
 2014-01-24 17:24:58,670 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
 started
 2014-01-24 17:24:58,877 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 2014-01-24 17:24:58,880 WARN
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
 exists!
 2014-01-24 17:25:10,778 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
 Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
 = 102782159; datanode namespaceID = 1227483104
 at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
 at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:321)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)

 2014-01-24 17:25:10,779 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at user/127.0.1.1
 /
 2014-01-24 17:26:13,413 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = user/127.0.1.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
 STARTUP_MSG:   java = 1.6.0_27
 /
 2014-01-24 17:26:13,510 INFO
 org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
 hadoop-metrics2.properties
 2014-01-24 17:26:13,518 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 2014-01-24 17:26:13,518 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
 period at 10 second(s).
 2014-01-24 17:26:13,518 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
 started
 2014-01-24 17:26:13,626 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 2014-01-24 17:26:13,628 WARN
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
 exists!
 2014-01-24 17:26:28,860 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
 Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
 = 102782159; datanode namespaceID = 1227483104
 at