from:"Azuryy Yu"

Hi Dear,

I have a small test cluster with hadoop-2.0x, and HA configuraded, but I
want to upgrade to hadoop-2.2.

I dont't want to stop cluster during upgrade, so my steps are:

1)  on standby NN: hadoop-dameon.sh stop namenode
2)  remove HA configuration in the conf
3)   hadoop-daemon.sh start namenode -upgrade -clusterID test-cluster

but Exception in the NN log, so how to upgrade and don't stop the whole
cluster.
Thanks.


org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /hdfs/name is in an inconsistent state: previous fs state should
not exist during upgrade. Finalize or rollback first.
at
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:323)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:248)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:858)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:620)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:445)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:494)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:692)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:677)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1345)

Re: Missing records from HDFS

what's your hadoop version? and which InputFormat are you used?

these files under one directory or there are lots of subdirectory? how ddi
you configure input path in your main?



On Thu, Nov 21, 2013 at 12:25 AM, ZORAIDA HIDALGO SANCHEZ zora...@tid.eswrote:

  Hi all,

  my job is not reading all the input records. In the input directory I
 have a set of files containing a total of 600 records but only 5997000
 are processed. The Map Input Records counter says 5997000.
 I have tried downloading the files with a getmerge to check how many
 records would return but the correct number is returned(600).

  Do you have any suggestion?

  Thanks.

 --

 Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
 nuestra política de envío y recepción de correo electrónico en el enlace
 situado más abajo.
 This message is intended exclusively for its addressee. We only send and
 receive email on the basis of the terms set out at:
 http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Re: help with hadoop 1.2.1 aggregate framework (non-streaming)

Hi Nicholas,

This is not hadoop releated.

edu.harvard.seas.scifs.ScifsStandard, which is your cutomized class, so:
you need to include this class in your ScifsStandard.jar


On Thu, Nov 21, 2013 at 4:15 AM, Nicholas Murphy halcyo...@gmail.comwrote:

 I'm trying to use the aggregate framework but in a non-streaming fashion.
  I detailed what I'm doing pretty well here:


 http://stackoverflow.com/questions/20085532/trying-to-make-hadoop-1-2-1-aggregate-framework-work-non-streaming

 I have a feeling there's a simple solution (involving setting the
 appropriate classpath somewhere) but I don't know what it is.  Any help
 appreciated.  There's remarkably little information available on
 non-streaming use of aggregates that I can find.

 Thanks,
 Nick

Re: help with hadoop 1.2.1 aggregate framework (non-streaming)

Where is your ScifsStandard.jar? does that locate in the same path with
your command line?

if not, plaese specify the full path in your command line, such as :

hadoop jar ../test/test.jar X


On Thu, Nov 21, 2013 at 2:42 PM, Nicholas Murphy halcyo...@gmail.comwrote:

 Thanks, but as per the StackOverflow site, it is there (see the second
 code block where I list the contents of ScifsStandard.jar).

 Nick

 On Nov 21, 2013, at 1:37 AM, Azuryy Yu azury...@gmail.com wrote:

  Hi Nicholas,
 
  This is not hadoop releated.
 
  edu.harvard.seas.scifs.ScifsStandard, which is your cutomized class, so:
  you need to include this class in your ScifsStandard.jar
 
 
  On Thu, Nov 21, 2013 at 4:15 AM, Nicholas Murphy halcyo...@gmail.com
 wrote:
  I'm trying to use the aggregate framework but in a non-streaming
 fashion.  I detailed what I'm doing pretty well here:
 
 
 http://stackoverflow.com/questions/20085532/trying-to-make-hadoop-1-2-1-aggregate-framework-work-non-streaming
 
  I have a feeling there's a simple solution (involving setting the
 appropriate classpath somewhere) but I don't know what it is.  Any help
 appreciated.  There's remarkably little information available on
 non-streaming use of aggregates that I can find.
 
  Thanks,
  Nick

hadoop maven repo

2013-11-18 Thread Azuryy Yu

hi,

please recommend a good maven repo to compile hadoop source code.

It complain cannot find jdbm:bundle:2.0.0:m15 during compile trunk.

thanks.

Re: hadoop maven repo

2013-11-18 Thread Azuryy Yu

Ted,

I am on Linux.
 On 2013-11-19 1:30 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which platform did you perform the build on ?

 I was able to build trunk on Mac.

 I found the following dependency in dependency tree output:

 [INFO] +-
 org.apache.directory.server:apacheds-jdbm-partition:jar:2.0.0-M15:compile
 [INFO] |  \-
 org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2:compile


 On Mon, Nov 18, 2013 at 3:27 AM, Azuryy Yu azury...@gmail.com wrote:

 hi,

 please recommend a good maven repo to compile hadoop source code.

 It complain cannot find jdbm:bundle:2.0.0:m15 during compile trunk.

 thanks.

Re: hadoop maven repo

2013-11-18 Thread Azuryy Yu

Thanks Ted.

I missed new pom.xml, fixed it now.
 On 2013-11-19 3:09 AM, Ted Yu yuzhih...@gmail.com wrote:

 Compilation on Linux passed for me:

 [hortonzy@kiyo hadoop]$ uname -a
 Linux core.net 2.6.32-220.23.1.el6.20120713.x86_64 #1 SMP Fri Jul 13
 11:40:51 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux
 [hortonzy@kiyo hadoop]$ mvn -version
 Apache Maven 3.0.3 (r1075438; 2011-02-28 17:31:09+)

 Cheers


 On Mon, Nov 18, 2013 at 10:51 AM, Azuryy Yu azury...@gmail.com wrote:

 Ted,

 I am on Linux.
  On 2013-11-19 1:30 AM, Ted Yu yuzhih...@gmail.com wrote:

 Which platform did you perform the build on ?

 I was able to build trunk on Mac.

 I found the following dependency in dependency tree output:

 [INFO] +-
 org.apache.directory.server:apacheds-jdbm-partition:jar:2.0.0-M15:compile
 [INFO] |  \-
 org.apache.directory.jdbm:apacheds-jdbm1:bundle:2.0.0-M2:compile


 On Mon, Nov 18, 2013 at 3:27 AM, Azuryy Yu azury...@gmail.com wrote:

 hi,

 please recommend a good maven repo to compile hadoop source code.

 It complain cannot find jdbm:bundle:2.0.0:m15 during compile trunk.

 thanks.

RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory

2013-08-27 Thread Azuryy Yu

dfs.ha.namenodes.mycluster
nn.domain,snn.domain

it should be:
dfs.ha.namenodes.mycluster
nn1,nn2
On Aug 27, 2013 11:22 PM, Smith, Joshua D. joshua.sm...@gd-ais.com
wrote:

 Harsh-

 Here are all of the other values that I have configured.

 hdfs-site.xml
 -

 dfs.webhdfs.enabled
 true

 dfs.client.failover.proxy.provider.mycluster
 org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

 dfs.ha.automatic-falover.enabled
 true

 ha.zookeeper.quorum
 nn.domain:2181,snn.domain:2181,jt.domain:2181

 dfs.journalnode.edits.dir
 /opt/hdfs/data1/dfs/jn

 dfs.namenode.shared.edits.dir
 qjournal://nn.domain:8485;snn.domain:8485;jt.domain:8485/mycluster

 dfs.nameservices
 mycluster

 dfs.ha.namenodes.mycluster
 nn.domain,snn.domain

 dfs.namenode.rpc-address.mycluster.nn1
 nn.domain:8020

 dfs.namenode.rpc-address.mycluster.nn2
 snn.domain:8020

 dfs.namenode.http-address.mycluster.nn1
 nn.domain:50070

 dfs.namenode.http-address.mycluster.nn2
 snn.domain:50070

 dfs.name.dir
 /var/lib/hadoop-hdfs/cache/hdfs/dfs/name


 core-site.xml
 
 fs.trash.interval
 1440

 fs.trash.checkpoint.interval
 1440

 fs.defaultFS
 hdfs://mycluster

 dfs.datanode.data.dir

 /hdfs/data1,/hdfs/data2,/hdfs/data3,/hdfs/data4,/hdfs/data5,/hdfs/data6,/hdfs/data7


 mapred-site.xml
 --
 mapreduce.framework.name
 yarn

 mapreduce.jobhistory.address
 jt.domain:10020

 mapreduce.jobhistory.webapp.address
 jt.domain:19888


 yarn-site.xml
 ---
 yarn.nodemanager.aux-service
 mapreduce.shuffle

 yarn.nodemanager.aux-services.mapreduce.shuffle.class
 org.apache.hadoop.mapred.ShuffleHandler

 yarn.log-aggregation-enable
 true

 yarn.nodemanager.remote-app-log-dir
 /var/log/hadoop-yarn/apps

 yarn.application.classpath
 $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib
 /*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*

 yarn.resourcemanager.resource-tracker.address
 jt.domain:8031

 yarn.resourcemanager.address
 jt.domain:8032

 yarn.resourcemanager.scheduler.address
 jt.domain:8030

 yarn.resourcemanager.admin.address
 jt.domain:8033

 yarn.reesourcemanager.webapp.address
 jt.domain:8088


 These are the only interesting entries in my HDFS log file when I try to
 start the NameNode with service hadoop-hdfs-namenode start.

 WARN org.apache.hadoop.hdfs.server.common.Util: Path
 /var/lib/hadoop-hdfs/cache/hdfs/dfs/name should be specified as a URI in
 configuration files. Please update hdfs configuration.
 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image
 storage directory (dfs.namenode.name.dir) configured. Beware of data loss
 due to lack of redundant storage directories!
 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false
 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Configured NNs:
 ((there's a blank line here implying no configured NameNodes!))
 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 Java.io.IOException: Invalid configuration: a shared edits dir must not be
 specified if HA is not enabled.
 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in
 namenode join
 Java.io.IOException: Invalid configuration: a shared edits dir must not be
 specified if HA is not enabled.

 I don't like the blank line for Configured NNs. Not sure why it's not
 finding them.

 If I try the command hdfs zkfc -formatZK I get the following:
 Exception in thread main
 org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for
 this namenode.

 -Original Message-
 From: Smith, Joshua D. [mailto:joshua.sm...@gd-ais.com]
 Sent: Tuesday, August 27, 2013 7:17 AM
 To: user@hadoop.apache.org
 Subject: RE: HDFS Startup Failure due to dfs.namenode.rpc-address and
 Shared Edits Directory

 Harsh-

 Yes, I intend to use HA. That's what I'm trying to configure right now.

 Unfortunately I cannot share my complete configuration files. They're on a
 disconnected network. Are there any configuration items that you'd like me
 to post my settings for?

 The deployment is CDH 4.3 on a brand new cluster. There are 3 master nodes
 (NameNode, StandbyNameNode, JobTracker/ResourceManager) and 7 slave nodes.
 Each of the master nodes is configured to be a Zookeeper node as well as a
 Journal node. The HA configuration that I'm striving toward is the
 automatic fail-over with Zookeeper.

 Does that help?
 Josh

 -Original Message-
 From: Harsh J [mailto:ha...@cloudera.com]
 Sent: Monday, August 26, 2013 6:11 PM
 To: user@hadoop.apache.org
 Subject: Re: HDFS Startup Failure due to dfs.namenode.rpc-address and
 Shared Edits Directory

 It is not quite from your post, so a Q: Do you intend to use HA or not?

 Can you share your complete core-site.xml and hdfs-site.xml along with a
 brief note on the deployment?

 On Tue, Aug 27, 2013 at 12:48 AM, Smith, Joshua D.

RE: HDFS Startup Failure due to dfs.namenode.rpc-address and Shared Edits Directory

2013-08-27 Thread Azuryy Yu

not yet.

please correct it.
 On Aug 27, 2013 11:39 PM, Smith, Joshua D. joshua.sm...@gd-ais.com
wrote:

  nn.domain is a place holder for the actual fully qualified hostname of
 my NameNode

 snn.domain is a place holder for the actual fully qualified hostname of my
 StandbyNameNode.

 ** **

 Of course both the NameNode and the StandbyNameNode are running exactly
 the same software with the same configuration since this is YARN. I’m not
 running and SecondaryName node.

 ** **

 The actual fully qualified hostnames are on another network and my
 customer is sensitive about privacy, so that’s why I didn’t post the actual
 values.

 ** **

 So, I think I have the equivalent of nn1,nn2 do I not?

 ** **

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* Tuesday, August 27, 2013 11:32 AM
 *To:* user@hadoop.apache.org
 *Subject:* RE: HDFS Startup Failure due to dfs.namenode.rpc-address and
 Shared Edits Directory

 ** **

 dfs.ha.namenodes.mycluster
 nn.domain,snn.domain

 it should be:
 dfs.ha.namenodes.mycluster
 nn1,nn2

 On Aug 27, 2013 11:22 PM, Smith, Joshua D. joshua.sm...@gd-ais.com
 wrote:

 Harsh-

 Here are all of the other values that I have configured.

 hdfs-site.xml
 -

 dfs.webhdfs.enabled
 true

 dfs.client.failover.proxy.provider.mycluster
 org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

 dfs.ha.automatic-falover.enabled
 true

 ha.zookeeper.quorum
 nn.domain:2181,snn.domain:2181,jt.domain:2181

 dfs.journalnode.edits.dir
 /opt/hdfs/data1/dfs/jn

 dfs.namenode.shared.edits.dir
 qjournal://nn.domain:8485;snn.domain:8485;jt.domain:8485/mycluster

 dfs.nameservices
 mycluster

 dfs.ha.namenodes.mycluster
 nn.domain,snn.domain

 dfs.namenode.rpc-address.mycluster.nn1
 nn.domain:8020

 dfs.namenode.rpc-address.mycluster.nn2
 snn.domain:8020

 dfs.namenode.http-address.mycluster.nn1
 nn.domain:50070

 dfs.namenode.http-address.mycluster.nn2
 snn.domain:50070

 dfs.name.dir
 /var/lib/hadoop-hdfs/cache/hdfs/dfs/name


 core-site.xml
 
 fs.trash.interval
 1440

 fs.trash.checkpoint.interval
 1440

 fs.defaultFS
 hdfs://mycluster

 dfs.datanode.data.dir

 /hdfs/data1,/hdfs/data2,/hdfs/data3,/hdfs/data4,/hdfs/data5,/hdfs/data6,/hdfs/data7


 mapred-site.xml
 --
 mapreduce.framework.name
 yarn

 mapreduce.jobhistory.address
 jt.domain:10020

 mapreduce.jobhistory.webapp.address
 jt.domain:19888


 yarn-site.xml
 ---
 yarn.nodemanager.aux-service
 mapreduce.shuffle

 yarn.nodemanager.aux-services.mapreduce.shuffle.class
 org.apache.hadoop.mapred.ShuffleHandler

 yarn.log-aggregation-enable
 true

 yarn.nodemanager.remote-app-log-dir
 /var/log/hadoop-yarn/apps

 yarn.application.classpath
 $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib
 /*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$YARN_HOME/*,$YARN_HOME/lib/*

 yarn.resourcemanager.resource-tracker.address
 jt.domain:8031

 yarn.resourcemanager.address
 jt.domain:8032

 yarn.resourcemanager.scheduler.address
 jt.domain:8030

 yarn.resourcemanager.admin.address
 jt.domain:8033

 yarn.reesourcemanager.webapp.address
 jt.domain:8088


 These are the only interesting entries in my HDFS log file when I try to
 start the NameNode with service hadoop-hdfs-namenode start.

 WARN org.apache.hadoop.hdfs.server.common.Util: Path
 /var/lib/hadoop-hdfs/cache/hdfs/dfs/name should be specified as a URI in
 configuration files. Please update hdfs configuration.
 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image
 storage directory (dfs.namenode.name.dir) configured. Beware of data loss
 due to lack of redundant storage directories!
 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: false
 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Configured NNs:
 ((there's a blank line here implying no configured NameNodes!))
 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
 initialization failed.
 Java.io.IOException: Invalid configuration: a shared edits dir must not be
 specified if HA is not enabled.
 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in
 namenode join
 Java.io.IOException: Invalid configuration: a shared edits dir must not be
 specified if HA is not enabled.

 I don't like the blank line for Configured NNs. Not sure why it's not
 finding them.

 If I try the command hdfs zkfc -formatZK I get the following:
 Exception in thread main
 org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for
 this namenode.

 -Original Message-
 From: Smith, Joshua D. [mailto:joshua.sm...@gd-ais.com]
 Sent: Tuesday, August 27, 2013 7:17 AM
 To: user@hadoop.apache.org
 Subject: RE: HDFS Startup Failure due to dfs.namenode.rpc-address and
 Shared Edits Directory

 Harsh-

 Yes, I intend to use HA. That's what I'm trying to configure right now.

 Unfortunately I cannot

Re: Why LineRecordWriter.write(..) is synchronized

because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote:

 Hi,

 LineRecordWriter.write(..) is synchronized. I did not find any other
 RecordWriter implementations define the write as synchronized.
 Any specific reason for this.

 regards,
 sathwik

Re: Why LineRecordWriter.write(..) is synchronized

its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote:

 Hi,
 Thanks for your reply.
 May I know where does hadoop fork multiple threads to use a single
 RecordWriter.

 regards,
 sathwik

 On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote:

 because we may use multi-threads to write a single file.
 On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote:

 Hi,

 LineRecordWriter.write(..) is synchronized. I did not find any other
 RecordWriter implementations define the write as synchronized.
 Any specific reason for this.

 regards,
 sathwik

Re: issue about hadoop hardware choose

if you want HA, then do you want to deploy journal node on the DN?
On Aug 8, 2013 5:09 PM, ch huang justlo...@gmail.com wrote:

 hi,all:
 My company need build a 10 node hadoop cluster (2 namenode and
 8 datanode  node manager ,for both data storage and data analysis ) ,we
 have hbase ,hive on the hadoop cluster, 10G data increment per day.
 we use CDH4.3 ( for dual - namenode HA),my plan is

name node   resource manager
dual Quad Core
  24G RAM
  2 * 500GB SATA DISK (JBOD)

  datanode  node manager
  dual Quad Core
  24G RAM
  2 * 1TGB SATA DISK (JBOD)


 my question is
 1, if resource manager need a dedicated server? ( i plan to put RM with
 one of NN)
 2, if the RAM is enough for RM + NN machine?
 3,RAID is need for NN machine?
 4,is it ok if i place JN on other node(DN or NN)
 5, how much zookeeper server node i need?
 6,i want to place yarn proxy server and mapreduce history server with
 another NN,is it ok?

Re: Why LineRecordWriter.write(..) is synchronized

sequence writer is also synchronized, I dont think this is bad.

if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, Jay Vyas jayunit...@gmail.com wrote:

 Then is this a bug?  Synchronization in absence of any race condition is
 normally considered bad.

 In any case id like to know why this writer is synchronized whereas the
 other one are not.. That is, I think, then point at issue: either other
 writers should be synchronized or else this one shouldn't be - consistency
 across the write implementations is probably desirable so that changes to
 output formats or record writers don't lead to bugs in multithreaded
 environments .

 On Aug 8, 2013, at 6:50 AM, Harsh J ha...@cloudera.com wrote:

 While we don't fork by default, we do provide a MultithreadedMapper
 implementation that would require such synchronization. But if you are
 asking is it necessary, then perhaps the answer is no.
 On Aug 8, 2013 3:43 PM, Azuryy Yu azury...@gmail.com wrote:

 its not hadoop forked threads, we may create a line record writer, then
 call this writer concurrently.
 On Aug 8, 2013 4:00 PM, Sathwik B P sathwik...@gmail.com wrote:

 Hi,
 Thanks for your reply.
 May I know where does hadoop fork multiple threads to use a single
 RecordWriter.

 regards,
 sathwik

 On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu azury...@gmail.com wrote:

 because we may use multi-threads to write a single file.
 On Aug 8, 2013 2:54 PM, Sathwik B P sath...@apache.org wrote:

 Hi,

 LineRecordWriter.write(..) is synchronized. I did not find any other
 RecordWriter implementations define the write as synchronized.
 Any specific reason for this.

 regards,
 sathwik

Re: Namenode is failing with expception to join

Manish,

you stop HDFS then start HDFS on the standby name node right?

please looked at https://issues.apache.org/jira/browse/HDFS-5058

there are two solutions:
1) start HDFS on the active name node, nor SBN
2) copy {namenode.name.dir}/* to the SBN

I advice #1.




On Wed, Aug 7, 2013 at 3:00 PM, Manish Bhoge manishbh...@rocketmail.comwrote:

 I have all configuration fine. But whenever i start namenode it fails with
 a below exception. No clue where to fix this?

 2013-08-07 02:56:22,754 FATAL
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join

 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
 Number of files = 1
 2013-08-07 02:56:22,751 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
 Number of files under construction = 0
 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
 Image file of size 115 loaded in 0 seconds.
 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
 Loaded image for txid 0 from 
 /data/1/dfs/nn/current/fsimage_000
 2013-08-07 02:56:22,752 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
 Reading 
 org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@5f18223d 
 expecting start txid #1
 2013-08-07 02:56:22,752 INFO 
 org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding 
 stream '/data/1/dfs/nn/current/edits_0515247-0515255' 
 to transaction ID 1
 2013-08-07 02:56:22,753 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics 
 system...
 2013-08-07 02:56:22,754 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
 stopped.
 2013-08-07 02:56:22,754 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
 shutdown complete.2013-08-07 02:56:22,754 FATAL 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
 java.io.IOException: There appears to be a gap in the edit log.  We expected 
 txid 1, but got txid 515247.

specify Mapred tasks and slots

Hi Dears,

Can I specify how many slots to use for reduce?

I know we can specify reduces tasks, but is there one task occupy one slot?

it it possible that one tak occupy more than one slot in Hadoop-1.1.2.

Thanks.

Re: specify Mapred tasks and slots

My question is can I specify how many slots to be used for each M/R task?


On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.comwrote:

 Slots are decided upon the configuration of machines, RAM etc...

 Regards,
 Som Shekhar Sharma
 +91-8197243810


 On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote:

 Hi Dears,

 Can I specify how many slots to use for reduce?

 I know we can specify reduces tasks, but is there one task occupy one
 slot?

 it it possible that one tak occupy more than one slot in Hadoop-1.1.2.

 Thanks.

Re: specify Mapred tasks and slots

Thanks Harsh and all friends response. That's helpful.



On Thu, Aug 8, 2013 at 11:55 AM, Harsh J ha...@cloudera.com wrote:

 What Devaraj said. Except that if you use CapacityScheduler, then you
 can bind together memory requests and slot concepts, and be able to have a
 task
 grab more than one slot for itself when needed. We've discussed this
 aspect previously at http://search-hadoop.com/m/gnFs91yIg1e

 On Thu, Aug 8, 2013 at 8:34 AM, Devaraj k devara...@huawei.com wrote:
  One task can use only one slot, It cannot use more than one slot. If the
  task is Map task then it will use one map slot and if the task is reduce
  task the it will use one reduce slot from the configured ones.
 
 
 
  Thanks
 
  Devaraj k
 
 
 
  From: Azuryy Yu [mailto:azury...@gmail.com]
  Sent: 08 August 2013 08:27
  To: user@hadoop.apache.org
  Subject: Re: specify Mapred tasks and slots
 
 
 
  My question is can I specify how many slots to be used for each M/R task?
 
 
 
  On Thu, Aug 8, 2013 at 10:29 AM, Shekhar Sharma shekhar2...@gmail.com
  wrote:
 
  Slots are decided upon the configuration of machines, RAM etc...
 
 
  Regards,
 
  Som Shekhar Sharma
 
  +91-8197243810
 
 
 
  On Thu, Aug 8, 2013 at 7:19 AM, Azuryy Yu azury...@gmail.com wrote:
 
  Hi Dears,
 
 
 
  Can I specify how many slots to use for reduce?
 
 
 
  I know we can specify reduces tasks, but is there one task occupy one
 slot?
 
 
 
  it it possible that one tak occupy more than one slot in Hadoop-1.1.2.
 
 
 
  Thanks.
 
 
 
 



 --
 Harsh J

Re: version 1.1.2 document error:

2013-07-24 Thread Azuryy Yu

All the differences are listed on the last URL you provided:
https://github.com/twitter/hadoop-lzo

Did you read it carefully?



On Thu, Jul 25, 2013 at 11:28 AM, 周梦想 abloz...@gmail.com wrote:


 Hello,
 In the page:http://hadoop.apache.org/docs/r1.1.2/deployment_layout.html


- LZO - LZ0 codec from github.com/omally/hadoop-gpl-compression

 should be
 http://hadoop.apache.org/docs/r1.1.2/deployment_layout.html
 https://github.com/omalley/hadoop-gpl-compression

 I think.

 there is another lzo :
 https://github.com/twitter/hadoop-lzo

 how is the difference between the tow?

 Thanks.

 Andy Zhou

Re: ./hdfs namenode -bootstrapStandby error

2013-07-19 Thread Azuryy Yu

hi,

can you using
'hdfs namenode -initializeSharedEdits' on the active NN, remember start all
journal nodes before try this.
 On Jul 19, 2013 5:17 PM, lei liu liulei...@gmail.com wrote:

 I  use hadoop-2.0.5 version and use QJM for HA.


 I use ./hdfs namenode -bootstrapStandby for StandbyNameNode, but report
 below error:

 =
 About to bootstrap Standby ID nn2 from:
Nameservice ID: mycluster
 Other Namenode ID: nn1
   Other NN's HTTP address: 10.232.98.77:20021
   Other NN's IPC  address: dw77.kgb.sqa.cm4/10.232.98.77:20020
  Namespace ID: 1499625118
 Block pool ID: BP-2012507965-10.232.98.77-1372993302021
Cluster ID: CID-921af0aa-b831-4828-965c-3b71a5149600
Layout version: -40
 =
 Re-format filesystem in Storage Directory
 /home/musa.ll/hadoop2/cluster-data/name ? (Y or N) Y
 13/07/19 17:04:28 INFO common.Storage: Storage directory
 /home/musa.ll/hadoop2/cluster-data/name has been successfully formatted.
 13/07/19 17:04:29 FATAL ha.BootstrapStandby: Unable to read transaction
 ids 16317-16337 from the configured shared edits storage qjournal://
 10.232.98.61:20022;10.232.98.62:20022;10.232.98.63:20022/mycluster.
 Please copy these logs into the shared edits storage or call saveNamespace
 on the active node.
 Error: Gap in transactions. Expected to be able to read up until at least
 txid 16337 but unable to find any edit logs containing txid 16331
 13/07/19 17:04:29 INFO util.ExitUtil: Exiting with status 6



 The edit logs are below content in JournalNode:
 -rw-r--r-- 1 musa.ll users  30 Jul 19 15:51
 edits_0016327-0016328
 -rw-r--r-- 1 musa.ll users  30 Jul 19 15:53
 edits_0016329-0016330
 -rw-r--r-- 1 musa.ll users 1048576 Jul 19 17:03
 edits_inprogress_0016331


 The edits_inprogress_0016331 should contains the 16331-16337
 transactions, why the ./hdfs namenode -bootstrapStandby command report
 error? How can I initialize the StandbyNameNode?

 Thanks,

 LiuLei

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-19 Thread Azuryy Yu

this is not a bug.

it has been documented.
 On Jul 19, 2013 10:13 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi Harsh,

   I have made my dfs.namenode.name.dir point to a subdirectory of my
 home, and I don't see this issue again. So, is this a bug that we need to
 log into JIRA?

 Thanks,
 Kishore


 On Tue, Jul 16, 2013 at 6:39 AM, Harsh J ha...@cloudera.com wrote:

  2013-07-12 11:04:26,002 WARN
 org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space
 available on volume 'null' is 0, which is below the configured reserved
 amount 104857600

 This is interesting. Its calling your volume null, which may be more
 of a superficial bug.

 What is your dfs.namenode.name.dir set to? From
 /tmp/hadoop-dsadm/dfs/name I'd expect you haven't set it up and /tmp
 is being used off of the out-of-box defaults. Could you try to set it
 to a specific directory thats not on /tmp?

 On Mon, Jul 15, 2013 at 2:43 PM, Krishna Kishore Bonagiri
 write2kish...@gmail.com wrote:
  I don't have it in my hdfs-site.xml, in which case probably the default
  value is taken..
 
 
  On Mon, Jul 15, 2013 at 2:29 PM, Azuryy Yu azury...@gmail.com wrote:
 
  please check dfs.datanode.du.reserved in the hdfs-site.xml
 
  On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com
 wrote:
 
  Hi Krishna,
 
 Can you please send screenshots of namenode web UI.
 
  Thanks Aditya.
 
 
  On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
 
  I have had enough space on the disk that is used, like around 30 Gigs
 
  Thanks,
  Kishore
 
 
  On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla
  venkatarami.ne...@cloudwick.com wrote:
 
  Hi,
  pls see the available space for NN storage directory.
 
  Thanks  Regards
 
  Venkat
 
 
  On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri
  write2kish...@gmail.com wrote:
 
  Hi,
 
   I am doing no activity on my single node cluster which is using
  2.1.0-beta, and still observed that it has gone to safe mode by
 itself after
  a while. I was looking at the name node log and see many of these
 kinds of
  entries.. Can anything be interpreted from these?
 
  2013-07-12 09:06:11,256 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  561
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:07:11,290 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 561
  2013-07-12 09:07:11,291 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 1 Number of transactions batched
 in Syncs:
  0 Number of syncs: 2 SyncTimes(ms): 14
  2013-07-12 09:07:11,292 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 1 Number of transactions batched
 in Syncs:
  0 Number of syncs: 3 SyncTimes(ms): 15
  2013-07-12 09:07:11,293 INFO
  org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
 Finalizing edits
  file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561
  -
 
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
  2013-07-12 09:07:11,294 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  563
  2013-07-12 09:08:11,397 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:08:11,398 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:08:11,398 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 563
  2013-07-12 09:08:11,399 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 2 Number of transactions batched
 in Syncs:
  0 Number of syncs: 2 SyncTimes(ms): 11
  2013-07-12 09:08:11,400 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of
 transactions: 2
  Total time for transactions(ms): 2 Number of transactions batched
 in Syncs:
  0 Number of syncs: 3 SyncTimes(ms): 12
  2013-07-12 09:08:11,402 INFO
  org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
 Finalizing edits
  file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563
  -
 
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
  2013-07-12 09:08:11,402 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log
 segment at
  565
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
 from
  9.70.137.114
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
  2013-07-12 09:09:11,440 INFO
  org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log
 segment 565
  2013-07-12

Re: Running a single cluster in multiple datacenters

2013-07-16 Thread Azuryy Yu

Hi Bertrand,
I guess you configured two racks totally. one IDC is a rack, and another IDC is 
another rack. 
so if you want to don't replicate populate during one IDC down, you had to 
change the replicate placement policy, 
if there are minimum blocks on one rack, then don't  do anything. (here minims 
blocks should be '2', which can guarantee you have two blocks at lease in one 
IDC)
so you had to configure replicator factor to '4' if you adopt my advice.



On Jul 16, 2013, at 6:37 AM, Bertrand Dechoux decho...@gmail.com wrote:

 According to your own analysis, you wouldn't be more available but that was 
 your aim.
 Did you consider having two separate clusters? One per datacenter, with an 
 automatic copy of the data?
 I understand that load balancing of work and data would not be easy but it 
 seems to me a simple strategy (that I have seen working).
 
 However, you are stating that the two datacenters are close and linked by a 
 big network connection.
 What is the impact on the latency and the bandwidth? (between two nodes in 
 the same datacenter versus two nodes in different datacenters)
 The main question is what happens when a job will use TaskTrackers from 
 datacenter A but DataNodes from datacenter B.
 It will happen. Simply consider Reducer tasks that don't have any strategy 
 about locality because it doesn't really make sense in a general context.
 
 Regards
 
 Bertrand
 
 
 On Mon, Jul 15, 2013 at 11:56 PM, j...@nanthrax.net wrote:
 Hi Niels,
 
 it's depend of the number of replicas and the Hadoop rack configuration 
 (level).
 It's possible to have replicas on the two datacenters.
 
 What's the rack configuration that you plan ? You can implement your own one 
 and define it using the topology.node.switch.mapping.impl property.
 
 Regards
 JB
 
 
 On 2013-07-15 23:49, Niels Basjes wrote:
 Hi,
 
 Last week we had a discussion at work regarding setting up our new
 Hadoop cluster(s).
 One of the things that has changed is that the importance of the
 Hadoop stack is growing so we want to be more available.
 
 One of the points we talked about was setting up the cluster in such a
 way that the nodes are physically located in two separate datacenters
 (on opposite sides of the same city) with a big network connection in
 between.
 
 Were currently talking about a cluster in the 50 nodes range, but that
 
 will grow over time.
 
 The advantages I see:
 - More CPU power available for jobs.
 - The data is automatically copied between the datacenters as long as
 we configure them to be different racks.
 
 
 The disadvantages I see:
 - If the network goes out then one half is dead and the other half
 will most likely go to safemode because the recovering of the missing
 replicas will fill up the disks fast.
 
 What things should we consider also?
 Has anyone any experience with such a setup?
 Is it a good idea to do this?
 
 What are better options for us to consider?
 
 Thanks for any input.
 
 
 
 
 -- 
 Bertrand Dechoux

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-15 Thread Azuryy Yu

hi,
from the log:
NameNode low on available disk space. Entering safe mode.

this is the root cause.
 On Jul 15, 2013 2:45 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi,

  I am doing no activity on my single node cluster which is using
 2.1.0-beta, and still observed that it has gone to safe mode by itself
 after a while. I was looking at the name node log and see many of these
 kinds of entries.. Can anything be interpreted from these?

 2013-07-12 09:06:11,256 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 561
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561
 2013-07-12 09:07:11,291 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 14
 2013-07-12 09:07:11,292 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 15
 2013-07-12 09:07:11,293 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
 2013-07-12 09:07:11,294 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 563
 2013-07-12 09:08:11,397 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563
 2013-07-12 09:08:11,399 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 11
 2013-07-12 09:08:11,400 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 13
 2013-07-12 09:09:11,441 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 13


 And after sometime it said:

 2013-07-12 11:03:19,799 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 795
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 12
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 11:04:19,829 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_795 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_795-796
 2013-07-12 11:04:19,829 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 797
 2013-07-12 11:04:26,002 WARN

Re: Namenode automatically going to safemode with 2.1.0-beta

2013-07-15 Thread Azuryy Yu

please check dfs.datanode.du.reserved in the hdfs-site.xml
On Jul 15, 2013 4:30 PM, Aditya exalter adityaexal...@gmail.com wrote:

 Hi Krishna,

Can you please send screenshots of namenode web UI.

 Thanks Aditya.


 On Mon, Jul 15, 2013 at 1:54 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 I have had enough space on the disk that is used, like around 30 Gigs

 Thanks,
 Kishore


 On Mon, Jul 15, 2013 at 1:30 PM, Venkatarami Netla 
 venkatarami.ne...@cloudwick.com wrote:

 Hi,
 pls see the available space for NN storage directory.

 Thanks  Regards

 Venkat


 On Mon, Jul 15, 2013 at 12:14 PM, Krishna Kishore Bonagiri 
 write2kish...@gmail.com wrote:

 Hi,

  I am doing no activity on my single node cluster which is using
 2.1.0-beta, and still observed that it has gone to safe mode by itself
 after a while. I was looking at the name node log and see many of these
 kinds of entries.. Can anything be interpreted from these?

 2013-07-12 09:06:11,256 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 561
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:07:11,290 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 561
 2013-07-12 09:07:11,291 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 14
 2013-07-12 09:07:11,292 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 1 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 15
 2013-07-12 09:07:11,293 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_561 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_561-562
 2013-07-12 09:07:11,294 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 563
 2013-07-12 09:08:11,397 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:08:11,398 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 563
 2013-07-12 09:08:11,399 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 11
 2013-07-12 09:08:11,400 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 2 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 12
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
 file
 /tmp/hadoop-dsadm/dfs/name/current/edits_inprogress_563 -
 /tmp/hadoop-dsadm/dfs/name/current/edits_563-564
 2013-07-12 09:08:11,402 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 565
 2013-07-12 09:09:11,440 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 13
 2013-07-12 09:09:11,441 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 3 SyncTimes(ms): 13


 And after sometime it said:

 2013-07-12 11:03:19,799 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
 795
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
 9.70.137.114
 2013-07-12 11:04:19,826 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 795
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in Syncs:
 0 Number of syncs: 2 SyncTimes(ms): 12
 2013-07-12 11:04:19,827 INFO
 org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
 Total time for transactions(ms): 0 Number of transactions batched in

Re: Use a URL for the HADOOP_CONF_DIR?

2013-07-15 Thread Azuryy Yu

yes. it is useful.
On Jul 16, 2013 5:40 AM, Niels Basjes ni...@basjes.nl wrote:

 Hi,

 When giving users access to an Hadoop cluster they need a few XML config
 files (like the hadoop-site.xml ).
 They put these somewhere on they PC and start running their jobs on the
 cluster.

 Now when you're changing the settings you want those users to use (for
 example you changed some tcpport) you need them all to update their config
 files.

 My question is: Can you set the HADOOP_CONF_DIR to be a URL on a
 webserver?
 A while ago I tried this and (back then) it didn't work.

 Would this be a useful enhancement?

 --
 Best regards,

 Niels Basjes

Re: Hadoop property precedence

2013-07-13 Thread Azuryy Yu

the conf that client running on will take effect.
On Jul 13, 2013 4:42 PM, Kiran Dangeti kirandkumar2...@gmail.com wrote:

 Shalish,

 The default block size is 64MB which is good at the client end. Make sure
 the same at your end also in conf. You can increase the size of each block
 to 128MB or greater than that only thing you can see the processing will be
 fast but at end there may be  chances of losing data.

 Thanks,
 Kiran


 On Fri, Jul 12, 2013 at 10:20 PM, Shalish VJ shalis...@yahoo.com wrote:

 Hi,


 Suppose block size set in configuration file at client side is 64MB,
 block size set in configuration file at name node side is 128MB and block
 size set in configuration file at datanode side is something else.
 Please advice, If the client is writing a file to hdfs,which property
 would be executed.

 Thanks,
 Shalish.

Re: Task failure in slave node

2013-07-11 Thread Azuryy Yu

sorry for typo,

mahout, not mahou.  sent from mobile
On Jul 11, 2013 9:40 PM, Azuryy Yu azury...@gmail.com wrote:

 hi,

 put all mahou jars under hadoop_home/lib, then restart cluster.
  On Jul 11, 2013 8:45 PM, Margusja mar...@roo.ee wrote:

 Hi

 I have tow nodes:
 n1 (master, salve) and n2 (slave)

 after set up I ran wordcount example and it worked fine:
 [hduser@n1 ~]$ hadoop jar /usr/local/hadoop/hadoop-**examples-1.0.4.jar
 wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
 13/07/11 15:30:44 INFO input.FileInputFormat: Total input paths to
 process : 7
 13/07/11 15:30:44 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 13/07/11 15:30:44 WARN snappy.LoadSnappy: Snappy native library not loaded
 13/07/11 15:30:44 INFO mapred.JobClient: Running job:
 job_201307111355_0015
 13/07/11 15:30:45 INFO mapred.JobClient:  map 0% reduce 0%
 13/07/11 15:31:03 INFO mapred.JobClient:  map 42% reduce 0%
 13/07/11 15:31:06 INFO mapred.JobClient:  map 57% reduce 0%
 13/07/11 15:31:09 INFO mapred.JobClient:  map 71% reduce 0%
 13/07/11 15:31:15 INFO mapred.JobClient:  map 100% reduce 0%
 13/07/11 15:31:18 INFO mapred.JobClient:  map 100% reduce 23%
 13/07/11 15:31:27 INFO mapred.JobClient:  map 100% reduce 100%
 13/07/11 15:31:32 INFO mapred.JobClient: Job complete:
 job_201307111355_0015
 13/07/11 15:31:32 INFO mapred.JobClient: Counters: 30
 13/07/11 15:31:32 INFO mapred.JobClient:   Job Counters
 13/07/11 15:31:32 INFO mapred.JobClient: Launched reduce tasks=1
 13/07/11 15:31:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=67576
 13/07/11 15:31:32 INFO mapred.JobClient: Total time spent by all
 reduces waiting after reserving slots (ms)=0
 13/07/11 15:31:32 INFO mapred.JobClient: Total time spent by all maps
 waiting after reserving slots (ms)=0
 13/07/11 15:31:32 INFO mapred.JobClient: Rack-local map tasks=3
 13/07/11 15:31:32 INFO mapred.JobClient: Launched map tasks=7
 13/07/11 15:31:32 INFO mapred.JobClient: Data-local map tasks=4
 13/07/11 15:31:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21992
 13/07/11 15:31:32 INFO mapred.JobClient:   File Output Format Counters
 13/07/11 15:31:32 INFO mapred.JobClient: Bytes Written=1412505
 13/07/11 15:31:32 INFO mapred.JobClient:   FileSystemCounters
 13/07/11 15:31:32 INFO mapred.JobClient: FILE_BYTES_READ=5414195
 13/07/11 15:31:32 INFO mapred.JobClient: HDFS_BYTES_READ=6950820
 13/07/11 15:31:32 INFO mapred.JobClient: FILE_BYTES_WRITTEN=8744993
 13/07/11 15:31:32 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1412505
 13/07/11 15:31:32 INFO mapred.JobClient:   File Input Format Counters
 13/07/11 15:31:32 INFO mapred.JobClient: Bytes Read=6950001
 13/07/11 15:31:32 INFO mapred.JobClient:   Map-Reduce Framework
 13/07/11 15:31:32 INFO mapred.JobClient: Map output materialized
 bytes=3157469
 13/07/11 15:31:32 INFO mapred.JobClient: Map input records=137146
 13/07/11 15:31:32 INFO mapred.JobClient: Reduce shuffle bytes=2904836
 13/07/11 15:31:32 INFO mapred.JobClient: Spilled Records=594764
 13/07/11 15:31:32 INFO mapred.JobClient: Map output bytes=11435849
 13/07/11 15:31:32 INFO mapred.JobClient: Total committed heap usage
 (bytes)=1128136704
 13/07/11 15:31:32 INFO mapred.JobClient: CPU time spent (ms)=18230
 13/07/11 15:31:32 INFO mapred.JobClient: Combine input records=1174991
 13/07/11 15:31:32 INFO mapred.JobClient: SPLIT_RAW_BYTES=819
 13/07/11 15:31:32 INFO mapred.JobClient: Reduce input records=218990
 13/07/11 15:31:32 INFO mapred.JobClient: Reduce input groups=128513
 13/07/11 15:31:32 INFO mapred.JobClient: Combine output records=218990
 13/07/11 15:31:32 INFO mapred.JobClient: Physical memory (bytes)
 snapshot=1179656192
 13/07/11 15:31:32 INFO mapred.JobClient: Reduce output records=128513
 13/07/11 15:31:32 INFO mapred.JobClient: Virtual memory (bytes)
 snapshot=22992117760
 13/07/11 15:31:32 INFO mapred.JobClient: Map output records=1174991

 from web interface (http://n1:50030/) I saw that both (n1 and n2 ) were
 used without any errors.

 Problems appear if I try to use following commands in master (n1):

 [hduser@n1 ~]$hadoop jar 
 mahout-distribution-0.7/**mahout-examples-0.7-job.jar
 org.apache.mahout.classifier.**df.mapreduce.BuildForest
 -Dmapred.max.split.size=**1874231 -p -d testdata/bal_ee_2009.csv -ds
 testdata/bal_ee_2009.csv.info -sl 10 -o bal_ee_2009_out -t 1

 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in [file:/usr/local/hadoop-1.0.4/**org/slf4j/impl/**
 StaticLoggerBinder.class]
 SLF4J: Found binding in [jar:file:/usr/local/hadoop-1.**
 0.4/lib/slf4j-log4j12-1.4.3.**jar!/org/slf4j/impl/**
 StaticLoggerBinder.class]
 SLF4J: See 
 http://www.slf4j.org/codes.**html#multiple_bindingshttp://www.slf4j.org/codes.html#multiple_bindingsfor
  an explanation.
 13/07/11 15:36:50 INFO mapreduce.BuildForest: Partial Mapred
 implementation
 13/07/11 15:36:50 INFO mapreduce.BuildForest

Re: cannot submit a job via java client in hadoop- 2.0.5-alpha

2013-07-10 Thread Azuryy Yu

you didn't set yarn.nodemanager.address in your yarn-site.xml




On Wed, Jul 10, 2013 at 4:33 PM, Francis.Hu francis...@reachjunction.comwrote:

  Hi,All

 ** **

 I have a hadoop- 2.0.5-alpha cluster with 3 data nodes . I have Resource
 Manager and all data nodes started and can access web ui of Resource
 Manager. 

 I wrote a java client to submit a job as TestJob class below. But the job
 is never submitted successfully. It throws out exception all the time. ***
 *

 My configurations are attached.  Can anyone help me? Thanks.

 ** **

 -my-java client

 *public* *class* TestJob {

 

 *public* *void* execute() {

 ** **

 Configuration conf1 = *new* Configuration();

 conf1.addResource(resources/core-site.xml);

 conf1.addResource(resources/hdfs-site.xml);

 conf1.addResource(resources/yarn-site.xml);

 conf1.addResource(resources/mapred-site.xml);

 JobConf conf = *new* JobConf(conf1);

 

 conf.setJar(/home/francis/hadoop-jobs/MapReduceJob.jar);

 conf.setJobName(Test);

 ** **

 conf.setInputFormat(TextInputFormat.*class*);

 conf.setOutputFormat(TextOutputFormat.*class*);

 ** **

 conf.setOutputKeyClass(Text.*class*);

 conf.setOutputValueClass(IntWritable.*class*);

 ** **

 conf.setMapperClass(DisplayRequestMapper.*class*);

 conf.setReducerClass(DisplayRequestReducer.*class*);

 ** **

 FileInputFormat.*setInputPaths*(conf,*new* Path(
 /home/francis/hadoop-jobs/2013070907.FNODE.2.txt));

 FileOutputFormat.*setOutputPath*(conf, *new* Path(
 /home/francis/hadoop-jobs/result/));

 ** **

 *try* {

 JobClient client = *new* JobClient(conf);

 RunningJob job = client.submitJob(conf);

 job.waitForCompletion();

 } *catch* (IOException e) {

 e.printStackTrace();

 }

 }

 }

 ** **

 --Exception 

 ** **

 jvm 1| java.io.IOException: Cannot initialize Cluster. Please check
 your configuration for mapreduce.framework.name and the correspond server
 addresses.

 jvm 1|  at
 org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:119)

 jvm 1|  at
 org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:81)

 jvm 1|  at
 org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:74)

 jvm 1|  at
 org.apache.hadoop.mapred.JobClient.init(JobClient.java:482)

 jvm 1|  at
 org.apache.hadoop.mapred.JobClient.init(JobClient.java:461)

 jvm 1|  at
 com.rh.elastic.hadoop.job.TestJob.execute(TestJob.java:59)

 ** **

 ** **

 Thanks,

 Francis.Hu

 ** **

Re: Distributed Cache

2013-07-09 Thread Azuryy Yu

It should be like this:
 Configuration conf = new Configuration();
 Job job = new Job(conf, test);
  job.setJarByClass(Test.class);

 DistributedCache.addCacheFile(new Path(your hdfs path).toUri(),
job.getConfiguration());


but the best example is test cases:
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/filecache/TestClientDistributedCacheManager.java?view=markup





On Wed, Jul 10, 2013 at 6:07 AM, Ted Yu yuzhih...@gmail.com wrote:

 You should use Job#addCacheFile()


 Cheers


 On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew andrew.bote...@emc.comwrote:

 Hi,

 ** **

 I was wondering if I can still use the DistributedCache class in the
 latest release of Hadoop (Version 2.0.5).

 In my driver class, I use this code to try and add a file to the
 distributed cache:

 ** **

 import java.net.URI;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.filecache.DistributedCache;

 import org.apache.hadoop.fs.*;

 import org.apache.hadoop.io.*;

 import org.apache.hadoop.mapreduce.*;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 ** **

 Configuration conf = new Configuration();

 DistributedCache.addCacheFile(new URI(file path in HDFS), conf);

 Job job = Job.getInstance(); 

 …

 ** **

 However, I keep getting warnings that the method addCacheFile() is
 deprecated.

 Is there a more current way to add files to the distributed cache?

 ** **

 Thanks in advance,

 ** **

 Andrew

Re: Can I move block data directly?

2013-07-08 Thread Azuryy Yu

Thanks Harsh, always detailed answers each time.

Yes, this is an unsupported scenarios, Load balancer is very slow even
after I set bandwidthPerSec  to a large value, so I want to take this way
to slove the problem quickly.






On Mon, Jul 8, 2013 at 1:46 PM, Viral Bajaria viral.baja...@gmail.comwrote:

 Out of curiosity. Besides the bandwidthPerSec and threshold what other
 parameters are tuneable ?

 Thanks,
 Viral

 On Sun, Jul 7, 2013 at 10:39 PM, Harsh J ha...@cloudera.com wrote:

 If balancer isn't cutting it out for you with stock defaults, you
 should consider tuning that than doing these unsupported scenarios.

Re: Can I move block data directly?

2013-07-08 Thread Azuryy Yu

bq. I'd also ensure the ownership of the block files are intact.
Hi Harsh,

what's mean ensure the ownership of the block files are intact?

and I want to ask more.

for my understand, after I restart the data node daemon, block report
should tell NN all blocks owned by this DN, and block scnaner can remember
all blocks' structure on local, so the block file owner ship would be
confirmed at the starting period

and even if some pieces of blk_ files losts, then NN an find it's under
replicated, am I right?  Thanks.




On Mon, Jul 8, 2013 at 2:07 PM, Azuryy Yu azury...@gmail.com wrote:

 Thanks Harsh, always detailed answers each time.

 Yes, this is an unsupported scenarios, Load balancer is very slow even
 after I set bandwidthPerSec  to a large value, so I want to take this way
 to slove the problem quickly.






 On Mon, Jul 8, 2013 at 1:46 PM, Viral Bajaria viral.baja...@gmail.comwrote:

 Out of curiosity. Besides the bandwidthPerSec and threshold what other
 parameters are tuneable ?

 Thanks,
 Viral

 On Sun, Jul 7, 2013 at 10:39 PM, Harsh J ha...@cloudera.com wrote:

 If balancer isn't cutting it out for you with stock defaults, you
 should consider tuning that than doing these unsupported scenarios.

Re: Can I move block data directly?

2013-07-08 Thread Azuryy Yu

Yeah. I got it. Thanks Harsh.


On Mon, Jul 8, 2013 at 3:10 PM, Harsh J ha...@cloudera.com wrote:

 Yeah you're right. I only meant the ownership of the blk_* files to be
 owned by the same user as the DN daemon, for consistency more than
 anything else.

 On Mon, Jul 8, 2013 at 11:46 AM, Azuryy Yu azury...@gmail.com wrote:
  bq. I'd also ensure the ownership of the block files are intact.
  Hi Harsh,
 
  what's mean ensure the ownership of the block files are intact?
 
  and I want to ask more.
 
  for my understand, after I restart the data node daemon, block report
 should
  tell NN all blocks owned by this DN, and block scnaner can remember all
  blocks' structure on local, so the block file owner ship would be
 confirmed
  at the starting period
 
  and even if some pieces of blk_ files losts, then NN an find it's under
  replicated, am I right?  Thanks.
 
 
 
 
  On Mon, Jul 8, 2013 at 2:07 PM, Azuryy Yu azury...@gmail.com wrote:
 
  Thanks Harsh, always detailed answers each time.
 
  Yes, this is an unsupported scenarios, Load balancer is very slow even
  after I set bandwidthPerSec  to a large value, so I want to take this
 way to
  slove the problem quickly.
 
 
 
 
 
 
  On Mon, Jul 8, 2013 at 1:46 PM, Viral Bajaria viral.baja...@gmail.com
  wrote:
 
  Out of curiosity. Besides the bandwidthPerSec and threshold what other
  parameters are tuneable ?
 
  Thanks,
  Viral
 
  On Sun, Jul 7, 2013 at 10:39 PM, Harsh J ha...@cloudera.com wrote:
 
  If balancer isn't cutting it out for you with stock defaults, you
  should consider tuning that than doing these unsupported scenarios.
 
 
 
 
 



 --
 Harsh J

Re: New to hadoop - which version to start with ?

2013-07-06 Thread Azuryy Yu

oh, this is fair scheduler issue. you need a patch for that. I will send it
to you later. or can you use another scheduler insteadof fs?
 On Jul 6, 2013 4:42 PM, sudhir543-...@yahoo.com sudhir543-...@yahoo.com
wrote:


 That did not work, same error comes up.


 So , I tried
 bin/hadoop fs -chmod -R 755
 /tmp/hadoop-Sudhir/mapred/staging/sudhir-1664368101/.staging - as well

 However next run would give

 13/07/06 14:09:46 ERROR security.UserGroupInformation:
 PriviledgedActionException as:Sudhir cause:java.io.IOException: Failed to
 set permissions of path:
 \tmp\hadoop-Sudhir\mapred\staging\Sudhir1731506911\.staging to 0700
 java.io.IOException: Failed to set permissions of path:
 \tmp\hadoop-Sudhir\mapred\staging\Sudhir1731506911\.staging to 0700

 Thanks
 *Sudhir *


   --
  *From:* Azuryy Yu azury...@gmail.com
 *To:* user@hadoop.apache.org; sudhir543-...@yahoo.com 
 sudhir543-...@yahoo.com
 *Sent:* Saturday, 6 July 2013 11:01 AM
 *Subject:* Re: New to hadoop - which version to start with ?

 hadoop fs -chmod -R 755 \tmp\hadoop-Sudhir\mapred\staging

 Then It should works.


 On Sat, Jul 6, 2013 at 1:27 PM, sudhir543-...@yahoo.com 
 sudhir543-...@yahoo.com wrote:


 I am new to hadoop, just started reading 'hadoop the definitive guide'.
 I downloaded hadoop 1.1.2 and tried to run a sample Map reduce job using
 cygwin, but I got following error

 java.io.IOException: Failed to set permissions of path: 
 \tmp\hadoop-Sudhir\mapred\staging\Sudhir-1267269654\.staging to 0700

  I read that some specific version has this error.

 Can some one suggest me which version should I start with (and supports
 MapReduce 1 runtime (Classic))


 Thanks
 *Sudhir *

Re: Which InputFormat to use?

2013-07-05 Thread Azuryy Yu

Using InputFormat under mapreduce package.  mapred package is very old
package. but generally you can extend from FileInputFormat under
o.a.h.mapreduce package.


On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k devara...@huawei.com wrote:

  Hi Ahmed,

 ** **

 Hadoop 0.20.0 included the new mapred API, these sometimes
 refer as context objects. These are designed to make API easier to evolve
 in future. There are some differences between new  old API's,

 ** **

  The new API's favour abstract classes rather than interfaces, since
 abstract classes are easy to evolve.

  New API's use context objects like MapContext  ReduceContext to connect
 the user code. 

  The old API has a special JobConf object for jobconf, in new API Job
 configuration will be done using Configuration. 

 ** **

 You can find the new API's in org.apache.hadoop.mapreduce.lib.input.*
 package and its sub packages, old API's in org.apache.hadoop.mapred.*
 package its sub packages. 

 ** **

 The new API is type-incompatible with the old, we need to rewrite the jobs
 to make use of these advantages.

 ** **

 Based on these things you can select which API's to use.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Ahmed Eldawy [mailto:aseld...@gmail.com]
 *Sent:* 05 July 2013 00:00

 *To:* user@hadoop.apache.org
 *Subject:* Which InputFormat to use?

  ** **

 Hi I'm developing a new set of InputFormats that are used for a project
 I'm doing. I found that there are two ways to create  a new InputFormat.**
 **

 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat

 2- Implement the interface org.apache.hadoop.mapred.InputFormat

 I don't know why there are two versions which are incompatible. I found
 out that for each one, there is a whole set of interfaces for different
 classes such as InputSplit, RecordReader and MapReduce job. Unfortunately,
 each set of classes is not compatible with the other one. This means that I
 have to choose one of the interfaces and go with it till the end. I have
 two questions basically.

 1- Which of these two interfaces I should go with? I didn't find any
 deprecation in one of them so they both seem legitimate. Is there any plan
 to retire one of them?

 2- I already have some classes implemented in one of the formats, does it
 worth refactoring these classes to use the other interface, in case I used
 he old format.

 Thanks in advance for your help.

 ** **


 

 Best regards,
 Ahmed Eldawy

Re: Decomssion datanode - no response

2013-07-05 Thread Azuryy Yu

I filed this issue at :
https://issues.apache.org/jira/browse/HDFS-4959


On Fri, Jul 5, 2013 at 1:06 PM, Azuryy Yu azury...@gmail.com wrote:

 Client hasn't any connection problem.


 On Fri, Jul 5, 2013 at 12:46 PM, Devaraj k devara...@huawei.com wrote:

  And also could you check whether the client is connecting to NameNode
 or any failure in connecting to the NN.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 09:15

 *To:* user@hadoop.apache.org
 *Subject:* Re: Decomssion datanode - no response

  ** **

 I added dfs.hosts.exclude before NN started.

  

 and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It
 doesn't decomssion. 

 ** **

 On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k devara...@huawei.com wrote:*
 ***

 When did you add this configuration in NN conf? 

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 If you have added this configuration after starting NN, it won’t take
 effect and need to restart NN.

  

 If you have added this config with the exclude file before NN start, you
 can update the file with new hosts and refreshNodes command can be issued,
 then newly updated the DN’s will be decommissioned.

  

 Thanks

 Devaraj k

  

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:48
 *To:* user@hadoop.apache.org
 *Subject:* Re: Decomssion datanode - no response

  

 Thanks Devaraj,

  

 There are no any releated logs in the NN log and DN log.

  

 On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:*
 ***

 Do you see any log related to this in Name Node logs when you issue
 refreshNodes dfsadmin command?

  

 Thanks

 Devaraj k

  

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:12
 *To:* user@hadoop.apache.org
 *Subject:* Decomssion datanode - no response

  

 Hi,

 I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude,
 

  

 hdfs-site.xml:

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 then:

 hdfs dfsadmin -refreshNodes

  

 but there is no decomssion nodes showed on the webUI. and not any
 releated logs in the datanode log. what's wrong?

  

 ** **

Re: New to hadoop - which version to start with ?

2013-07-05 Thread Azuryy Yu

hadoop fs -chmod -R 755 \tmp\hadoop-Sudhir\mapred\staging

Then It should works.


On Sat, Jul 6, 2013 at 1:27 PM, sudhir543-...@yahoo.com 
sudhir543-...@yahoo.com wrote:


 I am new to hadoop, just started reading 'hadoop the definitive guide'.
 I downloaded hadoop 1.1.2 and tried to run a sample Map reduce job using
 cygwin, but I got following error

 java.io.IOException: Failed to set permissions of path: 
 \tmp\hadoop-Sudhir\mapred\staging\Sudhir-1267269654\.staging to 0700

  I read that some specific version has this error.

 Can some one suggest me which version should I start with (and supports
 MapReduce 1 runtime (Classic))


 Thanks
 *Sudhir *

Re: 【dfs.namenode.shared.edits.dir can support different NameServices】

Hi Bing,
HA is not confilct with HDFS federation.

for example, you have two name services: cluster1, cluster2,

then,

property
namedfs.namenode.shared.edits.dir/name
valueqjournal://n1.com:8485;n2.com:8485/cluster1/value
/property

property
namedfs.namenode.shared.edits.dir/name
valueqjournal://n1.com:8485;n2.com:8485/cluster2/value
 /property


On Thu, Jul 4, 2013 at 2:46 PM, Bing Jiang jiangbinglo...@gmail.com wrote:

 hi, Chris.
 I have traced the source code, and I find this issue comes from
 sbin/start-dfs.sh,

 SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey
 dfs.namenode.shared.edits.dir 2-)

 If I set the suffix on dfs.namenode.shared.edits.dir.[namespace id].[nn
 id], it will get null. So please take consideration about HA on multiple
 nameservice.
 So, please change the way of launching JournalNodes in  shell scripts
 (sbin/start-dfs.sh)


 Thanks.


 2013/7/4 Chris Nauroth cnaur...@hortonworks.com

 Hello Bing,

 I have not tested this configuration myself, but from reading the code it
 does appear that dfs.namenode.shared.edits.dir supports appending the
 nameservice ID to the key, so that you can specify different directories
 for different federated namenodes in a single hdfs-site.xml.


 https://github.com/apache/hadoop-common/blob/branch-2.0.5-alpha/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L147

 Hope this helps,

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Wed, Jul 3, 2013 at 6:21 AM, Bing Jiang jiangbinglo...@gmail.comwrote:

 hi,all.
 Using hadoop-2.0.5-alpha.
 and I meet a problem that I should configure different
 dfs.namenode.shared.edits.dir for different NameSpace.
 So could dfs.namenode.shared.edits.dir item can support multiple
 nameservies to avoid retaining multiple conf/hdfs-site.xml in different
 Namenode?

 Thanks~



 --
 Bing Jiang
 Tel：(86)134-2619-1361
 weibo: http://weibo.com/jiangbinglover
 BLOG: http://blog.sina.com.cn/jiangbinglover
 National Research Center for Intelligent Computing Systems
 Institute of Computing technology
 Graduate University of Chinese Academy of Science





 --
 Bing Jiang
 Tel：(86)134-2619-1361
 weibo: http://weibo.com/jiangbinglover
 BLOG: http://blog.sina.com.cn/jiangbinglover
 National Research Center for Intelligent Computing Systems
 Institute of Computing technology
 Graduate University of Chinese Academy of Science

Re: Datanode support different Namespace

This is because you don't use the same clusterID. all data nodes and
namenodes should use the same clusterID.


On Thu, Jul 4, 2013 at 3:12 PM, Bing Jiang jiangbinglo...@gmail.com wrote:

 Hi, all

 We try to use hadoop-2.0.5-alpha, using two namespaces, one is for hbase
 cluster, and the other one is for common use.At the same time, we use
 Quorum Journal policy as HA.

 GS-CIX-SEV0001, GS-CIX-SEV0002,  namenodes in hbasecluster  namespace

 GS-CIX-SEV0003, GS-CIX-SEV0004, namenodes in commoncluster namespace.

 GS-CIX-SEV0001~GS-CIX-SEV0008 , 8 machines used as Datanode

 After launching the hdfs cluster all, there is something which makes me
 confused, that  each namespace has half of the datanodes.

 NameNode 'GS-CIX-SEV0004:9100'

 Started:Thu Jul 04 10:28:00 CST 2013
 Version:2.0.5-alpha, 1488459
 Compiled:   2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha
 Cluster ID: CID-15c48d78-2137-4c6e-aacf-0edbf2bb3db7
 Block Pool ID:  BP-1792015895-10.100.2.3-1372904504940


 Browse the filesystem
 NameNode Logs
 Go back to DFS home

 
 Live Datanodes : 4


 Last AdminConfigured  Used Non DFS  Remaining
 Used  Used  Remaining  Block Block Pool  Failed
  Node  Contact   State Capacity   (GB)   Used (GB)
 (%)   (%)  (%)Blocks   Pool  Used (%)  Volumes
  (GB)
 (GB)  Used (GB) Blocks
 GS-CIX-SEV0001 1   In Service  888.07 0.00   116.04772.03 0.00
 ┌┐ 86.93 0   0.00   0.000

 └┘
 GS-CIX-SEV0002 1   In Service  888.07 0.00   135.50752.57 0.00
 ┌┐ 84.74 0   0.00   0.000

 └┘
 GS-CIX-SEV0005 1   In Service  888.07 0.0097.61790.46 0.00
 ┌┐ 89.01 0   0.00   0.000

 └┘
 GS-CIX-SEV0006 1   In Service  888.07 0.00   122.30765.77 0.00
 ┌┐ 86.23 0   0.00   0.000

 └┘


 Another Namespace's NameNode:

 NameNode 'GS-CIX-SEV0001:9100'

 Started:Thu Jul 04 10:19:03 CST 2013
 Version:2.0.5-alpha, 1488459
 Compiled:   2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha
 Cluster ID: CID-1a53483d-000e-4726-aef1-f500bedb1df6
 Block Pool ID:  BP-1142418822-10.100.2.1-1372904314309


 Browse the filesystem
 NameNode Logs
 Go back to DFS home

 
 Live Datanodes : 4


 Last AdminConfigured  Used Non DFS  Remaining
 Used  Used  Remaining  Block Block Pool  Failed
  Node  Contact   State Capacity   (GB)   Used (GB)
 (%)   (%)  (%)Blocks   Pool  Used (%)  Volumes
  (GB)
 (GB)  Used (GB) Blocks
 GS-CIX-SEV0003 0   In Service  888.07 0.00   150.54737.53 0.00
 ┌┐ 83.05 0   0.00   0.000

 └┘
 GS-CIX-SEV0004 0   In Service  888.07 0.00   177.22710.85 0.00
 ┌┐ 80.04 0   0.00   0.000

 └┘
 GS-CIX-SEV0007 0   In Service  888.07 0.0062.91825.16 0.00
 ┌┐ 92.92 0   0.00   0.000

 └┘
 GS-CIX-SEV0008 0   In Service  888.07 0.00   125.25762.82 0.00
 ┌┐ 85.90 0   0.00   0.000



 And check the DN(GS-CIX-SEV0001)'s log, it prints like this:
 2013-07-04 10:34:51,699 FATAL
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
 block pool Block pool BP-1142418822-10.100.2.1-1372904314309 (storage id
 DS-1677272131-10.100.2.1-50010-1372905291690) service to GS-CIX-SEV0001/
 10.100.2.1:9100
 java.io.IOException: Inconsistent storage IDs. Name-node returned
 DS811369792. Expecting DS-1677272131-10.100.2.1-50010-1372905291690
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:731)
 at
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:308)
 at
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:632)
 at

Re: Datanode support different Namespace

Additional,

If these are two new clusters, then on each namenode, using hdfs namenode
-format -clusterID yourID

But if you want to upgrade these two clusters from NonHA to HA, then using
bin/start-dfs.sh -upgrade -clusterID yourID



On Thu, Jul 4, 2013 at 3:14 PM, Azuryy Yu azury...@gmail.com wrote:


 This is because you don't use the same clusterID. all data nodes and
 namenodes should use the same clusterID.


 On Thu, Jul 4, 2013 at 3:12 PM, Bing Jiang jiangbinglo...@gmail.comwrote:

 Hi, all

 We try to use hadoop-2.0.5-alpha, using two namespaces, one is for hbase
 cluster, and the other one is for common use.At the same time, we use
 Quorum Journal policy as HA.

 GS-CIX-SEV0001, GS-CIX-SEV0002,  namenodes in hbasecluster  namespace

 GS-CIX-SEV0003, GS-CIX-SEV0004, namenodes in commoncluster namespace.

 GS-CIX-SEV0001~GS-CIX-SEV0008 , 8 machines used as Datanode

 After launching the hdfs cluster all, there is something which makes me
 confused, that  each namespace has half of the datanodes.

 NameNode 'GS-CIX-SEV0004:9100'

 Started:Thu Jul 04 10:28:00 CST 2013
 Version:2.0.5-alpha, 1488459
 Compiled:   2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha
 Cluster ID: CID-15c48d78-2137-4c6e-aacf-0edbf2bb3db7
 Block Pool ID:  BP-1792015895-10.100.2.3-1372904504940


 Browse the filesystem
 NameNode Logs
 Go back to DFS home

 
 Live Datanodes : 4


 Last AdminConfigured  Used Non DFS  Remaining
 Used  Used  Remaining  Block Block Pool  Failed
  Node  Contact   State Capacity   (GB)   Used (GB)
 (%)   (%)  (%)Blocks   Pool  Used (%)  Volumes
  (GB)
 (GB)  Used (GB) Blocks
 GS-CIX-SEV0001 1   In Service  888.07 0.00   116.04772.03
 0.00 ┌┐ 86.93 0   0.00   0.000

 └┘
 GS-CIX-SEV0002 1   In Service  888.07 0.00   135.50752.57
 0.00 ┌┐ 84.74 0   0.00   0.000

 └┘
 GS-CIX-SEV0005 1   In Service  888.07 0.0097.61790.46
 0.00 ┌┐ 89.01 0   0.00   0.000

 └┘
 GS-CIX-SEV0006 1   In Service  888.07 0.00   122.30765.77
 0.00 ┌┐ 86.23 0   0.00   0.000

 └┘


 Another Namespace's NameNode:

 NameNode 'GS-CIX-SEV0001:9100'

 Started:Thu Jul 04 10:19:03 CST 2013
 Version:2.0.5-alpha, 1488459
 Compiled:   2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha
 Cluster ID: CID-1a53483d-000e-4726-aef1-f500bedb1df6
 Block Pool ID:  BP-1142418822-10.100.2.1-1372904314309


 Browse the filesystem
 NameNode Logs
 Go back to DFS home

 
 Live Datanodes : 4


 Last AdminConfigured  Used Non DFS  Remaining
 Used  Used  Remaining  Block Block Pool  Failed
  Node  Contact   State Capacity   (GB)   Used (GB)
 (%)   (%)  (%)Blocks   Pool  Used (%)  Volumes
  (GB)
 (GB)  Used (GB) Blocks
 GS-CIX-SEV0003 0   In Service  888.07 0.00   150.54737.53
 0.00 ┌┐ 83.05 0   0.00   0.000

 └┘
 GS-CIX-SEV0004 0   In Service  888.07 0.00   177.22710.85
 0.00 ┌┐ 80.04 0   0.00   0.000

 └┘
 GS-CIX-SEV0007 0   In Service  888.07 0.0062.91825.16
 0.00 ┌┐ 92.92 0   0.00   0.000

 └┘
 GS-CIX-SEV0008 0   In Service  888.07 0.00   125.25762.82
 0.00 ┌┐ 85.90 0   0.00   0.000



 And check the DN(GS-CIX-SEV0001)'s log, it prints like this:
 2013-07-04 10:34:51,699 FATAL
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
 block pool Block pool BP-1142418822-10.100.2.1-1372904314309 (storage id
 DS-1677272131-10.100.2.1-50010-1372905291690) service to GS-CIX-SEV0001/
 10.100.2.1:9100
 java.io.IOException: Inconsistent storage IDs. Name-node returned
 DS811369792. Expecting DS-1677272131-10.100.2.1-50010-1372905291690
 at
 org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:731

Re: Datanode support different Namespace

It's random.
 On Jul 4, 2013 3:33 PM, Bing Jiang jiangbinglo...@gmail.com wrote:

 If not set cluster id in formatting the Namenode, is there a policy in
 hdfs to guarantee the even of distributing DataNodes into different
 Namespace, or just randomly?



 2013/7/4 Azuryy Yu azury...@gmail.com

 Additional,

 If these are two new clusters, then on each namenode, using hdfs
 namenode -format -clusterID yourID

 But if you want to upgrade these two clusters from NonHA to HA, then
 using bin/start-dfs.sh -upgrade -clusterID yourID



 On Thu, Jul 4, 2013 at 3:14 PM, Azuryy Yu azury...@gmail.com wrote:


 This is because you don't use the same clusterID. all data nodes and
 namenodes should use the same clusterID.


 On Thu, Jul 4, 2013 at 3:12 PM, Bing Jiang jiangbinglo...@gmail.comwrote:

 Hi, all

 We try to use hadoop-2.0.5-alpha, using two namespaces, one is for
 hbase cluster, and the other one is for common use.At the same time, we use
 Quorum Journal policy as HA.

 GS-CIX-SEV0001, GS-CIX-SEV0002,  namenodes in hbasecluster  namespace

 GS-CIX-SEV0003, GS-CIX-SEV0004, namenodes in commoncluster namespace.

 GS-CIX-SEV0001~GS-CIX-SEV0008 , 8 machines used as Datanode

 After launching the hdfs cluster all, there is something which makes
 me  confused, that  each namespace has half of the datanodes.

 NameNode 'GS-CIX-SEV0004:9100'

 Started:Thu Jul 04 10:28:00 CST 2013
 Version:2.0.5-alpha, 1488459
 Compiled:   2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha
 Cluster ID: CID-15c48d78-2137-4c6e-aacf-0edbf2bb3db7
 Block Pool ID:  BP-1792015895-10.100.2.3-1372904504940


 Browse the filesystem
 NameNode Logs
 Go back to DFS home

 
 Live Datanodes : 4


 Last AdminConfigured  Used Non DFS  Remaining
 Used  Used  Remaining  Block Block Pool  Failed
  Node  Contact   State Capacity   (GB)   Used (GB)
 (%)   (%)  (%)Blocks   Pool  Used (%)  Volumes
  (GB)
 (GB)  Used (GB) Blocks
 GS-CIX-SEV0001 1   In Service  888.07 0.00   116.04772.03
 0.00 ┌┐ 86.93 0   0.00   0.000

 └┘
 GS-CIX-SEV0002 1   In Service  888.07 0.00   135.50752.57
 0.00 ┌┐ 84.74 0   0.00   0.000

 └┘
 GS-CIX-SEV0005 1   In Service  888.07 0.0097.61790.46
 0.00 ┌┐ 89.01 0   0.00   0.000

 └┘
 GS-CIX-SEV0006 1   In Service  888.07 0.00   122.30765.77
 0.00 ┌┐ 86.23 0   0.00   0.000

 └┘


 Another Namespace's NameNode:

 NameNode 'GS-CIX-SEV0001:9100'

 Started:Thu Jul 04 10:19:03 CST 2013
 Version:2.0.5-alpha, 1488459
 Compiled:   2013-06-01T04:05Z by jenkins from branch-2.0.5-alpha
 Cluster ID: CID-1a53483d-000e-4726-aef1-f500bedb1df6
 Block Pool ID:  BP-1142418822-10.100.2.1-1372904314309


 Browse the filesystem
 NameNode Logs
 Go back to DFS home

 
 Live Datanodes : 4


 Last AdminConfigured  Used Non DFS  Remaining
 Used  Used  Remaining  Block Block Pool  Failed
  Node  Contact   State Capacity   (GB)   Used (GB)
 (%)   (%)  (%)Blocks   Pool  Used (%)  Volumes
  (GB)
 (GB)  Used (GB) Blocks
 GS-CIX-SEV0003 0   In Service  888.07 0.00   150.54737.53
 0.00 ┌┐ 83.05 0   0.00   0.000

 └┘
 GS-CIX-SEV0004 0   In Service  888.07 0.00   177.22710.85
 0.00 ┌┐ 80.04 0   0.00   0.000

 └┘
 GS-CIX-SEV0007 0   In Service  888.07 0.0062.91825.16
 0.00 ┌┐ 92.92 0   0.00   0.000

 └┘
 GS-CIX-SEV0008 0   In Service  888.07 0.00   125.25762.82
 0.00 ┌┐ 85.90 0   0.00   0.000



 And check the DN(GS-CIX-SEV0001)'s log, it prints like this:
 2013-07-04 10:34:51,699 FATAL
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
 block pool Block pool BP-1142418822-10.100.2.1-1372904314309 (storage id
 DS-1677272131-10.100.2.1-50010-1372905291690

Decomssion datanode - no response

Hi,
I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude,

hdfs-site.xml:
  property
namedfs.hosts.exclude/name
value/usr/local/hadoop/conf/dfs_exclude/value
  /property

then:
hdfs dfsadmin -refreshNodes

but there is no decomssion nodes showed on the webUI. and not any releated
logs in the datanode log. what's wrong?

Re: Decomssion datanode - no response

Thanks Devaraj,

There are no any releated logs in the NN log and DN log.


On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:

  Do you see any log related to this in Name Node logs when you issue
 refreshNodes dfsadmin command?

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:12
 *To:* user@hadoop.apache.org
 *Subject:* Decomssion datanode - no response

 ** **

 Hi,

 I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, *
 ***

  

 hdfs-site.xml:

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 then:

 hdfs dfsadmin -refreshNodes

  

 but there is no decomssion nodes showed on the webUI. and not any releated
 logs in the datanode log. what's wrong?

Re: 答复: Decomssion datanode - no response

It's 20 minutes passed after I ran -refreshNodes, but there is no
decomssion nodes showed on the UI. and canno find any hints in the NN and
DN logs.


On Fri, Jul 5, 2013 at 11:16 AM, Francis.Hu francis...@reachjunction.comwrote:

  I know the default value is 10 minutes and 30 seconds for switching
 datanodes from live to dead. 

 ** **

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com]
 *发送时间:* Friday, July 05, 2013 10:42
 *收件人:* user@hadoop.apache.org
 *主题:* Decomssion datanode - no response

 ** **

 Hi,

 I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, *
 ***

  

 hdfs-site.xml:

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 then:

 hdfs dfsadmin -refreshNodes

  

 but there is no decomssion nodes showed on the webUI. and not any releated
 logs in the datanode log. what's wrong?

Re: Decomssion datanode - no response

I added dfs.hosts.exclude before NN started.

and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It
doesn't decomssion.


On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k devara...@huawei.com wrote:

  When did you add this configuration in NN conf? 

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

 ** **

 If you have added this configuration after starting NN, it won’t take
 effect and need to restart NN.

 ** **

 If you have added this config with the exclude file before NN start, you
 can update the file with new hosts and refreshNodes command can be issued,
 then newly updated the DN’s will be decommissioned.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:48
 *To:* user@hadoop.apache.org
 *Subject:* Re: Decomssion datanode - no response

 ** **

 Thanks Devaraj,

  

 There are no any releated logs in the NN log and DN log.

 ** **

 On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:**
 **

 Do you see any log related to this in Name Node logs when you issue
 refreshNodes dfsadmin command?

  

 Thanks

 Devaraj k

  

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:12
 *To:* user@hadoop.apache.org
 *Subject:* Decomssion datanode - no response

  

 Hi,

 I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, *
 ***

  

 hdfs-site.xml:

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 then:

 hdfs dfsadmin -refreshNodes

  

 but there is no decomssion nodes showed on the webUI. and not any releated
 logs in the datanode log. what's wrong?

 ** **

Re: Decomssion datanode - no response

Client hasn't any connection problem.


On Fri, Jul 5, 2013 at 12:46 PM, Devaraj k devara...@huawei.com wrote:

  And also could you check whether the client is connecting to NameNode or
 any failure in connecting to the NN.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 09:15

 *To:* user@hadoop.apache.org
 *Subject:* Re: Decomssion datanode - no response

  ** **

 I added dfs.hosts.exclude before NN started.

  

 and I updated /usr/local/hadoop/conf/dfs_exclude whith new hosts, but It
 doesn't decomssion. 

 ** **

 On Fri, Jul 5, 2013 at 11:39 AM, Devaraj k devara...@huawei.com wrote:**
 **

 When did you add this configuration in NN conf? 

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 If you have added this configuration after starting NN, it won’t take
 effect and need to restart NN.

  

 If you have added this config with the exclude file before NN start, you
 can update the file with new hosts and refreshNodes command can be issued,
 then newly updated the DN’s will be decommissioned.

  

 Thanks

 Devaraj k

  

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:48
 *To:* user@hadoop.apache.org
 *Subject:* Re: Decomssion datanode - no response

  

 Thanks Devaraj,

  

 There are no any releated logs in the NN log and DN log.

  

 On Fri, Jul 5, 2013 at 11:14 AM, Devaraj k devara...@huawei.com wrote:**
 **

 Do you see any log related to this in Name Node logs when you issue
 refreshNodes dfsadmin command?

  

 Thanks

 Devaraj k

  

 *From:* Azuryy Yu [mailto:azury...@gmail.com]
 *Sent:* 05 July 2013 08:12
 *To:* user@hadoop.apache.org
 *Subject:* Decomssion datanode - no response

  

 Hi,

 I am using hadoop-2.0.5-alpha, and I added 5 datanodes into dfs_exclude, *
 ***

  

 hdfs-site.xml:

   property
 namedfs.hosts.exclude/name
 value/usr/local/hadoop/conf/dfs_exclude/value
   /property

  

 then:

 hdfs dfsadmin -refreshNodes

  

 but there is no decomssion nodes showed on the webUI. and not any releated
 logs in the datanode log. what's wrong?

  

 ** **

Re: Could not get additional block while writing hundreds of files

2013-07-03 Thread Azuryy Yu

Hi Manuel,
2013-07-03 15:03:16,427 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
enough replicas, still in need of 3
2013-07-03 15:03:16,427 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:root cause:java.io.IOException: File /log/1372863795616 could only be
replicated to 0 nodes, instead of 1


This indicates you haven't enough space on the HDFS. can you check the
cluster capacity used?




On Thu, Jul 4, 2013 at 12:14 AM, Manuel de Ferran manuel.defer...@gmail.com
 wrote:

 Greetings all,

 we try to import data to an HDFS cluster, but we face random Exception. We
 try to figure out what is the root cause: misconfiguration, too much load,
 ... and how to solve that.

 The client writes hundred of files with a replication factor of 3. It
 crashes sometimes at the beginning, sometimes close to the end, and in rare
 case it succeeds.

 On failure, we have on client side:
  DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
 java.io.IOException: File /log/1372863795616 could only be replicated to 0
 nodes, instead of 1
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
  

 which seems to be well known. We have followed the hints from the
 Troubleshooting page, but we're still stuck: lots of disk available on
 datanodes, free inodes, far below the open files limit , all datanodes are
 up and running.

 Note that we have other HDFS clients that are still able to write files
 while import is running.

 Here is the corresponding extract of the namenode log file:

 2013-07-03 15:03:15,951 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
 transactions: 46009 Total time for transactions(ms): 153Number of
 transactions batched in Syncs: 5428 Number of syncs: 32889 SyncTimes(ms):
 139555
 2013-07-03 15:03:16,427 WARN
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
 enough replicas, still in need of 3
 2013-07-03 15:03:16,427 ERROR
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
 as:root cause:java.io.IOException: File /log/1372863795616 could only be
 replicated to 0 nodes, instead of 1
 2013-07-03 15:03:16,427 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 9 on 9002, call addBlock(/log/1372863795616, DFSClient_1875494617,
 null) from 192.168.1.141:41376: error: java.io.IOException: File
 /log/1372863795616 could only be replicated to 0 nodes, instead of 1
 java.io.IOException: File /log/1372863795616 could only be replicated to 0
 nodes, instead of 1
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
 at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)


 During the process, fsck reports about 300 of open files. The cluster is
 running hadoop-1.0.3.

 Any advice about the configuration ? We tried to
 lower dfs.heartbeat.interval, we raised dfs.datanode.max.xcievers to 4k
 maybe raising dfs.datanode.handler.count ?


 Thanks for your help

Re: data loss after cluster wide power loss

2013-07-02 Thread Azuryy Yu

Hi Uma,

I think there is minimum performance degration if set
dfs.datanode.synconclose to true.

On Tue, Jul 2, 2013 at 3:31 PM, Uma Maheswara Rao G mahesw...@huawei.comwrote:

Hi Dave,

Looks like your analysis is correct. I have faced similar issue some time
back.
See the discussion link:
http://markmail.org/message/ruev3aa4x5zh2l4w#query:+page:1+mid:33gcdcu3coodkks3+state:results
On sudden restarts, it can lost the OS filesystem edits. Similar thing
happened in our case, i.e, after restart blocks were moved back to
BeingWritten directory even though they were finalized.
After restart they were marked as corrupt. You could set
dfs.datanode.synconclose to true to avoid this sort of things, but that
will degrade performance.

Regards,
Uma

-Original Message-
From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave
Latham
Sent: 01 July 2013 16:08
To: hdfs-u...@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.org
Subject: Re: data loss after cluster wide power loss

Much appreciated, Suresh. Let me know if I can provide any more
information or if you'd like me to open a JIRA.

Dave

On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas sur...@hortonworks.com
wrote:

Dave,

Thanks for the detailed email. Sorry I did not read all the details
you had sent earlier completely (on my phone). As you said, this is
not related to data loss related to HBase log and hsync. I think you
are right; the rename operation itself might not have hit the disk. I
think we should either ensure metadata operation is synced on the
datanode or handle it being reported as blockBeingWritten. Let me
spend sometime to debug this issue.

One surprising thing is, all the replicas were reported as
blockBeingWritten.

Regards,
Suresh

On Mon, Jul 1, 2013 at 6:03 PM, Dave Latham lat...@davelink.net wrote:

(Removing hbase list and adding hdfs-dev list as this is pretty
internal stuff).

Reading through the code a bit:

FSDataOutputStream.close calls
DFSOutputStream.close calls
DFSOutputStream.closeInternal
- sets currentPacket.lastPacketInBlock = true
- then calls
DFSOutputStream.flushInternal
- enqueues current packet
- waits for ack

BlockReceiver.run
- if (lastPacketInBlock !receiver.finalized) calls
FSDataset.finalizeBlock calls FSDataset.finalizeBlockInternal calls
FSVolume.addBlock calls FSDir.addBlock calls FSDir.addBlock
- renames block from blocksBeingWritten tmp dir to current dest
dir

This looks to me as I would expect a synchronous chain from a DFS
client to moving the file from blocksBeingWritten to the current dir
so that once the file is closed that it the block files would be in
the proper directory
- even if the contents of the file are still in the OS buffer rather
than synced to disk. It's only after this moving of blocks that
NameNode.complete file is called. There are several conditions and
loops in there that I'm not certain this chain is fully reliable in
all cases without a greater understanding of the code.

Could it be the case that the rename operation itself is not synced
and that ext3 lost the fact that the block files were moved?
Or is there a bug in the close file logic that for some reason the
block files are not always moved into place when a file is closed?

Thanks for your patience,
Dave

On Mon, Jul 1, 2013 at 3:35 PM, Dave Latham lat...@davelink.net
wrote:

Thanks for the response, Suresh.

I'm not sure that I understand the details properly. From my
reading of
HDFS-744 the hsync API would allow a client to make sure that at any
point in time it's writes so far hit the disk. For example, for
HBase it could apply a fsync after adding some edits to its WAL to
ensure those edits are fully durable for a file which is still open.

However, in this case the dfs file was closed and even renamed. Is
it the case that even after a dfs file is closed and renamed that
the data blocks would still not be synced and would still be stored
by the datanode in blocksBeingWritten rather than in current?
If that is case, would it be better for the NameNode not to reject
replicas that are in blocksBeingWritten, especially if it doesn't
have any other replicas available?

Dave

On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas
sur...@hortonworks.comwrote:

Yes this is a known issue.

The HDFS part of this was addressed in
https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and
is not available in 1.x release. I think HBase does not use this
API yet.

On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net
wrote:

We're running HBase over HDFS 1.0.2 on about 1000 nodes. On
Saturday
the
data center we were in had a total power failure and the cluster
went
down
hard. When we brought it back up, HDFS reported 4 files as CORRUPT.
We
recovered the data in question

What's Yarn?

2013-07-02 Thread Azuryy Yu

Hi Dear all,

I just fount it occasionally, maybe all you know that, but I just show here
again.

Yet Another Resource Negotiator—YARN

from:
http://adtmag.com/blogs/watersworks/2012/08/apache-yarn-promotion.aspx

Re: reply: a question about dfs.replication

2013-07-01 Thread Azuryy Yu

It's not HDFS issue.

dfs.replication is a client side configuration, not server side. so you
need to set it to '2' on your client side( your application running on).
THEN execute command such as : hdfs dfs -put  or call HDFS API in java
application.




On Tue, Jul 2, 2013 at 12:25 PM, Francis.Hu francis...@reachjunction.comwrote:

  Thanks all of you, I just get the problem fixed through the command: 

 hdfs dfs -setrep -R -w 2 /

 ** **

 Is that an issue of HDFS ? Why do i need to execute manually a command to
 tell the hadoop the replication factor even it is set in hdfs-site.xml ?**
 **

 ** **

 Thanks,

 Francis.Hu

 ** **

 *发件人:* Francis.Hu [mailto:francis...@reachjunction.com]
 *发送时间:* Tuesday, July 02, 2013 11:30
 *收件人:* user@hadoop.apache.org
 *主题:* 答复: 答复: a question about dfs.replication

 ** **

 Yes , it returns 2 correctly after hdfs getconf -confkey dfs.replication
 

 ** **

 

 ** **

 but in web page ,it is 3 as below:

 

 ** **

 *发件人:* yypvsxf19870706 [mailto:yypvsxf19870...@gmail.com]
 *发送时间:* Monday, July 01, 2013 23:24
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: a question about dfs.replication

 ** **

 Hi 

 ** **

 Could you please get the property value by using : hdfs getconf
 -confkey dfs.replication.


  iPhone


 ��?2013-7-1锛?5:51锛�Francis.Hu francis...@reachjunction.com ��锛?br

   

 Actually, My java client is running with the same configuration as the
 hadoop's . The dfs.replication is already set as 2 in my hadoop's
 configuration.

 So i think the dfs.replication is already overrided by my configuration in
 hdfs-site.xml. but seems it doesn't work even i overrided the parameter
 evidently.

  

  

 *���浠朵�**?span lang=EN-US:* ��屑械谢薪芯胁 ��芯��懈�� 
 [mailto:emelya...@post.km.ruemelya...@post.km.ru]

 ***�**���堕�**?span lang=EN-US:* Monday, July 01, 2013 15:18
 *��朵欢浜?span lang=EN-US:* user@hadoop.apache.org
 *涓婚**��:* Re: a question about dfs.replication

  

 On 01.07.2013 10:19, Francis.Hu wrote:

 Hi, All

  

 I am installing a cluster with Hadoop 2.0.5-alpha. I have one namenode and
 two datanodes. The dfs.replication is set as 2 in hdfs-site.xml. After
 all configuration work is done, I started all nodes. Then I saved a file
 into HDFS through java client. nOW I can access hdfs web page:
 x.x.x.x:50070,and also see the file is already listed in the hdfs list.***
 *

 My question is:  *The replication column in HDFS web page is showing as
 3, not 2.  Does anyone know What the problem is?*

 * *

 ---Actual setting of hdfs-site.xml

 property

 namedfs.replication/name

 value2/value

 /property

  

 After that, I typed dfsamdin command to check the file:

 hdfs fsck /test3/

 The result of above command:

 /test3/hello005.txt:  Under replicated
 BP-609310498-192.168.219.129-1372323727200:blk_-1069303317294683372_1006.
 Target Replicas is 3 but found 2 replica(s).

 Status: HEALTHY

  Total size:35 B

  Total dirs:1

  Total files:   1

  Total blocks (validated):  1 (avg. block size 35 B)

  Minimally replicated blocks:   1 (100.0 %)

  Over-replicated blocks:0 (0.0 %)

  Under-replicated blocks:   1 (100.0 %)

  Mis-replicated blocks: 0 (0.0 %)

  Default replication factor:2

  Average block replication: 2.0

  Corrupt blocks:0

  Missing replicas:  1 (33.32 %)

  Number of data-nodes:  3

  Number of racks:   1

 FSCK ended at Sat Jun 29 16:51:37 CST 2013 in 6 milliseconds

  

  

 Thanks,

 Francis Hu

  

 If I'm not mistaking dfs.replication parameter in config sets only
 default replication factor, which can be overrided when putting file to
 hdfs.


image001.pngimage002.png

Re: java.lang.UnsatisfiedLinkError - Unable to load libGfarmFSNative library

2013-06-26 Thread Azuryy Yu

From the log:  libGfarmFSNative.so: libgfarm.so.1: cannot open shared
object file: No such file or directory

I don't think you put libgfarm.* under
$HADOOP_HOME/lib/native/Linux-amd64-64 (Linux-i386-32 if running on 32 bits
OS) on all nodes.



On Thu, Jun 27, 2013 at 10:44 AM, Harsh J ha...@cloudera.com wrote:

 Is libgfarm.so.1 installed and available on all systems? You're facing
 a link error though hadoop did try to load the library it had (
 libGfarmFSNative.so). If the gfarm guys have a mailing list, thats
 probably the best place to ask.


 On Thu, Jun 27, 2013 at 1:06 AM, Marília Melo mariliam...@gmail.comwrote:

 Hi all,

 I'm trying to install a plugin called gfarm_hadoop that allows me to use
 a filesystem called gfarm instead of HDFS (
 https://sourceforge.net/projects/gfarm/files/gfarm_hadoop/).

 I have used it before, but now I'm trying to install it in a new cluster
 and for some reason it isn't working...

 After installing gfarm 2.5.8 at /data/local3/marilia/gfarm, hadoop 1.1.2
 at /data/local3/marilia/hadoop-1.1.2 and the plugin, when I try to list the
 new filesystem it works fine:

 $ bin/hadoop fs -ls gfarm:///
 Found 26 items
 -rwxrwxrwx   1101 2013-06-26 02:36 /foo
 drwxrwxrwx   -  0 2013-06-26 02:43 /home

 But then when I try to run an example, the task eventually completes, but
 I get  Unable to load libGfarmFSNative library errors. Looking at the
 logs message it seems to be a path problem, but I have tried almost
 everything and it doesn't work.

 The way I'm setting the path now is writing on conf/hadoop-env.sh the
 following line:

 export LD_LIBRARY_PATH=/data/local3/marilia/gfarm/lib

 I have even moved all the .so files to the hadoop directory, but I still
 get the same message...


 Any ideas?

 Thanks in advance.


 Log:

 $ bin/hadoop jar hadoop-examples-*.jar teragen 1000 gfarm:///inoa11
 Generating 1000 using 2 maps with step of 500
 13/06/27 03:57:32 INFO mapred.JobClient: Running job:
 job_201306270356_0001
 13/06/27 03:57:33 INFO mapred.JobClient:  map 0% reduce 0%
 13/06/27 03:57:38 INFO mapred.JobClient:  map 50% reduce 0%
 13/06/27 03:57:43 INFO mapred.JobClient: Task Id :
 attempt_201306270356_0001_m_01_0, Status : FAILED
 java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
 Caused by: java.io.IOException: Task process exit with nonzero status of
 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

 java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
 Caused by: java.io.IOException: Task process exit with nonzero status of
 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

 attempt_201306270356_0001_m_01_0: java.lang.UnsatisfiedLinkError:
 /data/local3/marilia/hadoop-1.1.2/lib/native/Linux-amd64-64/libGfarmFSNative.so:
 libgfarm.so.1: cannot open shared object file: No such file or directory
 attempt_201306270356_0001_m_01_0:   at
 java.lang.ClassLoader$NativeLibrary.load(Native Method)
 attempt_201306270356_0001_m_01_0:   at
 java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1803)
  attempt_201306270356_0001_m_01_0:   at
 java.lang.ClassLoader.loadLibrary(ClassLoader.java:1728)
 attempt_201306270356_0001_m_01_0:   at
 java.lang.Runtime.loadLibrary0(Runtime.java:823)
 attempt_201306270356_0001_m_01_0:   at
 java.lang.System.loadLibrary(System.java:1028)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.gfarmfs.GfarmFSNative.clinit(GfarmFSNative.java:9)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.gfarmfs.GfarmFileSystem.initialize(GfarmFileSystem.java:34)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1411)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.mapred.FileOutputCommitter.getTempTaskOutputPath(FileOutputCommitter.java:234)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.mapred.Task.initialize(Task.java:522)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
 attempt_201306270356_0001_m_01_0:   at
 org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 attempt_201306270356_0001_m_01_0:   at
 java.security.AccessController.doPrivileged(Native Method)
 attempt_201306270356_0001_m_01_0:   at
 javax.security.auth.Subject.doAs(Subject.java:396)
 attempt_201306270356_0001_m_01_0:   at

Re: 答复： Help about build cluster on boxes which already has one?

2013-06-25 Thread Azuryy Yu

there is no MN, NM is node manager.

--Send from my Sony mobile.
On Jun 26, 2013 6:31 AM, yuhe justlo...@gmail.com wrote:

 I plan to use CDH3u4,and what is MN?

 --
 使用语盒发送 @2013-06-25 22:36
 http://www.yuchs.com


 -- 原始邮件 --
 user@hadoop.apache.org @2013年06月25日 15:12
  What version of Hadoop are you planning on using? You will probably have
 to partition the resources too. e.g. If you are using 0.23 / 2.0, the NM
 available resources memory will have to be split on all the nodes
 
 
 
 
  
   From: Sandeep L sandeepvre...@outlook.com
  To: user@hadoop.apache.org user@hadoop.apache.org
  Sent: Tuesday, June 25, 2013 3:53 AM
  Subject: RE: 答复： Help about build cluster on boxes which already has
 one?
 
 
 
 
  just try by changing ports, then if you get any errors reply, I can help
 you.
 
 
 
   Date: Tue, 25 Jun 2013 10:57:06 +
   From: justlo...@gmail.com
   To: user@hadoop.apache.org
   Subject: 答复： Help about build cluster on boxes which already has one?
  
   Thanks,anything else?
  
   --
   使用语盒发送 @2013-06-25 10:57
   http://www.yuchs.com
  
  
   -- 原始邮件 --
   user@hadoop.apache.org @2013年06月25日 10:50
Port numbers should not conflict.You need to change port numbers in
 all configuration files(hbase-*.xml, hdfs-*.xml, core-*.xml, mapred-*.xml)
   
   
 Date: Tue, 25 Jun 2013 10:26:47 +
 From: justlo...@gmail.com
 To: user@hadoop.apache.org
 Subject: Help about build cluster on boxes which already has one?

 Today I get a task to build a hadoop and hbase cluster on boxes
 which already has one hadoop cluster on them (use default port ),what
 things I need to change so I can deploy the new one successfully? I am
 newbie to hadoop and hbase ,have successfully built hadoop and hbase on
 vmware ,thanks all

 --
 使用语盒发送 @2013-06-25 10:26
 http://www.yuchs.com

Re: MapReduce job not running - i think i keep all correct configuration.

2013-06-24 Thread Azuryy Yu

Can you paste some error logs here? you can find it on the JT or TT. and
tell us the hadoop version.


On Sun, Jun 23, 2013 at 9:20 PM, Pavan Kumar Polineni 
smartsunny...@gmail.com wrote:


 Hi all,

 first i have a machine with all the demons are running on it. after that i
 added two data nodes. In this case MR job working fine.

 Now i changed the first machine to just namenode by stopping all the
 demons except NN demon. and changed i data node to (SNN.JT,DN,TT) and all
 are working. i keep the other data node like that only.

 I changed the configurations to link up the NN and JT.

 From here when i tried to run MR job this is not running ..

 Please help Me. Thanks

 --
  Pavan Kumar Polineni

Re: which hadoop version i can choose in production env?

2013-06-24 Thread Azuryy Yu

I advice community version of Hadoop-1.1.2, which is a stable release,

Hadoop2 hasn't stable release currently, even if all alpha release was
extensive tested.

but from me, I think HDFS2 is stable now.(no?), MR1 is also stable, but
Yarn still need extensive tests(at least I think so),

so our prod cluster using HDFS2 and MR1 now.



On Tue, Jun 25, 2013 at 10:11 AM, ch huang justlo...@gmail.com wrote:

 hadoop 1 vs hadoop 2 ,and apache community version vs cloudrea
 version,anyone can help?

Re: Inputformat

2013-06-22 Thread Azuryy Yu

you had to write a JSONInputFormat, or google first to find it.

--Send from my Sony mobile.
On Jun 23, 2013 7:06 AM, jamal sasha jamalsha...@gmail.com wrote:

 Then how should I approach this issue?


 On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes ni...@basjes.nl wrote:

 If you try to hammer in a nail (json file) with a screwdriver (
 XMLInputReader) then perhaps the reason it won't work may be that you are
 using the wrong tool?
  On Jun 21, 2013 11:38 PM, jamal sasha jamalsha...@gmail.com wrote:

 Hi,

   I am using one of the libraries which rely on InputFormat.
 Right now, it is reading xml files spanning across mutiple lines.
 So currently the input format is like:

 public class XMLInputReader extends FileInputFormatLongWritable, Text {

   public static final String START_TAG = page;
   public static final String END_TAG = /page;

   @Override
   public RecordReaderLongWritable, Text getRecordReader(InputSplit
 split,
   JobConf conf, Reporter reporter) throws IOException {
 conf.set(XMLInputFormat.START_TAG_KEY, START_TAG);
 conf.set(XMLInputFormat.END_TAG_KEY, END_TAG);
 return new XMLRecordReader((FileSplit) split, conf);
   }
 }
 So, in above if the data is like:

 page

  soemthing \n
 somthing \n

 /page

 It process this sort of data..


 Now, i want to use the same framework but for json files but lasting
 just single line..

 So I guess my
 my START_TAG can be {

 Will my END_TAG be }\n

 it can't be } as there can be nested json in this data?

 Any clues
 Thanks

Re: How to fail the Name Node or how to crash the Name Node for testing Purpose.

2013-06-19 Thread Azuryy Yu

$HADOOP_HOME/bin/hadoop-daemon.sh stop namenode



On Wed, Jun 19, 2013 at 2:38 PM, Pavan Kumar Polineni 
smartsunny...@gmail.com wrote:

 For Testing The Name Node Crashes and failures. For Single Point of Failure

 --
  Pavan Kumar Polineni

Re: how to close hadoop when tmp files were cleared

2013-06-17 Thread Azuryy Yu

ps aux|grep java  , you can find pid, then just 'kill -9' to stop the
Hadoop process.


On Mon, Jun 17, 2013 at 4:34 PM, Harsh J ha...@cloudera.com wrote:

 Just send the processes a SIGTERM signal (regular kill). Its what the
 script does anyway. Ensure to change the PID directory before the next
 restart though.

 On Mon, Jun 17, 2013 at 1:09 PM,  zhang.hen...@zte.com.cn wrote:
 
  Hi，
 
  My hadoop cluster has been running for a period of time.  Now i want to
  close it for some system changes. But the command bin/stop-all.sh shows
  no jobtracker to stop,no tasktracker to stop,no namenode to stop
 and
  no datanode to stop. I use jps got nothing but jps itself. However,
  hadoop is indeed running.I think it may be some tmp files about hadoop
 had
  been cleared by operation system. Could someone tell me how to stop
 hadoop
  in case of no data files breaks ?
  Any guidance would be greatly appreciated. Thanks!
 
  Jeff
 
 
  
  ZTE Information Security Notice: The information contained in this mail
 (and
  any attachment transmitted herewith) is privileged and confidential and
 is
  intended for the exclusive use of the addressee(s).  If you are not an
  intended recipient, any disclosure, reproduction, distribution or other
  dissemination or use of the information contained is strictly prohibited.
  If you have received this mail in error, please delete it and notify us
  immediately.
 
 



 --
 Harsh J

Re: Error in command: bin/hadoop fs -put conf input

2013-06-15 Thread Azuryy Yu

from the log, there is no room on the HDFS.

--Send from my Sony mobile.
On Jun 16, 2013 5:12 AM, sumit piparsania sumitpiparsa...@yahoo.com
wrote:

 Hi,

 I am getting the below error while executing the command. Kindly assist me
 in resolving this issue.


 $ bin/hadoop fs -put conf input
 bin/hadoop: line 320: C:\Program: command not found
 13/06/16 02:29:13 WARN conf.Configuration: DEPRECATED: hadoop-site.xml
 found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use
 core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
 core-default.xml, mapred-default.xml and hdfs-default.xml respectively
 13/06/16 02:29:18 WARN hdfs.DFSClient: DataStreamer Exception:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/Sumit/input/capacity-scheduler.xml could only be replicated to 0
 nodes, instead of 1
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
  at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
  at org.apache.hadoop.ipc.Client.call(Client.java:1107)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
  at $Proxy1.addBlock(Unknown Source)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
  at $Proxy1.addBlock(Unknown Source)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3686)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3546)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2749)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2989)
 13/06/16 02:29:18 WARN hdfs.DFSClient: Error Recovery for block null bad
 datanode[0] nodes == null
 13/06/16 02:29:18 WARN hdfs.DFSClient: Could not get block locations.
 Source file /user/Sumit/input/capacity-scheduler.xml - Aborting...
 put: java.io.IOException: File /user/Sumit/input/capacity-scheduler.xml
 could only be replicated to 0 nodes, instead of 1
 13/06/16 02:29:18 ERROR hdfs.DFSClient: Failed to close file
 /user/Sumit/input/capacity-scheduler.xml
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /user/Sumit/input/capacity-scheduler.xml could only be replicated to 0
 nodes, instead of 1
  at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1639)
  at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:736)
  at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
  at org.apache.hadoop.ipc.Client.call(Client.java:1107)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
  at $Proxy1.addBlock(Unknown Source)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
  at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
  at $Proxy1.addBlock(Unknown Source)
  at

Re: read lucene index in mapper

2013-06-11 Thread Azuryy Yu

you need to add lucene index tar.gz in the distributed cache as archive,
then create index reader in the mapper's setup.

--Send from my Sony mobile.
On Jun 12, 2013 12:50 AM, parnab kumar parnab.2...@gmail.com wrote:

 Hi ,

 I need to read an existing lucene index in a map.can someone point
 me to the right direction.

 Thanks,
 Parnab

Re: hadoop 2.0 client configuration

2013-06-10 Thread Azuryy Yu

if you want work with HA, yes , all these configuration needed.

--Send from my Sony mobile.
On Jun 11, 2013 8:05 AM, Praveen M lefthandma...@gmail.com wrote:

 Hello,

 I'm a hadoop n00b, and I had recently upgraded from hadoop 0.20.2 to
 hadoop 2 (chd-4.2.1)

 For a client configuration to connect to the hadoop cluster, in the
 earlier 0.20.2 case,

 I had to specify only the,
 fs.default.name and mapred.job.tracker config parameters.

 But now, with the HA configuration of hadoop 2. I have a lot of
 parameters in my client configuration.

 fs.defaultFS
 dfs.nameservices
 dfs.ha.namenodes.nameservice-id
 dfs.namenode.rpc-address.nameservice-id.namenode-1
 dfs.namenode.rpc-address.nameservice-id.namenode-2
 dfs.namenode.shared.edits.dir
 ha.zookeeper.quorum

 My Question is, do I really need all these configurations in the client?

 I'm looking for minimalist client configuration that would let me work
 with the hadoop cluster in HA mode.

 Thanks in advance.

 Thanks you,
 Praveen

Re: HadoopV2 and HDFS-fuse

2013-06-09 Thread Azuryy Yu

hi Harsh,

yes,
I‘ve build native -Pnative successfully.

I also used -Drequire.fuse=true. but I just found contrib/fuse directory is
empty. so I asked this question, Thanks Harsh.

--Send from my Sony mobile.
On Jun 9, 2013 9:09 PM, Harsh J ha...@cloudera.com wrote:

 Hi Azuryy,

 Are you not finding it compiled with the global native compile option?
 Do you face a specific error?

 Per the pom.xml of hadoop-hdfs, it will build fuse-dfs if native
 profile is turned on, and you can assert for fuse-requirement with
 -Drequire.fuse=true.

 On Sun, Jun 9, 2013 at 11:03 AM, Azuryy Yu azury...@gmail.com wrote:
  hi,
 
  Does anybody can tell me how to compile hdfs-fuse based on Hadoop-2.0-*,
  Thanks.



 --
 Harsh J

HadoopV2 and HDFS-fuse

2013-06-08 Thread Azuryy Yu

hi,

Does anybody can tell me how to compile hdfs-fuse based on Hadoop-2.0-*,
Thanks.

Re: how to locate the replicas of a file in HDFS?

2013-06-04 Thread Azuryy Yu

ClientProtocol namenode = DFSClient.createNamenode(conf);
HdfsFileStatus hfs = namenode.getFileInfo(your_hdfs_file_name);
LocatedBlocks lbs = namenode.getBlockLocations(your_hdfs_file_name, 0,
hfs.getLen());

for (LocatedBlock lb : lbs.getLocatedBlocks()) {
  DatanodeInfo[]  info = lb.getLocations() ;
  //you can get data node name or address here.
}


On Tue, Jun 4, 2013 at 2:02 PM, Mahmood Naderan nt_mahm...@yahoo.comwrote:

 hadoop fsck mytext.txt -files -locations -blocks


 I expect something like a tag which is attached to each block (say block
 X) that shows the position of the replicated block of X. The method you
 mentioned is a user level task. Am I right?

 Regards,
 Mahmood*
 *

   --
  *From:* Rahul Bhattacharjee rahul.rec@gmail.com
 *To:* user@hadoop.apache.org user@hadoop.apache.org; 一凡 李 
 zhuazhua_...@yahoo.com.cn
 *Sent:* Tuesday, June 4, 2013 9:34 AM
 *Subject:* Re: how to locate the replicas of a file in HDFS?

 hadoop fsck mytext.txt -files -locations -blocks

 Thanks,
 Rahul



 On Tue, Jun 4, 2013 at 10:19 AM, 一凡 李 zhuazhua_...@yahoo.com.cn wrote:

 Hi,

 Could you tell me how to locate where store each replica of a file in HDFS?

 Correctly speaking, if I create a file in HDFS(replicate factor:3),how to
 find the DataNodes which store its each block and replicas?


 Best Wishes,
 Yifan

Re:

2013-06-03 Thread Azuryy Yu

can you upgrade to 1.1.2, which is also a stable release, and fixed the bug
you facing now.

--Send from my Sony mobile.
On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Thanks Harsh for the reply. I was confused too that why security is
 causing this.

 Regards,
 Shahab


 On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote:

 Shahab - I see he has mentioned generally that security is enabled
 (but not that it happens iff security is enabled), and the issue here
 doesn't have anything to do with security really.

 Azurry - Lets discuss the code issues on the JIRA (instead of here) or
 on the mapreduce-dev lists.

 On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com
 wrote:
  HI Harsh,
 
  Quick question though: why do you think it only happens if the OP 'uses
  security' as he mentioned?
 
  Regards,
  Shahab
 
 
  On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote:
 
  Does smell like a bug as that number you get is simply Long.MAX_VALUE,
  or 8 exbibytes.
 
  Looking at the sources, this turns out to be a rather funny Java issue
  (there's a divide by zero happening and [1] suggests Long.MAX_VALUE
  return in such a case). I've logged a bug report for this at
  https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a
  reproducible case.
 
  Does this happen consistently for you?
 
  [1]
 
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double)
 
  On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de
  wrote:
   Hi all,
  
   I stumbled upon this problem as well while trying to run the default
   wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2
 virtual
   machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One
 node is
   used as JT+NN, the other as TT+DN. Security is enabled. The input
 file is
   about 600 kB and the error is
  
   2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress:
 No
   room for map task. Node 10.156.120.49 has 22854692864 bytes free;
 but we
   expect map to take 9223372036854775807
  
   The logfile is attached, together with the configuration files. The
   version I'm using is
  
   Hadoop 1.2.0
   Subversion
   https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2-r
   1479473
   Compiled by hortonfo on Mon May  6 06:59:37 UTC 2013
   From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
   This command was run using
   /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar
  
   If I run the default configuration (i.e. no securty), then the job
   succeeds.
  
   Is there something missing in how I set up my nodes? How is it
 possible
   that the envisaged value for the needed space is so big?
  
   Thanks in advance.
  
   Matteo
  
  
  
  Which version of Hadoop are you using. A quick search shows me a bug
  https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show
  similar symptoms. However, that was fixed a long while ago.
  
  
  On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui 
  reduno1...@googlemail.com wrote:
  
   This the content of the jobtracker log file :
   2013-03-23 12:06:48,912 INFO
 org.apache.hadoop.mapred.JobInProgress:
   Input
   size for job job_201303231139_0001 = 6950001. Number of splits = 7
   2013-03-23 12:06:48,925 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_00 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,927 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_01 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,930 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_02 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,931 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_03 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,933 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_04 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,934 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_05 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,939 INFO
 org.apache.hadoop.mapred.JobInProgress:
   tip:task_201303231139_0001_m_06 has split on
   node:/default-rack/hadoop0.novalocal
   2013-03-23 12:06:48,950 INFO
 org.apache.hadoop.mapred.JobInProgress:
   job_201303231139_0001 LOCALITY_WAIT_FACTOR=0.5
   2013-03-23 12:06:48,978 INFO
 org.apache.hadoop.mapred.JobInProgress:
   Job
   job_201303231139_0001 initialized successfully with 7 map tasks
 and 1
   reduce tasks.
   2013-03-23 12:06:50,855 INFO org.apache.hadoop.mapred.JobTracker:
   Adding
   task (JOB_SETUP) 'attempt_201303231139_0001_m_08_0' to tip
   task_201303231139_0001_m_08, for tracker

Re:

2013-06-03 Thread Azuryy Yu

yes. hadoop-1.1.2 was released on Jan. 31st. just download it.


On Tue, Jun 4, 2013 at 6:33 AM, Lanati, Matteo matteo.lan...@lrz.de wrote:

 Hi Azuryy,

 thanks for the update. Sorry for the silly question, but where can I
 download the patched version?
 If I look into the closest mirror (i.e.
 http://mirror.netcologne.de/apache.org/hadoop/common/), I can see that
 the Hadoop 1.1.2 version was last updated on Jan. 31st.
 Thanks in advance,

 Matteo

 PS: just to confirm that I tried a minimal Hadoop 1.2.0 setup, so without
 any security, and the problem is there.

 On Jun 3, 2013, at 3:02 PM, Azuryy Yu azury...@gmail.com wrote:

  can you upgrade to 1.1.2, which is also a stable release, and fixed the
 bug you facing now.
 
  --Send from my Sony mobile.
 
  On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com wrote:
  Thanks Harsh for the reply. I was confused too that why security is
 causing this.
 
  Regards,
  Shahab
 
 
  On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote:
  Shahab - I see he has mentioned generally that security is enabled
  (but not that it happens iff security is enabled), and the issue here
  doesn't have anything to do with security really.
 
  Azurry - Lets discuss the code issues on the JIRA (instead of here) or
  on the mapreduce-dev lists.
 
  On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com
 wrote:
   HI Harsh,
  
   Quick question though: why do you think it only happens if the OP 'uses
   security' as he mentioned?
  
   Regards,
   Shahab
  
  
   On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote:
  
   Does smell like a bug as that number you get is simply Long.MAX_VALUE,
   or 8 exbibytes.
  
   Looking at the sources, this turns out to be a rather funny Java issue
   (there's a divide by zero happening and [1] suggests Long.MAX_VALUE
   return in such a case). I've logged a bug report for this at
   https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a
   reproducible case.
  
   Does this happen consistently for you?
  
   [1]
  
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double)
  
   On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de
   wrote:
Hi all,
   
I stumbled upon this problem as well while trying to run the default
wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2
 virtual
machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One
 node is
used as JT+NN, the other as TT+DN. Security is enabled. The input
 file is
about 600 kB and the error is
   
2013-06-01 12:22:51,999 WARN
 org.apache.hadoop.mapred.JobInProgress: No
room for map task. Node 10.156.120.49 has 22854692864 bytes free;
 but we
expect map to take 9223372036854775807
   
The logfile is attached, together with the configuration files. The
version I'm using is
   
Hadoop 1.2.0
Subversion
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2-r
1479473
Compiled by hortonfo on Mon May  6 06:59:37 UTC 2013
From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
This command was run using
/home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar
   
If I run the default configuration (i.e. no securty), then the job
succeeds.
   
Is there something missing in how I set up my nodes? How is it
 possible
that the envisaged value for the needed space is so big?
   
Thanks in advance.
   
Matteo
   
   
   
   Which version of Hadoop are you using. A quick search shows me a bug
   https://issues.apache.org/jira/browse/HADOOP-5241 that seems to
 show
   similar symptoms. However, that was fixed a long while ago.
   
   
   On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui 
   reduno1...@googlemail.com wrote:
   
This the content of the jobtracker log file :
2013-03-23 12:06:48,912 INFO
 org.apache.hadoop.mapred.JobInProgress:
Input
size for job job_201303231139_0001 = 6950001. Number of splits = 7
2013-03-23 12:06:48,925 INFO
 org.apache.hadoop.mapred.JobInProgress:
tip:task_201303231139_0001_m_00 has split on
node:/default-rack/hadoop0.novalocal
2013-03-23 12:06:48,927 INFO
 org.apache.hadoop.mapred.JobInProgress:
tip:task_201303231139_0001_m_01 has split on
node:/default-rack/hadoop0.novalocal
2013-03-23 12:06:48,930 INFO
 org.apache.hadoop.mapred.JobInProgress:
tip:task_201303231139_0001_m_02 has split on
node:/default-rack/hadoop0.novalocal
2013-03-23 12:06:48,931 INFO
 org.apache.hadoop.mapred.JobInProgress:
tip:task_201303231139_0001_m_03 has split on
node:/default-rack/hadoop0.novalocal
2013-03-23 12:06:48,933 INFO
 org.apache.hadoop.mapred.JobInProgress:
tip:task_201303231139_0001_m_04 has split on
node:/default-rack/hadoop0.novalocal
2013-03-23 12:06:48,934 INFO
 org.apache.hadoop.mapred.JobInProgress:
tip:task_201303231139_0001_m_05 has

Re:

2013-06-03 Thread Azuryy Yu

Hi Harsh,

I need to take care my eyes recently, I mis-read 1.2.0 to 1.0.2, so I said
upgrade. Sorry.


On Tue, Jun 4, 2013 at 9:46 AM, Harsh J ha...@cloudera.com wrote:

 Azuryy,

 1.1.2  1.2.0. Its not an upgrade you're suggesting there. If you feel
 there's been a regression, can you comment that on the JIRA?

 On Tue, Jun 4, 2013 at 6:57 AM, Azuryy Yu azury...@gmail.com wrote:
  yes. hadoop-1.1.2 was released on Jan. 31st. just download it.
 
 
  On Tue, Jun 4, 2013 at 6:33 AM, Lanati, Matteo matteo.lan...@lrz.de
 wrote:
 
  Hi Azuryy,
 
  thanks for the update. Sorry for the silly question, but where can I
  download the patched version?
  If I look into the closest mirror (i.e.
  http://mirror.netcologne.de/apache.org/hadoop/common/), I can see that
 the
  Hadoop 1.1.2 version was last updated on Jan. 31st.
  Thanks in advance,
 
  Matteo
 
  PS: just to confirm that I tried a minimal Hadoop 1.2.0 setup, so
 without
  any security, and the problem is there.
 
  On Jun 3, 2013, at 3:02 PM, Azuryy Yu azury...@gmail.com wrote:
 
   can you upgrade to 1.1.2, which is also a stable release, and fixed
 the
   bug you facing now.
  
   --Send from my Sony mobile.
  
   On Jun 2, 2013 3:23 AM, Shahab Yunus shahab.yu...@gmail.com
 wrote:
   Thanks Harsh for the reply. I was confused too that why security is
   causing this.
  
   Regards,
   Shahab
  
  
   On Sat, Jun 1, 2013 at 12:43 PM, Harsh J ha...@cloudera.com wrote:
   Shahab - I see he has mentioned generally that security is enabled
   (but not that it happens iff security is enabled), and the issue here
   doesn't have anything to do with security really.
  
   Azurry - Lets discuss the code issues on the JIRA (instead of here) or
   on the mapreduce-dev lists.
  
   On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com
 
   wrote:
HI Harsh,
   
Quick question though: why do you think it only happens if the OP
'uses
security' as he mentioned?
   
Regards,
Shahab
   
   
On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com
 wrote:
   
Does smell like a bug as that number you get is simply
Long.MAX_VALUE,
or 8 exbibytes.
   
Looking at the sources, this turns out to be a rather funny Java
issue
(there's a divide by zero happening and [1] suggests Long.MAX_VALUE
return in such a case). I've logged a bug report for this at
https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a
reproducible case.
   
Does this happen consistently for you?
   
[1]
   
   
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double)
   
On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo 
 matteo.lan...@lrz.de
wrote:
 Hi all,

 I stumbled upon this problem as well while trying to run the
 default
 wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2
 virtual
 machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One
 node is
 used as JT+NN, the other as TT+DN. Security is enabled. The input
 file is
 about 600 kB and the error is

 2013-06-01 12:22:51,999 WARN
 org.apache.hadoop.mapred.JobInProgress: No
 room for map task. Node 10.156.120.49 has 22854692864 bytes free;
 but we
 expect map to take 9223372036854775807

 The logfile is attached, together with the configuration files.
 The
 version I'm using is

 Hadoop 1.2.0
 Subversion

 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2
 -r
 1479473
 Compiled by hortonfo on Mon May  6 06:59:37 UTC 2013
 From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
 This command was run using
 /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar

 If I run the default configuration (i.e. no securty), then the
 job
 succeeds.

 Is there something missing in how I set up my nodes? How is it
 possible
 that the envisaged value for the needed space is so big?

 Thanks in advance.

 Matteo



Which version of Hadoop are you using. A quick search shows me a
 bug
https://issues.apache.org/jira/browse/HADOOP-5241 that seems to
 show
similar symptoms. However, that was fixed a long while ago.


On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui 
reduno1...@googlemail.com wrote:

 This the content of the jobtracker log file :
 2013-03-23 12:06:48,912 INFO
 org.apache.hadoop.mapred.JobInProgress:
 Input
 size for job job_201303231139_0001 = 6950001. Number of splits
 =
 7
 2013-03-23 12:06:48,925 INFO
 org.apache.hadoop.mapred.JobInProgress:
 tip:task_201303231139_0001_m_00 has split on
 node:/default-rack/hadoop0.novalocal
 2013-03-23 12:06:48,927 INFO
 org.apache.hadoop.mapred.JobInProgress:
 tip:task_201303231139_0001_m_01 has split on
 node:/default-rack/hadoop0.novalocal
 2013-03-23 12

Re:

2013-06-01 Thread Azuryy Yu

This should be fixed in hadoop-1.1.2 stable release.
if we determine completedMapsInputSize is zero, then job's map tasks MUST
be zero, so the estimated output size is zero.
below is the code:

  long getEstimatedMapOutputSize() {
long estimate = 0L;
if (job.desiredMaps()  0) {
  estimate = getEstimatedTotalMapOutputSize()  / job.desiredMaps();
}
return estimate;
  }



On Sat, Jun 1, 2013 at 11:49 PM, Harsh J ha...@cloudera.com wrote:

 Does smell like a bug as that number you get is simply Long.MAX_VALUE,
 or 8 exbibytes.

 Looking at the sources, this turns out to be a rather funny Java issue
 (there's a divide by zero happening and [1] suggests Long.MAX_VALUE
 return in such a case). I've logged a bug report for this at
 https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a
 reproducible case.

 Does this happen consistently for you?

 [1]
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double)

 On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de
 wrote:
  Hi all,
 
  I stumbled upon this problem as well while trying to run the default
 wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual
 machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is
 used as JT+NN, the other as TT+DN. Security is enabled. The input file is
 about 600 kB and the error is
 
  2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No
 room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we
 expect map to take 9223372036854775807
 
  The logfile is attached, together with the configuration files. The
 version I'm using is
 
  Hadoop 1.2.0
  Subversion
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
 1479473
  Compiled by hortonfo on Mon May  6 06:59:37 UTC 2013
  From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
  This command was run using
 /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar
 
  If I run the default configuration (i.e. no securty), then the job
 succeeds.
 
  Is there something missing in how I set up my nodes? How is it possible
 that the envisaged value for the needed space is so big?
 
  Thanks in advance.
 
  Matteo
 
 
 
 Which version of Hadoop are you using. A quick search shows me a bug
 https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show
 similar symptoms. However, that was fixed a long while ago.
 
 
 On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui 
 reduno1...@googlemail.com wrote:
 
  This the content of the jobtracker log file :
  2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress:
 Input
  size for job job_201303231139_0001 = 6950001. Number of splits = 7
  2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_00 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_01 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,930 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_02 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,931 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_03 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,933 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_04 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,934 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_05 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,939 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_06 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,950 INFO org.apache.hadoop.mapred.JobInProgress:
  job_201303231139_0001 LOCALITY_WAIT_FACTOR=0.5
  2013-03-23 12:06:48,978 INFO org.apache.hadoop.mapred.JobInProgress:
 Job
  job_201303231139_0001 initialized successfully with 7 map tasks and 1
  reduce tasks.
  2013-03-23 12:06:50,855 INFO org.apache.hadoop.mapred.JobTracker:
 Adding
  task (JOB_SETUP) 'attempt_201303231139_0001_m_08_0' to tip
  task_201303231139_0001_m_08, for tracker
  'tracker_hadoop0.novalocal:hadoop0.novalocal/127.0.0.1:44879'
  2013-03-23 12:08:00,340 INFO org.apache.hadoop.mapred.JobInProgress:
 Task
  'attempt_201303231139_0001_m_08_0' has completed
  task_201303231139_0001_m_08 successfully.
  2013-03-23 12:08:00,538 WARN org.apache.hadoop.mapred.JobInProgress: No
  room for map task. Node hadoop0.novalocal has 8791543808 bytes free;
 but we
  expect map to take 1317624576693539401
  2013-03-23 12:08:00,543 WARN org.apache.hadoop.mapred.JobInProgress: No
  room for map task. Node hadoop0.novalocal has 8791543808 bytes free;
 but we
  expect map to take 1317624576693539401
  2013-03-23 12:08:00,544 WARN

Re:

2013-06-01 Thread Azuryy Yu

just add more, continue the above thread:

  protected synchronized long getEstimatedTotalMapOutputSize()  {
if(completedMapsUpdates  threshholdToUse) {
  return 0;
} else {
  long inputSize = job.getInputLength() + job.desiredMaps();
  //add desiredMaps() so that randomwriter case doesn't blow up
  //the multiplication might lead to overflow, casting it with
  //double prevents it
  long estimate = Math.round(((double)inputSize *
  completedMapsOutputSize * 2.0)/completedMapsInputSize);
  if (LOG.isDebugEnabled()) {
LOG.debug(estimate total map output will be  + estimate);
  }
  return estimate;
}
  }


On Sun, Jun 2, 2013 at 12:34 AM, Azuryy Yu azury...@gmail.com wrote:

 This should be fixed in hadoop-1.1.2 stable release.
 if we determine completedMapsInputSize is zero, then job's map tasks MUST
 be zero, so the estimated output size is zero.
 below is the code:

   long getEstimatedMapOutputSize() {
 long estimate = 0L;
 if (job.desiredMaps()  0) {
   estimate = getEstimatedTotalMapOutputSize()  / job.desiredMaps();
 }
 return estimate;
   }



 On Sat, Jun 1, 2013 at 11:49 PM, Harsh J ha...@cloudera.com wrote:

 Does smell like a bug as that number you get is simply Long.MAX_VALUE,
 or 8 exbibytes.

 Looking at the sources, this turns out to be a rather funny Java issue
 (there's a divide by zero happening and [1] suggests Long.MAX_VALUE
 return in such a case). I've logged a bug report for this at
 https://issues.apache.org/jira/browse/MAPREDUCE-5288 with a
 reproducible case.

 Does this happen consistently for you?

 [1]
 http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html#round(double)

 On Sat, Jun 1, 2013 at 7:27 PM, Lanati, Matteo matteo.lan...@lrz.de
 wrote:
  Hi all,
 
  I stumbled upon this problem as well while trying to run the default
 wordcount shipped with Hadoop 1.2.0. My testbed is made up of 2 virtual
 machines: Debian 7, Oracle Java 7, 2 GB RAM, 25 GB hard disk. One node is
 used as JT+NN, the other as TT+DN. Security is enabled. The input file is
 about 600 kB and the error is
 
  2013-06-01 12:22:51,999 WARN org.apache.hadoop.mapred.JobInProgress: No
 room for map task. Node 10.156.120.49 has 22854692864 bytes free; but we
 expect map to take 9223372036854775807
 
  The logfile is attached, together with the configuration files. The
 version I'm using is
 
  Hadoop 1.2.0
  Subversion
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r
 1479473
  Compiled by hortonfo on Mon May  6 06:59:37 UTC 2013
  From source with checksum 2e0dac51ede113c1f2ca8e7d82fb3405
  This command was run using
 /home/lu95jib/hadoop-exmpl/hadoop-1.2.0/hadoop-core-1.2.0.jar
 
  If I run the default configuration (i.e. no securty), then the job
 succeeds.
 
  Is there something missing in how I set up my nodes? How is it possible
 that the envisaged value for the needed space is so big?
 
  Thanks in advance.
 
  Matteo
 
 
 
 Which version of Hadoop are you using. A quick search shows me a bug
 https://issues.apache.org/jira/browse/HADOOP-5241 that seems to show
 similar symptoms. However, that was fixed a long while ago.
 
 
 On Sat, Mar 23, 2013 at 4:40 PM, Redwane belmaati cherkaoui 
 reduno1...@googlemail.com wrote:
 
  This the content of the jobtracker log file :
  2013-03-23 12:06:48,912 INFO org.apache.hadoop.mapred.JobInProgress:
 Input
  size for job job_201303231139_0001 = 6950001. Number of splits = 7
  2013-03-23 12:06:48,925 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_00 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,927 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_01 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,930 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_02 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,931 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_03 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,933 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_04 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,934 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_05 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,939 INFO org.apache.hadoop.mapred.JobInProgress:
  tip:task_201303231139_0001_m_06 has split on
  node:/default-rack/hadoop0.novalocal
  2013-03-23 12:06:48,950 INFO org.apache.hadoop.mapred.JobInProgress:
  job_201303231139_0001 LOCALITY_WAIT_FACTOR=0.5
  2013-03-23 12:06:48,978 INFO org.apache.hadoop.mapred.JobInProgress:
 Job
  job_201303231139_0001 initialized successfully with 7 map tasks and 1
  reduce tasks.
  2013-03-23 12:06:50,855 INFO org.apache.hadoop.mapred.JobTracker:
 Adding
  task (JOB_SETUP

Re: Hint on EOFException's on datanodes

2013-05-24 Thread Azuryy Yu

maybe network issue, datanode received an incomplete packet.

--Send from my Sony mobile.
On May 24, 2013 1:39 PM, Stephen Boesch java...@gmail.com wrote:


 On a smallish (10 node) cluster with only 2 mappers per node after a few
 minutes EOFExceptions are cropping up on the datanodes: an example is shown
 below.

 Any hint on what to tweak/change in hadoop / cluster settings to make this
 more happy?


 2013-05-24 05:03:57,460 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode
 (org.apache.hadoop.hdfs.server.datanode.DataXceiver@1b1accfc): writeBlock
 blk_7760450154173670997_48372 received exception java.io.EOFException:
 while trying to read 65557 bytes
 2013-05-24 05:03:57,262 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder 0 for
 Block blk_-3990749197748165818_48331): PacketResponder 0 for block
 blk_-3990749197748165818_48331 terminating
 2013-05-24 05:03:57,460 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode
 (org.apache.hadoop.hdfs.server.datanode.DataXceiver@1b1accfc):
 DatanodeRegistration(10.254.40.79:9200,
 storageID=DS-1106090267-10.254.40.79-9200-1369343833886, infoPort=9102,
 ipcPort=9201):DataXceiver
 java.io.EOFException: while trying to read 65557 bytes
 at
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
 at
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:406)
 at
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
 at java.lang.Thread.run(Thread.java:662)
 2013-05-24 05:03:57,261 INFO org.apache.hadoop.hdfs.server.datanode.Dat

Re: Keep Kerberos credentials valid after logging out

2013-05-21 Thread Azuryy Yu

nohup  ./your_bash  1temp.log 21 

--Send from my Sony mobile.
On May 21, 2013 6:32 PM, zheyi rong zheyi.r...@gmail.com wrote:

 Hi all,

 I would like to run my hadoop job in a bash file for several times, e.g.
 #!/usr/bin/env bash
 for i in {1..10}
 do
 my-hadoop-job
 done

 Since I don't want to keep my laptop on for hours, I run this bash script
 on a server
 via a SSH session.

 However, the bash script always terminated after my logging out of that
 server by
 'ctrl-z, bg, disown, exit'.

 Using GNU 'screen' detaching and reattaching, I can see the following
 exceptions:

 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS
 initiate failed [Caused by GSSException: No valid credentials provided
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:554)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
 at
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)
 at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:601)
 at
 org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:212)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1292)
 at org.apache.hadoop.ipc.Client.call(Client.java:1121)
 ... 30 more
 Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused
 by GSSException: No valid credentials provided (Mechanism level: Failed to
 find any Kerberos tgt)]
 at
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194)
 at
 org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:134)
 at
 org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:415)
 at
 org.apache.hadoop.ipc.Client$Connection.access$1100(Client.java:212)
 at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:594)
 at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:591)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
 at
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:590)
 ... 33 more
 Caused by: GSSException: No valid credentials provided (Mechanism level:
 Failed to find any Kerberos tgt)
 at
 sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:130)
 at
 sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:106)
 at
 sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:172)
 at
 sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:209)
 at
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:195)
 at
 sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:162)
 at
 com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:175)
 ... 42 more

 The cluster is deployed with cdh3.

 so how can I keep my script running after logging out ?

 Thank you in advance.

 Regards,
 Zheyi Rong

Re: Writing intermediate key,value pairs to file and read it again

2013-04-20 Thread Azuryy Yu

you would look at chain reducer java doc, which meet your requirement.

--Send from my Sony mobile.
On Apr 20, 2013 11:43 PM, Vikas Jadhav vikascjadha...@gmail.com wrote:

 Hello,
 Can anyone help me in following issue
 Writing intermediate key,value pairs to file and read it again

 let us say i have to write each intermediate pair received @reducer to a
 file and again read that as key value pair again and use it for processing

 I found IFile.java file which has reader and writer but i am not able
 understand how to use it for example. I dont understand Counter value as
 last parameter spilledRecordsCounter


 Thanks.


 --
 *
 *
 *

   Regards,*
 *   Vikas *

when to hadoop-2.0 stable release

2013-04-19 Thread Azuryy Yu

I don't think this is easy to answer.

maybe it's not decided. if so, can you tell me what important features are
still deveoping or any other reasons?

Appreciate.

Re: Reading and Writing Sequencefile using Hadoop 2.0 Apis

2013-04-17 Thread Azuryy Yu

you can use if even if it's depracated.

I can find in
the org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.java,

  @Override
  public void initialize(InputSplit split,
 TaskAttemptContext context
 ) throws IOException, InterruptedException {
FileSplit fileSplit = (FileSplit) split;
conf = context.getConfiguration();
Path path = fileSplit.getPath();
FileSystem fs = path.getFileSystem(conf);
this.in = new SequenceFile.Reader(fs, path, conf);
this.end = fileSplit.getStart() + fileSplit.getLength();

if (fileSplit.getStart()  in.getPosition()) {
  in.sync(fileSplit.getStart());  // sync to start
}

this.start = in.getPosition();
more = start  end;
  }


On Thu, Apr 18, 2013 at 6:44 AM, sumit ghosh sumi...@yahoo.com wrote:

**
   I am looking for an example which is using the new Hadoop 2.0 API to
 read and write Sequence Files. Effectively I need to know how to use these
 functions:

 createWriter(Configuration conf, 
 org.apache.hadoop.io.SequenceFile.Writer.Option... opts)

 The Old definition is not working for me:

 SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());


 Similarly I need to know what will be the code for reading the Sequence
 file, as the follwoing is deprecated:

 SequenceFile.Reader(fs, path, conf);

 Thanks,
 Sumit

Re: Physically moving HDFS cluster to new

2013-04-17 Thread Azuryy Yu

Data nodes name or IP  changed cannot cause your data loss. only kept
fsimage(under the namenode.data.dir) and all block data on the data nodes,
then everything can be recoveryed when your start the cluster.


On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown tombrow...@gmail.com wrote:

 We have a situation where we want to physically move our small (4 node)
 cluster from one data center to another. As part of this move, each node
 will receive both a new FQN and a new IP address. As I understand it, HDFS
 is somehow tied to the the FQN or IP address, and changing them causes data
 loss.

 Is there any supported method of moving a cluster this way?

 Thanks in advance!

 --Tom

Re: jobtracker not starting - access control exception - folder not owned by me (it claims)

I supposed you start-mapred by user mapred.

then hadoop fs -chown -R mpared:mapred /home/jbu/hadoop_local_install/
hadoop-1.0.4/tmp/mapred/system

this is caused by fairscheduler, please reach
MAPREDUCE-4398https://issues.apache.org/jira/browse/MAPREDUCE-4398


On Mon, Apr 15, 2013 at 6:43 PM, Julian Bui julian...@gmail.com wrote:

 Hello hadoop users,

 I can't start my jobtracker and am getting an
 org.apache.hadoop.security.AccessControlException saying that
 my 
 hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system
 is not owned by jbu (me, my user).  However, I check the folder and it is
 indeed owned by me.  Details follow.

 $ cd /home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/
 $ ls -al
 drwxrwxr-x 6 jbu jbu 4096 Apr 15 03:30 local
 drwxrwxr-x 2 jbu jbu 4096 Apr 15 03:33 system

 Looking inside ./hadoop-jbu-jobtracker-jbu-laptop.log:

 org.apache.hadoop.security.AccessControlException: The systemdir
 hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system
 is not owned by jbu
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2379)
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2192)
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2186)
 at
 org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
 at
 org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
 at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)
 2013-04-15 03:34:13,697 FATAL org.apache.hadoop.mapred.JobTracker:
 org.apache.hadoop.security.AccessControlException: The systemdir
 hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system
 is not owned by jbu
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2379)
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2192)
 at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:2186)
 at
 org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:300)
 at
 org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:291)
 at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4978)

 So it's still having problems thinking that directory is not owned by me.

 The log also said:

 2013-04-15 03:34:13,695 WARN org.apache.hadoop.mapred.JobTracker: Failed
 to operate on mapred.system.dir
 (hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system)
 because of permissions.
 2013-04-15 03:34:13,695 WARN org.apache.hadoop.mapred.JobTracker: Manually
 delete the mapred.system.dir
 (hdfs://localhost:9000/home/jbu/hadoop_local_install/hadoop-1.0.4/tmp/mapred/system)
 and then start the JobTracker.

 So I deleted the system directory and restarted and the same problem
 appeared, that I didn't have ownership of the directory.  Still won't start.

 I am using hadoop 1.0.4 on linux mint.

 Any ideas?

 Thanks,
 -Julian

Re: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

This is zookeeper issue.

please paste zookeeper log here. thanks.


On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote:

 It is hbase-0.94.2-cdh4.2.0.

 ** **

 *发件人:* Ted Yu [mailto:yuzhih...@gmail.com]
 *发送时间:* 2013年4月16日 9:55
 *收件人:* u...@hbase.apache.org
 *主题:* Re: Region has been CLOSING for too long, this should eventually
 complete or the server will expire, send RPC again

 ** **

 I think this question would be more appropriate for HBase user mailing
 list.

 ** **

 Moving hadoop user to bcc.

 ** **

 Please tell us the HBase version you are using.

 ** **

 Thanks

 On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote:

 Hi

  

 I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes
 for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node.
  ), 3 DN nodes with zookeepers,  It works fine.  When I reboot one data
 node machine which includes zookeeper, after that , restart all processes.
 The hadoop works fine, but hbase not. I cannot disable tables and drop
 tables.

  

 The logs an follows:

 The Hbase HMaster log:

 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
 unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
 

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
 transition timed out:  -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865,
 server=Master,6,1366001238313

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
 been CLOSING for too long, this should eventually complete or the server
 will expire, send RPC again

 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
 unassignment of region -ROOT-,,0.70236052 (offlining)

  

 The Hbase HRegionServer log:

  

 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44
 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0,
 hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,
 evictions=0, evicted=0, evictedPerRun=NaN

  

 The Hbase Web show：

 Region  State

 70236052-ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST
 2013 (75440s ago), server=Master,6,1366001238313

  

 How fix it?

  

 Thanks.

 ** **

Re: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

it located under hbase-home/logs/  if your zookeeper is managed by hbase.

but I noticed you configured QJM, then did your QJM and Hbase share the
same ZK cluster? if so, then just paste your QJM zk configuration in the
hdfs-site.xml and hbase zk configuration in the hbase-site.xml.


On Tue, Apr 16, 2013 at 10:37 AM, dylan dwld0...@gmail.com wrote:

 How to check zookeeper log?? It is the binary files, how to transform it
 to normal log? ** **I find the “
 org.apache.zookeeper.server.LogFormatter”, how to run?** **

 ** **

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com]
 *发送时间:* 2013年4月16日 10:01
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: Region has been CLOSING for too long, this should
 eventually complete or the server will expire, send RPC again

 ** **

 This is zookeeper issue.

 ** **

 please paste zookeeper log here. thanks.

 ** **

 On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote:

 It is hbase-0.94.2-cdh4.2.0.

  

 *发件人:* Ted Yu [mailto:yuzhih...@gmail.com]
 *发送时间:* 2013年4月16日 9:55
 *收件人:* u...@hbase.apache.org
 *主题:* Re: Region has been CLOSING for too long, this should eventually
 complete or the server will expire, send RPC again

  

 I think this question would be more appropriate for HBase user mailing
 list.

  

 Moving hadoop user to bcc.

  

 Please tell us the HBase version you are using.

  

 Thanks

 On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote:

 Hi

  

 I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes
 for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node.
  ), 3 DN nodes with zookeepers,  It works fine.  When I reboot one data
 node machine which includes zookeeper, after that , restart all processes.
 The hadoop works fine, but hbase not. I cannot disable tables and drop
 tables.

  

 The logs an follows:

 The Hbase HMaster log:

 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
 unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
 

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
 transition timed out:  -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865,
 server=Master,6,1366001238313

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
 been CLOSING for too long, this should eventually complete or the server
 will expire, send RPC again

 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
 unassignment of region -ROOT-,,0.70236052 (offlining)

  

 The Hbase HRegionServer log:

  

 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44
 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0,
 hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,
 evictions=0, evicted=0, evictedPerRun=NaN

  

 The Hbase Web show：

 Region  State

 70236052-ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST
 2013 (75440s ago), server=Master,6,1366001238313

  

 How fix it?

  

 Thanks.

  

 ** **

Re: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

and paste ZK configuration in the zookeerp_home/conf/zoo.cfg


On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu azury...@gmail.com wrote:

 it located under hbase-home/logs/  if your zookeeper is managed by hbase.

 but I noticed you configured QJM, then did your QJM and Hbase share the
 same ZK cluster? if so, then just paste your QJM zk configuration in the
 hdfs-site.xml and hbase zk configuration in the hbase-site.xml.


 On Tue, Apr 16, 2013 at 10:37 AM, dylan dwld0...@gmail.com wrote:

 How to check zookeeper log?? It is the binary files, how to transform it
 to normal log?  ** **I find the “
 org.apache.zookeeper.server.LogFormatter”, how to run? ** **

 ** **

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com]
 *发送时间:* 2013年4月16日 10:01
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: Region has been CLOSING for too long, this should
 eventually complete or the server will expire, send RPC again

 ** **

 This is zookeeper issue.

 ** **

 please paste zookeeper log here. thanks.

 ** **

 On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote:

 It is hbase-0.94.2-cdh4.2.0.

  

 *发件人:* Ted Yu [mailto:yuzhih...@gmail.com]
 *发送时间:* 2013年4月16日 9:55
 *收件人:* u...@hbase.apache.org
 *主题:* Re: Region has been CLOSING for too long, this should eventually
 complete or the server will expire, send RPC again

  

 I think this question would be more appropriate for HBase user mailing
 list.

  

 Moving hadoop user to bcc.

  

 Please tell us the HBase version you are using.

  

 Thanks

 On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote:

 Hi

  

 I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes
 for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node.
  ), 3 DN nodes with zookeepers,  It works fine.  When I reboot one data
 node machine which includes zookeeper, after that , restart all processes.
 The hadoop works fine, but hbase not. I cannot disable tables and drop
 tables.

  

 The logs an follows:

 The Hbase HMaster log:

 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
 unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
 

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
 transition timed out:  -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865,
 server=Master,6,1366001238313

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
 been CLOSING for too long, this should eventually complete or the server
 will expire, send RPC again

 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
 unassignment of region -ROOT-,,0.70236052 (offlining)

  

 The Hbase HRegionServer log:

  

 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44
 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0,
 hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,
 evictions=0, evicted=0, evictedPerRun=NaN

  

 The Hbase Web show：

 Region  State

 70236052-ROOT-,,0.70236052 state=CLOSING, ts=Mon Apr 15 12:52:38 CST
 2013 (75440s ago), server=Master,6,1366001238313

  

 How fix it?

  

 Thanks.

  

 ** **

Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

then, can you find zookeeper log  under zookeeper_home/zookeeper.out ?


On Tue, Apr 16, 2013 at 11:04 AM, dylan dwld0...@gmail.com wrote:

 I use  hbase shell 

 ** **

 I always show :

 ERROR: org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

 ** **

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com]
 *发送时间:* 2013年4月16日 10:59
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: 答复: 答复: Region has been CLOSING for too long, this should
 eventually complete or the server will expire, send RPC again

 ** **

 did your hbase managed zookeeper? or did you set export
 HBASE_MANAGES_ZK=false in the hbase-env.sh?

 ** **

 if not, then that's zookeeper port conflicted.

 ** **

 On Tue, Apr 16, 2013 at 10:55 AM, dylan dwld0...@gmail.com wrote:

 # The number of milliseconds of each tick

 tickTime=2000

 # The number of ticks that the initial 

 # synchronization phase can take

 initLimit=10

 # The number of ticks that can pass between 

 # sending a request and getting an acknowledgement

 syncLimit=5

 # the directory where the snapshot is stored.

 # do not use /tmp for storage, /tmp here is just 

 # example sakes.

 dataDir=/usr/cdh4/zookeeper/data

 # the port at which the clients will connect

 clientPort=2181

  

 server.1=Slave01:2888:3888

 server.2=Slave02:2888:3888

 server.3=Slave03:2888:3888

  

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com] 

 *发送时间:* 2013年4月16日 10:45
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: 答复: Region has been CLOSING for too long, this should
 eventually complete or the server will expire, send RPC again

  

 and paste ZK configuration in the zookeerp_home/conf/zoo.cfg

  

 On Tue, Apr 16, 2013 at 10:42 AM, Azuryy Yu azury...@gmail.com wrote:***
 *

 it located under hbase-home/logs/  if your zookeeper is managed by hbase.*
 ***

  

 but I noticed you configured QJM, then did your QJM and Hbase share the
 same ZK cluster? if so, then just paste your QJM zk configuration in the
 hdfs-site.xml and hbase zk configuration in the hbase-site.xml.

  

 On Tue, Apr 16, 2013 at 10:37 AM, dylan dwld0...@gmail.com wrote:
 How to check zookeeper log?? It is the binary files, how to transform it
 to normal log?  I find the “
 org.apache.zookeeper.server.LogFormatter”, how to run? 

  

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com]
 *发送时间:* 2013年4月16日 10:01
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: Region has been CLOSING for too long, this should
 eventually complete or the server will expire, send RPC again

  

 This is zookeeper issue.

  

 please paste zookeeper log here. thanks.

  

 On Tue, Apr 16, 2013 at 9:58 AM, dylan dwld0...@gmail.com wrote:

 It is hbase-0.94.2-cdh4.2.0.

  

 *发件人:* Ted Yu [mailto:yuzhih...@gmail.com]
 *发送时间:* 2013年4月16日 9:55
 *收件人:* u...@hbase.apache.org
 *主题:* Re: Region has been CLOSING for too long, this should eventually
 complete or the server will expire, send RPC again

  

 I think this question would be more appropriate for HBase user mailing
 list.

  

 Moving hadoop user to bcc.

  

 Please tell us the HBase version you are using.

  

 Thanks

 On Mon, Apr 15, 2013 at 6:51 PM, dylan dwld0...@gmail.com wrote:

 Hi

  

 I am a newer for hadoop, and set up hadoop with tarball . I have 5 nodes
 for cluster, 2 NN nodes with QJM (3 Journal Nodes, one of them on DN node.
  ), 3 DN nodes with zookeepers,  It works fine.  When I reboot one data
 node machine which includes zookeeper, after that , restart all processes.
 The hadoop works fine, but hbase not. I cannot disable tables and drop
 tables.

  

 The logs an follows:

 The Hbase HMaster log:

 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Attempted to
 unassign region -ROOT-,,0.70236052 but it is not currently assigned anywhere
 

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in
 transition timed out:  -ROOT-,,0.70236052 state=CLOSING, ts=1366001558865,
 server=Master,6,1366001238313

 ,683 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has
 been CLOSING for too long, this should eventually complete or the server
 will expire, send RPC again

 10,684 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting
 unassignment of region -ROOT-,,0.70236052 (offlining)

  

 The Hbase HRegionServer log:

  

 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=7.44
 MB, free=898.81 MB, max=906.24 MB, blocks=0, accesses=0, hits=0,
 hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0,
 evictions=0, evicted=0, evictedPerRun=NaN

  

 The Hbase Web show：

 Region  State

 70236052-ROOT-,,0.70236052 state=CLOSING

Re: 答复: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

I cannot find any useful information from pasted logs.


On Tue, Apr 16, 2013 at 11:22 AM, dylan dwld0...@gmail.com wrote:

 yes. I have just discovered.

 ** **

 I find the Slave01 and Slave03  zookeeper.out under zookeeper_home/bin/***
 *

 But Slave02(which reboot before) zookeeper_home under / directory after
 reboot 

 ** **

 *Slave02  zookeeper.out show:*

 WARN  [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting
 SendWorker

 2013-04-15 16:38:31,987 [myid:2] - WARN
 [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while
 waiting for message on queue

 java.lang.InterruptedException

 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
 

 at
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094)
 

 at
 java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)*
 ***

 at
 org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
 

 at
 org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
 

 at
 org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
 

 [myid:2] - WARN  [SendWorker:1:QuorumCnxManager$SendWorker@688] - Send
 worker leaving thread

 [myid:2] - INFO  [Slave02/192.168.75.243:3888
 :QuorumCnxManager$Listener@493] - Received connection request /
 192.168.75.242:51136

 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] -
 Notification: 1 (n.leader), 0x5037d (n.zxid), 0x1 (n.round), LOOKING
 (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)

 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] -
 Notification: 1 (n.leader), 0x5037d (n.zxid), 0x2 (n.round), LOOKING
 (n.state), 1 (n.sid), 0x5 (n.peerEPoch), FOLLOWING (my state)

 ** **

 ** **

 *Slave01  zookeeper.out show:*

 [myid:1] - INFO  [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@627]
 - Got user-level KeeperException when processing
 sessionid:0x13e0dc5a0890005 type:create cxid:0x1e zxid:0xb003c
 txntype:-1 reqpath:n/a Error Path:/hbase/online-snapshot/acquired
 Error:KeeperErrorCode = NodeExists for /hbase/online-snapshot/acquired

 2013-04-16 10:58:26,415 [myid:1] - INFO  [ProcessThread(sid:1
 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
 when processing sessionid:0x13e0dc5a0890006 type:create cxid:0x7
 zxid:0xb003d txntype:-1 reqpath:n/a Error
 Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
 /hbase/online-snapshot/acquired

 2013-04-16 10:58:26,431 [myid:1] - INFO  [ProcessThread(sid:1
 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
 when processing sessionid:0x13e0dc5a0890007 type:create cxid:0x7
 zxid:0xb003e txntype:-1 reqpath:n/a Error
 Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
 /hbase/online-snapshot/acquired

 2013-04-16 10:58:26,489 [myid:1] - INFO  [ProcessThread(sid:1
 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException
 when processing sessionid:0x23e0dc5a333000a type:create cxid:0x7
 zxid:0xb003f txntype:-1 reqpath:n/a Error
 Path:/hbase/online-snapshot/acquired Error:KeeperErrorCode = NodeExists for
 /hbase/online-snapshot/acquired

 2013-04-16 10:58:36,001 [myid:1] - INFO
 [SessionTracker:ZooKeeperServer@325] - Expiring session
 0x33e0dc5b4de0003, timeout of 4ms exceeded

 2013-04-16 10:58:36,001 [myid:1] - INFO  [ProcessThread(sid:1
 cport:-1)::PrepRequestProcessor@476] - Processed session termination for
 sessionid: 0x33e0dc5b4de0003

 2013-04-16 11:03:44,000 [myid:1] - INFO
 [SessionTracker:ZooKeeperServer@325] - Expiring session
 0x23e0dc5a333000b, timeout of 4ms exceeded

 2013-04-16 11:03:44,001 [myid:1] - INFO  [ProcessThread(sid:1
 cport:-1)::PrepRequestProcessor@476] - Processed session termination for
 sessionid: 0x23e0dc5a333000b

 ** **

 ** **

 ** **

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com]
 *发送时间:* 2013年4月16日 11:13
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: 答复: 答复: 答复: Region has been CLOSING for too long, this
 should eventually complete or the server will expire, send RPC again

 ** **

 then, can you find zookeeper log  under zookeeper_home/zookeeper.out ?

 ** **

 On Tue, Apr 16, 2013 at 11:04 AM, dylan dwld0...@gmail.com wrote:

 I use  hbase shell 

  

 I always show :

 ERROR: org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

  

 *发件人:* Azuryy Yu [mailto:azury...@gmail.com] 

 *发送时间:* 2013年4月16日 10:59
 *收件人:* user@hadoop.apache.org
 *主题:* Re: 答复: 答复: 答复: Region has been CLOSING for too long, this should
 eventually complete or the server will expire, send

Re: A question of QJM with HDFS federation

2013-04-14 Thread Azuryy Yu

Hi Harsh,

If they are two separate cluster, instead of federated, haing the same
cluster ID, but using different name service ID, so can they use the same
journal nodes and ZK nodes?

and if they are also separated cluster, with different clusterID, but using
the same name service ID, so can they use the same journal nodes and ZK
nodes?

Thanks.


On Mon, Apr 8, 2013 at 2:57 PM, Azuryy Yu azury...@gmail.com wrote:

 Thank you very much, Harsh.

 not yet question now.

 --Send from my Sony mobile.
 On Apr 8, 2013 2:51 PM, Harsh J ha...@cloudera.com wrote:

 Hi Azurry,

 QJM:

 Yes, multiple nameservices can share a single QJM set. The QJM
 configuration allows for a journal ID prefix path which you should
 configure to be the nameservice ID. You do not need to change disk
 paths/etc. at all.

 For example,

 NS1 NNs can have the dfs.namenode.shared.edits.dir configured as:
 qjournal://node1.example.com:8485;node2.example.com:8485;
 node3.example.com:8485/NS1

 NS2 NNs can have the dfs.namenode.shared.edits.dir configured as:
 qjournal://node1.example.com:8485;node2.example.com:8485;
 node3.example.com:8485/NS2

 Which will separate both of them logically and still make them use the
 same QJM set of nodes.

 ZKFC:

 Each NN needs its own HDFS ZKFC daemon, but all ZKFCs across multiple
 NSes can share a single ZK cluster. All ZKFC's core-site.xml can have
 the same ha.zookeeper.quorum value since the ZKFC automatically reuses
 the nameservice ID as its parent znode name on the ZK instance, and
 won't collide with another NSs' ZKFCs.

 Do post back if there are still some more doubts.

 On Mon, Apr 8, 2013 at 10:53 AM, Azuryy Yu azury...@gmail.com wrote:
  Hi dears,
  I deployed Hadoopv2 with HA enabled using QJM, so my question is:
 
  1) if we also configured HDFS federation,  such as:
 
  NN1 is active, NN2 is standby
  NN3 is active, NN4 is standby
 
  they are configured as HDFS federation, then,
  Can these four NNs using the same Journal nodes and ZKs?
 
  if your answer is yes, does that enough just using different
  dfs.journalnode.edits.dir?
 
  such as:
  NN1 and NN2 configured dfs.journalnode.edits.dir as /data1,
  NN3 and NN4 configured dfs.journalnode.edits.dir as /data2
 
 
  Thanks.



 --
 Harsh J

Re: Mapper always hangs at the same spot

2013-04-13 Thread Azuryy Yu

agree. just check your app. or paste map code here.

--Send from my Sony mobile.
On Apr 14, 2013 4:08 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Your application logic is likely stuck in a loop.


 On Sat, Apr 13, 2013 at 12:47 PM, Chris Hokamp chris.hok...@gmail.comwrote:

 When you say never progresses, do you see the MR framework kill it
 automatically after 10 minutes of inactivity or does it never ever
 exit?

 The latter -- it never exits. Killing it manually seems like a good
 option for now. We already have mapred.max.map.failures.percent set to a
 non-zero value, but because the task never fails, this never comes into
 effect.

 Thanks for the help,
 Chris


 On Sat, Apr 13, 2013 at 5:00 PM, Harsh J ha...@cloudera.com wrote:

 When you say never progresses, do you see the MR framework kill it
 automatically after 10 minutes of inactivity or does it never ever
 exit?

 You can lower the timeout period on tasks via mapred.task.timeout set
 in msec. You could also set mapred.max.map.failures.percent to a
 non-zero value to allow that much percentage of tasks to fail without
 also marking the whole job as a failure.

 If the task itself does not get killed by the framework due to
 inactiveness, try doing a hadoop job -fail-task on its attempt ID
 manually.

 On Sat, Apr 13, 2013 at 8:45 PM, Chris Hokamp chris.hok...@gmail.com
 wrote:
  Hello,
 
  We have a job where all mappers finish except for one, which always
 hangs at
  the same spot (i.e. reaches 49%, then never progresses).
 
  This is likely due to a bug in the wiki parser in our Pig UDF. We can
 afford
  to lose the data this mapper is working on if it would allow the job to
  finish. Question: is there a hadoop configuration parameter similar to
  mapred.skip.map.max.skip.records that would let us skip a map that
 doesn't
  progress after X amount of time? Any other possible workarounds for
 this
  case would also be useful.
 
  We are currently using hadoop 1.1.0 and Pig 0.10.1.
 
  Thanks,
  Chris



 --
 Harsh J

Re: Copy Vs DistCP

2013-04-11 Thread Azuryy Yu

yes, you are right.


On Thu, Apr 11, 2013 at 3:40 PM, Hemanth Yamijala yhema...@thoughtworks.com
 wrote:

 AFAIK, the cp command works fully from the DFS client. It reads bytes from
 the InputStream created when the file is opened and writes the same to the
 OutputStream of the file. It does not work at the level of data blocks. A
 configuration io.file.buffer.size is used as the size of the buffer used in
 copy - set to 4096 by default.

 Thanks
 Hemanth


 On Thu, Apr 11, 2013 at 9:42 AM, KayVajj vajjalak...@gmail.com wrote:

 If CP command is not parallel how does it work for a file partitioned on
 various data nodes?


 On Wed, Apr 10, 2013 at 6:30 PM, Azuryy Yu azury...@gmail.com wrote:

 CP command is not parallel, It's just call FileSystem, even if DFSClient
 has multi threads.

 DistCp can work well on the same cluster.


 On Thu, Apr 11, 2013 at 8:17 AM, KayVajj vajjalak...@gmail.com wrote:

 The File System Copy utility copies files byte by byte if I'm not
 wrong. Could it be possible that the cp command works with blocks and moves
 them which could be significantly efficient?


 Also how does the cp command work if the file is distributed on
 different data nodes??

 Thanks
 Kay


 On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas jayunit...@gmail.com wrote:

 DistCP is a full blown mapreduce job (mapper only, where the mappers
 do a fully parallel copy to the detsination).

 CP appears (correct me if im wrong) to simply invoke the FileSystem
 and issues a copy command for every source file.

 I have an additional question: how is CP which is internal to a
 cluster optimized (if at all) ?



 On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 shurong@qunar.com wrote:

 **
 Hi，

 I think it' better using Copy in the same cluster while using distCP
 between clusters, and cp command is a hadoop internal parallel process 
 and
 will not copy files locally.

 --
  麦树荣

  *From:* KayVajj vajjalak...@gmail.com
 *Date:* 2013-04-11 06:20
 *To:* user@hadoop.apache.org
 *Subject:* Copy Vs DistCP
   I have few questions regarding the usage of DistCP for copying
 files in the same cluster.


 1) Which one is better within a  same cluster and what factors (like
 file size etc) wouldinfluence the usage of one over te other?

  2) when we run a cp command like below from a  client node of the
 cluster (not a data node), How does the cp command work
   i) like an MR job
  ii) copy files locally and then it copy it back at the new
 location.

  Example of the copy command

  hdfs dfs -cp /some_location/file /new_location/

  Thanks, your responses are appreciated.

  -- Kay




 --
 Jay Vyas
 http://jayunit100.blogspot.com

Re: Copy Vs DistCP

2013-04-11 Thread Azuryy Yu

DistCP is prefer for your requirements.


On Fri, Apr 12, 2013 at 12:52 AM, KayVajj vajjalak...@gmail.com wrote:

 Summing up what would be the recommendations for copy

 1) DistCP
 2) shell cp command
 3) Using File System API(FileUtils to be precise) inside of a Java program
 4) A MR with an Identity Mapper and no Reducer (may be this is what DistCP
 does)


 I did not run any comparisons as my dev cluster is just a two node cluster
 and not sure how this would perform on a production cluster.

 Kay


 On Thu, Apr 11, 2013 at 5:44 AM, Jay Vyas jayunit...@gmail.com wrote:

 Yes makes sense...  cp is serialized and simpler, and does not rely on
 jobtracker- Whereas distcp actually only submits a job and waits for
 completion.
 So it can fail if tasks start to fail or timeout.
  I Have seen distcp fail and hang before albeit not often.

 Sent from my iPhone

 On Apr 10, 2013, at 10:37 PM, Alexander Pivovarov apivova...@gmail.com
 wrote:

 if cluster is busy with other jobs distcp will wait for free map slots.
 Regular cp is more reliable and predictable. Especialy if you need to copy
 just several GB
 On Apr 10, 2013 6:31 PM, Azuryy Yu azury...@gmail.com wrote:

 CP command is not parallel, It's just call FileSystem, even if DFSClient
 has multi threads.

 DistCp can work well on the same cluster.


 On Thu, Apr 11, 2013 at 8:17 AM, KayVajj vajjalak...@gmail.com wrote:

 The File System Copy utility copies files byte by byte if I'm not
 wrong. Could it be possible that the cp command works with blocks and moves
 them which could be significantly efficient?


 Also how does the cp command work if the file is distributed on
 different data nodes??

 Thanks
 Kay


 On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas jayunit...@gmail.com wrote:

 DistCP is a full blown mapreduce job (mapper only, where the mappers
 do a fully parallel copy to the detsination).

 CP appears (correct me if im wrong) to simply invoke the FileSystem
 and issues a copy command for every source file.

 I have an additional question: how is CP which is internal to a
 cluster optimized (if at all) ?



 On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 shurong@qunar.com wrote:

 **
 Hi，

 I think it' better using Copy in the same cluster while using distCP
 between clusters, and cp command is a hadoop internal parallel process 
 and
 will not copy files locally.

 --
  麦树荣

  *From:* KayVajj vajjalak...@gmail.com
 *Date:* 2013-04-11 06:20
 *To:* user@hadoop.apache.org
 *Subject:* Copy Vs DistCP
   I have few questions regarding the usage of DistCP for copying
 files in the same cluster.


 1) Which one is better within a  same cluster and what factors (like
 file size etc) wouldinfluence the usage of one over te other?

  2) when we run a cp command like below from a  client node of the
 cluster (not a data node), How does the cp command work
   i) like an MR job
  ii) copy files locally and then it copy it back at the new
 location.

  Example of the copy command

  hdfs dfs -cp /some_location/file /new_location/

  Thanks, your responses are appreciated.

  -- Kay




 --
 Jay Vyas
 http://jayunit100.blogspot.com

Re: The Job.xml file

2013-04-09 Thread Azuryy Yu

Yes, you can start a job directly from a job.xml

try hadoop job -submit JOB_FILE, replace JOB_FILE with your job.xml.


On Wed, Apr 10, 2013 at 12:25 AM, Jay Vyas jayunit...@gmail.com wrote:

 Hi guys: I cant find much info about the life cycle for the job.xml file
 in hadoop.

 My thoughts are :

 1) It is created by the job client
 2) It is only read by the JobTracker
 3) Task trackers (indirectly) are configured by information in job.xml
 because
 the JobTracker decomposes its contents into individual tasks

 So, my (related) questions are:

 Is there a way to start a job directly from a job.xml file?
 What components depend on and read the job.xml file?
 Where is the job.xml defined/documented (if anywhere)?

 --
 Jay Vyas
 http://jayunit100.blogspot.com

Re: backup node question

Hi Harsh,
Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x?


On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote:

 Yes, it need not keep an edits (transactions) stream locally cause
 those are passed synchronously to the BackupNameNode, which persists
 it on its behalf.

 On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote:
  Thanks Harsh,
 
  For your comments, What it means is that the NameNode need not store
  anything locally, you mean Primary Name Node do not need to store
  checkpoint/journal locally, and only need to keep memory image up-to-date
  for edits?
 
  regards,
  Lin
 
 
  On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote:
 
  Hi Lin,
 
  My reply inline.
 
  On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote:
   Hi guys,
  
   I am reading from this paper to learn about backup nodes
   (http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf),
  
   It is mentioned, It contains all file system metadata information
   except
   for block locations. It can perform all operations of the regular
   NameNode
   that do not involve modification of the namespace or knowledge of
 block
   locations. , what kinds of operations do not need knowledge of block
   locations?
 
  Operations that do not involve data reads or writes would not require
  knowledge of block locations. Applying also the restriction of no
  namespace mutation, an example would be listing directories and
  looking up file information via FileStatus objects (perhaps the only
  examples - its like a safemode but no reads either).
 
   It is also mentioned, Use of a BackupNode provides the option of
   running
   the NameNode without persistent storage, delegating responsibility for
   the
   namespace state persisting to the BackupNode., what means running
 the
   NameNode without persistent storage and delegating responsibility
 for
   the
   namespace state persisting?
 
  What it means is that the NameNode need not store anything locally,
  but can rely on the edits being stored at the BackupNameNode which
  would continuously be receiving it. When restarted, it can grab a
  current checkpoint from the BNN and boot up anywhere, since there's no
  local storage requirement.
 
  --
  Harsh J
 
 



 --
 Harsh J

Re: backup node question

I am confused. Hadoopv2 has NN SNN DN JN(journal node), so whats
Standby Namenode?

--Send from my Sony mobile.
On Apr 7, 2013 9:03 PM, Harsh J ha...@cloudera.com wrote:

 BackupNameNode is not present in the maintenance 1.x releases, it is a
 feature added to a higher version; you can try it out in 2.x today if
 you wish to.

 On Sun, Apr 7, 2013 at 3:12 PM, Azuryy Yu azury...@gmail.com wrote:
  Hi Harsh,
  Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x?
 
 
  On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote:
 
  Yes, it need not keep an edits (transactions) stream locally cause
  those are passed synchronously to the BackupNameNode, which persists
  it on its behalf.
 
  On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote:
   Thanks Harsh,
  
   For your comments, What it means is that the NameNode need not store
   anything locally, you mean Primary Name Node do not need to store
   checkpoint/journal locally, and only need to keep memory image
   up-to-date
   for edits?
  
   regards,
   Lin
  
  
   On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote:
  
   Hi Lin,
  
   My reply inline.
  
   On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote:
Hi guys,
   
I am reading from this paper to learn about backup nodes
(http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf),
   
It is mentioned, It contains all file system metadata information
except
for block locations. It can perform all operations of the regular
NameNode
that do not involve modification of the namespace or knowledge of
block
locations. , what kinds of operations do not need knowledge of
 block
locations?
  
   Operations that do not involve data reads or writes would not require
   knowledge of block locations. Applying also the restriction of no
   namespace mutation, an example would be listing directories and
   looking up file information via FileStatus objects (perhaps the only
   examples - its like a safemode but no reads either).
  
It is also mentioned, Use of a BackupNode provides the option of
running
the NameNode without persistent storage, delegating responsibility
for
the
namespace state persisting to the BackupNode., what means running
the
NameNode without persistent storage and delegating responsibility
for
the
namespace state persisting?
  
   What it means is that the NameNode need not store anything locally,
   but can rely on the edits being stored at the BackupNameNode which
   would continuously be receiving it. When restarted, it can grab a
   current checkpoint from the BNN and boot up anywhere, since there's
 no
   local storage requirement.
  
   --
   Harsh J
  
  
 
 
 
  --
  Harsh J
 
 



 --
 Harsh J

Re: backup node question

SNN=secondary name node  in my last mail.

--Send from my Sony mobile.
On Apr 7, 2013 10:01 PM, Azuryy Yu azury...@gmail.com wrote:

 I am confused. Hadoopv2 has NN SNN DN JN(journal node), so whats
 Standby Namenode?

 --Send from my Sony mobile.
 On Apr 7, 2013 9:03 PM, Harsh J ha...@cloudera.com wrote:

 BackupNameNode is not present in the maintenance 1.x releases, it is a
 feature added to a higher version; you can try it out in 2.x today if
 you wish to.

 On Sun, Apr 7, 2013 at 3:12 PM, Azuryy Yu azury...@gmail.com wrote:
  Hi Harsh,
  Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x?
 
 
  On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote:
 
  Yes, it need not keep an edits (transactions) stream locally cause
  those are passed synchronously to the BackupNameNode, which persists
  it on its behalf.
 
  On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote:
   Thanks Harsh,
  
   For your comments, What it means is that the NameNode need not store
   anything locally, you mean Primary Name Node do not need to store
   checkpoint/journal locally, and only need to keep memory image
   up-to-date
   for edits?
  
   regards,
   Lin
  
  
   On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com wrote:
  
   Hi Lin,
  
   My reply inline.
  
   On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote:
Hi guys,
   
I am reading from this paper to learn about backup nodes
(http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf),
   
It is mentioned, It contains all file system metadata information
except
for block locations. It can perform all operations of the regular
NameNode
that do not involve modification of the namespace or knowledge of
block
locations. , what kinds of operations do not need knowledge of
 block
locations?
  
   Operations that do not involve data reads or writes would not
 require
   knowledge of block locations. Applying also the restriction of no
   namespace mutation, an example would be listing directories and
   looking up file information via FileStatus objects (perhaps the only
   examples - its like a safemode but no reads either).
  
It is also mentioned, Use of a BackupNode provides the option of
running
the NameNode without persistent storage, delegating responsibility
for
the
namespace state persisting to the BackupNode., what means
 running
the
NameNode without persistent storage and delegating
 responsibility
for
the
namespace state persisting?
  
   What it means is that the NameNode need not store anything locally,
   but can rely on the edits being stored at the BackupNameNode which
   would continuously be receiving it. When restarted, it can grab a
   current checkpoint from the BNN and boot up anywhere, since there's
 no
   local storage requirement.
  
   --
   Harsh J
  
  
 
 
 
  --
  Harsh J
 
 



 --
 Harsh J

Re: backup node question

oh, got it. you are a good guy.

--Send from my Sony mobile.
On Apr 7, 2013 10:11 PM, Harsh J ha...@cloudera.com wrote:

 StandbyNameNode is the term we use to refer to a NameNode in HA that
 is currently not the active one (i.e. its state is 'Standby'). Its not
 a special type of daemon (i.e. it just runs the NameNode service),
 just a naming convention.

 On Sun, Apr 7, 2013 at 7:31 PM, Azuryy Yu azury...@gmail.com wrote:
  I am confused. Hadoopv2 has NN SNN DN JN(journal node), so whats
  Standby Namenode?
 
  --Send from my Sony mobile.
 
  On Apr 7, 2013 9:03 PM, Harsh J ha...@cloudera.com wrote:
 
  BackupNameNode is not present in the maintenance 1.x releases, it is a
  feature added to a higher version; you can try it out in 2.x today if
  you wish to.
 
  On Sun, Apr 7, 2013 at 3:12 PM, Azuryy Yu azury...@gmail.com wrote:
   Hi Harsh,
   Do you mean BackupNameNode is Secondary NameNode in Hadoop1.x?
  
  
   On Sun, Apr 7, 2013 at 4:05 PM, Harsh J ha...@cloudera.com wrote:
  
   Yes, it need not keep an edits (transactions) stream locally cause
   those are passed synchronously to the BackupNameNode, which persists
   it on its behalf.
  
   On Sun, Apr 7, 2013 at 1:21 PM, Lin Ma lin...@gmail.com wrote:
Thanks Harsh,
   
For your comments, What it means is that the NameNode need not
 store
anything locally, you mean Primary Name Node do not need to store
checkpoint/journal locally, and only need to keep memory image
up-to-date
for edits?
   
regards,
Lin
   
   
On Sun, Apr 7, 2013 at 3:31 PM, Harsh J ha...@cloudera.com
 wrote:
   
Hi Lin,
   
My reply inline.
   
On Sun, Apr 7, 2013 at 12:36 PM, Lin Ma lin...@gmail.com wrote:
 Hi guys,

 I am reading from this paper to learn about backup nodes
 (http://www.storageconference.org/2010/Papers/MSST/Shvachko.pdf
 ),

 It is mentioned, It contains all file system metadata
 information
 except
 for block locations. It can perform all operations of the
 regular
 NameNode
 that do not involve modification of the namespace or knowledge
 of
 block
 locations. , what kinds of operations do not need knowledge of
 block
 locations?
   
Operations that do not involve data reads or writes would not
require
knowledge of block locations. Applying also the restriction of no
namespace mutation, an example would be listing directories and
looking up file information via FileStatus objects (perhaps the
 only
examples - its like a safemode but no reads either).
   
 It is also mentioned, Use of a BackupNode provides the option
 of
 running
 the NameNode without persistent storage, delegating
 responsibility
 for
 the
 namespace state persisting to the BackupNode., what means
 running
 the
 NameNode without persistent storage and delegating
 responsibility
 for
 the
 namespace state persisting?
   
What it means is that the NameNode need not store anything
 locally,
but can rely on the edits being stored at the BackupNameNode which
would continuously be receiving it. When restarted, it can grab a
current checkpoint from the BNN and boot up anywhere, since
 there's
no
local storage requirement.
   
--
Harsh J
   
   
  
  
  
   --
   Harsh J
  
  
 
 
 
  --
  Harsh J



 --
 Harsh J

A question of QJM with HDFS federation