RE: Hadoop 2.6.0 Error

2015-03-25 Thread Brahma Reddy Battula
Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like 
putting in /etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538




Thanks  Regards

 Brahma Reddy Battula


From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:

Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost
2. Did following exports in user defined hadoop.sh and ran it succesfully
1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
2. EXPORT HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0
3. EXPORT PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully
4. Ran $hadoop namenode -format successfully
5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory
6. Ran $start-dfs.sh --config /home/anand_vihar/conf

Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to /usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.

Thanks

Regards,

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)


Hadoop 2.6.0 Error

2015-03-25 Thread Anand Murali
Dear All:
Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost2. Did following exports in user defined hadoop.sh and ran it 
succesfully
    1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64    2. EXPORT 
HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0    3. EXPORT 
PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully4. Ran $hadoop namenode -format 
successfully5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory6. Ran $start-dfs.sh 
--config /home/anand_vihar/conf
Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to /usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.
Thanks
Regards,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)

Re: Hadoop 2.6.0 Error

2015-03-25 Thread Anand Murali
Dear All:
I get this error shall try setting JAVA_HOME in .profile
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$
Thanks
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:
   

 #yiv4614194906 P {margin-top:0;margin-bottom:0;}Instead of exporting the 
JAVA_HOME, Please set JAVA_HOME in system level ( like putting in 
/etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538



Thanks  Regards Brahma Reddy Battula
From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:
Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost2. Did following exports in user defined hadoop.sh and ran it 
succesfully
    1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64    2. EXPORT 
HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0    3. EXPORT 
PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully4. Ran $hadoop namenode -format 
successfully5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory6. Ran $start-dfs.sh 
--config /home/anand_vihar/conf
Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to/usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.
Thanks
Regards,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)

  

Re: Hadoop 2.6.0 Error

2015-03-25 Thread Azuryy Yu
please also set correct JAVA_HOME in hadoop-env.sh.


On Wed, Mar 25, 2015 at 1:53 PM, Anand Murali anand_vi...@yahoo.com wrote:

 Dear All:

 Request help/advise as I am unable to start Hadoop. Performed follow steps
 in Ubuntu 14.10

 1. ssh localhost
 2. Did following exports in user defined hadoop.sh and ran it succesfully
 1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
 2. EXPORT HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0
 3. EXPORT PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
 3. Tested hadoop version succesfully
 4. Ran $hadoop namenode -format successfully
 5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to
 pseudo-distributed mode in /home/anand_vihar/conf directory
 6. Ran $start-dfs.sh --config /home/anand_vihar/conf

 Got error JAVA_HOME not set and slaves not found in /conf. If I echo
 $JAVA_HOME it is pointing to /usr/lib/jvm/java-7-openjdk-amd6, correctly
 as set. Help appreciated.

 Thanks

 Regards,

 Anand Murali
 11/7, 'Anand Vihar', Kandasamy St, Mylapore
 Chennai - 600 004, India
 Ph: (044)- 28474593/ 43526162 (voicemail)



RE: Hadoop 2.6.0 Error

2015-03-25 Thread Brahma Reddy Battula
HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...




Thanks  Regards

Brahma Reddy Battula





From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To: user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

Dear All:

Even after setting JAVA_HOME in .profile I get

JAVA_HOME is not set and could not be found -error.


If anyone of you know of a more stable version please do let me know.

Thanks,

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)



On Wednesday, March 25, 2015 12:57 PM, Anand Murali anand_vi...@yahoo.com 
wrote:


Dear Mr. Bhrama Reddy:

Should I type

SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64


in root (profile) or at user level (.profile). Reply most welcome

Thanks

Regards


Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)



On Wednesday, March 25, 2015 12:37 PM, Anand Murali anand_vi...@yahoo.com 
wrote:


Dear All:

I get this error shall try setting JAVA_HOME in .profile

Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$

Thanks

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)



On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:


Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like 
putting in /etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538



Thanks  Regards
 Brahma Reddy Battula


From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:

Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost
2. Did following exports in user defined hadoop.sh and ran it succesfully
1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
2. EXPORT HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0
3. EXPORT PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully
4. Ran $hadoop namenode -format successfully
5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory
6. Ran $start-dfs.sh --config /home/anand_vihar/conf

Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to /usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.

Thanks

Regards,

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)








Re: Hadoop 2.6.0 Error

2015-03-25 Thread Anand Murali
Dear All:
Even after setting JAVA_HOME in .profile I get
JAVA_HOME is not set and could not be found -error.

If anyone of you know of a more stable version please do let me know.
Thanks,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 12:57 PM, Anand Murali 
anand_vi...@yahoo.com wrote:
   

 Dear Mr. Bhrama Reddy:
Should I type
SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

in root (profile) or at user level (.profile). Reply most welcome
Thanks
Regards
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 12:37 PM, Anand Murali 
anand_vi...@yahoo.com wrote:
   

 Dear All:
I get this error shall try setting JAVA_HOME in .profile
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$
Thanks
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:
   

 #yiv6471776950 P {margin-top:0;margin-bottom:0;}Instead of exporting the 
JAVA_HOME, Please set JAVA_HOME in system level ( like putting in 
/etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538



Thanks  Regards Brahma Reddy Battula
From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:
Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost2. Did following exports in user defined hadoop.sh and ran it 
succesfully
    1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64    2. EXPORT 
HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0    3. EXPORT 
PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully4. Ran $hadoop namenode -format 
successfully5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory6. Ran $start-dfs.sh 
--config /home/anand_vihar/conf
Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to/usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.
Thanks
Regards,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)

   

   

  

Re: Hadoop 2.6.0 Error

2015-03-25 Thread Anand Murali
Dear Mr. Bhrama Reddy:
Should I type
SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

in root (profile) or at user level (.profile). Reply most welcome
Thanks
Regards
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 12:37 PM, Anand Murali 
anand_vi...@yahoo.com wrote:
   

 Dear All:
I get this error shall try setting JAVA_HOME in .profile
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$
Thanks
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:
   

 #yiv4909522465 P {margin-top:0;margin-bottom:0;}Instead of exporting the 
JAVA_HOME, Please set JAVA_HOME in system level ( like putting in 
/etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538



Thanks  Regards Brahma Reddy Battula
From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:
Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost2. Did following exports in user defined hadoop.sh and ran it 
succesfully
    1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64    2. EXPORT 
HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0    3. EXPORT 
PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully4. Ran $hadoop namenode -format 
successfully5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory6. Ran $start-dfs.sh 
--config /home/anand_vihar/conf
Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to/usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.
Thanks
Regards,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)

   

  

Re: Hadoop 2.6.0 Error

2015-03-25 Thread Anand Murali
Mr. Reddy:

I tried both options, and it does not work. starting daemons fail with 
JAVA_HOME not set error. However, if I $ECHO JAVA_HOME it points to the RIGHT 
directory. Have you worked on a lower version of Hadoop, I see that on the 
website they just have 1.2, 2.5 and 2.6. 1.2 does not have Yarn support. Is 2.5 
stable. Path errors are a major issue in hadoop.
Thanks for your suggestions.
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 1:58 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:
   

 HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...



Thanks  Regards 
Brahma Reddy Battula 
From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To: user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

Dear All:
Even after setting JAVA_HOME in .profile I get
JAVA_HOME is not set and could not be found -error.

If anyone of you know of a more stable version please do let me know.
Thanks,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)


On Wednesday, March 25, 2015 12:57 PM, Anand Murali anand_vi...@yahoo.com 
wrote:


Dear Mr. Bhrama Reddy:
Should I type
SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

in root (profile) or at user level (.profile). Reply most welcome
Thanks
Regards
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)


On Wednesday, March 25, 2015 12:37 PM, Anand Murali anand_vi...@yahoo.com 
wrote:


Dear All:
I get this error shall try setting JAVA_HOME in .profile
Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$
Thanks
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)


On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:


#yiv3257277709 #yiv3257277709 -- p 
{margin-top:0;margin-bottom:0;}#yiv3257277709 #yiv3257277709 BODY 
{direction:ltr;font-family:Tahoma;color:#00;font-size:10pt;}#yiv3257277709 
P {margin-top:0;margin-bottom:0;}Instead of exporting the JAVA_HOME, Please set 
JAVA_HOME in system level ( like putting in /etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538



Thanks  Regards Brahma Reddy Battula
From: Anand Murali [anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:
Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost2. Did following exports in user defined hadoop.sh and ran it 
succesfully
    1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64    2. EXPORT 
HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0    3. EXPORT 
PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully4. Ran $hadoop namenode -format 
successfully5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory6. Ran $start-dfs.sh 
--config /home/anand_vihar/conf
Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to/usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.
Thanks
Regards,
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)







  

Re: Hadoop 2.6.0 Error

2015-03-25 Thread Olivier Renault
It should be :
export JAVA_HOME=…

Olivier


From: Brahma Reddy Battula
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wednesday, 25 March 2015 08:28
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org, Anand Murali
Subject: RE: Hadoop 2.6.0 Error

HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...




Thanks  Regards

Brahma Reddy Battula





From: Anand Murali [anand_vi...@yahoo.commailto:anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

Dear All:

Even after setting JAVA_HOME in .profile I get

JAVA_HOME is not set and could not be found -error.


If anyone of you know of a more stable version please do let me know.

Thanks,

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)



On Wednesday, March 25, 2015 12:57 PM, Anand Murali 
anand_vi...@yahoo.commailto:anand_vi...@yahoo.com wrote:


Dear Mr. Bhrama Reddy:

Should I type

SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64


in root (profile) or at user level (.profile). Reply most welcome

Thanks

Regards


Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)



On Wednesday, March 25, 2015 12:37 PM, Anand Murali 
anand_vi...@yahoo.commailto:anand_vi...@yahoo.com wrote:


Dear All:

I get this error shall try setting JAVA_HOME in .profile

Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$

Thanks

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)



On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.commailto:brahmareddy.batt...@huawei.com wrote:


Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like 
putting in /etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538



Thanks  Regards
 Brahma Reddy Battula


From: Anand Murali [anand_vi...@yahoo.commailto:anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error

Dear All:

Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost
2. Did following exports in user defined hadoop.sh and ran it succesfully
1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
2. EXPORT HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0
3. EXPORT PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully
4. Ran $hadoop namenode -format successfully
5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory
6. Ran $start-dfs.sh --config /home/anand_vihar/conf

Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to /usr/lib/jvm/java-7-openjdk-amd6, correctly as set. Help 
appreciated.

Thanks

Regards,

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)








RE: Hadoop 2.6.0 Error

2015-03-25 Thread Alexandru Pacurar
Hello,

I had a similar problem and my solution to this was setting JAVA_HOME in 
/etc/environment.

The problem is, from what I remember, that the start-dfs.sh script calls 
hadoop-daemons.sh with the necessary options to start the Hadoop daemons. 
hadoop-daemons.sh in turn calls hadoop-daemon.sh with the necessary options via 
ssh, in an non-interactive fashion. When you are executing a command via ssh in 
a non-interactive manner (ex. ssh host1 ‘ls -la’ ) you have a minimal 
environment and you do not source the .profile file, and other environment 
related files. But the /etc/environment is sourced so you could set JAVA_HOME 
there. Technically you should set BASH_ENV there which should point to a file 
containing the environment variables you need.

For more info see 
http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man,
 or man bash

Thank you,
Alex

From: Olivier Renault [mailto:orena...@hortonworks.com]
Sent: Wednesday, March 25, 2015 10:44 AM
To: user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

It should be :
export JAVA_HOME=…

Olivier


From: Brahma Reddy Battula
Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Date: Wednesday, 25 March 2015 08:28
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org, Anand Murali
Subject: RE: Hadoop 2.6.0 Error

HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...



Thanks  Regards

Brahma Reddy Battula






From: Anand Murali [anand_vi...@yahoo.commailto:anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error
Dear All:

Even after setting JAVA_HOME in .profile I get

JAVA_HOME is not set and could not be found -error.


If anyone of you know of a more stable version please do let me know.

Thanks,

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)


On Wednesday, March 25, 2015 12:57 PM, Anand Murali 
anand_vi...@yahoo.commailto:anand_vi...@yahoo.com wrote:

Dear Mr. Bhrama Reddy:

Should I type

SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64


in root (profile) or at user level (.profile). Reply most welcome

Thanks

Regards


Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)


On Wednesday, March 25, 2015 12:37 PM, Anand Murali 
anand_vi...@yahoo.commailto:anand_vi...@yahoo.com wrote:

Dear All:

I get this error shall try setting JAVA_HOME in .profile

Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$mailto:anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$

Thanks

Anand Murali
11/7, 'Anand Vihar', Kandasamy St, Mylapore
Chennai - 600 004, India
Ph: (044)- 28474593/ 43526162 (voicemail)


On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.commailto:brahmareddy.batt...@huawei.com wrote:

Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like 
putting in /etc/profile...)

For more details please check the following jira.

https://issues.apache.org/jira/browse/HADOOP-11538https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11538d=AwMGaQc=011sAuWmEGuAVOWaydfJseI4cXS0FNE1rSze05FbvBEr=ZAivIGqQeG1xHpYmaYPBcgf03sOn2v2J3r9hTy7sK4jmLRosJPeq02_uC8oa5Pa7m=C3HqXnaRYKppI_bCvbqjXzXa6SZIULL08MPZDtxtGNMs=7JTHoL-82HcBMB5Gdgtjowf5Dv96ZX8GY9TEEzMoIsUe=


Thanks  Regards
 Brahma Reddy Battula


From: Anand Murali [anand_vi...@yahoo.commailto:anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 11:23 AM
To: User Hadoop
Subject: Hadoop 2.6.0 Error
Dear All:

Request help/advise as I am unable to start Hadoop. Performed follow steps in 
Ubuntu 14.10

1. ssh localhost
2. Did following exports in user defined hadoop.sh and ran it succesfully
1. EXPORT JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
2. EXPORT HADOOP_INSTALL=/home/anand_vihar/hadoop-2.6.0
3. EXPORT PATH=:$PATH:$HADOOP_INSTALL/sbin:$HADOOP_INSTALL/bin
3. Tested hadoop version succesfully
4. Ran $hadoop namenode -format successfully
5. Modified core-site.xml, hdfs-site.xml and mapred-site.xml to 
pseudo-distributed mode in /home/anand_vihar/conf directory
6. Ran $start-dfs.sh --config /home/anand_vihar/conf

Got error JAVA_HOME not set and slaves not found in /conf. If I echo $JAVA_HOME 
it is pointing to 

Re: Trusted-realm vs default-realm kerberos issue

2015-03-25 Thread Alexander Alten-Lorenz
Do you have mapping rules, which tells Hadoop that the trusted realm is allowed 
to login? 
http://mapredit.blogspot.de/2015/02/hadoop-and-trusted-mitv5-kerberos-with.html 
http://mapredit.blogspot.de/2015/02/hadoop-and-trusted-mitv5-kerberos-with.html

BR,
 Alex


 On 24 Mar 2015, at 18:21, Michael Segel michael_se...@hotmail.com wrote:
 
 So… 
 
 If I understand, you’re saying you have a one way trust set up so that the 
 cluster’s AD trusts the Enterprise AD? 
 
 And by AD you really mean KDC? 
 
 On Mar 17, 2015, at 2:22 PM, John Lilley john.lil...@redpoint.net 
 mailto:john.lil...@redpoint.net wrote:
 
 AD
 
 The opinions expressed here are mine, while they may reflect a cognitive 
 thought, that is purely accidental. 
 Use at your own risk. 
 Michael Segel
 michael_segel (AT) hotmail.com http://hotmail.com/
 
 
 
 
 



Re: Hadoop 2.6.0 Error

2015-03-25 Thread Anand Murali
Dear Mr.RagavendraGanesh

I did first try setting JAVA_HOME to the JDK path in hadoop-env.sh. It still 
does not work. I have been looking for lower versions of Hadoop on apache, but 
they have removed them except  1.X, which is antiquated. Suggestions welcome.
Thanks
 Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail) 


 On Wednesday, March 25, 2015 5:33 PM, hadoop.supp...@visolve.com 
hadoop.supp...@visolve.com wrote:
   

 !--#yiv3212172614 _filtered #yiv3212172614 {font-family:Helvetica;panose-1:2 
11 6 4 2 2 2 2 2 4;} _filtered #yiv3212172614 {font-family:Cambria 
Math;panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv3212172614 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv3212172614 
{font-family:Verdana;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv3212172614 
{font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv3212172614 
#yiv3212172614 p.yiv3212172614MsoNormal, #yiv3212172614 
li.yiv3212172614MsoNormal, #yiv3212172614 div.yiv3212172614MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:Times New 
Roman, serif;}#yiv3212172614 a:link, #yiv3212172614 
span.yiv3212172614MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv3212172614 a:visited, #yiv3212172614 
span.yiv3212172614MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv3212172614 p 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:Times New 
Roman, serif;}#yiv3212172614 span {}#yiv3212172614 
span.yiv3212172614EmailStyle19 {font-family:Calibri, 
sans-serif;color:#1F497D;}#yiv3212172614 span.yiv3212172614EmailStyle20 
{font-family:Calibri, sans-serif;color:#1F497D;}#yiv3212172614 
.yiv3212172614MsoChpDefault {font-size:10.0pt;} _filtered #yiv3212172614 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv3212172614 div.yiv3212172614WordSection1 
{}--Hello Anand,  Set your Java home in hadoop-env.sh - 
/usr/local/hadoop/etc/hadoop/hadoop-env.sh  export 
JAVA_HOME='/usr/lib/jvm/java-7-openjdk-amd64'  It would resolve your error.  
Thanks,S.RagavendraGaneshViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com email: servi...@visolve.com | Phone: 408-850-2243    
From: Alexandru Pacurar [mailto:alexandru.pacu...@propertyshark.com] 
Sent: Wednesday, March 25, 2015 3:17 PM
To: user@hadoop.apache.org
Subject: RE: Hadoop 2.6.0 Error  Hello,  I had a similar problem and my 
solution to this was setting JAVA_HOME in /etc/environment.  The problem is, 
from what I remember, that the start-dfs.sh script calls hadoop-daemons.sh with 
the necessary options to start the Hadoop daemons. hadoop-daemons.sh in turn 
calls hadoop-daemon.sh with the necessary options via ssh, in an 
non-interactive fashion. When you are executing a command via ssh in a 
non-interactive manner (ex. ssh host1 ‘ls -la’ ) you have a minimal environment 
and you do not source the .profile file, and other environment related files. 
But the /etc/environment is sourced so you could set JAVA_HOME there. 
Technically you should set BASH_ENV there which should point to a file 
containing the environment variables you need.  For more info see 
http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man,
 or man bash  Thank you,Alex  From: Olivier Renault 
[mailto:orena...@hortonworks.com] 
Sent: Wednesday, March 25, 2015 10:44 AM
To: user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error  It should be : export JAVA_HOME=…  Olivier    
From: Brahma Reddy Battula
Reply-To: user@hadoop.apache.org
Date: Wednesday, 25 March 2015 08:28
To: user@hadoop.apache.org, Anand Murali
Subject: RE: Hadoop 2.6.0 Error  HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...  Thanks  Regards Brahma Reddy Battula   From: Anand Murali 
[anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To: user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 ErrorDear All:  Even after setting JAVA_HOME in 
.profile I get  JAVA_HOME is not set and could not be found -error.    If 
anyone of you know of a more stable version please do let me know.  Thanks, 
Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)    On Wednesday, March 25, 2015 
12:57 PM, Anand Murali anand_vi...@yahoo.com wrote:  Dear Mr. Bhrama Reddy:  
Should I type  SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64    in root 
(profile) or at user level (.profile). Reply most welcome  Thanks  Regards   
Anand Murali  11/7, 'Anand Vihar', Kandasamy St, MylaporeChennai - 600 004, 
IndiaPh: (044)- 28474593/ 43526162 (voicemail)    On Wednesday, March 25, 2015 
12:37 PM, Anand Murali anand_vi...@yahoo.com wrote:  Dear All:  I get this 

namenode recovery

2015-03-25 Thread Brian Jeltema
I have a question about a recovery scenario for Hadoop 2.4.

I have a small development cluster, no HA configured, that was taken down 
cleanly, 
that is, all services were stopped (via Ambari) and all the nodes were then 
rebooted.
However, the reboot of the namenode system failed; that system is completely 
dead.
The only HDFS service running on the system was the namenode; the secondary 
namenode
was running elsewhere and came back, as well as all of the datanodes. 

In this scenario, can I just start a namenode on one of the other nodes? Will 
it recover
the fsimage that was checkpointed by the secondary namenode? 

Thanks
Brian

can block size for namenode be different from datanode block size?

2015-03-25 Thread Dr Mich Talebzadeh

Hi,

The block size for HDFS is currently set to 128MB by defauilt. This is
configurable.

My point is that I assume this  parameter in hadoop-core.xml sets the
block size for both namenode and datanode. However, the storage and
random access for metadata in nsamenode is different and suits smaller
block sizes.

For example in Linux the OS block size is 4k which means one HTFS blopck
size  of 128MB can hold 32K OS blocks. For metadata this may not be
useful and smaller block size will be suitable and hence my question.

Thanks,

Mich


can block size for namenode be different from datanode block size?

2015-03-25 Thread Dr Mich Talebzadeh

Hi,

The block size for HDFS is currently set to 128MB by defauilt. This is
configurable.

My point is that I assume this  parameter in hadoop-core.xml sets the
block size for both namenode and datanode. However, the storage and
random access for metadata in nsamenode is different and suits smaller
block sizes.

For example in Linux the OS block size is 4k which means one HTFS blopck
size  of 128MB can hold 32K OS blocks. For metadata this may not be
useful and smaller block size will be suitable and hence my question.

Thanks,

Mich


Re: can block size for namenode be different from datanode block size?

2015-03-25 Thread Mirko Kämpf
Hi Mich,

please see the comments in your text.



2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk:


 Hi,

 The block size for HDFS is currently set to 128MB by defauilt. This is
 configurable.

Correct, an HDFS client can overwrite the cfg-property and define a
different block size for HDFS blocks.


 My point is that I assume this  parameter in hadoop-core.xml sets the
 block size for both namenode and datanode.

Correct, the block-size is a HDFS wide setting but in general the
HDFS-client makes the blocks.


 However, the storage and
 random access for metadata in nsamenode is different and suits smaller
 block sizes.

HDFS blocksize has no impact here. NameNode metadata is held in memory. For
reliability it is dumped to local discs of the server.



 For example in Linux the OS block size is 4k which means one HTFS blopck
 size  of 128MB can hold 32K OS blocks. For metadata this may not be
 useful and smaller block size will be suitable and hence my question.

Remember, metadata is in memory. The fsimage-file, which contains the
metadata
is loaded on startup of the NameNode.

Please be not confused by the two types of block-sizes.

Hope this helps a bit.
Cheers,
Mirko



 Thanks,

 Mich



Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mirko Kämpf
Correct, let's say you run the NameNode with just 1GB of RAM.
This would be a very strong limitation for the cluster. For each file we
need about 200 bytes and for each block as well. Now we can estimate the
max. capacity depending on HDFS-Blocksize and average File size.

Cheers,
Mirko

2015-03-25 15:34 GMT+00:00 Mich Talebzadeh m...@peridale.co.uk:

 Hi Mirko,

 Thanks for feedback.

 Since i have worked with in memory databases, this metadata caching sounds
 more like an IMDB that caches data at start up from disk resident storage.

 IMDBs tend to get issues when the cache cannot hold all data. Is this the
 case the case with metada as well?

 Regards,

 Mich
 Let your email find you with BlackBerry from Vodafone
 --
 *From: * Mirko Kämpf mirko.kae...@gmail.com
 *Date: *Wed, 25 Mar 2015 15:20:03 +
 *To: *user@hadoop.apache.orguser@hadoop.apache.org
 *ReplyTo: * user@hadoop.apache.org
 *Subject: *Re: can block size for namenode be different from datanode
 block size?

 Hi Mich,

 please see the comments in your text.



 2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk:


 Hi,

 The block size for HDFS is currently set to 128MB by defauilt. This is
 configurable.

 Correct, an HDFS client can overwrite the cfg-property and define a
 different block size for HDFS blocks.


 My point is that I assume this  parameter in hadoop-core.xml sets the
 block size for both namenode and datanode.

 Correct, the block-size is a HDFS wide setting but in general the
 HDFS-client makes the blocks.


 However, the storage and
 random access for metadata in nsamenode is different and suits smaller
 block sizes.

 HDFS blocksize has no impact here. NameNode metadata is held in memory.
 For reliability it is dumped to local discs of the server.



 For example in Linux the OS block size is 4k which means one HTFS blopck
 size  of 128MB can hold 32K OS blocks. For metadata this may not be
 useful and smaller block size will be suitable and hence my question.

 Remember, metadata is in memory. The fsimage-file, which contains the
 metadata
 is loaded on startup of the NameNode.

 Please be not confused by the two types of block-sizes.

 Hope this helps a bit.
 Cheers,
 Mirko



 Thanks,

 Mich





Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mich Talebzadeh
Hi Mirko,

Thanks for feedback.

Since i have worked with in memory databases, this metadata caching sounds more 
like an IMDB that caches data at start up from disk resident storage.

IMDBs tend to get issues when the cache cannot hold all data. Is this the case 
the case with metada as well?

Regards,

Mich
Let your email find you with BlackBerry from Vodafone

-Original Message-
From: Mirko Kämpf mirko.kae...@gmail.com
Date: Wed, 25 Mar 2015 15:20:03 
To: user@hadoop.apache.orguser@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Re: can block size for namenode be different from datanode block size?

Hi Mich,

please see the comments in your text.



2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk:


 Hi,

 The block size for HDFS is currently set to 128MB by defauilt. This is
 configurable.

Correct, an HDFS client can overwrite the cfg-property and define a
different block size for HDFS blocks.


 My point is that I assume this  parameter in hadoop-core.xml sets the
 block size for both namenode and datanode.

Correct, the block-size is a HDFS wide setting but in general the
HDFS-client makes the blocks.


 However, the storage and
 random access for metadata in nsamenode is different and suits smaller
 block sizes.

HDFS blocksize has no impact here. NameNode metadata is held in memory. For
reliability it is dumped to local discs of the server.



 For example in Linux the OS block size is 4k which means one HTFS blopck
 size  of 128MB can hold 32K OS blocks. For metadata this may not be
 useful and smaller block size will be suitable and hence my question.

Remember, metadata is in memory. The fsimage-file, which contains the
metadata
is loaded on startup of the NameNode.

Please be not confused by the two types of block-sizes.

Hope this helps a bit.
Cheers,
Mirko



 Thanks,

 Mich




Identifying new files in HDFS

2015-03-25 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

We have a requirement to process only new files in HDFS on a daily basis. I
am sure this is a general requirement in many ETL kind of processing
scenarios. Just wondering if there is a way to identify new files that are
added to a path in HDFS? For example, assume already some files were
present for sometime. Now I have added new files today. So wanted to
process only those new files. What is the best way to achieve this.

Thanks  Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Identifying new files on HDFS

2015-03-25 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

We have a requirement to process only new files in HDFS on a daily basis. I
am sure this is a general requirement in many ETL kind of processing
scenarios. Just wondering if there is a way to identify new files that are
added to a path in HDFS? For example, assume already some files were
present for sometime. Now I have added new files today. So wanted to
process only those new files. What is the best way to achieve this.

Thanks  Regards
Vijay


*Vijay Bhoomireddy*, Big Data Architect

1000 Great West Road, Brentford, London, TW8 9DW
*T:  +44 20 3475 7980*
*M: **+44 7481 298 360*
*W: *ww http://www.whishworks.com/w.whishworks.com
http://www.whishworks.com/

https://www.linkedin.com/company/whishworks
http://www.whishworks.com/blog/  https://twitter.com/WHISHWORKS
https://www.facebook.com/whishworksit

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.


Re: Identifying new files on HDFS

2015-03-25 Thread Mich Talebzadeh
Hi,

Have you considered taking snapshot of files at close of business and compare 
it with the new snapshot and process only new ones? Just a simple shell script 
will do.

HTH
Let your email find you with BlackBerry from Vodafone

-Original Message-
From: Vijaya Narayana Reddy Bhoomi Reddy vijaya.bhoomire...@whishworks.com
Date: Wed, 25 Mar 2015 09:55:57 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Identifying new files on HDFS

Hi,

We have a requirement to process only new files in HDFS on a daily basis. I
am sure this is a general requirement in many ETL kind of processing
scenarios. Just wondering if there is a way to identify new files that are
added to a path in HDFS? For example, assume already some files were
present for sometime. Now I have added new files today. So wanted to
process only those new files. What is the best way to achieve this.

Thanks  Regards
Vijay


*Vijay Bhoomireddy*, Big Data Architect

1000 Great West Road, Brentford, London, TW8 9DW
*T:  +44 20 3475 7980*
*M: **+44 7481 298 360*
*W: *ww http://www.whishworks.com/w.whishworks.com
http://www.whishworks.com/

https://www.linkedin.com/company/whishworks
http://www.whishworks.com/blog/  https://twitter.com/WHISHWORKS
https://www.facebook.com/whishworksit

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.



Re: Intermittent BindException during long MR jobs

2015-03-25 Thread Krishna Rao
Thanks for the responses. In our case the port is 0, and so from the link
http://wiki.apache.org/hadoop/BindException Ted mentioned it says that a
collision is highly unlikely:

If the port is 0, then the OS is looking for any free port -so the
port-in-use and port-below-1024 problems are highly unlikely to be the
cause of the problem.

I think load may be the culprit since the nodes will be heavily used during
the times that the exception occurs.

Is there anyway to set/increase the timeout for the call/connection
attempt? In all cases so far it seems to be on a call to delete a file in
HDFS. I had a search through the HDFS code base but couldn't see an obvious
way to set a timeout, and couldn't see it being set.

Krishna

On 28 February 2015 at 15:20, Ted Yu yuzhih...@gmail.com wrote:

 Krishna:
 Please take a look at:
 http://wiki.apache.org/hadoop/BindException

 Cheers

 On Thu, Feb 26, 2015 at 10:30 PM, hadoop.supp...@visolve.com wrote:

 Hello Krishna,



 Exception seems to be IP specific. It might be occurred due to
 unavailability of IP address in the system to assign. Double check the IP
 address availability and run the job.



 *Thanks,*

 *S.RagavendraGanesh*

 ViSolve Hadoop Support Team
 ViSolve Inc. | San Jose, California
 Website: www.visolve.com

 email: servi...@visolve.com | Phone: 408-850-2243





 *From:* Krishna Rao [mailto:krishnanj...@gmail.com]
 *Sent:* Thursday, February 26, 2015 9:48 PM
 *To:* u...@hive.apache.org; user@hadoop.apache.org
 *Subject:* Intermittent BindException during long MR jobs



 Hi,



 we occasionally run into a BindException causing long running jobs to
 occasionally fail.



 The stacktrace is below.



 Any ideas what this could be caused by?



 Cheers,



 Krishna





 Stacktrace:

 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job
 Submission failed with exception 'java.net.BindException(Problem binding to
 [back10/10.4.2.10:0] java.net.BindException: Cann

 ot assign requested address; For more details see:
 http://wiki.apache.org/hadoop/BindException)'

 java.net.BindException: Problem binding to [back10/10.4.2.10:0]
 java.net.BindException: Cannot assign requested address; For more details
 see:  http://wiki.apache.org/hadoop/BindException

 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)

 at org.apache.hadoop.ipc.Client.call(Client.java:1242)

 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

 at com.sun.proxy.$Proxy10.create(Unknown Source)

 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)

 at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

 at com.sun.proxy.$Proxy11.create(Unknown Source)

 at
 org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376)

 at
 org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)

 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)

 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)

 at
 org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)

 at
 org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)

 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)

 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
 

Re: Intermittent BindException during long MR jobs

2015-03-25 Thread Krishna Rao
Thanks for the responses. In our case the port is 0, and so from the 
linkhttp://wiki.apache.org/hadoop/BindException Ted mentioned it says that a 
collision is highly unlikely:

If the port is 0, then the OS is looking for any free port -so the 
port-in-use and port-below-1024 problems are highly unlikely to be the cause of 
the problem.

I think load may be the culprit since the nodes will be heavily used during the 
times that the exception occurs.

Is there anyway to set/increase the timeout for the call/connection attempt? In 
all cases so far it seems to be on a call to delete a file in HDFS. I had a 
search through the HDFS code base but couldn't see an obvious way to set a 
timeout, and couldn't see it being set.


Krishna


On 28 February 2015 at 15:20, Ted Yu 
yuzhih...@gmail.commailto:yuzhih...@gmail.com wrote:
Krishna:
Please take a look at:
http://wiki.apache.org/hadoop/BindException

Cheers

On Thu, Feb 26, 2015 at 10:30 PM, 
hadoop.supp...@visolve.commailto:hadoop.supp...@visolve.com wrote:
Hello Krishna,

Exception seems to be IP specific. It might be occurred due to unavailability 
of IP address in the system to assign. Double check the IP address availability 
and run the job.

Thanks,
S.RagavendraGanesh
ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.comhttp://www.visolve.com
email: servi...@visolve.commailto:servi...@visolve.com | Phone: 
408-850-2243tel:408-850-2243


From: Krishna Rao [mailto:krishnanj...@gmail.commailto:krishnanj...@gmail.com]
Sent: Thursday, February 26, 2015 9:48 PM
To: u...@hive.apache.orgmailto:u...@hive.apache.org; 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Intermittent BindException during long MR jobs

Hi,

we occasionally run into a BindException causing long running jobs to 
occasionally fail.

The stacktrace is below.

Any ideas what this could be caused by?

Cheers,

Krishna


Stacktrace:
379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job Submission 
failed with exception 'java.net.BindException(Problem binding to 
[back10/10.4.2.10:0http://10.4.2.10:0] java.net.BindException: Cann
ot assign requested address; For more details see:  
http://wiki.apache.org/hadoop/BindException)'
java.net.BindException: Problem binding to 
[back10/10.4.2.10:0http://10.4.2.10:0] java.net.BindException: Cannot assign 
requested address; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)
at org.apache.hadoop.ipc.Client.call(Client.java:1242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at com.sun.proxy.$Proxy10.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)
at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at com.sun.proxy.$Proxy11.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)
at 
org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)
at 
org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)
at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)
at java.security.AccessController.doPrivileged(Native 

Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mirko Kämpf
This 200 bytes is just a mental helper not a precise measure. And it does
NOT take replication into account.
Each replica block has again another item of approx. 200 bytes in the NN
memory.
MK


2015-03-25 17:16 GMT+00:00 Mich Talebzadeh m...@peridale.co.uk:

 Great. Does that 200 bytes for each block include overhead for three
 replicas? So with 128MB block a 1GB file will be 8 blocks with 200 + 8x200
 around 1800 bytes memory in namenode?

 Thx
 Let your email find you with BlackBerry from Vodafone
 --
 *From: * Mirko Kämpf mirko.kae...@gmail.com
 *Date: *Wed, 25 Mar 2015 16:08:02 +
 *To: *user@hadoop.apache.orguser@hadoop.apache.org; m...@peridale.co.uk
 
 *ReplyTo: * user@hadoop.apache.org
 *Subject: *Re: can block size for namenode be different from wdatanode
 block size?

 Correct, let's say you run the NameNode with just 1GB of RAM.
 This would be a very strong limitation for the cluster. For each file we
 need about 200 bytes and for each block as well. Now we can estimate the
 max. capacity depending on HDFS-Blocksize and average File size.

 Cheers,
 Mirko

 2015-03-25 15:34 GMT+00:00 Mich Talebzadeh m...@peridale.co.uk:

 Hi Mirko,

 Thanks for feedback.

 Since i have worked with in memory databases, this metadata caching
 sounds more like an IMDB that caches data at start up from disk resident
 storage.

 IMDBs tend to get issues when the cache cannot hold all data. Is this the
 case the case with metada as well?

 Regards,

 Mich
 Let your email find you with BlackBerry from Vodafone
 --
 *From: * Mirko Kämpf mirko.kae...@gmail.com
 *Date: *Wed, 25 Mar 2015 15:20:03 +
 *To: *user@hadoop.apache.orguser@hadoop.apache.org
 *ReplyTo: * user@hadoop.apache.org
 *Subject: *Re: can block size for namenode be different from datanode
 block size?

 Hi Mich,

 please see the comments in your text.



 2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk:


 Hi,

 The block size for HDFS is currently set to 128MB by defauilt. This is
 configurable.

 Correct, an HDFS client can overwrite the cfg-property and define a
 different block size for HDFS blocks.


 My point is that I assume this  parameter in hadoop-core.xml sets the
 block size for both namenode and datanode.

 Correct, the block-size is a HDFS wide setting but in general the
 HDFS-client makes the blocks.


 However, the storage and
 random access for metadata in nsamenode is different and suits smaller
 block sizes.

 HDFS blocksize has no impact here. NameNode metadata is held in memory.
 For reliability it is dumped to local discs of the server.



 For example in Linux the OS block size is 4k which means one HTFS blopck
 size  of 128MB can hold 32K OS blocks. For metadata this may not be
 useful and smaller block size will be suitable and hence my question.

 Remember, metadata is in memory. The fsimage-file, which contains the
 metadata
 is loaded on startup of the NameNode.

 Please be not confused by the two types of block-sizes.

 Hope this helps a bit.
 Cheers,
 Mirko



 Thanks,

 Mich






Re: can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mich Talebzadeh
Great. Does that 200 bytes for each block include overhead for three replicas? 
So with 128MB block a 1GB file will be 8 blocks with 200 + 8x200 around 1800 
bytes memory in namenode?

Thx
Let your email find you with BlackBerry from Vodafone

-Original Message-
From: Mirko Kämpf mirko.kae...@gmail.com
Date: Wed, 25 Mar 2015 16:08:02 
To: user@hadoop.apache.orguser@hadoop.apache.org; m...@peridale.co.uk
Reply-To: user@hadoop.apache.org
Subject: Re: can block size for namenode be different from wdatanode block size?

Correct, let's say you run the NameNode with just 1GB of RAM.
This would be a very strong limitation for the cluster. For each file we
need about 200 bytes and for each block as well. Now we can estimate the
max. capacity depending on HDFS-Blocksize and average File size.

Cheers,
Mirko

2015-03-25 15:34 GMT+00:00 Mich Talebzadeh m...@peridale.co.uk:

 Hi Mirko,

 Thanks for feedback.

 Since i have worked with in memory databases, this metadata caching sounds
 more like an IMDB that caches data at start up from disk resident storage.

 IMDBs tend to get issues when the cache cannot hold all data. Is this the
 case the case with metada as well?

 Regards,

 Mich
 Let your email find you with BlackBerry from Vodafone
 --
 *From: * Mirko Kämpf mirko.kae...@gmail.com
 *Date: *Wed, 25 Mar 2015 15:20:03 +
 *To: *user@hadoop.apache.orguser@hadoop.apache.org
 *ReplyTo: * user@hadoop.apache.org
 *Subject: *Re: can block size for namenode be different from datanode
 block size?

 Hi Mich,

 please see the comments in your text.



 2015-03-25 15:11 GMT+00:00 Dr Mich Talebzadeh m...@peridale.co.uk:


 Hi,

 The block size for HDFS is currently set to 128MB by defauilt. This is
 configurable.

 Correct, an HDFS client can overwrite the cfg-property and define a
 different block size for HDFS blocks.


 My point is that I assume this  parameter in hadoop-core.xml sets the
 block size for both namenode and datanode.

 Correct, the block-size is a HDFS wide setting but in general the
 HDFS-client makes the blocks.


 However, the storage and
 random access for metadata in nsamenode is different and suits smaller
 block sizes.

 HDFS blocksize has no impact here. NameNode metadata is held in memory.
 For reliability it is dumped to local discs of the server.



 For example in Linux the OS block size is 4k which means one HTFS blopck
 size  of 128MB can hold 32K OS blocks. For metadata this may not be
 useful and smaller block size will be suitable and hence my question.

 Remember, metadata is in memory. The fsimage-file, which contains the
 metadata
 is loaded on startup of the NameNode.

 Please be not confused by the two types of block-sizes.

 Hope this helps a bit.
 Cheers,
 Mirko



 Thanks,

 Mich






Re: can block size for namenode be different from datanode block size?

2015-03-25 Thread Ravi Prakash
Hi Mich!

The block size you are referring to is used only on the datanodes. The file 
that the namenode writes (fsimage OR editlog) is not chunked using this block 
size.
HTHRavi
 


 On Wednesday, March 25, 2015 8:12 AM, Dr Mich Talebzadeh 
m...@peridale.co.uk wrote:
   

 
Hi,

The block size for HDFS is currently set to 128MB by defauilt. This is
configurable.

My point is that I assume this  parameter in hadoop-core.xml sets the
block size for both namenode and datanode. However, the storage and
random access for metadata in nsamenode is different and suits smaller
block sizes.

For example in Linux the OS block size is 4k which means one HTFS blopck
size  of 128MB can hold 32K OS blocks. For metadata this may not be
useful and smaller block size will be suitable and hence my question.

Thanks,

Mich

  

Re: Can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Harsh J
 2. The block size is only relevant to DataNodes (DN). NameNode (NN)
does not use this parameter

Actually, as a configuration, its only relevant to the client. See also
http://www.quora.com/How-do-I-check-HDFS-blocksize-default-custom

Other points sound about right, except the ability to do (7) can only now
be done if you have legacy mode of fsimage writes enabled. The new OIV tool
in recent releases only serves a REST based Web Server to query the file
data upon.

On Thu, Mar 26, 2015 at 1:47 AM, Mich Talebzadeh m...@peridale.co.uk
wrote:

 Thank you all for your contribution.



 I have summarised the findings as below



 1. The Hadoop block size is a configurable parameter dfs.block.size
 in bytes . By default this is set to 134217728 bytes or 128MB

 2. The block size is only relevant to DataNodes (DN). NameNode (NN)
 does not use this parameter

 3. NN behaves like an in-memory database IMDB and uses a disk file
 system called the FsImage to load the metadata as startup. This is the only
 place that I see value for Solid State Disk to make this initial load faster

 4. For the remaining period until HDFS shutdown or otherwise NN will
 use the in memory cache to access metadata

 5. With regard to sizing of NN to store metadata, one can use the
 following rules of thumb (heuristics):

 a. NN consumes roughly 1GB for every 1 million blokes (source Hadoop
 Operations, Eric Sammer, ISBN: 978-1-499-3205-7). So if you have 128MB
 block size, you can store  128 * 1E6 / (3 *1024) = 41,666GB of data for
 every 1GB. Number 3 comes from the fact that the block is replicated three
 times. In other words just under 42TB of data. So if you have 10GB of
 namenode cache, you can have up to 420TB of data on your datanodes

 6. You can take FsImage file from Hadoop and convert it into a text
 file as follows:



 *hdfs dfsadmin -fetchImage nnimage*



 15/03/25 20:17:40 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable

 15/03/25 20:17:41 INFO namenode.TransferFsImage: Opening connection to
 http://rhes564:50070/imagetransfer?getimage=1txid=latest

 15/03/25 20:17:41 INFO namenode.TransferFsImage: Image Transfer timeout
 configured to 6 milliseconds

 15/03/25 20:17:41 WARN namenode.TransferFsImage: Overwriting existing file
 nnimage with file downloaded from
 http://rhes564:50070/imagetransfer?getimage=1txid=latest

 15/03/25 20:17:41 INFO namenode.TransferFsImage: Transfer took 0.03s at
 1393.94 KB/s



 7. That create an image file in the current directory that can be
 converted to text file

 *hdfs  oiv -i nnimage -o nnimage.txt*



 15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading 2 strings

 15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading 543
 inodes.

 15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading inode
 references

 15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loaded 0 inode
 references

 15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading inode
 directory section

 15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loaded 198
 directories

 15/03/25 20:20:07 INFO offlineImageViewer.WebImageViewer: WebImageViewer
 started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.



 Let me know if I missed  anything or got it wrong.



 HTH



 Mich Talebzadeh



 http://talebzadehmich.wordpress.com



 *Publications due shortly:*

 *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
 Coherence Cache*



 NOTE: The information in this email is proprietary and confidential. This
 message is for the designated recipient only, if you are not the intended
 recipient, you should destroy it immediately. Any information in this
 message shall not be understood as given or endorsed by Peridale Ltd, its
 subsidiaries or their employees, unless expressly so stated. It is the
 responsibility of the recipient to ensure that this email is virus free,
 therefore neither Peridale Ltd, its subsidiaries nor their employees accept
 any responsibility.






-- 
Harsh J


Swap requirements

2015-03-25 Thread Abdul I Mohammed
Hello all,

I am trying to figure out what are the swap requirements for name node and data 
nodes?  Some vendor docs that I read says to set vm.swappiness to 0 which is 
telling kernel to not to use swap?  

What is the default settings the community is using for there clusters?  

Also does the below parameter uses swap memory or regular memory in yarn?
Yarn.nodemanager.Vmem-pmem-ratio

RE: Swap requirements

2015-03-25 Thread Mich Talebzadeh
Hi,

I do not think DataNodes (DN) require any swapping as swap is only used for 
paging in case of running out of memory.

NameNodes (NN) behave like an in-memory database IMDB (like Oracle TimesTen) 
and uses a disk file system called the FsImage to load the metadata as startup. 
For the remaining period until HDFS is shutdown or otherwise NN will use the in 
memory cache to access metadata. Now the only occasion I see the swap space 
will be used is when NN runs out of memory and start swapping to disk. That is 
using swap space like memory which is undesirable. Not very clear to me.

HTH

Mich Talebzadeh

http://talebzadehmich.wordpress.com

Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.


-Original Message-
From: Abdul I Mohammed [mailto:oracle.bl...@gmail.com] 
Sent: 25 March 2015 20:45
To: HDP mailing list
Subject: Swap requirements

Hello all,

I am trying to figure out what are the swap requirements for name node and data 
nodes?  Some vendor docs that I read says to set vm.swappiness to 0 which is 
telling kernel to not to use swap?  

What is the default settings the community is using for there clusters?  

Also does the below parameter uses swap memory or regular memory in yarn?
Yarn.nodemanager.Vmem-pmem-ratio



Re: Identifying new files on HDFS

2015-03-25 Thread Harsh J
Look at timestamps of the file? HDFS maintains both mtimes and atimes
(latter's not exposed in -ls though).

In ETL context, a simple workflow system also resolves this. You have an
incoming directory, a done directory, and a destination directory, etc. and
you can move around files pre/post processing for every job, to manage new
content/avoid repeated processing (as one simple example).

On Wed, Mar 25, 2015 at 11:11 PM, Mich Talebzadeh m...@peridale.co.uk
wrote:

 Hi,

 Have you considered taking snapshot of files at close of business and
 compare it with the new snapshot and process only new ones? Just a simple
 shell script will do.

 HTH
 Let your email find you with BlackBerry from Vodafone
 --
 *From: * Vijaya Narayana Reddy Bhoomi Reddy 
 vijaya.bhoomire...@whishworks.com
 *Date: *Wed, 25 Mar 2015 09:55:57 +
 *To: *user@hadoop.apache.org
 *ReplyTo: * user@hadoop.apache.org
 *Subject: *Identifying new files on HDFS

 Hi,

 We have a requirement to process only new files in HDFS on a daily basis.
 I am sure this is a general requirement in many ETL kind of processing
 scenarios. Just wondering if there is a way to identify new files that
 are added to a path in HDFS? For example, assume already some files were
 present for sometime. Now I have added new files today. So wanted to
 process only those new files. What is the best way to achieve this.

 Thanks  Regards
 Vijay


 *Vijay Bhoomireddy*, Big Data Architect

 1000 Great West Road, Brentford, London, TW8 9DW
 *T:  +44 20 3475 7980 %2B44%2020%203475%207980*
 *M: **+44 7481 298 360 %2B44%207481%20298%20360*
 *W: *ww http://www.whishworks.com/w.whishworks.com
 http://www.whishworks.com/

 https://www.linkedin.com/company/whishworks
 http://www.whishworks.com/blog/  https://twitter.com/WHISHWORKS
 https://www.facebook.com/whishworksit

 The contents of this e-mail are confidential and for the exclusive use of
 the intended recipient. If you receive this e-mail in error please delete
 it from your system immediately and notify us either by e-mail or
 telephone. You should not copy, forward or otherwise disclose the content
 of the e-mail. The views expressed in this communication may not
 necessarily be the view held by WHISHWORKS.




-- 
Harsh J


Can block size for namenode be different from wdatanode block size?

2015-03-25 Thread Mich Talebzadeh
Thank you all for your contribution.

 

I have summarised the findings as below

 

1. The Hadoop block size is a configurable parameter dfs.block.size in 
bytes . By default this is set to 134217728 bytes or 128MB

2. The block size is only relevant to DataNodes (DN). NameNode (NN) does 
not use this parameter

3. NN behaves like an in-memory database IMDB and uses a disk file system 
called the FsImage to load the metadata as startup. This is the only place that 
I see value for Solid State Disk to make this initial load faster

4. For the remaining period until HDFS shutdown or otherwise NN will use 
the in memory cache to access metadata

5. With regard to sizing of NN to store metadata, one can use the following 
rules of thumb (heuristics):

a. NN consumes roughly 1GB for every 1 million blokes (source Hadoop 
Operations, Eric Sammer, ISBN: 978-1-499-3205-7). So if you have 128MB block 
size, you can store  128 * 1E6 / (3 *1024) = 41,666GB of data for every 1GB. 
Number 3 comes from the fact that the block is replicated three times. In other 
words just under 42TB of data. So if you have 10GB of namenode cache, you can 
have up to 420TB of data on your datanodes

6. You can take FsImage file from Hadoop and convert it into a text file as 
follows:

 

hdfs dfsadmin -fetchImage nnimage

 

15/03/25 20:17:40 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

15/03/25 20:17:41 INFO namenode.TransferFsImage: Opening connection to 
http://rhes564:50070/imagetransfer?getimage=1txid=latest

15/03/25 20:17:41 INFO namenode.TransferFsImage: Image Transfer timeout 
configured to 6 milliseconds

15/03/25 20:17:41 WARN namenode.TransferFsImage: Overwriting existing file 
nnimage with file downloaded from 
http://rhes564:50070/imagetransfer?getimage=1txid=latest

15/03/25 20:17:41 INFO namenode.TransferFsImage: Transfer took 0.03s at 1393.94 
KB/s

 

7. That create an image file in the current directory that can be converted 
to text file

hdfs  oiv -i nnimage -o nnimage.txt

 

15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading 2 strings

15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading 543 inodes.

15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading inode 
references

15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loaded 0 inode 
references

15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loading inode 
directory section

15/03/25 20:20:07 INFO offlineImageViewer.FSImageHandler: Loaded 198 directories

15/03/25 20:20:07 INFO offlineImageViewer.WebImageViewer: WebImageViewer 
started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.

 

Let me know if I missed  anything or got it wrong.

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 



RE: Swap requirements

2015-03-25 Thread Mich Talebzadeh
Yes I believe that is the case.

 

This is very common from days of Max shared memory on Solaris etc. Large 
applications tend to have processes with large virtual address spaces. This is 
typically the result of attaching to large shared memory segments used by 
applications and large copy-on-write (COW) segments that get mapped but 
sometimes never actually get touched. The net effect of this is that on the 
host supporting multiple applications, the virtual address space requirements 
will grow to be quite large, typically exceeding the physical memory. 
Consequently, a fair amount of swap disk needs to be configured to support 
these applications  with large virtual address space running concurrently. In 
the old days this would typically be 1.2* shared memory segment or RAM

 

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly so stated. It is the responsibility of the 
recipient to ensure that this email is virus free, therefore neither Peridale 
Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: max scalf [mailto:oracle.bl...@gmail.com] 
Sent: 25 March 2015 23:05
To: user@hadoop.apache.org
Subject: Re: Swap requirements

 

Thank you harsh.  Can you please explain what you mean when u said Just simple 
virtual memory used by the process ?  Doesn't virtual memory means swap?

On Wednesday, March 25, 2015, Harsh J ha...@cloudera.com wrote:

The suggestion (regarding swappiness) is not for disabling swap as much as it 
is to 'not using swap (until really necessary)'. When you run a constant 
memory-consuming service such as HBase you'd ideally want the RAM to serve up 
as much as it can, which setting that swappiness value helps do (the OS 
otherwise begins swapping way before its available physical RAM is nearing full 
state).

 

The vmem-pmem ratio is something entirely else. The vmem of a process does not 
mean swap space usage, just simple virtual memory used by the . I'd recommend 
disabling YARN's vmem checks on today's OSes (but keep pmem checks on). You can 
read some more on this at 
http://www.quora.com/Why-do-some-applications-use-significantly-more-virtual-memory-on-RHEL-6-compared-to-RHEL-5

 

On Thu, Mar 26, 2015 at 3:37 AM, Abdul I Mohammed oracle.bl...@gmail.com 
javascript:_e(%7B%7D,'cvml','oracle.bl...@gmail.com');  wrote:

Thanks Mith...any idea about Yarn.nodemanager.Vmem-pmem-ratio parameter...

If data nodes does not require swap then what about the above parameter?  What 
is that used for in yarn?





 

-- 

Harsh J



Re: Swap requirements

2015-03-25 Thread Harsh J
The suggestion (regarding swappiness) is not for disabling swap as much as
it is to 'not using swap (until really necessary)'. When you run a constant
memory-consuming service such as HBase you'd ideally want the RAM to serve
up as much as it can, which setting that swappiness value helps do (the OS
otherwise begins swapping way before its available physical RAM is nearing
full state).

The vmem-pmem ratio is something entirely else. The vmem of a process does
not mean swap space usage, just simple virtual memory used by the process.
I'd recommend disabling YARN's vmem checks on today's OSes (but keep pmem
checks on). You can read some more on this at
http://www.quora.com/Why-do-some-applications-use-significantly-more-virtual-memory-on-RHEL-6-compared-to-RHEL-5

On Thu, Mar 26, 2015 at 3:37 AM, Abdul I Mohammed oracle.bl...@gmail.com
wrote:

 Thanks Mith...any idea about Yarn.nodemanager.Vmem-pmem-ratio parameter...

 If data nodes does not require swap then what about the above parameter?
 What is that used for in yarn?




-- 
Harsh J


Re: namenode recovery

2015-03-25 Thread Harsh J
Not automatically. You'd need to copy over the VERSION and fsimage* files
from your SecondaryNN's checkpoint directory over to the new NameNode's
configured name directory to start it back up with the checkpointed data.

On Wed, Mar 25, 2015 at 5:40 PM, Brian Jeltema 
brian.jelt...@digitalenvoy.net wrote:

 I have a question about a recovery scenario for Hadoop 2.4.

 I have a small development cluster, no HA configured, that was taken down
 cleanly,
 that is, all services were stopped (via Ambari) and all the nodes were
 then rebooted.
 However, the reboot of the namenode system failed; that system is
 completely dead.
 The only HDFS service running on the system was the namenode; the
 secondary namenode
 was running elsewhere and came back, as well as all of the datanodes.

 In this scenario, can I just start a namenode on one of the other nodes?
 Will it recover
 the fsimage that was checkpointed by the secondary namenode?

 Thanks
 Brian




-- 
Harsh J


Re: Swap requirements

2015-03-25 Thread max scalf
Thank you harsh.  Can you please explain what you mean when u said Just
simple virtual memory used by the process ?  Doesn't virtual memory means
swap?

On Wednesday, March 25, 2015, Harsh J ha...@cloudera.com wrote:

 The suggestion (regarding swappiness) is not for disabling swap as much as
 it is to 'not using swap (until really necessary)'. When you run a constant
 memory-consuming service such as HBase you'd ideally want the RAM to serve
 up as much as it can, which setting that swappiness value helps do (the OS
 otherwise begins swapping way before its available physical RAM is nearing
 full state).

 The vmem-pmem ratio is something entirely else. The vmem of a process does
 not mean swap space usage, just simple virtual memory used by the . I'd
 recommend disabling YARN's vmem checks on today's OSes (but keep pmem
 checks on). You can read some more on this at
 http://www.quora.com/Why-do-some-applications-use-significantly-more-virtual-memory-on-RHEL-6-compared-to-RHEL-5

 On Thu, Mar 26, 2015 at 3:37 AM, Abdul I Mohammed oracle.bl...@gmail.com
 javascript:_e(%7B%7D,'cvml','oracle.bl...@gmail.com'); wrote:

 Thanks Mith...any idea about Yarn.nodemanager.Vmem-pmem-ratio parameter...

 If data nodes does not require swap then what about the above parameter?
 What is that used for in yarn?




 --
 Harsh J



Re: Swap requirements

2015-03-25 Thread daemeon reiydelle
You will find much information if you Google configuring Linux page
stealing. This is actually the core of the problem with swap (and throwing
away pages of shared libraries). Or talk to your devops team about how to
avoid page stealing in systems with large memory storage footprints ...
especially as you may find java heaps in the 10-15gb are practicable with
tuning.



*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Mar 25, 2015 at 4:14 PM, Mich Talebzadeh m...@peridale.co.uk
wrote:

 Yes I believe that is the case.



 This is very common from days of Max shared memory on Solaris etc. Large
 applications tend to have processes with large virtual address spaces. This
 is typically the result of attaching to large shared memory segments used
 by applications and large copy-on-write (COW) segments that get mapped but
 sometimes never actually get touched. The net effect of this is that on the
 host supporting multiple applications, the virtual address space
 requirements will grow to be quite large, typically exceeding the physical
 memory. Consequently, a fair amount of swap disk needs to be configured to
 support these applications  with large virtual address space running
 concurrently. In the old days this would typically be 1.2* shared memory
 segment or RAM





 HTH



 Mich Talebzadeh



 http://talebzadehmich.wordpress.com



 *Publications due shortly:*

 *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
 Coherence Cache*



 NOTE: The information in this email is proprietary and confidential. This
 message is for the designated recipient only, if you are not the intended
 recipient, you should destroy it immediately. Any information in this
 message shall not be understood as given or endorsed by Peridale Ltd, its
 subsidiaries or their employees, unless expressly so stated. It is the
 responsibility of the recipient to ensure that this email is virus free,
 therefore neither Peridale Ltd, its subsidiaries nor their employees accept
 any responsibility.



 *From:* max scalf [mailto:oracle.bl...@gmail.com]
 *Sent:* 25 March 2015 23:05
 *To:* user@hadoop.apache.org
 *Subject:* Re: Swap requirements



 Thank you harsh.  Can you please explain what you mean when u said Just
 simple virtual memory used by the process ?  Doesn't virtual memory means
 swap?

 On Wednesday, March 25, 2015, Harsh J ha...@cloudera.com wrote:

 The suggestion (regarding swappiness) is not for disabling swap as much as
 it is to 'not using swap (until really necessary)'. When you run a constant
 memory-consuming service such as HBase you'd ideally want the RAM to serve
 up as much as it can, which setting that swappiness value helps do (the OS
 otherwise begins swapping way before its available physical RAM is nearing
 full state).



 The vmem-pmem ratio is something entirely else. The vmem of a process does
 not mean swap space usage, just simple virtual memory used by the . I'd
 recommend disabling YARN's vmem checks on today's OSes (but keep pmem
 checks on). You can read some more on this at
 http://www.quora.com/Why-do-some-applications-use-significantly-more-virtual-memory-on-RHEL-6-compared-to-RHEL-5



 On Thu, Mar 26, 2015 at 3:37 AM, Abdul I Mohammed oracle.bl...@gmail.com
 wrote:

 Thanks Mith...any idea about Yarn.nodemanager.Vmem-pmem-ratio parameter...

 If data nodes does not require swap then what about the above parameter?
 What is that used for in yarn?





 --

 Harsh J



RE: Hadoop 2.6.0 Error

2015-03-25 Thread hadoop.support
Hello Anand,

 

Set your Java home in hadoop-env.sh - /usr/local/hadoop/etc/hadoop/hadoop-env.sh

 

export JAVA_HOME='/usr/lib/jvm/java-7-openjdk-amd64'

 

It would resolve your error.

 

Thanks,

S.RagavendraGanesh

ViSolve Hadoop Support Team
ViSolve Inc. | San Jose, California
Website: www.visolve.com http://www.visolve.com  

email: servi...@visolve.com mailto:servi...@visolve.com  | Phone: 408-850-2243

 

 

From: Alexandru Pacurar [mailto:alexandru.pacu...@propertyshark.com] 
Sent: Wednesday, March 25, 2015 3:17 PM
To: user@hadoop.apache.org
Subject: RE: Hadoop 2.6.0 Error

 

Hello,

 

I had a similar problem and my solution to this was setting JAVA_HOME in 
/etc/environment.

 

The problem is, from what I remember, that the start-dfs.sh script calls 
hadoop-daemons.sh with the necessary options to start the Hadoop daemons. 
hadoop-daemons.sh in turn calls hadoop-daemon.sh with the necessary options via 
ssh, in an non-interactive fashion. When you are executing a command via ssh in 
a non-interactive manner (ex. ssh host1 ‘ls -la’ ) you have a minimal 
environment and you do not source the .profile file, and other environment 
related files. But the /etc/environment is sourced so you could set JAVA_HOME 
there. Technically you should set BASH_ENV there which should point to a file 
containing the environment variables you need.

 

For more info see  
http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man
 
http://stackoverflow.com/questions/216202/why-does-an-ssh-remote-command-get-fewer-environment-variables-then-when-run-man,
 or man bash

 

Thank you,

Alex

 

From: Olivier Renault [ mailto:orena...@hortonworks.com 
mailto:orena...@hortonworks.com] 
Sent: Wednesday, March 25, 2015 10:44 AM
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

 

It should be : 

export JAVA_HOME=…

 

Olivier

 

 

From: Brahma Reddy Battula
Reply-To:  mailto:user@hadoop.apache.org user@hadoop.apache.org
Date: Wednesday, 25 March 2015 08:28
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org, Anand Murali
Subject: RE: Hadoop 2.6.0 Error

 

HI

Ideally it should take effect , if you configure in .profile or hadoop-env.sh..

As you told that you set in .profile ( hope you did source ~/.profile ),,,

 did you verify that take effect..?  ( by checking echo $JAVA_HOME,, or 
jps..etc )...

 

Thanks  Regards 

Brahma Reddy Battula

 

 

  _  

From: Anand Murali [ mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com]
Sent: Wednesday, March 25, 2015 1:30 PM
To:  mailto:user@hadoop.apache.org user@hadoop.apache.org; Anand Murali
Subject: Re: Hadoop 2.6.0 Error

Dear All:

 

Even after setting JAVA_HOME in .profile I get

 

JAVA_HOME is not set and could not be found -error.

 

 

If anyone of you know of a more stable version please do let me know.

 

Thanks,

 

Anand Murali  

11/7, 'Anand Vihar', Kandasamy St, Mylapore

Chennai - 600 004, India

Ph: (044)- 28474593/ 43526162 (voicemail)

 

 

On Wednesday, March 25, 2015 12:57 PM, Anand Murali  
mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com wrote:

 

Dear Mr. Bhrama Reddy:

 

Should I type

 

SET JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

 

 

in root (profile) or at user level (.profile). Reply most welcome

 

Thanks

 

Regards

 

 

Anand Murali  

11/7, 'Anand Vihar', Kandasamy St, Mylapore

Chennai - 600 004, India

Ph: (044)- 28474593/ 43526162 (voicemail)

 

 

On Wednesday, March 25, 2015 12:37 PM, Anand Murali  
mailto:anand_vi...@yahoo.com anand_vi...@yahoo.com wrote:

 

Dear All:

 

I get this error shall try setting JAVA_HOME in .profile

 

Starting namenodes on [localhost]
localhost: Error: JAVA_HOME is not set and could not be found.
cat: /home/anand_vihar/hadoop-2.6.0/conf/slaves: No such file or directory
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
 mailto:anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$ 
anand_vihar@Latitude-E5540:~/hadoop-2.6.0/sbin$

 

Thanks

 

Anand Murali  

11/7, 'Anand Vihar', Kandasamy St, Mylapore

Chennai - 600 004, India

Ph: (044)- 28474593/ 43526162 (voicemail)

 

 

On Wednesday, March 25, 2015 12:22 PM, Brahma Reddy Battula  
mailto:brahmareddy.batt...@huawei.com brahmareddy.batt...@huawei.com wrote:

 

Instead of exporting the JAVA_HOME, Please set JAVA_HOME in system level ( like 
putting in /etc/profile...)

For more details please check the following jira.

 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HADOOP-2D11538d=AwMGaQc=011sAuWmEGuAVOWaydfJseI4cXS0FNE1rSze05FbvBEr=ZAivIGqQeG1xHpYmaYPBcgf03sOn2v2J3r9hTy7sK4jmLRosJPeq02_uC8oa5Pa7m=C3HqXnaRYKppI_bCvbqjXzXa6SZIULL08MPZDtxtGNMs=7JTHoL-82HcBMB5Gdgtjowf5Dv96ZX8GY9TEEzMoIsUe=
 https://issues.apache.org/jira/browse/HADOOP-11538

 

Thanks  Regards

 Brahma Reddy Battula

 

  _  

From: Anand Murali [ 

Re: Swap requirements

2015-03-25 Thread Abdul I Mohammed
Thanks Mith...any idea about Yarn.nodemanager.Vmem-pmem-ratio parameter...

If data nodes does not require swap then what about the above parameter?  What 
is that used for in yarn?