Re: After deleting data of Hbase table hdfs size is not decreasing HDFS-15812

2021-02-11 Thread Shashwat Shriparv
Please check TTL configuraton


*Warm Regards,*
*Shashwat Shriparv*
*http://bit.ly/14cHpad <http://bit.ly/14cHpad> *

*http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>*
*http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>*
http://helpmetocode.blogspot.in/
http://photoinfinity.blogspot.in/
http://writingishabit.blogspot.in/
http://realiq.blogspot.in/
http://sshriparv.blogspot.in/
https://goo.gl/M8Us3B
https://goo.gl/nrI2mv
https://500px.com/shriparv
https://www.flickr.com/photos/55141469@N02/
https://about.me/shriparv
ISBN - 10: 1783985941

ISBN - 13: 9781783985944
[image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Thu, 11 Feb 2021 at 12:14, satya prakash gaurav 
wrote:

> Hi Team,
> Can anyone please help on this issue?
>
> Regards,
> Satya
>
> On Wed, Feb 3, 2021 at 7:27 AM satya prakash gaurav 
> wrote:
>
>> Hi Team,
>>
>> I have raised a jira HDFS-15812
>> We are using the hdp 3.1.4.0-315 and hbase 2.0.2.3.1.4.0-315.
>>
>> We are deleting the data with normal hbase delete command and even with
>> api using phoenix. The count is reducing on phoenix and hbase but the
>> Hdfs size of the hbase directory is not reducing even I ran the major
>> compaction.
>>
>> Regards,
>> Satya
>>
>>
>>
>
> --
> --
> Regards,
> S.P.Gaurav
>
>


Re: Running pig script on a remote cluster

2020-02-11 Thread Shashwat Shriparv
nohup  &


*Warm Regards,*
*Shashwat Shriparv*
*http://bit.ly/14cHpad <http://bit.ly/14cHpad> *

*http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>*
*http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>*
http://helpmetocode.blogspot.in/
http://photoinfinity.blogspot.in/
http://writingishabit.blogspot.in/
http://realiq.blogspot.in/
http://sshriparv.blogspot.in/
https://goo.gl/M8Us3B
https://goo.gl/nrI2mv
https://500px.com/shriparv
https://www.flickr.com/photos/55141469@N02/
https://about.me/shriparv
ISBN - 10: 1783985941

ISBN - 13: 9781783985944
[image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Wed, 12 Feb 2020 at 04:48, Daniel Santos  wrote:

> Hello,
>
> I managed to create a properties file with the following contents :
>
> fs.defaultFS=hdfs://hadoopnamenode:9000
> mapreduce.framework.name=yarn
> yarn.resourcemanager.address=hadoopresourcemanager:8032
>
> It is now submitting the jobs to the cluster. I also set the HADOOP_HOME
> on my laptop to point to the same version of hadoop that is running on the
> cluster (2.7.0). I am running pig version  0.17
>
> Then a main class not found error happened on the yarn nodes where the job
> was scheduled to run. I had to add the following to yarn-site.xml and
> restart yarn and the nodes :
>
> 
> mapreduce.application.classpath
>
> /home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*,/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*
> 
>
> After this change, the script ran. But the pig command only returned after
> the job finished.
> Does anyone know how to launch the script and exit immediately to the
> shell ?
> If the job takes a long time I will have to keep the terminal open.
>
> Thanks,
> Regards
>
>
> > On 11 Feb 2020, at 05:25, Vinod Kumar Vavilapalli 
> wrote:
> >
> > It’s running the job in local mode (LocalJobRunner), that’s why. Please
> check your configuration files and make sure that the right directories are
> on the classpath. Also look in mapred-site.xml for
> mapreduce.framework.name (should be yarn).
> >
> > Thanks
> > +Vinod
> >
> >> On Feb 11, 2020, at 2:09 AM, Daniel Santos 
> wrote:
> >>
> >> Hello all,
> >>
> >> I have developed a script in my laptop. The script is now ready to be
> unleashed in a non secured cluster.
> >> But when I do : pig -x mapreduce 

Re: Why hdfs don't have current working directory

2017-05-26 Thread Shashwat Shriparv
If you can write an HDFS shell you can have pwd and cd.



*Warm Regards,*
*Shashwat Shriparv*
*http://bit.ly/14cHpad <http://bit.ly/14cHpad> *

*http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>*
*http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>*
http://helpmetocode.blogspot.in/
http://photoinfinity.blogspot.in/
http://writingishabit.blogspot.in/
http://realiq.blogspot.in/
http://sshriparv.blogspot.in/
https://goo.gl/M8Us3B
https://goo.gl/nrI2mv
https://500px.com/shriparv
https://www.flickr.com/photos/55141469@N02/
https://about.me/shriparv
ISBN - 10: 1783985941

ISBN - 13: 9781783985944
[image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 


On Fri, May 26, 2017 at 3:44 PM, Sidharth Kumar  wrote:

> Hi,
>
> Can you kindly explain me why hdfs doesnt have current directory concept.
> Why Hadoop is not implement to use pwd? Why command like cd and PWD cannot
> be implemented in hdfs?
>
> Regards
> Sidharth
> Mob: +91 819799 <081975%2055599>
> LinkedIn: www.linkedin.com/in/sidharthkumar2792
>


Re: Mapreduce job got stuck

2015-04-15 Thread shashwat shriparv
What is your yarn.nodemanager.address address ?



*Warm Regards,*
*Shashwat Shriparv*
*http://bit.ly/14cHpad <http://bit.ly/14cHpad> *

*http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>*
*http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>*
[image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 

On Wed, Apr 15, 2015 at 3:42 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> Please check the error logs. and send the logs.
>
>
>
> On Wed, Apr 15, 2015 at 3:33 PM, Vandana kumari 
> wrote:
>
>> nodemanager
>
>
>
>
>
>
> *Warm Regards,*
> *Shashwat Shriparv*
> *http://bit.ly/14cHpad <http://bit.ly/14cHpad> *
>
> *http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>*
> *http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>*
> [image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
> <https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] 
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]
> <http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 
>


Re: Mapreduce job got stuck

2015-04-15 Thread shashwat shriparv
Please check the error logs. and send the logs.



On Wed, Apr 15, 2015 at 3:33 PM, Vandana kumari 
wrote:

> nodemanager






*Warm Regards,*
*Shashwat Shriparv*
*http://bit.ly/14cHpad <http://bit.ly/14cHpad> *

*http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>*
*http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>*
[image: https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 


Re: Re: Question about the behavior of HDFS.

2014-12-18 Thread shashwat shriparv
​Its opening for me any how i am attaching the document for you... :)​



On Fri, Dec 19, 2014 at 9:31 AM, firefly...@gmail.com 
wrote:
>
> http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf





*Warm Regards_**∞_*
* Shashwat Shriparv*
[image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 


hdfs_design.pdf
Description: Adobe PDF document


Re: Question about the behavior of HDFS.

2014-12-18 Thread shashwat shriparv
Please read this once ..

http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf


*Warm Regards_**∞_*
* Shashwat Shriparv*
[image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]
<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]
<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 


On Fri, Dec 19, 2014 at 9:18 AM, bit1...@163.com  wrote:
>
> Hi Hadoopers,
>
> I got a question about the behavior of HDFS.
>
> Say, there are 1 namenode and 10 data nodes.
>
> On the namenode machine, i upload a 1G file to HDFS. Will this 1G file be
> distributed evenly  to the data nodes, and there is no data stored on the
> namenode?
> If I upload the the data from the data node, will the file still distributed
> evenly to all the data nodes ? I think if most of the data reside on the
> node that i upload the data, it will save the network, but this leads to
> another problem, when MR this file,
> most of time will be spent on this node because it has to process most of
> the data.
>
> --
> bit1...@163.com
>


Re: The reduce copier failed

2014-03-21 Thread shashwat shriparv
​Check if the tmp dir, hdfs remaining or log directory are getting filled
up while this job runs..​

On Fri, Mar 21, 2014 at 12:11 PM, Mahmood Naderan wrote:

> that imply a *retry* process? Or I have to be wo


​



*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 


Re: GC overhead limit exceeded

2014-03-07 Thread shashwat shriparv
Check this out

http://ask.gopivotal.com/hc/en-us/articles/201850408-Namenode-fails-with-java-lang-OutOfMemoryError-GC-overhead-limit-exceeded



* Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Fri, Mar 7, 2014 at 12:04 PM, haihong lu  wrote:

> Hi:
>
>  i have a problem when run Hibench with hadoop-2.2.0, the wrong
> message list as below
>
> 14/03/07 13:54:53 INFO mapreduce.Job:  map 19% reduce 0%
> 14/03/07 13:54:54 INFO mapreduce.Job:  map 21% reduce 0%
> 14/03/07 14:00:26 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_20_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:00:27 INFO mapreduce.Job:  map 20% reduce 0%
> 14/03/07 14:00:40 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_08_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:00:41 INFO mapreduce.Job:  map 19% reduce 0%
> 14/03/07 14:00:59 INFO mapreduce.Job:  map 20% reduce 0%
> 14/03/07 14:00:59 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_15_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:01:00 INFO mapreduce.Job:  map 19% reduce 0%
> 14/03/07 14:01:03 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_23_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:01:11 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_26_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:01:35 INFO mapreduce.Job:  map 20% reduce 0%
> 14/03/07 14:01:35 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_19_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:01:36 INFO mapreduce.Job:  map 19% reduce 0%
> 14/03/07 14:01:43 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_07_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:00 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_00_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:01 INFO mapreduce.Job:  map 18% reduce 0%
> 14/03/07 14:02:23 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_21_0, Status : FAILED
> Error: Java heap space
> 14/03/07 14:02:24 INFO mapreduce.Job:  map 17% reduce 0%
> 14/03/07 14:02:31 INFO mapreduce.Job:  map 18% reduce 0%
> 14/03/07 14:02:33 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_29_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:34 INFO mapreduce.Job:  map 17% reduce 0%
> 14/03/07 14:02:38 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_10_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:41 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_18_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:43 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_14_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:47 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_28_0, Status : FAILED
> Error: Java heap space
> 14/03/07 14:02:50 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_02_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:51 INFO mapreduce.Job:  map 16% reduce 0%
> 14/03/07 14:02:51 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_05_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:52 INFO mapreduce.Job:  map 15% reduce 0%
> 14/03/07 14:02:55 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_06_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:57 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_27_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:02:58 INFO mapreduce.Job:  map 14% reduce 0%
> 14/03/07 14:03:04 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_09_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:03:05 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_17_0, Status : FAILED
> Error: GC overhead limit exceeded
> 14/03/07 14:03:05 INFO mapreduce.Job: Task Id :
> attempt_1394160253524_0010_m_22_0, Status : FAILED
> Error: GC overhead limi

Re: How to solve it ? java.io.IOException: Failed on local exception

2014-03-05 Thread shashwat shriparv
What is the code that you are trying?


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Wed, Mar 5, 2014 at 2:12 PM, Stanley Shi  wrote:

> which version of hadoop you are using?
> This is something similar with your error log:
> http://stackoverflow.com/questions/19895969/can-access-hadoop-fs-through-shell-but-not-through-java-main
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Wed, Mar 5, 2014 at 4:29 PM, 张超  wrote:
>
>> Hi all,
>> Here is a problem that confuses me.
>>
>> when I use java code to manipulate pseudo-distributed hadoop , it throws
>> an exception:
>>
>> java.io.IOException: Failed on local exception: java.io.EOFException;
>> Host Details : local host is: "localhost/127.0.0.1"; destination host
>> is: ""localhost":9000;
>>
>> I have imported the "core-site.xml" file, hadoop-common.jar and
>> hadoop-hdfs.jar into my project, and also run the  pseudo-distributed
>> cluster.
>>
>> How can I solve the problem ? Please tell me, thank you !
>>
>
>


Re: Streaming data access in HDFS: Design Feature

2014-03-05 Thread shashwat shriparv
Streaming means process it as its coming to HDFS, like where in hadoop this
hadoop streaming enable hadoop to receive data using executable of
different types

i hope you have already read this :
http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe wrote:

> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>


Re: Need help to understand hadoop.tmp.dir

2014-03-03 Thread shashwat shriparv
NO need to format just change the value and restart the cluster;


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Mon, Mar 3, 2014 at 1:55 PM, Chengwei Yang wrote:

> On Mon, Mar 03, 2014 at 11:56:08AM +0530, shashwat shriparv wrote:
> > Ya its always better to change the temp dir path in hadoop, as it will
> prevent
> > deletion of file while the server reboots.
>
> Thanks, so is there anyway to recovery from this state? Or I have to format
> namenode again?
>
> --
> Thanks,
> Chengwei
>
> >
> >
> > Warm Regards_∞_
> > Shashwat Shriparv
> >
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9https://twitter.com/
> > shriparvhttps://
> www.facebook.com/shriparvhttp://google.com/+ShashwatShriparv
> > http://www.youtube.com/user/sShriparv/videoshttp://profile.yahoo.com/
> > SWXSTW3DVSDTF2HHSRM47AV6DI/
> >
> >
> >
> > On Mon, Mar 3, 2014 at 11:52 AM, Chengwei Yang <
> chengwei.yang...@gmail.com>
> > wrote:
> >
> > On Mon, Mar 03, 2014 at 11:25:59AM +0530, shashwat shriparv wrote:
> > > You can use any directory you like beside permissions are right.
> >
> > I mean if it's better if we change the default hadoop.tmp.dir?
> Because it
> > can not work cross reboot in default Linux environment.
> >
> > --
> > Thanks,
> > Chengwei
> >
> > >
> > >
> > > Warm Regards_∞_
> > > Shashwat Shriparv
> > > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9https://
> > twitter.com/
> > > shriparvhttps://www.facebook.com/shriparvhttp://google.com/
> > +ShashwatShriparv
> > >
> http://www.youtube.com/user/sShriparv/videoshttp://profile.yahoo.com/
> > > SWXSTW3DVSDTF2HHSRM47AV6DI/
> > >
> > >
> > >
> > > On Mon, Mar 3, 2014 at 11:07 AM, Chengwei Yang <
> > chengwei.yang...@gmail.com>
> > > wrote:
> > >
> > > Hi List,
> > >
> > > I'm confusing by hadoop.tmp.dir currently because its default
> value
> > > "/tmp/hadoop-${user.name}" always means a directory in tmpfs
> in
> > Linux.
> > > So after the name node machine reboot, it gone away and then
> name
> > node
> > > fail to start.
> > >
> > > I found this was reported here.
> > >
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201205.mbox
> > /
> > > %3cbay148-w22bf95c5fbe2c40bf7cd9f86...@phx.gbl%3E
> > >
> > > As I found from http://hadoop.apache.org/docs/r2.3.0/, there
> are a
> > lot
> > > properties are based on hadoop.tmp.dir, like
> > > dfs.namenode.name.dir   file://${hadoop.tmp.dir}/dfs/name
> > >
> > > I'm wondering, if we can set the default value of
> hadoop.tmp.dir to
> > > a non-tmpfs direcotry if it doesn't work at all by using a
> real tmpfs
> > > directory?
> > >
> > > --
> > > Thanks,
> > > Chengwei
> > >
> > >
> >
> >
>


Re: Need help to understand hadoop.tmp.dir

2014-03-02 Thread shashwat shriparv
Ya its always better to change the temp dir path in hadoop, as it will
prevent deletion of file while the server reboots.


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Mon, Mar 3, 2014 at 11:52 AM, Chengwei Yang
wrote:

> On Mon, Mar 03, 2014 at 11:25:59AM +0530, shashwat shriparv wrote:
> > You can use any directory you like beside permissions are right.
>
> I mean if it's better if we change the default hadoop.tmp.dir? Because it
> can not work cross reboot in default Linux environment.
>
> --
> Thanks,
> Chengwei
>
> >
> >
> > Warm Regards_∞_
> > Shashwat Shriparv
> >
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9https://twitter.com/
> > shriparvhttps://
> www.facebook.com/shriparvhttp://google.com/+ShashwatShriparv
> > http://www.youtube.com/user/sShriparv/videoshttp://profile.yahoo.com/
> > SWXSTW3DVSDTF2HHSRM47AV6DI/
> >
> >
> >
> > On Mon, Mar 3, 2014 at 11:07 AM, Chengwei Yang <
> chengwei.yang...@gmail.com>
> > wrote:
> >
> > Hi List,
> >
> > I'm confusing by hadoop.tmp.dir currently because its default value
> > "/tmp/hadoop-${user.name}" always means a directory in tmpfs in
> Linux.
> > So after the name node machine reboot, it gone away and then name
> node
> > fail to start.
> >
> > I found this was reported here.
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201205.mbox/
> > %3cbay148-w22bf95c5fbe2c40bf7cd9f86...@phx.gbl%3E
> >
> > As I found from http://hadoop.apache.org/docs/r2.3.0/, there are a
> lot
> > properties are based on hadoop.tmp.dir, like
> > dfs.namenode.name.dir   file://${hadoop.tmp.dir}/dfs/name
> >
> > I'm wondering, if we can set the default value of hadoop.tmp.dir to
> > a non-tmpfs direcotry if it doesn't work at all by using a real tmpfs
> > directory?
> >
> > --
> > Thanks,
> > Chengwei
> >
> >
>


Re: Need help to understand hadoop.tmp.dir

2014-03-02 Thread shashwat shriparv
You can use any directory you like beside permissions are right.


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Mon, Mar 3, 2014 at 11:07 AM, Chengwei Yang
wrote:

> Hi List,
>
> I'm confusing by hadoop.tmp.dir currently because its default value
> "/tmp/hadoop-${user.name}" always means a directory in tmpfs in Linux.
> So after the name node machine reboot, it gone away and then name node
> fail to start.
>
> I found this was reported here.
>
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201205.mbox/%3cbay148-w22bf95c5fbe2c40bf7cd9f86...@phx.gbl%3E
>
> As I found from http://hadoop.apache.org/docs/r2.3.0/, there are a lot
> properties are based on hadoop.tmp.dir, like
> dfs.namenode.name.dir   file://${hadoop.tmp.dir}/dfs/name
>
> I'm wondering, if we can set the default value of hadoop.tmp.dir to
> a non-tmpfs direcotry if it doesn't work at all by using a real tmpfs
> directory?
>
> --
> Thanks,
> Chengwei
>


Re: Re: hadoop Exception: java.io.IOException: Couldn't set up IO streams

2014-02-28 Thread shashwat shriparv
Great ...


* Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Fri, Feb 28, 2014 at 7:33 AM, leiwang...@gmail.com
wrote:

>
> Thanks, it works after increase the ulimit number.
> --
>   leiwang...@gmail.com
>
>  *From:* shashwat shriparv 
> *Date:* 2014-02-27 12:43
> *To:* user 
> *CC:* leiwangouc 
> *Subject:* Re: hadoop Exception: java.io.IOException: Couldn't set up IO
> streams
>   ​Try to increase ulimit for the machine and the user under which
> process runs.
> ​
>
> On Thu, Feb 27, 2014 at 9:35 AM, sudhakara st wrote:
>
>> Caused by: java.lang.OutOfMemoryError: unable to create new native thread"
>
>
> ​​
>
>
>
> *Warm Regards_**∞_*
> *Shashwat Shriparv*
>  [image: 
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] 
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 
>
>


Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread shashwat shriparv
These links you can follow

http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/bk_installing_manually_book/content/rpm_chap3.html
http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.html?m=1
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
http://raseshmori.wordpress.com/2012/09/23/install-hadoop-2-0-1-yarn-nextgen/




*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Fri, Feb 28, 2014 at 1:04 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> If you want to go for license free you can go for either Ubuntu or CentOS
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: 
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] 
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 
>
>
>
> On Fri, Feb 28, 2014 at 12:18 PM, Alex Lee  wrote:
>
>> Hi Zhi Jie,
>>
>> Thanks, I am going through it. But may need to select a linux os first.
>> Any suggesiton.
>>
>> Alex
>>
>> --
>> Date: Thu, 27 Feb 2014 22:29:56 -0800
>> Subject: Re: Newbie, any tutorial for install hadoop 2.3 with proper
>> linux version
>> From: zs...@hortonworks.com
>> To: user@hadoop.apache.org
>>
>>
>> This is the link about cluster setup:
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
>>
>> - Zhijie
>>
>>
>> On Thu, Feb 27, 2014 at 9:41 PM, Alex Lee  wrote:
>>
>> Hello,
>>
>> I am quite a newbie here. And want to setup hadoop 2.3 on 4 new PCs.
>> Later may add more PCs into it. Is there any tutorial I can learn from,
>> such as the which linux version I should use, how to setup the linux, and
>> how to install the hadoop step by step.
>>
>> I am trying to setup cluster and aim to store TB data. Any suggestion。
>>
>> With Best Regards,
>>
>> Alex
>>
>>
>>
>>
>> --
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>
>
>


Re: Newbie, any tutorial for install hadoop 2.3 with proper linux version

2014-02-27 Thread shashwat shriparv
If you want to go for license free you can go for either Ubuntu or CentOS


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Fri, Feb 28, 2014 at 12:18 PM, Alex Lee  wrote:

> Hi Zhi Jie,
>
> Thanks, I am going through it. But may need to select a linux os first.
> Any suggesiton.
>
> Alex
>
> --
> Date: Thu, 27 Feb 2014 22:29:56 -0800
> Subject: Re: Newbie, any tutorial for install hadoop 2.3 with proper linux
> version
> From: zs...@hortonworks.com
> To: user@hadoop.apache.org
>
>
> This is the link about cluster setup:
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
>
> - Zhijie
>
>
> On Thu, Feb 27, 2014 at 9:41 PM, Alex Lee  wrote:
>
> Hello,
>
> I am quite a newbie here. And want to setup hadoop 2.3 on 4 new PCs. Later
> may add more PCs into it. Is there any tutorial I can learn from, such as
> the which linux version I should use, how to setup the linux, and how to
> install the hadoop step by step.
>
> I am trying to setup cluster and aim to store TB data. Any suggestion。
>
> With Best Regards,
>
> Alex
>
>
>
>
> --
> Zhijie Shen
> Hortonworks Inc.
> http://hortonworks.com/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Client and NN communication in Hadoop

2014-02-26 Thread shashwat shriparv
The application that try to write the file, for example the binary "hadoop"
found in the bin dir of hadoop is a hadoop client, whoever wants to read or
write from to hadoop is a client, you can also write hadoop client using
java apis which hadoop provide.

like following is an example of hadoop client which tries to read file from
hadoop

import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class Cat{
public static void main (String [] args) throws Exception{
try{
Path pt=new
Path("hdfs://shashwat:9000/home/shriparv/fileonhdfstoread.txt");
FileSystem filesysObject = FileSystem.get(new
Configuration());
BufferedReader bufferReaderObject=new
BufferedReader(new InputStreamReader(filesysObject.open(pt)));
String line;
line=bufferReaderObject.readLine();
while (line != null){
System.out.println(line);
line=bufferReaderObject.readLine();
}
}catch(Exception e){
}
}
}

Hope this clear some of your doubts


* Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Thu, Feb 27, 2014 at 9:53 AM, navaz  wrote:

> Hi
>
>
>
> I have a Hadoop cluster running with 3 slaves and 1 Master. Slaves are
> Datanodes and running Tasktarckers. Namenode is running jobtracker and
> secondary namenode. I am running sample mapreduce after loading a file into
> HDFS.
>
>
>
> According to Hadoop architecture ,before writing a file into HDFS ,
>  client will contact the Namenode and get the location details of DNs and
> then client directly write the file into DN.  What is this client ? is it
> an application running on Namenode ? Is user and client both are different
> ? How can I see the messages between client and datanodes?
>
>
>
> Appreciate your response.
>
>
>
>
>
> Thanks & Regards
>
>
>
> Abdul Navaz
>
> Graduate Student
>
> University of Houston Main Campus,Texas
>
> Mob: (281) 685-0388
>
>
>


Re: hadoop Exception: java.io.IOException: Couldn't set up IO streams

2014-02-26 Thread shashwat shriparv
​Try to increase ulimit for the machine and the user under which process
runs.
​

On Thu, Feb 27, 2014 at 9:35 AM, sudhakara st wrote:

> Caused by: java.lang.OutOfMemoryError: unable to create new native thread"


​​



* Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 


Re: query

2014-02-25 Thread shashwat shriparv
Try these links:

http://wiki.apache.org/hadoop/EclipseEnvironment

http://blog.cloudera.com/blog/2013/05/how-to-configure-eclipse-for-hadoop-contributions/
http://blog.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html



* Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Wed, Feb 26, 2014 at 11:19 AM, Banty Sharma wrote:

> Hii,
>
> step by step i want to build and develop hadoop in eclipse in
> windows...can anybody help to find the source code of hadoop and document
> how i can import that in eclipse in windows..
>
> Thanx n Regards
>
> Jhanver sharma
>
>
> On Mon, Feb 24, 2014 at 4:03 PM, Banty Sharma wrote:
>
>> hello !! i want to get a information about hadoop development..from where
>> i can get actual procedure to solve the issues..
>>
>
>


Re: Mappers vs. Map tasks

2014-02-24 Thread shashwat shriparv
You are really confused :) Please read this :

http://developer.yahoo.com/hadoop/tutorial/module4.html#closer
http://wiki.apache.org/hadoop/HowManyMapsAndReduces



* Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 



On Tue, Feb 25, 2014 at 11:27 AM, Sugandha Naolekar
wrote:

> Hello,
>
> As per the various articles I went through till date, the File(s) are
> split in chunks/blocks. On the same note, would like to ask few things:
>
>
>1. No. of mappers are decided as: Total_File_Size/Max. Block Size.
>Thus, if the file is smaller than the block size, only one mapper will be
>invoked. Right?
>2. If yes, it means, the map() will be called only once. Right? In
>this case, if there are two datanodes with a replication factor as 1: only
>one datanode(mapper machine) will perform the task. Right?
>3. The map() function is called by all the datanodes/slaves right? If
>the no. of mappers are more than the no. of slaves, what happens?
>
> --
> Thanks & Regards,
> Sugandha Naolekar
>
>
>
>


Re: Getting following error in JT logs while running MR jobs

2013-12-16 Thread shashwat shriparv
Does your job fails? check for the error in log or in the jobtracker web ui
by clicked on failed task you will get a closer error message



*Warm Regards_**∞_*
*Shashwat Shriparv*
Big-Data Engineer(HPC)
[image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 




On Mon, Dec 16, 2013 at 4:25 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> Check if you are having correct permission for the user through which you
> are running the job.
>
>
>
> * Warm Regards_**∞_*
> *Shashwat Shriparv*
> Big-Data Engineer(HPC)
>  [image: 
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] 
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 
>
>
>
>
> On Mon, Dec 16, 2013 at 3:47 PM, Viswanathan J  > wrote:
>
>> Hi,
>>
>> I'm getting the following error frequently while running MR jobs.
>>
>> ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time for
>> task attempt_201312040159_126927_m_00_0 when no start time is set,
>> stackTrace is : java.lang.Exception
>> at
>> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:145)
>> at
>> org.apache.hadoop.mapred.TaskInProgress.incompleteSubTask(TaskInProgress.java:700)
>> at
>> org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:3004)
>> at
>> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1185)
>> at
>> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4471)
>> at
>> org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3306)
>> at
>> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3001)
>> at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:396)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
>>
>> Please help.
>>
>> --
>> Regards,
>> Viswa.J
>>
>
>


Re: Getting following error in JT logs while running MR jobs

2013-12-16 Thread shashwat shriparv
Check if you are having correct permission for the user through which you
are running the job.



* Warm Regards_**∞_*
*Shashwat Shriparv*
Big-Data Engineer(HPC)
[image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 




On Mon, Dec 16, 2013 at 3:47 PM, Viswanathan J
wrote:

> Hi,
>
> I'm getting the following error frequently while running MR jobs.
>
> ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time for
> task attempt_201312040159_126927_m_00_0 when no start time is set,
> stackTrace is : java.lang.Exception
> at
> org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:145)
> at
> org.apache.hadoop.mapred.TaskInProgress.incompleteSubTask(TaskInProgress.java:700)
> at
> org.apache.hadoop.mapred.JobInProgress.failedTask(JobInProgress.java:3004)
> at
> org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:1185)
> at
> org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:4471)
> at
> org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:3306)
> at
> org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3001)
> at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
>
> Please help.
>
> --
> Regards,
> Viswa.J
>


Re: println ststements on hadoop

2013-12-16 Thread shashwat shriparv
It will not be displayed in consol as the job is submitted to hadoop, and
once it is submitted you dont have control through terminal, meanwhile you
can execute hadoop job command giving job id here to get information about
the submitted job, how did you submitted the job can you let us know?



*Warm Regards_**∞_*
*Shashwat Shriparv*
Big-Data Engineer(HPC)
[image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 




On Mon, Dec 16, 2013 at 3:57 PM, unmesha sreeveni wrote:

> When i run an MR job it completed successfully in JobTracker web UI.
> Where can i see the println statements which i hv mentioned in map and
> reduce block?
> It is not displayed in my console.
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


Re: issue when using HDFS

2013-12-16 Thread shashwat shriparv
Had your upgrade finished successfully?? check if datanode is able to
connect to namenode, check datanode logs and please attach some log here if
you are getting any error in if data node is running.



*Warm Regards_**∞_*
*Shashwat Shriparv*
Big-Data Engineer(HPC)
[image: 
http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] 




On Mon, Dec 16, 2013 at 4:04 PM, Geelong Yao  wrote:

> Now the datanode is not working
> [image: 内嵌图片 1]
>
>
> 2013/12/16 Geelong Yao 
>
>> it is the namenode's problem.
>> How can I fix this problem?
>>
>>
>>
>> 2013/12/16 Shekhar Sharma 
>>
>>> Seems like DataNode is not running or went dead
>>> Regards,
>>> Som Shekhar Sharma
>>> +91-8197243810
>>>
>>>
>>> On Mon, Dec 16, 2013 at 1:40 PM, Geelong Yao 
>>> wrote:
>>> > Hi Everyone
>>> >
>>> > After I upgrade the hadoop to CDH 4.2.0 Hadoop 2.0.0,I try to running
>>> some
>>> > test
>>> > When I try to upload file to HDFS,error comes:
>>> >
>>> >
>>> >
>>> > node32:/software/hadoop-2.0.0-cdh4.2.0 # hadoop dfs -put
>>> > /public/data/carinput1G_BK carinput1G
>>> > DEPRECATED: Use of this script to execute hdfs command is deprecated.
>>> > Instead use the hdfs command for it.
>>> >
>>> > ls: Call From node32/11.11.11.32 to node32:9000 failed on connection
>>> > exception: java.net.ConnectException: Connection refused; For more
>>> details
>>> > see:  http://wiki.apache.org/hadoop/ConnectionRefused
>>> >
>>> >
>>> >
>>> > Something wrong with my setting?
>>> >
>>> > BRs
>>> > Geelong
>>> >
>>> >
>>> > --
>>> > From Good To Great
>>>
>>
>>
>>
>> --
>> From Good To Great
>>
>
>
>
> --
> From Good To Great
>
<>

Re: How to set "hadoop.tmp.dir" if I have multiple disks per node?

2013-12-15 Thread shashwat shriparv
You can set the hadoop tmp dir to a directory or a disk you can mount the
disk and put path of that to the configuration file.

link /mnt

and you should set right permission for the mounted disk.

*Thanks & Regards*

∞
Shashwat Shriparv



On Mon, Dec 16, 2013 at 12:32 PM, Tao Xiao  wrote:

> I have ten disks per node,and I don't know what value I should set to
> "hadoop.tmp.dir". Some said this property refers to a location in local
> disk while some other said it refers to a directory in HDFS. I'm confused,
> who can explain it ?
>
> I want to spread I/O since I have ten disks per node, so should I set a
> comma-separated list of directories (which are on different disks) to
> "hadoop.tmp.dir" ?
>


Re: Compression LZO class not found issue in Hadoop-2.2.0

2013-12-10 Thread shashwat shriparv
Set the class path to where hadoop*Lzo jar file is, and then try..

*Thanks & Regards*

∞
Shashwat Shriparv



On Tue, Dec 10, 2013 at 4:00 PM, Vinayakumar B wrote:

>  Hi Viswa,
>
>
>
> Sorry for the late reply,
>
>
>
> Have you restarted NodeManagers after copying the lzo jars to lib?
>
>
>
> Thanks and Regards,
>
> Vinayakumar B
>
> *From:* Viswanathan J [mailto:jayamviswanat...@gmail.com]
> *Sent:* 06 December 2013 23:32
> *To:* user@hadoop.apache.org
> *Subject:* Compression LZO class not found issue in Hadoop-2.2.0
>
>
>
> Hi Team,
>
> Have added the property in mapred/core site xml and copied the hadoop lzo
> jar in hadoop lib folder. Also installed lzop,lzo-devel package in CentOS
> version.
>
> Getting the below LZO issue in Hadoop-2.2.0,
>
>  3 AttemptID:attempt_1386352289515_0001_m_00_0 Info:Error:
> java.lang.IllegalArgumentException: Compression codec
> com.hadoop.compression.lzo.LzoCodec was not found.
>   4   at
> org.apache.hadoop.mapred.JobConf.getMapOutputCompressorClass(JobConf.java:796)
>   5   at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1000)
>   6   at
> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:390)
>   7   at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:79)
>   8   at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:674)
>   9   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:746)
>  10   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>  11   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>  12   at java.security.AccessController.doPrivileged(Native Method)
>  13   at javax.security.auth.Subject.doAs(Subject.java:396)
>  14   at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>  15   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>  16 Caused by: java.lang.ClassNotFoundException: Class
> com.hadoop.compression.lzo.LzoCodec not found
>  17   at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
>  18   at
> org.apache.hadoop.mapred.JobConf.getMapOutputCompressorClass(JobConf.java:794)
>  19   ... 11 more
>
>
>
> Do I need to add any other jars or property for this version?
>
> Please help. It will be very useful for production push.
>
> --
> Regards,
> Viswa.J
>


Re: how to handle the corrupt block in HDFS?

2013-12-10 Thread shashwat shriparv
How many nodes you have?
and if fsck is giving you healthy status no need to worry.
with the replication 10 what i may conclude that you have 10 listed
datanodes so 10 replicated jar files for the job to run.

*Thanks & Regards*

∞
Shashwat Shriparv



On Tue, Dec 10, 2013 at 3:50 PM, Vinayakumar B wrote:

>  Hi ch huang,
>
>
>
> It may seem strange, but the fact is,
>
> *CorruptBlocks* through JMX means *“Number of blocks with corrupt
> replicas”. May not be all replicas are corrupt.  *This you can check
> though jconsole for description.
>
>
>
> Where as *Corrupt blocks* through fsck means, *blocks with all replicas
> corrupt(non-recoverable)/ missing.*
>
>
>
> In your case, may be one of the replica is corrupt, not all replicas of
> same block. This corrupt replica will be deleted automatically if one more
> datanode available in your cluster and block replicated to that.
>
>
>
>
>
> Related to replication 10, As Peter Marron said, *some of the important
> files of the mapreduce job will set the replication of 10, to make it
> accessible faster and launch map tasks faster. *
>
> Anyway, if the job is success these files will be deleted auomatically. I
> think only in some cases if the jobs are killed in between these files will
> remain in hdfs showing underreplicated blocks.
>
>
>
> Thanks and Regards,
>
> Vinayakumar B
>
>
>
> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
> *Sent:* 10 December 2013 14:19
> *To:* user@hadoop.apache.org
> *Subject:* RE: how to handle the corrupt block in HDFS?
>
>
>
> Hi,
>
>
>
> I am sure that there are others who will answer this better, but anyway.
>
> The default replication level for files in HDFS is 3 and so most files
> that you
>
> see will have a replication level of 3. However when you run a Map/Reduce
>
> job the system knows in advance that every node will need a copy of
>
> certain files. Specifically the job.xml and the various jars containing
>
> classes that will be needed to run the mappers and reducers. So the
>
> system arranges that some of these files have a higher replication level.
> This increases
>
> the chances that a copy will be found locally.
>
> By default this higher replication level is 10.
>
>
>
> This can seem a little odd on a cluster where you only have, say, 3 nodes.
>
> Because it means that you will almost always have some blocks that are
> marked
>
> under-replicated. I think that there was some discussion a while back to
> change
>
> this to make the replication level something like min(10, #number of nodes)
>
> However, as I recall, the general consensus was that this was extra
>
> complexity that wasn’t really worth it. If it ain’t broke…
>
>
>
> Hope that this helps.
>
>
>
> *Peter Marron*
>
> Senior Developer, Research & Development
>
>
>
> Office: +44 *(0) 118-940-7609*  peter.mar...@trilliumsoftware.com
>
> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>
><https://www.facebook.com/pages/Trillium-Software/109184815778307>
>
>  <https://twitter.com/TrilliumSW>
>
>  <http://www.linkedin.com/company/17710>
>
>
>
> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>
> Be Certain About Your Data. Be Trillium Certain.
>
>
>
> *From:* ch huang [mailto:justlo...@gmail.com ]
> *Sent:* 10 December 2013 01:21
> *To:* user@hadoop.apache.org
> *Subject:* Re: how to handle the corrupt block in HDFS?
>
>
>
> more strange , in my HDFS cluster ,every block has three replicas,but i
> find some one has ten replicas ,why?
>
>
>
> # sudo -u hdfs hadoop fs -ls
> /data/hisstage/helen/.staging/job_1385542328307_0915
> Found 5 items
> -rw-r--r--   3 helen hadoop  7 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
> -rw-r--r--  10 helen hadoop2977839 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
> -rw-r--r--  10 helen hadoop   3696 2013-11-29 14:01
> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split
>
> On Tue, Dec 10, 2013 at 9:15 AM, ch huang  wrote:
>
> the strange thing is when i use the following command i find 1 corrupt
> block
>
>
>
> #  curl -s http://ch11:50070/jmx |grep orrupt
> "CorruptBlocks" : 1,
>
> but when i run hdfs fsck / , i get none ,everything seems fine
>
>
>
> # sudo -u hdfs hdfs fsck /
>
> 
>
>
>
> Status: HEALTHY
>  Total size:1479728140875 B (Total open files size: 1677721600 B)
>  Total dirs:  

Re: Large datasets for Hadoop

2013-11-20 Thread shashwat shriparv
One suggestion you can generate a file by writing a shell script, or go
through this http://spatialhadoop.cs.umn.edu/

*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, Nov 20, 2013 at 11:24 AM, unmesha sreeveni wrote:

>
> Where can i find a large dataset nearly 1GB for examining SVM training
> phase
>
> I have gone through
> 1. https://www.kaggle.com/
> 2. http://archive.ics.uci.edu/ml/datasets/
>
> But this did nt solved my problem they are all KB files
>
> I am in search of a large dataset with numeric data 1/-1 class or 1/0 class
>
> Can anyone suggest me a repo or link to get the dataaet
>
> --
> *Thanks & Regards*
>
> Unmesha Sreeveni U.B
>
> *Junior Developer*
>
>
>


Re: Moving a file to HDFS

2013-09-26 Thread shashwat shriparv
Try to put full hdfs address and see



On Thu, Sep 26, 2013 at 8:05 PM, Manickam P  wrote:

> /home/1gb-junk




*Thanks & Regards    *

∞
Shashwat Shriparv


Re: 2 Map tasks running for a small input file

2013-09-26 Thread shashwat shriparv
just try giving -Dmapred.tasktracker.map.tasks.maximum=1 on the command
line and check how many map task its running. and also set this in
mapred-site.xml and check.

*Thanks & Regards*

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:24 PM, Harsh J  wrote:

> Hi Sai,
>
> What Viji indicated is that the default Apache Hadoop setting for any
> input is 2 maps. If the input is larger than one block, regular
> policies of splitting such as those stated by Shekhar would apply. But
> for smaller inputs, just for an out-of-box "parallelism experience",
> Hadoop ships with a 2-maps forced splitting default
> (mapred.map.tasks=2).
>
> This means your 5 lines is probably divided as 2:3 or other ratios and
> is processed by 2 different Tasks. As Viji also indicated, to turn off
> this behavior, you can set the mapred.map.tasks to 1 in your configs
> and then you'll see only one map task process all 5 lines.
>
> On Thu, Sep 26, 2013 at 4:59 PM, Sai Sai  wrote:
> > Thanks Viji.
> > I am confused a little when the data is small y would there b 2 tasks.
> > U will use the min as 2 if u need it but in this case it is not needed
> due
> > to size of the data being small
> > so y would 2 map tasks exec.
> > Since it results in 1 block with 5 lines of data in it
> > i am assuming this results in 5 map computations 1 per each line
> > and all of em in 1 process/node since i m using a pseudo vm.
> > Where is the second task coming from.
> > The 5 computations of map on each line is 1 task.
> > Is this right.
> > Please help.
> > Thanks
> >
> >
> > 
> > From: Viji R 
> > To: user@hadoop.apache.org; Sai Sai 
> > Sent: Thursday, 26 September 2013 5:09 PM
> > Subject: Re: 2 Map tasks running for a small input file
> >
> > Hi,
> >
> > Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
> > avoid this.
> >
> > Regards,
> > Viji
> >
> > On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai  wrote:
> >> Hi
> >> Here is the input file for the wordcount job:
> >> **
> >> Hi This is a simple test.
> >> Hi Hadoop how r u.
> >> Hello Hello.
> >> Hi Hi.
> >> Hadoop Hadoop Welcome.
> >> **
> >>
> >> After running the wordcount successfully
> >> here r the counters info:
> >>
> >> ***
> >> Job Counters SLOTS_MILLIS_MAPS 0 0 8,386
> >> Launched reduce tasks 0 0 1
> >> Total time spent by all reduces waiting after reserving slots (ms) 0 0 0
> >> Total time spent by all maps waiting after reserving slots (ms) 0 0 0
> >> Launched map tasks 0 0 2
> >> Data-local map tasks 0 0 2
> >> SLOTS_MILLIS_REDUCES 0 0 9,199
> >> ***
> >> My question why r there 2 launched map tasks when i have only a small
> >> file.
> >> Per my understanding it is only 1 block.
> >> and should be only 1 split.
> >> Then for each line a map computation should occur
> >> but it shows 2 map tasks.
> >> Please let me know.
> >> Thanks
> >> Sai
> >>
> >
> >
>
>
>
> --
> Harsh J
>


Re: Is there any way to partially process HDFS edits?

2013-09-26 Thread shashwat shriparv
Just try to do mannual checkpointing..


*Thanks & Regards*

∞
Shashwat Shriparv



On Thu, Sep 26, 2013 at 5:35 PM, Harsh J  wrote:

> Hi Tom,
>
> The edits are processed sequentially, and aren't all held in memory.
> Right now there's no mid-way-checkpoint when it is loaded, such that
> it could resume only with remaining work if interrupted. Normally this
> is not a problem in deployments given that SNN or SBN runs for
> checkpointing the images and keeping the edits collection small
> periodically.
>
> If your NameNode is running out of memory _applying_ the edits, then
> the cause is not the edits but a growing namespace. You most-likely
> have more files now than before, and thats going to take up permanent
> memory from the NameNode heap size.
>
> On Thu, Sep 26, 2013 at 3:00 AM, Tom Brown  wrote:
> > Unfortunately, I cannot give it that much RAM. The machine has 4GB total
> > (though could be expanded somewhat-- it's a VM).
> >
> > Though if each edit is processed sequentially (in a streaming form), the
> > entire edits file will never be in RAM at once.
> >
> > Is the edits file format well defined (could I break off 100MB chunks and
> > process them individually to achieve the same result as processing the
> whole
> > thing at once)?
> >
> > --Tom
> >
> >
> > On Wed, Sep 25, 2013 at 1:53 PM, Ravi Prakash  wrote:
> >>
> >> Tom! I would guess that just giving the NN JVM lots of memory (64Gb /
> >> 96Gb) should be the easiest way.
> >>
> >>
> >> 
> >> From: Tom Brown 
> >> To: "user@hadoop.apache.org" 
> >> Sent: Wednesday, September 25, 2013 11:29 AM
> >> Subject: Is there any way to partially process HDFS edits?
> >>
> >> I have an edits file on my namenode that is 35GB. This is quite a bit
> >> larger than it should be (the secondary namenode wasn't running for some
> >> time, and HBASE-9648 caused a huge number of additional edits).
> >>
> >> The first time I tried to start the namenode, it chewed on the edits for
> >> about 4 hours and then ran out of memory. I have increased the memory
> >> available to the namenode (was 512MB, now 2GB), and started the process
> >> again.
> >>
> >> Is there any way that the edits file can be partially processed to avoid
> >> having to re-process the same edits over and over until I can allocate
> >> enough memory for it to be done in one shot?
> >>
> >> How long should it take (hours? days?) to process an edits file of that
> >> size?
> >>
> >> Any help is appreciated!
> >>
> >> --Tom
> >>
> >>
> >
>
>
>
> --
> Harsh J
>


Re: RE

2013-07-22 Thread shashwat shriparv
Do you really think this is the place to get interview question?

do following :

www.google.com

hadoop+interview+questions

 you will get lot of links.



*Thanks & Regards*

∞
Shashwat Shriparv



On Mon, Jul 22, 2013 at 2:54 PM, sri harsha  wrote:

> Hi all,
> can some one do post interview questions on hadoop?
>
>
> --
> amiable harsha
>


Re:

2013-06-26 Thread shashwat shriparv
When you see this, it basically means that you are unable to connected to
the NameNode. It's either not running or running on a different port.
please show us your core-site.xml

*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, Jun 26, 2013 at 5:44 PM, Mohammad Tariq  wrote:

> Or show the logs to us.
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Wed, Jun 26, 2013 at 2:03 PM, Devaraj k  wrote:
>
>>  Could you check the logs for the hadoop processes, are they started
>> successfully or any problem while starting?
>>
>> ** **
>>
>> Thanks
>>
>> Devaraj k
>>
>> ** **
>>
>> *From:* ch huang [mailto:justlo...@gmail.com]
>> *Sent:* 26 June 2013 12:38
>> *To:* user@hadoop.apache.org
>> *Subject:* 
>>
>> ** **
>>
>> hi i build a new hadoop cluster ,but i can not ACCESS hdfs ,why? i use
>> CDH3u4 ,redhat6.2
>>
>>  
>>
>> # hadoop fs -put /opt/test hdfs://192.168.10.22:9000/user/test
>> 13/06/26 15:00:47 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 0 time(s).
>> 13/06/26 15:00:48 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 1 time(s).
>> 13/06/26 15:00:49 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 2 time(s).
>> 13/06/26 15:00:50 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 3 time(s).
>> 13/06/26 15:00:51 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 4 time(s).
>> 13/06/26 15:00:52 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 5 time(s).
>> 13/06/26 15:00:53 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 6 time(s).
>> 13/06/26 15:00:54 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 7 time(s).
>> 13/06/26 15:00:55 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 8 time(s).
>> 13/06/26 15:00:56 INFO ipc.Client: Retrying connect to server: /
>> 192.168.10.22:9000. Already tried 9 time(s).
>> put: Call to /192.168.10.22:9000 failed on connection exception:
>> java.net.ConnectException: Connection refused
>>
>
>


Re: if i changed zookeeper port ,how can i let my hbase know the port?

2013-06-26 Thread shashwat shriparv

  hbase.zookeeper.property.clientPort
  [write your port number and start hbase]
  
  




*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, Jun 26, 2013 at 4:55 PM, Ted Yu  wrote:

> Have you looked at http://hbase.apache.org/book.html#zookeeper ?
>
> Thanks
>
> On Wed, Jun 26, 2013 at 5:09 PM, ch huang  wrote:
>
>> i change zookeeper from 2181 to 2281,it cause hbase region
>> server auto-closed after it start for a while
>> any one can help?
>>
>
>


Re: master node abnormal ,help

2013-06-26 Thread shashwat shriparv
Your datanode may not be running verify that.



*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, Jun 26, 2013 at 2:55 PM, Devaraj k  wrote:

>  Can you ask this HBase question in the HBase user mailing list?
>
> ** **
>
> ** **
>
> Thanks
>
> Devaraj k
>
> ** **
>
> *From:* ch huang [mailto:justlo...@gmail.com]
> *Sent:* 26 June 2013 14:52
> *To:* user@hadoop.apache.org
> *Subject:* master node abnormal ,help
>
> ** **
>
> when i start master node ,it not work,anyone can help?
>
>  
>
> 2013-06-26 17:17:52,552 INFO
> org.apache.hadoop.hbase.master.ActiveMasterManager: Master=CH22:6
> 2013-06-26 17:17:52,859 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> Created version file at hdfs://CH22:9000/hbaseroot set its version at:7
> 2013-06-26 17:17:52,863 WARN org.apache.hadoop.hdfs.DFSClient:
> DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: File /hbaseroot/hbase.version could only be replicated
> to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1533)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:667)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1107)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
> at $Proxy6.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy6.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3647)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3514)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2720)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2915)
> 
>


Re: region server can not start

2013-06-26 Thread shashwat shriparv
if you have change the port number y its trying to connet at

2013-06-26 16:57:08,824 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server CH22/192.168.10.22:2181

some where in your configuration you have still 2181 check in
/etc/zookeeper/zoo.cfg

Please hbase question on hbase user group..
:)

*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, Jun 26, 2013 at 2:29 PM, ch huang  wrote:

> i change zookeeper port from 2181 to 2281 , region server can not start
>
> 2013-06-26 16:57:00,003 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=CH22:2181 sessionTimeout=18
> watcher=regionserver:60020
> 2013-06-26 16:57:00,030 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server CH22/192.168.10.22:2181
> 2013-06-26 16:57:00,039 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
> 2013-06-26 16:57:00,254 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook
> thread: Shutdownhook:regionserver60020
> 2013-06-26 16:57:01,765 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server CH22/192.168.10.22:2181
> 2013-06-26 16:57:01,767 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
> 2013-06-26 16:57:03,505 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server CH22/192.168.10.22:2181
> 2013-06-26 16:57:03,506 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
> 2013-06-26 16:57:05,323 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server CH22/192.168.10.22:2181
> 2013-06-26 16:57:05,324 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
> 2013-06-26 16:57:06,770 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server CH22/192.168.10.22:2181
> 2013-06-26 16:57:06,771 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
> 2013-06-26 16:57:08,824 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server CH22/192.168.10.22:2181
> 2013-06-26 16:57:08,825 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>


Re: datanode can not start

2013-06-26 Thread shashwat shriparv
Remove



   dfs.datanode.address

   0.0.0.0:50011




And try.






*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, Jun 26, 2013 at 3:29 PM, varun kumar  wrote:

> HI huang,
> *
> *
> *Some other service is running on the port or you did not stop the
> datanode service properly.*
> *
> *
> *Regards,*
> *Varun Kumar.P
> *
>
>
> On Wed, Jun 26, 2013 at 3:13 PM, ch huang  wrote:
>
>> i have running old cluster datanode,so it exist some conflict, i changed
>> default port, here is my hdfs-site.xml
>>
>>
>> 
>>
>>
>>
>> dfs.name.dir
>>
>> /data/hadoopnamespace
>>
>> 
>>
>> 
>>
>> dfs.data.dir
>>
>> /data/hadoopdata
>>
>> 
>>
>> 
>>
>> dfs.datanode.address
>>
>> 0.0.0.0:50011
>>
>> 
>>
>> 
>>
>> dfs.permissions
>>
>> false
>>
>> 
>>
>> 
>>
>> dfs.datanode.max.xcievers
>>
>> 4096
>>
>> 
>>
>> 
>>
>> dfs.webhdfs.enabled
>>
>> true
>>
>> 
>>
>> 
>>
>> dfs.http.address
>>
>> 192.168.10.22:50070
>>
>> 
>>
>> 
>>
>>
>> 2013-06-26 17:37:24,923 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
>> /
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG:   host = CH34/192.168.10.34
>> STARTUP_MSG:   args = []
>> STARTUP_MSG:   version = 0.20.2-cdh3u4
>> STARTUP_MSG:   build =
>> file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u4 -r
>> 214dd731e3bdb687cb55988d3f47dd9e248c5690; compiled by 'root' on Mon May  7
>> 14:03:02 PDT 2012
>> /
>> 2013-06-26 17:37:25,335 INFO
>> org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already
>> set up for Hadoop, not re-installing.
>> 2013-06-26 17:37:25,421 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Registered
>> FSDatasetStatusMBean
>> 2013-06-26 17:37:25,429 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at
>> 50011
>> 2013-06-26 17:37:25,430 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is
>> 1048576 bytes/s
>> 2013-06-26 17:37:25,470 INFO org.mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 2013-06-26 17:37:25,513 INFO org.apache.hadoop.http.HttpServer: Added
>> global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 2013-06-26 17:37:25,518 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() before open() is
>> -1. Opening the listener on 50075
>> 2013-06-26 17:37:25,519 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for threadgroup to
>> exit, active threads is 0
>> 2013-06-26 17:37:25,619 INFO
>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: Shutting
>> down all async disk service threads...
>> 2013-06-26 17:37:25,619 INFO
>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All async
>> disk service threads have been shut down.
>> 2013-06-26 17:37:25,620 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: java.net.BindException:
>> Address already in use
>> at sun.nio.ch.Net.bind(Native Method)
>> at
>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
>> at
>> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>> at
>> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>> at org.apache.hadoop.http.HttpServer.start(HttpServer.java:564)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:505)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:303)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1643)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1583)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1601)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1727)
>> at
>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1744)
>> 2013-06-26 17:37:25,622 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
>> /
>> SHUTDOWN_MSG: Shutting down DataNode at CH34/192.168.10.34
>> /
>>
>
>
>
> --
> Regards,
> Varun Kumar.P
>


Re: Running Hadoop on two Ubuntu machines with Bridge connection

2013-06-02 Thread shashwat shriparv
Bro. make the ip static or use domain name everywhere. set your domain name
in hosts file and hostname file, when ip changes you need to change it into
host file.

or write a script to change ip and hostname in host file automatically when
ip changes.


On Sun, Jun 2, 2013 at 5:05 PM, Shashidhar Rao
wrote:

> the configuration details or if anyone knows some kin




*Thanks & Regards    *

∞
Shashwat Shriparv


Re: Install hadoop on multiple VMs in 1 laptop like a cluster

2013-05-31 Thread shashwat shriparv
Try this
http://www.youtube.com/watch?v=gIRubPl20oo
there will be three videos 1-3 watch and you can do what you need to do



*Thanks & Regards*

∞
Shashwat Shriparv



On Fri, May 31, 2013 at 5:52 PM, Jitendra Yadav
wrote:

> Hi,
>
> You can create a clone machine through an existing virtual machine
> in VMware and then run it as a separate virtual machine.
>
> http://www.vmware.com/support/ws55/doc/ws_clone_new_wizard.html
>
>
> After installing you have to make sure that all the virtual machines are
> setup with correct network set up so that they can ping each other (you
> should use Host only network settings in network configuration).
>
> I hope this will help you.
>
>
> Regards
> Jitendra
>
> On Fri, May 31, 2013 at 5:23 PM, Sai Sai  wrote:
>
>>  Just wondering if anyone has any documentation or references to any
>> articles how to simulate a multi node cluster setup in 1 laptop with hadoop
>> running on multiple ubuntu VMs. any help is appreciated.
>> Thanks
>> Sai
>>
>
>


Re: hi

2013-05-31 Thread shashwat shriparv
C:\Program: command not found??

>From where are you running this command is you hadoop is in windows or
linux?

*Thanks & Regards    *

∞
Shashwat Shriparv



On Fri, May 31, 2013 at 4:18 PM, 王洪军  wrote:

> $jps
> conform jobtracker is running( namenode and datanode is also needed )
>
>
> 2013/5/31 Jagat Singh 
>
>> Please run
>>
>> $ jps
>>
>> This command will show all running Hadoop daemons and then you can find
>> whats wrong :)
>>
>>
>>
>>
>> On Fri, May 31, 2013 at 8:25 PM, Mohammad Tariq wrote:
>>
>>> Hello sumit,
>>>
>>>   Make sure all the Hadoop daemons are running .
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Fri, May 31, 2013 at 3:07 PM, sumit piparsania <
>>> sumitpiparsa...@yahoo.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am new to hadoop. I am facing some issues while executing the below
>>>> command.
>>>> Kindly help me resolving this issue.
>>>>
>>>>
>>>> command: bin/hadoop jar hadoop-examples-*.jar grep input output
>>>> 'dfs[a-z.]+'
>>>> Error:
>>>> bin/hadoop: line 320: C:\Program: command not found
>>>> 13/05/31 12:59:58 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 0 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:00 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 1 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:02 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 2 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:04 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 3 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:06 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 4 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:08 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 5 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:10 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 6 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:12 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 7 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:14 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 8 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  13/05/31 13:00:16 INFO ipc.Client: Retrying connect to server:
>>>> localhost/127.0.0.1:9001. Already tried 9 time(s); retry policy is
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
>>>>  java.net.ConnectException: Call to localhost/127.0.0.1:9001 failed on
>>>> connection exception: java.net.ConnectException: Connection refused: no
>>>> further information
>>>>  at org.apache.hadoop.ipc.Client.wrapException(Client.java:1136)
>>>>  at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>>  at org.apache.hadoop.mapred.$Proxy2.getProtocolVersion(Unknown Source)
>>>>  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411)
>>>>  at
>>>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:499)
>>>>  at org.apache.hadoop.mapred.JobClient.init(JobClient.java:490)
>>>>  at org.apache.hadoop.mapred.JobClient.(JobClient.java:473)
>>>>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1322)
>>>>  at org.apache.hadoop.examples.Grep.run(Gre

Re: Hadoop Classpath issue.

2013-05-23 Thread shashwat shriparv
Check your HDFS at :50070 if these files are there...

*Thanks & Regards*

∞
Shashwat Shriparv



On Fri, May 24, 2013 at 9:45 AM, YouPeng Yang wrote:

> Hi
>You should check your /usr/bin/hadoop script.
>
>
>
> 2013/5/23 Dhanasekaran Anbalagan 
>
>> Hi Guys,
>>
>> When i trying to execute hadoop fs -ls / command
>> It's return extra two lines.
>>
>> 226:~# hadoop fs -ls /
>> *common ./*
>> *lib lib*
>> Found 9 items
>> drwxrwxrwx   - hdfs   supergroup  0 2013-03-07 04:46 /benchmarks
>> drwxr-xr-x   - hbase  hbase   0 2013-05-23 08:59 /hbase
>> drwxr-xr-x   - hdfs   supergroup  0 2013-02-20 13:21 /mapred
>> drwxr-xr-x   - tech   supergroup  0 2013-05-03 05:15 /test
>> drwxrwxrwx   - mapred supergroup  0 2013-05-23 09:33 /tmp
>> drwxrwxr-x   - hdfs   supergroup  0 2013-02-20 16:32 /user
>> drwxr-xr-x   - hdfs   supergroup  0 2013-02-20 15:10 /var
>>
>>
>> In other machines. Not return extra to lines. Please guide me how to
>> remove this line.
>>
>> 226:~# /usr/bin/hadoop classpath
>> common ./
>> lib lib
>>
>> /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
>>
>>
>> Please guide me How to fix this.
>>
>> -Dhanasekaran
>> Did I learn something today? If not, I wasted it.
>>
>
>


Re: Where to begin from??

2013-05-23 Thread shashwat shriparv
On Fri, May 24, 2013 at 10:09 AM, Lokesh Basu  wrote:

> cept that I don't know much about the real world problem and have to begin
> from scratch to get some insight of what is actually driving these
> technologies.
>


Dear Lokesh,

Its best to learn and start doing it and if you face problem start asking
:) welcome to Hadoop world.

*Thanks & Regards*

∞
Shashwat Shriparv


Re: Install Hadoop on Linux Pseudo Distributed Mode - Root Required?

2013-05-13 Thread shashwat shriparv
If you are installing CDH version of hadoop tell your admi that you need
root access as yoiu need to install RPM :)

*Thanks & Regards*

∞
Shashwat Shriparv


Re: 600s timeout during copy phase of job

2013-05-13 Thread shashwat shriparv
On Mon, May 13, 2013 at 11:35 AM, David Parks wrote:

> (I’ve got 8 reducers, 1-per-core, 25 i


Reduce number of mapper and reducer and give a try.

*Thanks & Regards    *

∞
Shashwat Shriparv


Re: The minimum memory requirements to datanode and namenode?

2013-05-13 Thread shashwat shriparv
Due to Small amount of memory available to the nodes they are not able to
send response in time, and socket connection exception, and there may be
some network issue to.

Please check which program is using memory? as there will be some other
cohosted application eating up the memory.

ps -e -orss=,args= | sort -b -k1,1n | pr -TW$COLUMNS

or

give top command then press shift+M
anc then c
and check application is eating up the memory.

there must be apmple memory available to the nodes beside the reserved for
JVM

*Thanks & Regards*

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:23 PM, Nitin Pawar wrote:

> 4GB memory on NN? this will run out of memory in few days.
>
> You will need to make sure your NN has atleast more than double RAM of
> your DNs if you have a miniature  cluster.
>
>
> On Mon, May 13, 2013 at 11:52 AM, sam liu  wrote:
>
>> I can issue a command 'hadoop dfsadmin -report', but it did not return
>> any result for a long time. Also, I can open the NN UI(
>> http://namenode:50070), but it is always keeping in the connecting
>> status, and could not return any cluster statistic.
>>
>> The mem of NN:
>>   total   used   free
>> Mem:  3834   3686148
>>
>> After running a top command, I can see following process are taking up
>> the memory: namenode, jobtracker, tasktracker, hbase, ...
>>
>> I can restart the cluster, and then the cluster will be healthy. But this
>> issue will probably occur in a few days later. I think it's caused by
>> lacking of free/available mem, but do not know how many extra
>> free/available mem of node is required, besides the necessary mem for
>> running datanode/tasktracker process?
>>
>>
>>
>>
>> 2013/5/13 Nitin Pawar 
>>
>>> just one node not having memory does not mean your cluster is down.
>>>
>>> Can you see your hdfs health on NN UI?
>>>
>>> how much memory do you have on NN? if there are no jobs running on the
>>> cluster then you can safely restart datanode and tasktracker.
>>>
>>> Also run a top command and figure out which processes are taking up the
>>> memory and for what purpose?
>>>
>>>
>>> On Mon, May 13, 2013 at 11:28 AM, sam liu wrote:
>>>
>>>> Nitin,
>>>>
>>>> In my cluster, the tasktracker and datanode already have been launched,
>>>> and are still running now. But the free/available mem of node3 now is just
>>>> 167 mb, and do you think it's the reason why my hadoop is unhealthy now(it
>>>> does not return result of command 'hadoop dfs -ls /')?
>>>>
>>>>
>>>> 2013/5/13 Nitin Pawar 
>>>>
>>>>> Sam,
>>>>>
>>>>> There is no formula for determining how much memory one should give to
>>>>> datanode and tasktracker. Ther formula is available for how many slots you
>>>>> want to have on a machine.
>>>>>
>>>>> In my prior experience, we did give 512MB memory each to a datanode
>>>>> and tasktracker.
>>>>>
>>>>>
>>>>> On Mon, May 13, 2013 at 11:18 AM, sam liu wrote:
>>>>>
>>>>>> For node3, the memory is:
>>>>>>total   used   free shared
>>>>>> buffers cached
>>>>>> Mem:  3834   3666167  0187
>>>>>> 1136
>>>>>> -/+ buffers/cache:   2342   1491
>>>>>> Swap: 8196  0   8196
>>>>>>
>>>>>> To a 3 nodes cluster as mine, what's the required minimum
>>>>>> free/available memory for the datanode process and tasktracker process,
>>>>>> without running any map/reduce task?
>>>>>> Any formula to determine it?
>>>>>>
>>>>>>
>>>>>> 2013/5/13 Rishi Yadav 
>>>>>>
>>>>>>> can you tell specs of node3. Even on a test/demo cluster, anything
>>>>>>> below 4 GB ram makes the node almost inaccessible as per my experience.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 12, 2013 at 8:25 PM, sam liu wrote:
>>>>>>>
>>>>>>>> Got some exceptions on node3:
>>>>>>>> 1. datanode log:
>>>>>>>> 2013-04-17 11:13:44,719 IN

Re: How to combine input files for a MapReduce job

2013-05-13 Thread shashwat shriparv
Look into mapred.max.split.size mapred.min.split.size and number of mapper
in mapred-site.xml

*Thanks & Regards*

∞
Shashwat Shriparv



On Mon, May 13, 2013 at 12:50 PM, Agarwal, Nikhil  wrote:

>  Hi,
>
> ** **
>
> I  have a 3-node cluster, with JobTracker running on one machine and
> TaskTrackers on other two. Instead of using HDFS, I have written my own
> FileSystem implementation. As an experiment, I kept 1000 text files (all of
> same size) on both the slave nodes and ran a simple Wordcount MR job. It
> took around 50 mins to complete the task. Afterwards, I concatenated all
> the 1000 files into a single file and then ran a Wordcount MR job, it took
> 35 secs. From the JobTracker UI I could make out that the problem is
> because of the number of mappers that JobTracker is creating. For 1000
> files it creates 1000 maps and for 1 file it creates 1 map (irrespective of
> file size). 
>
> ** **
>
> Thus, is there a way to reduce the number of mappers i.e. can I control
> the number of mappers through some configuration parameter so that Hadoop
> would club all the files until it reaches some specified size (say, 64 MB)
> and then make 1 map per 64 MB block?
>
> ** **
>
> Also, I wanted to know how to see which file is being submitted to which
> TaskTracker or if that is not possible then how do I check if some data
> transfer is happening in between my slave nodes during a MR job?
>
> ** **
>
> Sorry for so many questions and Thank you for your time.
>
> ** **
>
> Regards,
>
> Nikhil
>


Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread shashwat shriparv
As nitin said , its responsibility of Jobtracker to distribute the job to
task to the tasktrackers so you need to submitt the job to the job tracker

*Thanks & Regards*

∞
Shashwat Shriparv



On Sun, May 12, 2013 at 11:26 PM, Nitin Pawar wrote:

> nope
> in MRv1 only jobtracker can accept jobs. You can not trigger job on any
> other process in hadoop other than jobtracker.
>
>
> On Sun, May 12, 2013 at 11:25 PM, Shashidhar Rao <
> raoshashidhar...@gmail.com> wrote:
>
>> @shashwat shriparv
>>
>> Can the a hadoop job be submitted to any datanode in the cluster and not
>> to jobTracker.
>>
>> Correct me if it I am wrong , I was told that a hadoop job can be
>> submitted to datanode also apart from JobTracker. Is it correct?
>>
>> Advanced thanks
>>
>>
>> On Sun, May 12, 2013 at 11:02 PM, shashwat shriparv <
>> dwivedishash...@gmail.com> wrote:
>>
>>>
>>> On Sun, May 12, 2013 at 12:19 AM, Nitin Pawar 
>>> wrote:
>>>
>>>>
>>>> normally if you want to copy the jar then hadoop admins setu
>>>>
>>>
>>>  Submit you job to Job tracker it will distribute throughout the
>>> tasktrackers.
>>>
>>> *Thanks & Regards*
>>>
>>> ∞
>>> Shashwat Shriparv
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>


Re: Problem while running simple WordCount program(hadoop-1.0.4) on eclipse.

2013-05-12 Thread shashwat shriparv
the user through which you are running your hadoop, set permission to tmp
dir for that user.

*Thanks & Regards*

∞
Shashwat Shriparv



On Fri, May 10, 2013 at 5:24 PM, Nitin Pawar wrote:

> What are the permission of your /tmp/ folder?
> On May 10, 2013 5:03 PM, "Khaleel Khalid" 
> wrote:
>
>>  Hi all,
>>
>> I am facing the following error when I run a simple WordCount
>> program using hadoop-1.0.4 on eclipse(Galileo).  The map/reduce plugin
>> version I use is 1.0.4 as well.  It would be really helpful if
>> someone gives me a solution for the problem.
>>
>> ERROR:
>>
>> 13/05/10 16:53:51 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 13/05/10 16:53:51 ERROR security.UserGroupInformation:
>> *PriviledgedActionException* as:khaleelk *cause:java.io.IOException*:
>> Failed to set permissions of path:
>> \tmp\hadoop-khaleelk\mapred\staging\khaleelk-1067522586\.staging to 0700
>>
>> Exception in thread "main"
>> *java.io.IOException*: Failed to set permissions of path:
>> \tmp\hadoop-khaleelk\mapred\staging\khaleelk-1067522586\.staging to 0700
>>
>> at org.apache.hadoop.fs.FileUtil.checkReturnValue(
>> *FileUtil.java:689*)
>>
>> at org.apache.hadoop.fs.FileUtil.setPermission(
>> *FileUtil.java:662*)
>>
>> at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(
>> *RawLocalFileSystem.java:509*)
>>
>> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(
>> *RawLocalFileSystem.java:344*)
>>
>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(
>> *FilterFileSystem.java:189*)
>>
>> at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(
>> *JobSubmissionFiles.java:116*)
>>
>> at org.apache.hadoop.mapred.JobClient$2.run(
>> *JobClient.java:856*)
>>
>> at org.apache.hadoop.mapred.JobClient$2.run(
>> *JobClient.java:850*)
>>
>> at java.security.AccessController.doPrivileged(
>> *Native Method*)
>>
>> at javax.security.auth.Subject.doAs(Unknown Source)
>>
>> at org.apache.hadoop.security.UserGroupInformation.doAs(
>> *UserGroupInformation.java:1121*)
>>
>> at org.apache.hadoop.mapred.JobClient.submitJobInternal(
>> *JobClient.java:850*)
>>
>> at org.apache.hadoop.mapreduce.Job.submit(
>> *Job.java:500*)
>>
>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(
>> *Job.java:530*)
>>
>> at WordCount.main(
>> *WordCount.java:65*)
>>
>>
>>
>> Thank you in advance.
>>
>>
>>
>>
>>
>>
>>
>


Re: hadoop map-reduce errors

2013-05-12 Thread shashwat shriparv
Your connection setting to Mysql may not be correct check that.

*Thanks & Regards*

∞
Shashwat Shriparv



On Fri, May 10, 2013 at 6:12 PM, Shahab Yunus wrote:

> Have your checked your connection settings to the MySQL DB? Where and how
> are you passing the connection properties for the database? Is it
> accessible from the machine you are running this? Is the db up?
>
>
> On Thu, May 9, 2013 at 9:32 PM, 丙子  wrote:
>
>> When i  run a hadoop job ,there are some errors like this:
>> 13/05/10 08:20:59 ERROR manager.SqlManager: Error executing statement:
>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications
>> link failure
>>
>> The last packet successfully received from the server was 28,484
>> milliseconds ago.  The last packet sent successfully to the server was 1
>> milliseconds ago.
>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications
>> link failure
>>
>> The last packet successfully received from the server was 28,484
>> milliseconds ago.  The last packet sent successfully to the server was 1
>> milliseconds ago.
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>
>> ……
>> ……
>> at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>> at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
>> Caused by: java.io.EOFException: Can not read response from server.
>> Expected to read 4 bytes, read 0 bytes before connection was unexpectedly
>> lost.
>> at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3039)
>> at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3489)
>> ... 24 more
>>
>>
>> How  can i resolve it .
>>
>>
>


Re: issues with decrease the default.block.size

2013-05-12 Thread shashwat shriparv
The block size is for allocation not storage on the disk.

*Thanks & Regards*

∞
Shashwat Shriparv



On Fri, May 10, 2013 at 8:54 PM, Harsh J  wrote:

> Thanks. I failed to add: It should be okay to do if those cases are
> true and the cluster seems under-utilized right now.
>
> On Fri, May 10, 2013 at 8:29 PM, yypvsxf19870706
>  wrote:
> > Hi harsh
> >
> > Yep.
> >
> >
> >
> > Regards
> >
> >
> >
> >
> >
> >
> > 发自我的 iPhone
> >
> > 在 2013-5-10,13:27,Harsh J  写道:
> >
> >> Are you looking to decrease it to get more parallel map tasks out of
> >> the small files? Are you currently CPU bound on processing these small
> >> files?
> >>
> >> On Thu, May 9, 2013 at 9:12 PM, YouPeng Yang 
> wrote:
> >>> hi ALL
> >>>
> >>> I am going to setup a new hadoop  environment, .Because  of  there
>  are
> >>> lots of small  files, I would  like to change  the  default.block.size
> to
> >>> 16MB
> >>> other than adopting the ways to merge  the files into large  enough
> (e.g
> >>> using  sequencefiles).
> >>>I want to ask are  there  any bad influences or issues?
> >>>
> >>> Regards
> >>
> >>
> >>
> >> --
> >> Harsh J
>
>
>
> --
> Harsh J
>


Re: Permissions

2013-05-12 Thread shashwat shriparv
The user through which you are trying to run the task should jave
permission on hdfs. just verify that

*Thanks & Regards*

∞
Shashwat Shriparv



On Sat, May 11, 2013 at 1:02 AM, Amal G Jose  wrote:

> After starting the hdfs, ie NN, SN and DN, create an hdfs directory
> structure in the form //mapred/staging.
> Then give 777 permission to staging. After that change the ownership of
> mapred directory to mapred user.
> After doing this start jobtracker, it will start. Otherwise, it will not
> start.
> The reason for not showing any datanodes may be due to firewall. Check
> whether the necessary ports are open.
>
>
>
> On Tue, Apr 30, 2013 at 2:28 AM,  wrote:
>
>> I look in the name node log and I get the following errors:
>>
>> 2013-04-29 15:25:11,646 ERROR
>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>> as:mapred (auth:SIMPLE)
>> cause:org.apache.hadoop.security.AccessControlException: Permission denied:
>> *user=mapred*, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>>
>> 2013-04-29 15:25:11,646 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 6 on 9000, call
>> org.apache.hadoop.hdfs.protocol.ClientProtocol.mkdirs from
>> 172.16.26.68:45044: error:
>> org.apache.hadoop.security.AccessControlException: Permission denied: *
>> user=mapred*, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>> org.apache.hadoop.security.AccessControlException: Permission denied: *
>> user=mapred,* access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186)
>>
>> When I create the file system I have the user hdfs on the root folder.
>> (/). I am not sure now to have both the user mapred and hdfs have access to
>> the root (which it seems these errors are indicating).
>>
>> I get a page from 50070 put when I try to browse the filesystem from the
>> web UI I get an error that there are no nodes listening (I have 3 data
>> nodes and 1 namenode). The browser indicates that there is nothing
>> listening to port 50030, so it seems that the JobTracker is not up.
>>
>
>


Re: Submitting a hadoop job in large clusters.

2013-05-12 Thread shashwat shriparv
On Sun, May 12, 2013 at 12:19 AM, Nitin Pawar wrote:

>
> normally if you want to copy the jar then hadoop admins setu
>

Submit you job to Job tracker it will distribute throughout the
tasktrackers.

*Thanks & Regards*

∞
Shashwat Shriparv


Re: What are steps to make a Hadoop cluster Rack aware in hadoop 2.0.3-alpha

2013-05-11 Thread shashwat shriparv
check out
http://free-hadoop-tutorials.blogspot.in/2011/04/rack-awareness.html


*Thanks & Regards*

∞
Shashwat Shriparv



On Sat, May 11, 2013 at 9:47 PM, Mohammad Mustaqeem
<3m.mustaq...@gmail.com>wrote:

> Hi,
>Please, give me some link or matter that talks about how to make a
> hadoop cluster rack-aware.
> I am using Hadoop-2.0.3-alpha.
>
> --
> *With regards ---*
> *Mohammad Mustaqeem*,
> M.Tech (CSE)
> MNNIT Allahabad
>
>
>
>


Re: Hadoop noob question

2013-05-11 Thread shashwat shriparv
In our case we have our own written hdfs client to write the data and
downlod it.

*Thanks & Regards*

∞
Shashwat Shriparv



On Sat, May 11, 2013 at 10:52 PM, Mohammad Tariq  wrote:

> You'r welcome :)
>
> Warm Regards,
> Tariq
> cloudfront.blogspot.com
>
>
> On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee <
> rahul.rec@gmail.com> wrote:
>
>> Thanks Tariq!
>>
>>
>> On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq wrote:
>>
>>> @Rahul : Yes. distcp can do that.
>>>
>>> And, bigger the files lesser the metadata hence lesser memory
>>> consumption.
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Sat, May 11, 2013 at 9:40 PM, Rahul Bhattacharjee <
>>> rahul.rec@gmail.com> wrote:
>>>
>>>> IMHO,I think the statement about NN with regard to block metadata is
>>>> more like a general statement. Even if you put lots of small files of
>>>> combined size 10 TB , you need to have a capable NN.
>>>>
>>>> can disct cp be used to copy local - to - hdfs ?
>>>>
>>>> Thanks,
>>>> Rahul
>>>>
>>>>
>>>> On Sat, May 11, 2013 at 9:35 PM, Nitin Pawar 
>>>> wrote:
>>>>
>>>>> absolutely rite Mohammad
>>>>>
>>>>>
>>>>> On Sat, May 11, 2013 at 9:33 PM, Mohammad Tariq wrote:
>>>>>
>>>>>> Sorry for barging in guys. I think Nitin is talking about this :
>>>>>>
>>>>>> Every file and block in HDFS is treated as an object and for each
>>>>>> object around 200B of metadata get created. So the NN should be powerful
>>>>>> enough to handle that much metadata, since it is going to be in-memory.
>>>>>> Actually memory is the most important metric when it comes to NN.
>>>>>>
>>>>>> Am I correct @Nitin?
>>>>>>
>>>>>> @Thoihen : As Nitin has said, when you talk about that much data you
>>>>>> don't actually just do a "put". You could use something like "distcp" for
>>>>>> parallel copying. A better approach would be to use a data aggregation 
>>>>>> tool
>>>>>> like Flume or Chukwa, as Nitin has already pointed. Facebook uses their 
>>>>>> own
>>>>>> data aggregation tool, called Scribe for this purpose.
>>>>>>
>>>>>> Warm Regards,
>>>>>> Tariq
>>>>>> cloudfront.blogspot.com
>>>>>>
>>>>>>
>>>>>> On Sat, May 11, 2013 at 9:20 PM, Nitin Pawar >>>>> > wrote:
>>>>>>
>>>>>>> NN would still be in picture because it will be writing a lot of
>>>>>>> meta data for each individual file. so you will need a NN capable enough
>>>>>>> which can store the metadata for your entire dataset. Data will never 
>>>>>>> go to
>>>>>>> NN but lot of metadata about data will be on NN so its always good idea 
>>>>>>> to
>>>>>>> have a strong NN.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 11, 2013 at 9:11 PM, Rahul Bhattacharjee <
>>>>>>> rahul.rec@gmail.com> wrote:
>>>>>>>
>>>>>>>> @Nitin , parallel dfs to write to hdfs is great , but could not
>>>>>>>> understand the meaning of capable NN. As I know , the NN would not be a
>>>>>>>> part of the actual data write pipeline , means that the data would not
>>>>>>>> travel through the NN , the dfs would contact the NN from time to time 
>>>>>>>> to
>>>>>>>> get locations of DN as where to store the data blocks.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rahul
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 11, 2013 at 4:54 PM, Nitin Pawar <
>>>>>>>> nitinpawar...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> is it safe? .. there is no direct answer yes or no
>>>>>>>>>
>>>>>>>>> when you say , you have files worth 10TB 

Re: get recent changed files in hadoop

2013-05-08 Thread shashwat shriparv
hadoop dfs -ls|grep "date you want to see"

you can  use sort on that what ever you like ..

*Thanks & Regards    *

∞
Shashwat Shriparv



On Wed, May 8, 2013 at 6:19 AM, Winston Lin  wrote:

> Any idea to get recent changed file in hadoop? e.g. files created
> yesterday?
>
> fs -ls will only give us all the files.
>
> Thanks
> Winston
>


Re: How to balance reduce job

2013-05-07 Thread shashwat shriparv
The number of reducer running depends on the data available.

*Thanks & Regards*

∞
Shashwat Shriparv



On Tue, May 7, 2013 at 8:43 PM, Tony Burton wrote:

> ** **
>
> The typical Partitioner method for assigning reducer r from reducers R is*
> ***
>
> ** **
>
> r = hash(key) % count(R)
>
> ** **
>
> However if you find your partitioner is assigning your data to too few or
> one reducers, I found that changing the count(R) to the next odd number or
> (even better) prime number above count(R) is a good rule of thumb to follow.
> 
>
> ** **
>
> Tony
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* bejoy.had...@gmail.com [mailto:bejoy.had...@gmail.com]
> *Sent:* 17 April 2013 07:19
> *To:* user@hadoop.apache.org
> *Cc:* Mohammad Tariq
>
> *Subject:* Re: How to balance reduce job
>
> ** **
>
> Yes, That is a valid point.
>
> The partitioner might do non uniform distribution and reducers can be
> unevenly loaded.
>
> But this doesn't change the number of reducers and its distribution across
> nodes. The bottom issue as I understand is that his reduce tasks are
> scheduled on just a few nodes.
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
>
> *From: *Ajay Srivastava  
>
> *Date: *Wed, 17 Apr 2013 06:02:30 +
>
> *To: *; <
> bejoy.had...@gmail.com>
>
> *ReplyTo: *user@hadoop.apache.org 
>
> *Cc: *Mohammad Tariq
>
> *Subject: *Re: How to balance reduce job
>
> ** **
>
> Tariq probably meant distribution of keys from  pair emitted
> by mapper.
>
> Partitioner distributes these pairs to different reducers based on key. If
> data is such that keys are skewed then most of the records may go to same
> reducer.
>
> ** **
>
> ** **
>
> ** **
>
> Regards,
>
> Ajay Srivastava
>
> ** **
>
> ** **
>
> On 17-Apr-2013, at 11:08 AM, 
>
>   wrote:
>
>
>
> 
>
>
> Uniform Data distribution across HDFS is one of the factor that ensures
> map tasks are uniformly distributed across nodes. But reduce tasks doesn't
> depend on data distribution it is purely based on slot availability.
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
>
> *From: *Mohammad Tariq  
>
> *Date: *Wed, 17 Apr 2013 10:46:27 +0530
>
> *To: *user@hadoop.apache.org; Bejoy Ks<
> bejoy.had...@gmail.com>
>
> *Subject: *Re: How to balance reduce job
>
> ** **
>
> Just to add to Bejoy's comments, it also depends on the data distribution.
> Is your data properly distributed across the HDFS?
>
>
> 
>
> Warm Regards, 
>
> Tariq
>
> https://mtariq.jux.com/
>
> cloudfront.blogspot.com
>
> ** **
>
> On Wed, Apr 17, 2013 at 10:39 AM,  wrote:
>
> Hi Rauljin
>
> Few things to check here.
> What is the number of reduce slots in each Task Tracker? What is the
> number of reduce tasks for your job?
> Based on the availability of slots the reduce tasks are scheduled on TTs.
>
> You can do the following
> Set the number of reduce tasks to 8 or more.
> Play with the number of slots (not very advisable for tweaking this on a
> job level )
>
> The reducers are scheduled purely based on the slot availability so it
> won't be that easy to ensure that all TT are evenly loaded with same number
> of reducers.
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
>
> *From: *rauljin  
>
> *Date: *Wed, 17 Apr 2013 12:53:37 +0800
>
> *To: *user@hadoop.apache.org
>
> *ReplyTo: *user@hadoop.apache.org 
>
> *Subject: *How to balance reduce job
>
> ** **
>
> 8 datanode in my hadoop cluseter ,when running reduce job,there is only 2
> datanode running the job .
>
>  
>
> I want to use the 8 datanode to run the reduce job,so I can balance the
> I/O press.
>
>  
>
> Any ideas?
>
>  
>
> Thanks.
>
>  
> --
>
> rauljin
>
> ** **
>
> ** **
>
>
>
>
> *
> P *Please consider the environment before printing this email or
> attachments*
>
>
> This email and any attachments are confidential, protected by copyright
> and may be legally privileged. If you are not the intended recipient, then
> the diss

Re: there is not data-node

2013-05-01 Thread shashwat shriparv
Format your namenode and start again

*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, May 1, 2013 at 8:40 PM, 姚吉龙  wrote:

> Id is different in namenode and data node.you can modify the id.I met the
> same issue and I completely remove all  file under hadoop
> —
> Sent from Mailbox <https://bit.ly/SZvoJe> for iPhone
>
>
> On Wed, May 1, 2013 at 8:32 PM, Mohsen B.Sarmadi <
> mohsen.bsarm...@gmail.com> wrote:
>
>> Dear Sirs/madams
>>
>> i am trying to run hadoop 1.0.4 in the pseudo distributed mode, but i am
>> facing with
>>
>> datanode log,
>>
>> 01/05/2013 13:16:54 2013-05-01 13:16:54,206 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
>> Incompatible namespaceIDs in /home/mohs/hadoop/dfsdirdata: namenode
>> namespaceID = 717658700; datanode namespaceID = 1318489331
>>
>>
>> job-tracker log:
>>
>> 01/05/2013 13:24:40 org.apache.hadoop.ipc.RemoteException:
>> java.io.IOException: File /home/mohs/hadoop/tmp/mapred/system/
>> jobtracker.info could only be replicated to 0 nodes, instead of 1
>>
>> data-node log:
>> 01/05/2013 13:26:10 2013-05-01 13:26:09,711 WARN
>> org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>> /home/mohs/hadoop/tmp/mapred/system/jobtracker.info could only be
>> replicated to 0 nodes, instead of 1
>>
>> i have tried to solve this by removing the files in /tmp/hadoop/* and
>> running hadoop namenode -format
>>
>> but i am facing with the same error.
>> do you have any solution for this?
>>
>> regards
>> Mohsen
>>
>>
>>
>>
>


Re: New to Hadoop-SSH communication

2013-05-01 Thread shashwat shriparv
Watch these for sucessful configuration

https://www.youtube.com/watch?v=gIRubPl20oo
https://www.youtube.com/watch?v=pgOKKl5P0to
https://www.youtube.com/watch?v=8CrgPUaNfjk

*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, May 1, 2013 at 11:55 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> Open /etc/hostname file : change master and slave in those file restart
> the system. and dont give IP as 127.0.0.1 and 127.0.0.2 just give ifconfig
> command it will show you the actual ip give that in hosts file.
>
> *Thanks & Regards    *
>
> ∞
> Shashwat Shriparv
>
>
>
> On Wed, May 1, 2013 at 9:48 PM, kishore alajangi <
> alajangikish...@gmail.com> wrote:
>
>> Might be you are copying by logging slave machine, Exit from slave in
>> Master.
>>
>> Thanks,
>> Kishore.
>>
>>
>>
>>
>> On Wed, May 1, 2013 at 3:00 AM, Automation Me wrote:
>>
>>> Thank you Tariq.
>>>
>>> I am using  the same username on both the machines and when i try to
>>> copy a file master to slave just to make sure SSH is working fine, The file
>>> is copying into master itself not an slave machine.
>>>
>>>   scp -r /usr/local/somefile hduser@slave:/usr/local/somefile
>>>
>>> Any suggestions...
>>>
>>>
>>> Thanks
>>> Annt
>>>
>>>
>>>
>>> On Tue, Apr 30, 2013 at 5:14 PM, Mohammad Tariq wrote:
>>>
>>>> ssh is actually *user@some_machine *to *user@some_other_machine*.
>>>> either use same username on both the machines or add the IPs along with
>>>> proper user@hostname in /etc/hosts file.
>>>>
>>>> HTH
>>>>
>>>> Warm Regards,
>>>> Tariq
>>>> https://mtariq.jux.com/
>>>> cloudfront.blogspot.com
>>>>
>>>>
>>>> On Wed, May 1, 2013 at 2:39 AM, Automation Me wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am new to Hadoop and trying to install multinode cluster on ubuntu
>>>>> VM's.I am not able to communicate between two clusters using SSH.
>>>>>
>>>>> My host file:
>>>>>
>>>>> 127.0.1.1 Master
>>>>> 127.0.1.2 Slave
>>>>>
>>>>> The following changes i made in two VM's
>>>>>
>>>>> 1.Updated the etc/hosts file in two vm's
>>>>>
>>>>> on Master VM
>>>>>  i did SSH keygen and trying to copy the key into Slave
>>>>>
>>>>> ssh-keygen -t rsa -P ""
>>>>>cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
>>>>>ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave@ubuntu.
>>>>>
>>>>> When i login into master and slave  and check
>>>>>
>>>>> master@ubuntu>Hostname it says UBUNTU
>>>>> slave@ubuntu>Hostname it says UBUNTU
>>>>>
>>>>>
>>>>> Could you assist me on this?
>>>>>
>>>>> Thanks
>>>>> Annt
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: New to Hadoop-SSH communication

2013-05-01 Thread shashwat shriparv
Open /etc/hostname file : change master and slave in those file restart the
system. and dont give IP as 127.0.0.1 and 127.0.0.2 just give ifconfig
command it will show you the actual ip give that in hosts file.

*Thanks & Regards*

∞
Shashwat Shriparv



On Wed, May 1, 2013 at 9:48 PM, kishore alajangi
wrote:

> Might be you are copying by logging slave machine, Exit from slave in
> Master.
>
> Thanks,
> Kishore.
>
>
>
>
> On Wed, May 1, 2013 at 3:00 AM, Automation Me wrote:
>
>> Thank you Tariq.
>>
>> I am using  the same username on both the machines and when i try to copy
>> a file master to slave just to make sure SSH is working fine, The file is
>> copying into master itself not an slave machine.
>>
>>   scp -r /usr/local/somefile hduser@slave:/usr/local/somefile
>>
>> Any suggestions...
>>
>>
>> Thanks
>> Annt
>>
>>
>>
>> On Tue, Apr 30, 2013 at 5:14 PM, Mohammad Tariq wrote:
>>
>>> ssh is actually *user@some_machine *to *user@some_other_machine*.
>>> either use same username on both the machines or add the IPs along with
>>> proper user@hostname in /etc/hosts file.
>>>
>>> HTH
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Wed, May 1, 2013 at 2:39 AM, Automation Me wrote:
>>>
>>>> Hello,
>>>>
>>>> I am new to Hadoop and trying to install multinode cluster on ubuntu
>>>> VM's.I am not able to communicate between two clusters using SSH.
>>>>
>>>> My host file:
>>>>
>>>> 127.0.1.1 Master
>>>> 127.0.1.2 Slave
>>>>
>>>> The following changes i made in two VM's
>>>>
>>>> 1.Updated the etc/hosts file in two vm's
>>>>
>>>> on Master VM
>>>>  i did SSH keygen and trying to copy the key into Slave
>>>>
>>>> ssh-keygen -t rsa -P ""
>>>>cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
>>>>ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave@ubuntu.
>>>>
>>>> When i login into master and slave  and check
>>>>
>>>> master@ubuntu>Hostname it says UBUNTU
>>>> slave@ubuntu>Hostname it says UBUNTU
>>>>
>>>>
>>>> Could you assist me on this?
>>>>
>>>> Thanks
>>>> Annt
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>


Re: M/R job to a cluster?

2013-04-28 Thread shashwat shriparv
check in namenode:50030 if it appears there its not running in localmode
else it is

*Thanks & Regards*

∞
Shashwat Shriparv



On Sun, Apr 28, 2013 at 1:18 AM, sudhakara st wrote:

> Hello Kevin,
>
> In the case:
>
> JobClient client = new JobClient();
> JobConf conf - new JobConf(WordCount.class);
>
> Job client(default in local system) picks  configuration information  by
> referring HADOOP_HOME in local system.
>
> if your job configuration like this:
> *Configuration conf = new Configuration();*
> *conf.set("fs.default.name", "hdfs://name_node:9000");*
> *conf.set("mapred.job.tracker", "job_tracker_node:9001");*
>
> It pickups configuration information  by referring HADOOP_HOME in
> specified namenode and job tracker.
>
> Regards,
> Sudhakara.st
>
>
> On Sat, Apr 27, 2013 at 2:52 AM, Kevin Burton wrote:
>
>> It is hdfs://devubuntu05:9000. Is this wrong? Devubuntu05 is the name of
>> the host where the NameNode and JobTracker should be running. It is also
>> the host where I am running the M/R client code.
>>
>> On Apr 26, 2013, at 4:06 PM, Rishi Yadav  wrote:
>>
>> check core-site.xml and see value of fs.default.name. if it has
>> localhost you are running locally.
>>
>>
>>
>>
>> On Fri, Apr 26, 2013 at 1:59 PM,  wrote:
>>
>>> I suspect that my MapReduce job is being run locally. I don't have any
>>> evidence but I am not sure how the specifics of my configuration are
>>> communicated to the Java code that I write. Based on the text that I have
>>> read online basically I start with code like:
>>>
>>> JobClient client = new JobClient();
>>> JobConf conf - new JobConf(WordCount.class);
>>> . . . . .
>>>
>>> Where do I communicate the configuration information so that the M/R job
>>> runs on the cluster and not locally? Or is the configuration location
>>> "magically determined"?
>>>
>>> Thank you.
>>>
>>
>>
>
>
> --
>
> Regards,
> .  Sudhakara.st
>
>


Re: Uploading file to HDFS

2013-04-23 Thread shashwat shriparv
On Tue, Apr 23, 2013 at 9:23 PM, Mohammad Tariq  wrote:

> What should I do on namenode and datanode? Thank you very much


As Tariq has ask, can you provide datanode logs snapshots??

*Thanks & Regards    *

∞
Shashwat Shriparv


Re: Job launch from eclipse

2013-04-23 Thread shashwat shriparv
You need to generate a jar file, pass all the parameters on run time if any
is fixed and run at hadoop like hadoop -jar jarfilename.jar 

*Thanks & Regards*

∞
Shashwat Shriparv



On Tue, Apr 23, 2013 at 6:51 PM, Han JU  wrote:

> Hi,
>
> I'm getting my hands on hadoop. One thing I really want to know is how you
> launch MR jobs in a development environment.
>
> I'm currently using Eclipse 3.7 with hadoop plugin from hadoop 1.0.2. With
> this plugin I can manage HDFS and submit job to cluster. But the strange
> thing is, every job launch from Eclipse in this way is not recorded by the
> jobtracker (can't monitor it from web UI). But finally the output appears
> in HDFS path as the parameter I gave. It's really strange that makes me
> think it's a standalone job run then it writes output to HDFS.
>
> So how do you code and launch jobs to cluster?
>
> Many thanks.
>
> --
> *JU Han*
>
> UTC   -  Université de Technologie de Compiègne
> * **GI06 - Fouille de Données et Décisionnel*
>
> +33 061960
>


Re: why multiple checkpoint nodes?

2013-04-18 Thread shashwat shriparv
more checkpoint nodes means more backup of the metadata :)

*Thanks & Regards*

∞
Shashwat Shriparv



On Thu, Apr 18, 2013 at 9:35 PM, Thanh Do  wrote:

> Hi all,
>
> The document says "Multiple checkpoint nodes may be specified in the
> cluster configuration file".
>
> Can some one clarify me that why we really need to run multiple checkpoint
> nodes anyway? Is it possible that while checkpoint node A is doing
> checkpoint, and check point node B kicks in and does another checkpoint?
>
> Thanks,
> Thanh
>


Re: Get Hadoop cluster topology

2013-04-16 Thread shashwat shriparv
On Tue, Apr 16, 2013 at 11:34 PM, Diwakar Sharma
wrote:

> uster topology or uses an API to build it.


If you stop and start the cluster Hadoop Reads thes configuration files for
sure.



∞
Shashwat Shriparv


Re: NameNode failure and recovery!

2013-04-03 Thread shashwat shriparv
If you are not in position to go for HA just keep your checkpoint period
shorter to have recent data recoverable from SNN.

and you always have a option
hadoop namenode -recover
try this on testing cluster and get versed to it.

and take backup of image at some solid state storage.



∞
Shashwat Shriparv



On Wed, Apr 3, 2013 at 9:56 PM, Harsh J  wrote:

> There is a 3rd, most excellent way: Use HDFS's own HA, see
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
> :)
>
> On Wed, Apr 3, 2013 at 8:10 PM, Rahul Bhattacharjee
>  wrote:
> > Hi all,
> >
> > I was reading about Hadoop and got to know that there are two ways to
> > protect against the name node failures.
> >
> > 1) To write to a nfs mount along with the usual local disk.
> >  -or-
> > 2) Use secondary name node. In case of failure of NN , the SNN can take
> in
> > charge.
> >
> > My questions :-
> >
> > 1) SNN is always lagging , so when SNN becomes primary in event of a NN
> > failure ,  then the edits which have not been merged into the image file
> > would be lost , so the system of SNN would not be consistent with the NN
> > before its failure.
> >
> > 2) Also I have read that other purpose of SNN is to periodically merge
> the
> > edit logs with the image file. In case a setup goes with option #1
> (writing
> > to NFS, no SNN) , then who does this merging.
> >
> > Thanks,
> > Rahul
> >
> >
>
>
>
> --
> Harsh J
>


Re: Hadoop Debian Package

2013-03-17 Thread shashwat shriparv
try
find / -type f -iname "*site.xml"
it will show you where ever those files are..



∞
Shashwat Shriparv



On Sun, Mar 17, 2013 at 11:34 PM, Mohammad Alkahtani
wrote:

> The problem is I tried I read the configuration file by changing
> export HADOOP_CONF_DIR=${HADOOP_CONF_
> DIR:-"/usr/shar/hadoop/templates/conf"}
> but I think Hadoop dosen't get the configration from this dir, I trid and
> searched the system for conf dir the only dir is this one which I changed.
>
> Mohammad Alkahtani
> P.O.Box 102275
> Riyadh 11675
> Saudi Arabia
> mobile: 00966 555 33 1717
>
>
> On Sun, Mar 17, 2013 at 8:57 PM, shashwat shriparv <
> dwivedishash...@gmail.com> wrote:
>
>> Ye its is asking for file:/// instead of hdfs:// just check if it is
>> taking setting configuration from other location...
>>
>>
>>
>> ∞
>> Shashwat Shriparv
>>
>>
>>
>> On Sun, Mar 17, 2013 at 11:07 PM, Luangsay Sourygna 
>> wrote:
>>
>>> Hi,
>>>
>>> What is the version of Hadoop you use?
>>>
>>> Try using fs.defaultFS instead of fs.default.name (see the list of all
>>> the deprecated properties here:
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
>>> ).
>>> I remember I once had a similar error message and it was due to the
>>> change in properties names.
>>>
>>> Regards,
>>>
>>> Sourygna
>>>
>>> On Sun, Mar 17, 2013 at 2:32 PM, Mohammad Alkahtani
>>>  wrote:
>>> > Hi to all users of Hadoop,
>>> >
>>> > I installed Hadoop the .deb file on Ubuntu 12.04 but I might could not
>>> > configure it right. The conf dir is under templates in
>>> /usr/shar/hadoop. I
>>> > edit the core-site.xml, mapred-site.xml files to give
>>> > 
>>> > fs.default.name
>>> > hdfs://localhost:9000
>>> > 
>>> > and for mapred
>>> > 
>>> > mapred.job.tracker
>>> > localhost:9001
>>> > 
>>> >
>>> > but i get these errors, I assume that there is problem, Hadoop cannot
>>> read
>>> > the configuration file.
>>> > I chaned the hadoop-env.sh to
>>> > export
>>> HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/shar/hadoop/templates/conf"}
>>> > but dosen't solve the problem.
>>> >
>>> > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
>>> > java.lang.IllegalArgumentException: Does not contain a valid host:port
>>> > authority: file:/// at
>>> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:201)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:231)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:225)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:347)
>>> > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:309)
>>> at
>>> >
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1651)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1590)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1608)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1734)
>>> > at
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1751)
>>> >
>>> > 
>>> >
>>> > FATAL org.apache.hadoop.mapred.JobTracker:
>>> > java.lang.IllegalArgumentException: Does not contain a valid host:port
>>> > authority: local at
>>> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at
>>> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130) at
>>> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2312) at
>>> > org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2070) at
>>> > org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1889) at
>>> > org.apache.hadoop.mapred.JobTracker.(Job

Re: Hadoop Debian Package

2013-03-17 Thread shashwat shriparv
Ye its is asking for file:/// instead of hdfs:// just check if it is taking
setting configuration from other location...



∞
Shashwat Shriparv



On Sun, Mar 17, 2013 at 11:07 PM, Luangsay Sourygna wrote:

> Hi,
>
> What is the version of Hadoop you use?
>
> Try using fs.defaultFS instead of fs.default.name (see the list of all
> the deprecated properties here:
>
> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
> ).
> I remember I once had a similar error message and it was due to the
> change in properties names.
>
> Regards,
>
> Sourygna
>
> On Sun, Mar 17, 2013 at 2:32 PM, Mohammad Alkahtani
>  wrote:
> > Hi to all users of Hadoop,
> >
> > I installed Hadoop the .deb file on Ubuntu 12.04 but I might could not
> > configure it right. The conf dir is under templates in /usr/shar/hadoop.
> I
> > edit the core-site.xml, mapred-site.xml files to give
> > 
> > fs.default.name
> > hdfs://localhost:9000
> > 
> > and for mapred
> > 
> > mapred.job.tracker
> > localhost:9001
> > 
> >
> > but i get these errors, I assume that there is problem, Hadoop cannot
> read
> > the configuration file.
> > I chaned the hadoop-env.sh to
> > export
> HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/shar/hadoop/templates/conf"}
> > but dosen't solve the problem.
> >
> > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > java.lang.IllegalArgumentException: Does not contain a valid host:port
> > authority: file:/// at
> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:201)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:231)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:225)
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:347)
> > at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:309) at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1651)
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1590)
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1608)
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1734)
> > at
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1751)
> >
> > 
> >
> > FATAL org.apache.hadoop.mapred.JobTracker:
> > java.lang.IllegalArgumentException: Does not contain a valid host:port
> > authority: local at
> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at
> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:130) at
> > org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2312) at
> > org.apache.hadoop.mapred.JobTracker.(JobTracker.java:2070) at
> > org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1889) at
> > org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1883) at
> > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:312) at
> > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:303) at
> > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:298)
> > at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4791)
> >
> > 
> >
> > ERROR org.apache.hadoop.hdfs.server.namenode.NameNode:
> > java.lang.IllegalArgumentException: Does not contain a valid host:port
> > authority: file:/// at
> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:201)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:231)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:265)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:536) at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
> >
> > 
> >
> > Exception in thread "main" java.lang.IllegalArgumentException: Does not
> > contain a valid host:port authority: file:/// at
> > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUti

Re: Best way handle hadoop Java heap size space

2013-03-15 Thread shashwat shriparv
If you just want to see to utilization and monitor it for some time try

Java Visual JVM
or
JConsol

and connect to JMX port . it will show you the heap uses and other  details
too.

These tools you can find in jdk bin folder





∞
Shashwat Shriparv



On Tue, Feb 5, 2013 at 10:34 PM, Dhanasekaran Anbalagan
wrote:

> Hi Guys,
>
> We have configured many Heap size related thing. in Hadoop for ex.
>
> Namenode's Java Heap Size in bytes.
> Secondary namenode's Java Heap Size in bytes.
> Balancer's Java Heap Size in bytes.
> HttpFS's Java Heap Size in bytes.
> Failover Controller's Java Heap Size in bytes.
> MapReduce Child Java Maximum Heap Size
> Reduce Task Maximum Heap Siz
> TaskTracker's Java Heap Size in bytes.
>
> When i start my cluster these guys occupy the large number of memory, Best
> way handle or fine tune these parameter. please guide me.
>
> Bes monitor those configuration.
>
> Any graphical tool to help us to monitor?
>
> -Dhanasekaran.
>
>
> Did I learn something today? If not, I wasted it.
>


Re: Issue: Namenode is in safe mode

2013-03-06 Thread shashwat shriparv
you can not directly remove a datanode from a cluster its not a proper way.
you need to decommission nodes and wait till the data from the datanode to
be removed are copied to other nodes.
just read document for proper decomissioning of nodes

http://helpmetocode.blogspot.in/2012/11/datanode-decommissioning-from-hadoop.html



∞
Shashwat Shriparv



On Wed, Mar 6, 2013 at 10:00 PM, Shumin Guo  wrote:

> To decommission a live datanode from the cluster, you can do the following
> steps:
>
> 1, edit configuration file $HADOOP_HOME/conf/hdfs-site.xml, and add the
> following property:
> 
> dfs.hosts.exclude
> $HADOOP_HOME/conf/dfs-exclude.txt
> 
>
> 2, put the host name of the node you want to decommission onto file
> $HADOOP_HOME/conf/dfs-exclude.xml (one host per line).
>
> 3, Run the following command:
> hadoop dfsadmin -refreshNodes
>
> Shumin
>
> On Wed, Mar 6, 2013 at 5:29 AM, AMARNATH, Balachandar <
> balachandar.amarn...@airbus.com> wrote:
>
>> The repliation factor was 1 when I removed the entry of A in slaves file.
>> I did not mark it retirement. I do not know yet how to mark a node for
>> retirement. I waited for few minutes and then I could see thte namenode
>> running again
>>
>> ** **
>>
>> *From:* Nitin Pawar [mailto:nitinpawar...@gmail.com]
>> *Sent:* 06 March 2013 15:31
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Issue: Namenode is in safe mode
>>
>> ** **
>>
>> what is your replication factor?
>>
>> when you removed node A as datanode .. did you first mark it for
>> retirement? if you just removed it from service then the blocks from that
>> datanode are missing and namenode when starts up it checks for the blocks.
>> Unless it reaches its threshold value it will not let you write any more
>> data on your hdfs. 
>>
>> ** **
>>
>> I will suggest to start datanode on A, then mark it for retirement so
>> namenode will move the blocks to new datanode and once it is done namenode
>> will retire that datanode. 
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> On Wed, Mar 6, 2013 at 3:21 PM, AMARNATH, Balachandar <
>> balachandar.amarn...@airbus.com> wrote:
>>
>>  
>>
>> Hi, 
>>
>>  
>>
>> I have created a hadoop cluster with two nodes (A and B). ‘A’ act both as
>> namenode and datanode, and ‘B’ act as datanode only. With this setup, I
>> could store, read files. Now, I added one more datanode ‘C’ and relieved
>> ‘A’ from datanode duty. This means, ‘A’ act only as namenode, and both B
>> and C act as datanodes. Now, I tried to create a directory, it says
>>
>>  
>>
>> ‘ org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create
>> directory Name node is in safe mode’
>>
>>  
>>
>>  
>>
>> Can someone tell me why the namenode now is in safe mode?
>>
>>  
>>
>> With thanks and regards
>>
>> Balachandar
>>
>>  
>>
>>  
>>
>>  
>>
>> The information in this e-mail is confidential. The contents may not be 
>> disclosed or used by anyone other than the addressee. Access to this e-mail 
>> by anyone else is unauthorised.
>>
>> If you are not the intended recipient, please notify Airbus immediately and 
>> delete this e-mail.
>>
>> Airbus cannot accept any responsibility for the accuracy or completeness of 
>> this e-mail as it has been sent over public networks. If you have any 
>> concerns over the content of this message or its Accuracy or Integrity, 
>> please contact Airbus immediately.
>>
>> All outgoing e-mails from Airbus are checked using regularly updated virus 
>> scanning software but you should take whatever measures you deem to be 
>> appropriate to ensure that this message and any attachments are virus 
>> free.
>>
>>
>>
>> 
>>
>> ** **
>>
>> --
>> Nitin Pawar
>>
>> The information in this e-mail is confidential. The contents may not be 
>> disclosed or used by anyone other than the addressee. Access to this e-mail 
>> by anyone else is unauthorised.
>> If you are not the intended recipient, please notify Airbus immediately and 
>> delete this e-mail.
>> Airbus cannot accept any responsibility for the accuracy or completeness of 
>> this e-mail as it has been sent over public networks. If you have any 
>> concerns over the content of this message or its Accuracy or Integrity, 
>> please contact Airbus immediately.
>> All outgoing e-mails from Airbus are checked using regularly updated virus 
>> scanning software but you should take whatever measures you deem to be 
>> appropriate to ensure that this message and any attachments are virus free.
>>
>>
>


Re: 回复: Socket does not have a channel

2013-03-05 Thread shashwat shriparv
Try setting dfs.client.use.legacy.blockreader to true



∞
Shashwat Shriparv



On Tue, Mar 5, 2013 at 8:39 PM, 卖报的小行家 <85469...@qq.com> wrote:

> Yes.It's from hadoop 2.0. I just now read the code 1.1.1.There are no such
> classes the log mentioned.Maybe you can read the code first.
>
>
> -- 原始邮件 --
> *发件人:* "Subroto";
> *发送时间:* 2013年3月5日(星期二) 晚上10:56
> *收件人:* "user"; **
> *主题:* Re: Socket does not have a channel
>
> Hi Julian,
>
> This is from CDH4.1.2 and I think its based on Apache Hadoop 2.0.
>
> Cheers,
> Subroto Sanyal
> On Mar 5, 2013, at 3:50 PM, 卖报的小行家 wrote:
>
> Hi,
> Which revision of hadoop?
> and  what's the  situation  to report the Exception?
> BRs//Julian
>
> -- Original --
> *From: * "Subroto";
> *Date: * Tue, Mar 5, 2013 04:46 PM
> *To: * "user"; **
> *Subject: * Socket does not have a channel
>
> Hi
>
> java.lang.IllegalStateException: Socket 
> Socket[addr=/10.86.203.112,port=1004,localport=35170]
> does not have a channel
>  at
> com.google.common.base.Preconditions.checkState(Preconditions.java:172)
>  at
> org.apache.hadoop.net.SocketInputWrapper.getReadableByteChannel(SocketInputWrapper.java:83)
>  at
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:432)
>  at
> org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:82)
>  at
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:832)
>  at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:444)
>
> While accessing the HDFS  I keep getting the above mentioned error.
> Setting the dfs.client.use.legacy.blockreader to true fixes the problem.
> I would like to know what exactly is the problem? Is it a problem/bug in
> hadoop ?
> Is there is JIRA ticket for this??
>
>
> Cheers,
> Subroto Sanyal
>
>
>


Re: Unknown processes unable to terminate

2013-03-04 Thread shashwat shriparv
You can you kill -9 13082

Is there eclipse or netbeans project running, that may the this process..



∞
Shashwat Shriparv



On Mon, Mar 4, 2013 at 3:12 PM, Sai Sai  wrote:

> I have a list of following processes given below, i am trying to kill the
> process 13082 using:
>
> kill 13082
>
> Its not terminating RunJar.
>
> I have done a stop-all.sh hoping it would stop all the processes but only
> stopped the hadoop related processes.
> I am just wondering if it is necessary to stop all other processes before
> starting the hadoop process and how to stop these other processes.
>
> Here is the list of processes which r appearing:
>
>
> 30969 FileSystemCat
> 30877 FileSystemCat
> 5647 StreamCompressor
> 32200 DataNode
> 25015 Jps
> 2227 URLCat
> 5563 StreamCompressor
> 5398 StreamCompressor
> 13082 RunJar
> 32578 JobTracker
> 7215
> 385 TaskTracker
> 31884 NameNode
> 32489 SecondaryNameNode
>
> Thanks
> Sai
>


Re: How to start Data Replicated Blocks in HDFS manually.

2013-02-25 Thread shashwat shriparv
The problem may be in default replication factor, which is 3, so first
check in hdfs-site.xml for replication factor is specified or not. if it
not add that parameter and restart the cluster   ---> first option

2nd Option : change the replicatio factor of the root directory of hdfs to
2 using following command

bin/hadoop dfs -setrep -R -w 2 /

this will chage the replication factor 2 two.

this problem may be also because you have two datanodes and replication
factor is 3. so you can think of the senario when you have two bucket and
you have 3 objects to keep.




∞
Shashwat Shriparv



On Mon, Feb 25, 2013 at 7:50 PM, Nitin Pawar wrote:

> did you start the cluster with replication factor 3 and later changed it
> to 2?
> also did you enable rack awareness in your configs and both the nodes are
> on same rack?
>
>
>
>
> On Mon, Feb 25, 2013 at 7:45 PM, Dhanasekaran Anbalagan <
> bugcy...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> We have cluster with two data nodes. We configured data replication
>> factor two.
>> when  i copy data  to hdfs, Data's are not fully replicated. It's says * 
>> Number
>> of Under-Replicated Blocks : 15115*
>> How to manually invoke the Data replication in HDFS.
>>
>> I restarted cluster also. It's not helps me
>>
>> Please guide me guys.
>>
>> -Dhanasekaran.
>>
>> Did I learn something today? If not, I wasted it.
>>
>
>
>
> --
> Nitin Pawar
>


Re: Host NameNode, DataNode, JobTracker or TaskTracker on the same machine

2013-02-14 Thread shashwat shriparv
If you are doing it for production all the process should be running on
seperate machine as it will decrease the overload of the machine.



∞
Shashwat Shriparv



On Thu, Feb 14, 2013 at 10:40 PM, Jeff LI  wrote:

> Hello,
>
> Is there a good reason that we should not host NameNode, DataNode,
> JobTracker or TaskTracker services on the same machine?
>
> Not doing so is suggested here http://wiki.apache.org/hadoop/NameNode,
> but I'd like to know the reasoning of this.
>
> Thanks
>
> Cheers
>
> Jeff
>
>


Re: Decommissioning Nodes in Production Cluster.

2013-02-12 Thread shashwat shriparv
On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina wrote:

> to do it, there should be some information he


this is best way to remove data node from a cluster. you have done the
right thing.



∞
Shashwat Shriparv


Re: Prolonged safemode

2013-01-20 Thread shashwat shriparv
Check integrity of the file system, and check the replication factor, by
mistake if default is left as 3 or so.  if you have hbase configured check
hbck if everything is fine with the cluster.



∞
Shashwat Shriparv



On Sun, Jan 20, 2013 at 3:09 PM, xin jiang  wrote:

>
>
> On Sun, Jan 20, 2013 at 7:50 AM, Mohammad Tariq wrote:
>
>> Hey Jean,
>>
>> Feels good to hear that ;) I don't have to feel
>> myself a solitary yonker anymore.
>>
>> Since I am working on a single node, the problem
>> becomes more sever. I don't have any other node
>> where MR files could get replicated.
>>
>> Warm Regards,
>> Tariq
>>  https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Sun, Jan 20, 2013 at 5:08 AM, Jean-Marc Spaggiari <
>> jean-m...@spaggiari.org> wrote:
>>
>>> Hi Tariq,
>>>
>>> I often have to force HDFS to go out of safe mode manually when I
>>> restart my cluster (or after power outage) I never tought about
>>> reporting that ;)
>>>
>>> I'm using hadoop-1.0.3. I think it was because of the MR files still
>>> not replicated on enought nodes. But not 100% sure.
>>>
>>> JM
>>>
>>> 2013/1/19, Mohammad Tariq :
>>> > Hello list,
>>> >
>>> >I have a pseudo distributed setup on my laptop. Everything was
>>> > working fine untill now. But lately HDFS has started taking a lot of
>>> time
>>> > to leave the safemode. Infact, I have to it manuaaly most of the times
>>> as
>>> > TT and Hbase daemons get disturbed because of this.
>>> >
>>> > I am using hadoop-1.0.4. Is it a problem with this version? I have
>>> never
>>> > faced any such issue with older versions. Or, is something going wrong
>>> on
>>> > my side??
>>> >
>>> > Thank you so much for your precious time.
>>> >
>>> > Warm Regards,
>>> > Tariq
>>> > https://mtariq.jux.com/
>>> > cloudfront.blogspot.com
>>> >
>>>
>>
>>
>


Re: On a lighter note

2013-01-19 Thread shashwat shriparv
Secondaryname node that is not really namenode... :)



∞
Shashwat Shriparv



On Sat, Jan 19, 2013 at 3:48 AM, Ted Dunning  wrote:

> Well, I think the actual name was "untergang".  Same meaning.
>
> Sent from my iPhone
>
> On Jan 17, 2013, at 8:09 PM, Mohammad Tariq  wrote:
>
> You are right Michael, as always :)
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Fri, Jan 18, 2013 at 6:33 AM, Michael Segel 
> wrote:
>
>> I'm thinking 'Downfall'
>>
>> But I could be wrong.
>>
>> On Jan 17, 2013, at 6:56 PM, Yongzhi Wang 
>> wrote:
>>
>> Who can tell me what is the name of the original film? Thanks!
>>
>> Yongzhi
>>
>>
>> On Thu, Jan 17, 2013 at 3:05 PM, Mohammad Tariq wrote:
>>
>>> I am sure you will suffer from severe stomach ache after watching this :)
>>> http://www.youtube.com/watch?v=hEqQMLSXQlY
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>
>>
>>
>


Re: On a lighter note

2013-01-18 Thread shashwat shriparv
:)



∞
Shashwat Shriparv



On Fri, Jan 18, 2013 at 6:43 PM, Fabio Pitzolu wrote:

> Someone should made one about unsubscribing from this mailing list ! :D
>
>
> *Fabio Pitzolu*
> Consultant - BI & Infrastructure
>
> Mob. +39 3356033776
> Telefono 02 87157239
> Fax. 02 93664786
>
> *Gruppo Consulenza Innovazione - http://www.gr-ci.com*
>
>
> 2013/1/18 Mohammad Tariq 
>
>> Folks quite often get confused by the name. But this one is just
>> unbeatable :)
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>> cloudfront.blogspot.com
>>
>>
>> On Fri, Jan 18, 2013 at 4:52 PM, Viral Bajaria 
>> wrote:
>>
>>> LOL just amazing... I remember having a similar conversation with
>>> someone who didn't understand meaning of secondary namenode :-)
>>>
>>> Viral
>>> --
>>> From: iwannaplay games
>>> Sent: 1/18/2013 1:24 AM
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: On a lighter note
>>>
>>> Awesome
>>> :)
>>>
>>>
>>>
>>> Regards
>>> Prabhjot
>>>
>>>
>>
>


Re: Query mongodb

2013-01-16 Thread shashwat shriparv
Look at Hive and Lily projects.



∞
Shashwat Shriparv



On Wed, Jan 16, 2013 at 8:10 PM, John Lilley wrote:

>  How does one schedule mappers to read MongoDB or HBase in a
> data-locality-aware fashion?
>
> -john
>
> ** **
>
> *From:* Mohammad Tariq [mailto:donta...@gmail.com]
> *Sent:* Wednesday, January 16, 2013 3:29 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Query mongodb
>
> ** **
>
> Yes. You can use MongoDB-Hadoop adapter to achieve that. Through this
> adapter you can pull the data, process it and push it back to your MongoDB
> backed datastore by writing MR jobs.
>
> ** **
>
> It is also 100% possible to query Hbase or JSON files, or anything else
> for that matter, stored in HDFS.
>
>
> 
>
> Warm Regards,
>
> Tariq
>
> https://mtariq.jux.com/
>
> cloudfront.blogspot.com
>
> ** **
>
> On Wed, Jan 16, 2013 at 3:50 PM, Panshul Whisper 
> wrote:
>
> Hello,
> Is it possible or how is it possible to query mongodb directly from hadoop.
> 
>
> Or is it possible to query hbase or json files stored in hdfs in a similar
> way as we can query the json documents in mongodb.
>
> Suggestions please.
>
> Thank you.
> Regards,
> Panshul.
>
> ** **
>


Re: request on behalf of newbies

2013-01-13 Thread Shashwat Shriparv
Lot are Der on fb lot of blog's are there and last but not the least Google is 
ur friend first do some thing and ask ;)




Sent from Samsung Galaxy NoteMonkey2Code  wrote:Hi all,
Am a newbie in hadoop echo system, just wondering if there is any thread or 
group where newbies can hang in there and take advises or suggestions from 
Guru's . This will definitely helpful for both Guru's  and students  to take 
part in amazing ecosystem.  I see all the group mailings are very to get what 
they are talking about it. If there is group  specially meant  for ppl like me, 
please forgive me and pass those details to me  (really appreciate !! its a big 
favor to me )


Thank you!!
Monkey


request on behalf of newbies 


Re: log server for hadoop MR jobs??

2013-01-11 Thread shashwat shriparv
Have a look on flume..


On Fri, Jan 11, 2013 at 11:58 PM, Xiaowei Li  wrote:

> ct all log generated from






∞
Shashwat Shriparv


Re: HDFS disk space requirement

2013-01-10 Thread shashwat shriparv
115 * 5 = 575 Minimum GB you need, keep in mind on minimal, and you will
have other disk space needs too...



∞
Shashwat Shriparv



On Fri, Jan 11, 2013 at 11:19 AM, Alexander Pivovarov
wrote:

> finish elementary school first. (plus, minus operations at least)
>
>
> On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper wrote:
>
>> Thank you for the response.
>>
>> Actually it is not a single file, I have JSON files that amount to 115
>> GB, these JSON files need to be processed and loaded into a Hbase data
>> tables on the same cluster for later processing. Not considering the disk
>> space required for the Hbase storage, If I reduce the replication to 3, how
>> much more HDFS space will I require?
>>
>> Thank you,
>>
>>
>> On Fri, Jan 11, 2013 at 4:16 AM, Ravi Mutyala wrote:
>>
>>> If the file is a txt file, you could get a good compression ratio.
>>> Changing the replication to 3 and the file will fit. But not sure what your
>>> usecase is what you want to achieve by putting this data there. Any
>>> transformation on this data and you would need more space to save the
>>> transformed data.
>>>
>>> If you have 5 nodes and they are not virtual machines, you should
>>> consider adding more harddisks to your cluster.
>>>
>>>
>>> On Thu, Jan 10, 2013 at 9:02 PM, Panshul Whisper 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a hadoop cluster of 5 nodes with a total of available HDFS space
>>>> 130 GB with replication set to 5.
>>>> I have a file of 115 GB, which needs to be copied to the HDFS and
>>>> processed.
>>>> Do I need to have anymore HDFS space for performing all processing
>>>> without running into any problems? or is this space sufficient?
>>>>
>>>> --
>>>> Regards,
>>>> Ouch Whisper
>>>> 010101010101
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
>>
>
>


Re: queues in haddop

2013-01-10 Thread shashwat shriparv
The attached screenshot will shows how flume will work, and also you can
consider RabbitMQ, as it persistent too..



∞
Shashwat Shriparv



On Fri, Jan 11, 2013 at 10:24 AM, Mohit Anchlia wrote:

> Have you looked at flume?
>
> Sent from my iPhone
>
> On Jan 10, 2013, at 7:12 PM, Panshul Whisper 
> wrote:
>
> > Hello,
> >
> > I have a hadoop cluster setup of 10 nodes and I an in need of
> implementing queues in the cluster for receiving high volumes of data.
> > Please suggest what will be more efficient to use in the case of
> receiving 24 Million Json files.. approx 5 KB each in every 24 hours :
> > 1. Using Capacity Scheduler
> > 2. Implementing RabbitMQ and receive data from them using Spring
> Integration Data pipe lines.
> >
> > I cannot afford to loose any of the JSON files received.
> >
> > Thanking You,
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
>
<>

Re: Log files occupy lot of Disk size

2012-11-23 Thread shashwat shriparv
When you run a hive query it internally runs lot of map reduce tasks, which
intern generates lot of temporary files, so your disk uses grows, so can
you tell which folder is taking most of the spaces?



∞
Shashwat Shriparv




On Fri, Nov 23, 2012 at 1:24 PM, Mohammad Tariq  wrote:

> Harsh has got a point. I was thinking the same, but then I thought maybe
> you need all these log files. If not then do as Harsh has suggested. And
> deleting log files won't affect your Hdfs working, but it will not write
> logs for any operation until the next Hdfs restart.
>
> Regards,
> Mohammad Tariq
>
>
>
> On Fri, Nov 23, 2012 at 1:12 PM, Harsh J  wrote:
>
>> Lower your log levels if you do not need all that verbosity. You can
>> control log retention, max sizes to keep, max number of files to keep,
>> and logging levels, etc. via each components' log4j.properties file.
>>
>> On Fri, Nov 23, 2012 at 12:42 PM, iwannaplay games
>>  wrote:
>> > If i delete the log file without stopping the cluster  won't it
>> > terminate the session.
>> >
>> >
>> >
>> > On 11/23/12, Mohammad Tariq  wrote:
>> >> Hi there,
>> >>
>> >> You can write a small job or some script which periodically checks
>> for
>> >> the log growth and performs the delete after certain threshold.
>> >>
>> >> Regards,
>> >> Mohammad Tariq
>> >>
>> >>
>> >>
>> >> On Fri, Nov 23, 2012 at 12:28 PM, iwannaplay games <
>> >> funnlearnfork...@gmail.com> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Everytime i query hbase or hive ,there is a significant growth in my
>> >>> log files and it consumes lot of space from my hard disk(Approx 40
>> >>> gb)
>> >>> So i stop the cluster ,delete all the logs and free the space and then
>> >>> again start the cluster to start my work.
>> >>>
>> >>> Is there any other solution coz i cannot restart the cluster everyday.
>> >>>
>> >>
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-15 Thread shashwat shriparv
It seems u have formatted the namenode twice. In this case the namespace id
is not replicated to the datanodes.

Stop all the hadoop daemons and then try the following :

1. vi $PATH_TO_HADOOP_DATASTORE/dfs/name/current/VERSION and copy the
namespaceID value.
2.Now open a terminal on every machine having a datanode and do the
following:
vi $PATH_TO_HADOOP_DATASTORE/dfs/data/current/VERSION
Delete the entry corresponding to namespaceID and paste the value copied in
Step 1.
Save and exit.

Restart the hadoop daemons without formatting the namenode.



∞
Shashwat Shriparv




On Fri, Nov 16, 2012 at 1:22 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> Delete the VERSION for the datanode before format.
>
>
>
> ∞
> Shashwat Shriparv
>
>
>
>
> On Fri, Nov 16, 2012 at 1:15 PM, hadoop hive  wrote:
>
>> Seems like you havn't format your cluster (if its 1st time made).
>>
>> On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hk  wrote:
>>
>>> Hi,
>>>
>>> Please help!
>>>
>>> I have installed a Hadoop Cluster with a single master (master1) and
>>> have HBase running on the HDFS.  Now I am setting up the second master
>>>  (master2) in order to form HA.  When I used JPS to check the cluster, I
>>> found :
>>>
>>> 2782 Jps
>>> 2126 NameNode
>>> 2720 SecondaryNameNode
>>> i.e. The datanode on this server could not be started
>>>
>>> In the log file, found:
>>> 2012-11-16 10:28:44,851 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
>>> Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
>>> = 1356148070; datanode namespaceID = 1151604993
>>>
>>>
>>>
>>> One of the possible solutions to fix this issue is to:  stop the
>>> cluster, reformat the NameNode, restart the cluster.
>>> QUESTION: As I already have HBASE running on the cluster, if I reformat
>>> the NameNode, do I need to reinstall the entire HBASE? I don't mind to have
>>> all data lost as I don't have many data in HBASE and HDFS, however I don't
>>> want to re-install HBASE again.
>>>
>>>
>>> On the other hand, I have tried another solution: stop the DataNode,
>>> edit the namespaceID in current/VERSION (i.e. set namespaceID=1151604993),
>>> restart the datanode, it doesn't work:
>>> Warning: $HADOOP_HOME is deprecated.
>>> starting master2, logging to
>>> /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out
>>> Exception in thread "main" java.lang.NoClassDefFoundError: master2
>>> Caused by: java.lang.ClassNotFoundException: master2
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>> Could not find the main class: master2.  Program will exit.
>>> QUESTION: Any other solutions?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-15 Thread shashwat shriparv
Delete the VERSION for the datanode before format.



∞
Shashwat Shriparv




On Fri, Nov 16, 2012 at 1:15 PM, hadoop hive  wrote:

> Seems like you havn't format your cluster (if its 1st time made).
>
> On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hk  wrote:
>
>> Hi,
>>
>> Please help!
>>
>> I have installed a Hadoop Cluster with a single master (master1) and have
>> HBase running on the HDFS.  Now I am setting up the second master
>>  (master2) in order to form HA.  When I used JPS to check the cluster, I
>> found :
>>
>> 2782 Jps
>> 2126 NameNode
>> 2720 SecondaryNameNode
>> i.e. The datanode on this server could not be started
>>
>> In the log file, found:
>> 2012-11-16 10:28:44,851 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
>> Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
>> = 1356148070; datanode namespaceID = 1151604993
>>
>>
>>
>> One of the possible solutions to fix this issue is to:  stop the cluster,
>> reformat the NameNode, restart the cluster.
>> QUESTION: As I already have HBASE running on the cluster, if I reformat
>> the NameNode, do I need to reinstall the entire HBASE? I don't mind to have
>> all data lost as I don't have many data in HBASE and HDFS, however I don't
>> want to re-install HBASE again.
>>
>>
>> On the other hand, I have tried another solution: stop the DataNode, edit
>> the namespaceID in current/VERSION (i.e. set namespaceID=1151604993),
>> restart the datanode, it doesn't work:
>> Warning: $HADOOP_HOME is deprecated.
>> starting master2, logging to
>> /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out
>> Exception in thread "main" java.lang.NoClassDefFoundError: master2
>> Caused by: java.lang.ClassNotFoundException: master2
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> Could not find the main class: master2.  Program will exit.
>> QUESTION: Any other solutions?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>


Re: what happens when a datanode rejoins?

2012-09-11 Thread shashwat shriparv
Yes the cluster will be re balanced.

On Tue, Sep 11, 2012 at 2:09 PM, mehul choube  wrote:

> Hi,
>
>
> What happens when an existing (not new) datanode rejoins a cluster for
> following scenarios:
>
>
> a) Some of the blocks it was managing are deleted/modified?
>
> b) The size of the blocks are now modified say from 64MB to 128MB?
>
> c) What if the block replication factor was one (yea not in most
> deployments but say in case) so does the namenode recreate a file once the
> datanode rejoins?
>
>
>
>
> Thanks,
>
> Mehul
>
>
>


-- 


∞
Shashwat Shriparv


Re: word count stopped at Map 100%, Reduce 23%

2012-09-05 Thread shashwat shriparv
by downloading some files in text format   ---> what this means.. if
you think everything is fine your configuration setting then please take

full Java thread dump and post it.


∞
Shashwat Shriparv




On Tue, Sep 4, 2012 at 2:40 PM, Muzaffar Ali Ismail
wrote:

> Hi All,
>  I am new to Hadoop world. I just setup/configured Hadoop cluster with
> two nodes, a master and slave. The master node also acting as slave and
> running NameNode for the HDFS storage layer, and JobTracker for the
> MapReduce processing layer other than DataNode and JobTracker. Second
> machine (slave) is running DataNode and TaskTracker only.
> I started multi node cluster and is running fine. Every thing is ok. No
> errors, no warning.
>
> But when I run the WordCount examples (by downloading some files in text
> format), it hangs at the following message on console:
> "12/10/04 08:49:09 INFO mapred.JobClient:  map 100% reduce 23%"
> There is still no errors, exceptions in the logs.
>
> Anyone any idea. Really appreciate quick response.
>
> Thanks,
> Muzaffar
>



-- 


∞
Shashwat Shriparv


RE: Opening hdfs gz file

2012-08-18 Thread Shashwat Shriparv
Give command 

 

Jps

 

If you can see Namenode in the process list you can say your Namenode is
running.

 

From: rahul p [mailto:rahulpoolancha...@gmail.com] 
Sent: Saturday, August 18, 2012 4:46 PM
To: user@hadoop.apache.org
Subject: Re: Opening hdfs gz file

 

hi , how do i make my namenode up.?

can someone help me?

On Fri, Aug 17, 2012 at 2:52 PM, Harsh J mailto:ha...@cloudera.com> > wrote:

On the shell, just do:

$ hadoop fs -text /user/hive/warehouse/sample/loc=US/01_0.gz


On Fri, Aug 17, 2012 at 12:18 PM, prabhu K mailto:prabhu.had...@gmail.com> > wrote:
> Hi Users,
>
> How to open HDFS zip file(.gz) file in hadoop.?
>
> example:
>
> bin/hadoop fs -ls /user/hive/warehouse/sample
> -rw-r--r--   4 root supergroup  465141227 2012-08-14 17:02
> /user/hive/warehouse/sample/loc=US/01_0.gz
>
> i want to see the '01_0.gz' file content. can anyone knows how to open
> this file?
>
> Thanks,
> Prabhu




--
Harsh J

 



RE: Relevance of mapreduce.* configuration properties for MR V2

2012-08-18 Thread Shashwat Shriparv
What is content of your /etc/hosts file?

 

From: rahul p [mailto:rahulpoolancha...@gmail.com] 
Sent: Saturday, August 18, 2012 4:49 PM
To: user@hadoop.apache.org
Subject: Re: Relevance of mapreduce.* configuration properties for MR V2

 

Hi,

When i give bin/start-fs all 

namenode s not coming up..

On Fri, Aug 17, 2012 at 12:28 AM, mg mailto:userformailingli...@gmail.com> > wrote:

Hi,

I am currently trying to tune a CDH 4.0.1 (i~ hadoop 2.0.0-alpha) cluster
running HDFS, YARN, and HBase managed by Cloudera Manager 4.0.3 (Free
Edition).

In CM, there are a number of options for setting mapreduce.* configuration
properties on the YARN client page.

Some of the properties and their explanations in the GUI still refer to
JobTracker and TaskTracker, e.g.,
- mapreduce.jobtracker.handler.count,
- mapreduce.tasktracker.map.tasks.maximum,
- mapreduce.tasktracker.reduce.tasks.maximum

I wonder whether these and a number of other mapreduce.* (e.g.,
mapreduce.job.reduces) properties are observed by the MR2 ApplicationMaster
(and how), or not.

Can anyone clarify or point to respective documentation?

Thanks,
Martin

 



RE: hadoop 1.0.3 config exception

2012-08-18 Thread Shashwat Shriparv
What is your host file configuration the problem related to namenode is
almost always due to the host configuration.

 

From: rahul p [mailto:rahulpoolancha...@gmail.com] 
Sent: Saturday, August 18, 2012 5:07 PM
To: user@hadoop.apache.org
Subject: Re: hadoop 1.0.3 config exception

 

Hi Ben,

Can you help me resolve this issue.

i am new to hadoop and java.

i facing issue in bringing up my NameNode.

 

On Fri, Aug 17, 2012 at 11:59 PM, Ben Cuthbert mailto:bencuthb...@ymail.com> > wrote:

All

 

We are getting the following show in when we talk to hadoop 1.0.3

 

Seems it relates to these lines in Configuration.java 

 

public
 Configuration(boolean
loadDefaults) {

225  this.loadDefaults = loadDefaults;

226  if (LOG.isDebugEnabled()) {

227  LOG.debug(
 StringUtils.
 stringifyException(new
IOException("config()")));

228  }

229  synchronized(
 Configuration.class) {

230  REGISTRY.put(this, null);

231  }

232  this.storeResource = false;

233  }

 

 

Why is this here?

 

 

2012-08-17 16:53:11,133 (hdfs-hdfs-sink-call-runner-4) [DEBUG -
org.apache.hadoop.conf.Configuration.(Configuration.java:227)]
java.io.IOException: config()

at org.apache.hadoop.conf.Configuration.(Configuration.java:227)

at org.apache.hadoop.conf.Configuration.(Configuration.java:214)

at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:170)

at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:48)

at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:155)

at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:152)

at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:125)

at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:152)

at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:307)

at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:717)

at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:714)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

at java.lang.Thread.run(Thread.java:680)

 

 



Re: namenode instantiation error

2012-08-10 Thread shashwat shriparv
I think you need to install and configure ssh



On Thu, Aug 9, 2012 at 4:30 PM, anand sharma  wrote:

> Thanks all for reply, yes the user has access to that directory and i have
> already formatted the namenode; just for simplicity i am not using ssh as i
> am doing things for the first time.
>
> On Thu, Aug 9, 2012 at 3:58 PM, shashwat shriparv <
> dwivedishash...@gmail.com> wrote:
>
>> format the filesystem
>>
>> bin/hadoop namenode -format
>>
>> then try to start namenode :)
>>
>>
>> On Thu, Aug 9, 2012 at 3:51 PM, Mohammad Tariq wrote:
>>
>>> Hello Anand,
>>>
>>> Is there any specific reason behind not using ssh??
>>>
>>> Regards,
>>> Mohammad Tariq
>>>
>>>
>>> On Thu, Aug 9, 2012 at 3:46 PM, anand sharma 
>>> wrote:
>>> > Hi, i am just learning the Hadoop and i am setting the development
>>> > environment with CDH3 pseudo distributed mode without any ssh
>>> cofiguration
>>> > in CentOS 6.2 . i can run the sample programs as usual but when i try
>>> and
>>> > run namenode this is the error it logs...
>>> >
>>> > [hive@localhost ~]$ hadoop namenode
>>> > 12/08/09 20:56:57 INFO namenode.NameNode: STARTUP_MSG:
>>> > /
>>> > STARTUP_MSG: Starting NameNode
>>> > STARTUP_MSG:   host = localhost.localdomain/127.0.0.1
>>> > STARTUP_MSG:   args = []
>>> > STARTUP_MSG:   version = 0.20.2-cdh3u4
>>> > STARTUP_MSG:   build =
>>> file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u4
>>> > -r 214dd731e3bdb687cb55988d3f47dd9e248c5690; compiled by 'root' on Mon
>>> May
>>> > 7 14:01:59 PDT 2012
>>> > /
>>> > 12/08/09 20:56:57 INFO jvm.JvmMetrics: Initializing JVM Metrics with
>>> > processName=NameNode, sessionId=null
>>> > 12/08/09 20:56:57 INFO metrics.NameNodeMetrics: Initializing
>>> > NameNodeMeterics using context
>>> > object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext
>>> > 12/08/09 20:56:57 INFO util.GSet: VM type   = 64-bit
>>> > 12/08/09 20:56:57 INFO util.GSet: 2% max memory = 17.77875 MB
>>> > 12/08/09 20:56:57 INFO util.GSet: capacity  = 2^21 = 2097152
>>> entries
>>> > 12/08/09 20:56:57 INFO util.GSet: recommended=2097152, actual=2097152
>>> > 12/08/09 20:56:57 INFO namenode.FSNamesystem: fsOwner=hive
>>> (auth:SIMPLE)
>>> > 12/08/09 20:56:57 INFO namenode.FSNamesystem: supergroup=supergroup
>>> > 12/08/09 20:56:57 INFO namenode.FSNamesystem: isPermissionEnabled=false
>>> > 12/08/09 20:56:57 INFO namenode.FSNamesystem:
>>> > dfs.block.invalidate.limit=1000
>>> > 12/08/09 20:56:57 INFO namenode.FSNamesystem:
>>> isAccessTokenEnabled=false
>>> > accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
>>> > 12/08/09 20:56:57 INFO metrics.FSNamesystemMetrics: Initializing
>>> > FSNamesystemMetrics using context
>>> > object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext
>>> > 12/08/09 20:56:57 ERROR namenode.FSNamesystem: FSNamesystem
>>> initialization
>>> > failed.
>>> > java.io.FileNotFoundException:
>>> > /var/lib/hadoop-0.20/cache/hadoop/dfs/name/in_use.lock (Permission
>>> denied)
>>> > at java.io.RandomAccessFile.open(Native Method)
>>> > at java.io.RandomAccessFile.(RandomAccessFile.java:216)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:614)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:591)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:449)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:304)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:372)
>>> > at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:335)

Re: namenode instantiation error

2012-08-09 Thread shashwat shriparv
t; >
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:372)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:335)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:467)
> > at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1330)
> > at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1339)
> >
> > 12/08/09 20:56:57 INFO namenode.NameNode: SHUTDOWN_MSG:
> > /
> > SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
> > /
> >
> >
>



-- 


∞
Shashwat Shriparv