date:20111010

Re: hadoop knowledge gaining

2011-10-10 Thread Steve Loughran


On 07/10/11 15:25, Jignesh Patel wrote:

Guys,
I am able to deploy the first program word count using hadoop. I am interesting 
exploring more about hadoop and Hbase and don't know which is the best way to 
grasp both of them.

I have hadoop in action but it has older api.


Actually the API covered in the 2nd edition is pretty much the one in 
widest use. The newer API is better, but is only as complete in hadoop 
0.21 and later, which aren't yet in wide use



I do also have Hbase definitive guide which I have not started exploring.


Think of a problem, get some data, go through the books. Learning more 
about statistics and datamining is what you really need to learn, more 
than just the hadoop APIs


-steve

Re: ways to expand hadoop.tmp.dir capacity?

2011-10-10 Thread Marcos Luis Ortiz Valmaseda

2011/10/9 Harsh J ha...@cloudera.com

Hello Meng,

On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao meng...@gmail.com wrote:
Currently, we've got defined:
property
namehadoop.tmp.dir/name
value/hadoop/hadoop-metadata/cache//value
/property

In our experiments with SOLR, the intermediate files are so large that
they
tend to blow out disk space and fail (and annoyingly leave behind their
huge
failed attempts). We've had issues with it in the past, but we're having
real problems with SOLR if we can't comfortably get more space out of
hadoop.tmp.dir somehow.

1) It seems we never set *mapred.system.dir* to anything special, so it's
defaulting to ${hadoop.tmp.dir}/mapred/system.
Is this a problem? The docs seem to recommend against it when
hadoop.tmp.dir
had ${user.name} in it, which ours doesn't.

The {mapred.system.dir} is a HDFS location, and you shouldn't really
be worried about it as much.

1b) The doc says mapred.system.dir is the in-HDFS path to shared
MapReduce
system files. To me, that means there's must be 1 single path for
mapred.system.dir, which sort of forces hadoop.tmp.dir to be 1 path.
Otherwise, one might imagine that you could specify multiple paths to
store
hadoop.tmp.dir, like you can for dfs.data.dir. Is this a correct
interpretation? -- hadoop.tmp.dir could live on multiple paths/disks if
there were more mapping/lookup between mapred.system.dir and
hadoop.tmp.dir?

{hadoop.tmp.dir} is indeed reused for {mapred.system.dir}, although it
is on HDFS, and hence is confusing, but there should just be one
mapred.system.dir, yes.

Also, the config {hadoop.tmp.dir} doesn't support 1 path. What you
need here is a proper {mapred.local.dir} configuration.

2) IIRC, there's a -D switch for supplying config name/value pairs into
indivdiual jobs. Does such a switch exist? Googling for single letters is
fruitless. If we had a path on our workers with more space (in our case,
another hard disk), could we simply pass that path in as hadoop.tmp.dir
for
our SOLR jobs? Without incurring any consistency issues on future jobs
that
might use the SOLR output on HDFS?

Only a few parameters of a job are user-configurable. Stuff like
hadoop.tmp.dir and mapred.local.dir are not override-able by user set
parameters as they are server side configurations (static).

Given that the default value is ${hadoop.tmp.dir}/mapred/local, would the
expanded capacity we're looking for be as easily accomplished as by
defining
mapred.local.dir to span multiple disks? Setting aside the issue of temp
files so big that they could still fill a whole disk.

1. You can set mapred.local.dir independent of hadoop.tmp.dir
2. mapred.local.dir can have comma separated values in it, spanning
multiple disks
3. Intermediate outputs may spread across these disks but shall not
consume 1 disk at a time. So if your largest configured disk is 500
GB while the total set of them may be 2 TB, then your intermediate
output size can't really exceed 500 GB, cause only one disk is
consumed by one task -- the multiple disks are for better I/O
parallelism between tasks.

Know that hadoop.tmp.dir is a convenience property, for quickly
starting up dev clusters and such. For a proper configuration, you
need to remove dependency on it (almost nothing uses hadoop.tmp.dir on
the server side, once the right properties are configured - ex:
dfs.data.dir, dfs.name.dir, fs.checkpoint.dir, mapred.local.dir, etc.)

--
Harsh J

Here it's a excellent explanation how to install Apache Hadoop manually, and
Lars explains this very good.

http://blog.lars-francke.de/2011/01/26/setting-up-a-hadoop-cluster-part-1-manual-installation/

Regards

--
Marcos Luis Ortíz Valmaseda
Linux Infrastructure Engineer
Linux User # 418229
http://marcosluis2186.posterous.com
http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186

Developing MapReduce

2011-10-10 Thread Mohit Anchlia

I use eclipse. Is this http://wiki.apache.org/hadoop/EclipsePlugIn
still the best way to develop mapreduce programs in hadoop? Just want
to make sure before I go down this path.

Or should I just add hadoop jars in my classpath of eclipse and create
my own MapReduce programs.

Thanks

Re: Developing MapReduce

2011-10-10 Thread Jignesh Patel

When you download the hadoop in its dist(i don't remember  exact name) there is 
a related plugin. Go and get it from there. 
On Oct 10, 2011, at 10:34 AM, Mohit Anchlia wrote:

 I use eclipse. Is this http://wiki.apache.org/hadoop/EclipsePlugIn
 still the best way to develop mapreduce programs in hadoop? Just want
 to make sure before I go down this path.
 
 Or should I just add hadoop jars in my classpath of eclipse and create
 my own MapReduce programs.
 
 Thanks

How to iterate over a hdfs folder with hadoop

2011-10-10 Thread Raimon Bosch

Hi,

I'm wondering how can I browse an hdfs folder using the classes
in org.apache.hadoop.fs package. The operation that I'm looking for is
'hadoop dfs -ls'

The standard file system equivalent would be:

File f = new File(outputPath);
if(f.isDirectory()){
  String files[] = f.list();
  for(String file : files){
//Do your logic
  }
}

Thanks in advance,
Raimon Bosch.

Re: How to iterate over a hdfs folder with hadoop

2011-10-10 Thread John Conwell

FileStatus[] files = fs.listStatus(new Path(path));

for (FileStatus fileStatus : files)

{

//...do stuff ehre

}

On Mon, Oct 10, 2011 at 8:03 AM, Raimon Bosch raimon.bo...@gmail.comwrote:

 Hi,

 I'm wondering how can I browse an hdfs folder using the classes
 in org.apache.hadoop.fs package. The operation that I'm looking for is
 'hadoop dfs -ls'

 The standard file system equivalent would be:

 File f = new File(outputPath);
 if(f.isDirectory()){
  String files[] = f.list();
  for(String file : files){
//Do your logic
  }
 }

 Thanks in advance,
 Raimon Bosch.




-- 

Thanks,
John C

Re: hadoop input buffer size

2011-10-10 Thread Uma Maheswara Rao G 72686

I think below can give you more info about it.
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/
Nice explanation by Owen here.

Regards,
Uma

- Original Message -
From: Yang Xiaoliang yangxiaoliang2...@gmail.com
Date: Wednesday, October 5, 2011 4:27 pm
Subject: Re: hadoop input buffer size
To: common-user@hadoop.apache.org

 Hi,

 Hadoop neither read one line each time, nor fetching 
 dfs.block.size of lines
 into a buffer,
 Actually, for the TextInputFormat, it read io.file.buffer.size 
 bytes of text
 into a buffer each time,
 this can be seen from the hadoop source file LineReader.java

 2011/10/5 Mark question markq2...@gmail.com

  Hello,

   Correct me if I'm wrong, but when a program opens n-files at 
 the same time
  to read from, and start reading from each file at a time 1 line 
 at a time.
  Isn't hadoop actually fetching dfs.block.size of lines into a 
 buffer? and
  not actually one line.

   If this is correct, I set up my dfs.block.size = 3MB and each 
 line takes
  about 650 bytes only, then I would assume the performance for 
 reading 1-4000
  lines would be the same, but it isn't !  Do you know a way to 
 find #n of
  lines to be read at once?

  Thank you,
  Mark

Custom InputFormat for Multiline Input File Hive/Hadoop

2011-10-10 Thread Mike Sukmanowsky

Hi all,

Sending this to core-u...@hadoop.apache.org and d...@hive.apache.org.

Trying to process Omniture's data log files with Hadoop/Hive. The file
format is tab delimited and while being pretty simple for the most part,
they do allow you to have multiple new lines and tabs within a field that
are escaped by a backslash (\\n and \\t). As a result I've opted to create
my own InputFormat to handle the multiple newlines and convert those tabs to
spaces when Hive is going to try to do a split on the tabs.

I've found a fairly good reference for doing this using the newer
InputFormat API at http://blog.rguha.net/?p=293 but unfortunately my version
of Hive (0.7.0) still uses the old InputFormat API.

I haven't been able to find many tutorials on writing a custom InputFile
using the older API so I'm looking to see if I can get some guidance as to
what may be wrong with the following two classes:

https://gist.github.com/3141e9d27d4e07f5f9ed
https://gist.github.com/79fdab227950a0776616

The SELECT statements within hive currently return nothing and my other
variations returned nothing but NULL values.

This issue is also available on StackOverflow at
http://stackoverflow.com/questions/7692994/custom-inputformat-with-hive.

If there's a resource someone can point me to that'd also be great.

Many thanks in advance,
Mike

Re: How to iterate over a hdfs folder with hadoop

2011-10-10 Thread Uma Maheswara Rao G 72686


Yes, FileStatus class would be trhe equavalent for list.
 FileStstus has the API's isDir and getPath. This both api's can satify for 
your futher usage.:-)

I think small difference would be, FileStatus will ensure the sorted order.

Regards,
Uma
- Original Message -
From: John Conwell j...@iamjohn.me
Date: Monday, October 10, 2011 8:40 pm
Subject: Re: How to iterate over a hdfs folder with hadoop
To: common-user@hadoop.apache.org

 FileStatus[] files = fs.listStatus(new Path(path));
 
 for (FileStatus fileStatus : files)
 
 {
 
 //...do stuff ehre
 
 }
 
 On Mon, Oct 10, 2011 at 8:03 AM, Raimon Bosch 
 raimon.bo...@gmail.comwrote:
  Hi,
 
  I'm wondering how can I browse an hdfs folder using the classes
  in org.apache.hadoop.fs package. The operation that I'm looking 
 for is
  'hadoop dfs -ls'
 
  The standard file system equivalent would be:
 
  File f = new File(outputPath);
  if(f.isDirectory()){
   String files[] = f.list();
   for(String file : files){
 //Do your logic
   }
  }
 
  Thanks in advance,
  Raimon Bosch.
 
 
 
 
 -- 
 
 Thanks,
 John C

Re: How to iterate over a hdfs folder with hadoop

2011-10-10 Thread Raimon Bosch

Thanks John!

There is the complete solution:


Configuration jc = new Configuration();
Object files[] = null;
List files_in_hdfs = new ArrayList();

FileSystem fs = FileSystem.get(jc);
FileStatus[] file_status = fs.listStatus(new Path(outputPath));
for (FileStatus fileStatus : file_status) {
  files_in_hdfs.add(fileStatus.getPath().getName());
}

files = files_in_hdfs.toArray();

2011/10/10 John Conwell j...@iamjohn.me

 FileStatus[] files = fs.listStatus(new Path(path));

 for (FileStatus fileStatus : files)

 {

 //...do stuff ehre

 }

 On Mon, Oct 10, 2011 at 8:03 AM, Raimon Bosch raimon.bo...@gmail.com
 wrote:

  Hi,
 
  I'm wondering how can I browse an hdfs folder using the classes
  in org.apache.hadoop.fs package. The operation that I'm looking for is
  'hadoop dfs -ls'
 
  The standard file system equivalent would be:
 
  File f = new File(outputPath);
  if(f.isDirectory()){
   String files[] = f.list();
   for(String file : files){
 //Do your logic
   }
  }
 
  Thanks in advance,
  Raimon Bosch.
 



 --

 Thanks,
 John C

Re: hdfs directory location

2011-10-10 Thread bejoy . hadoop

Jignesh
   You are creating a dir in hdfs by that command. The dir won't be in your 
local file system but it hdfs. Issue a command like
hadoop fs -ls /user/hadoop-user/citation/
You can see the dir you created in hdfs

If you want to create a die on local unix use a simple linux command
mkdir /user/hadoop-user/citation/input


--Original Message--
From: Jignesh Patel
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: hdfs directory location
Sent: Oct 10, 2011 23:45

I am using following command to create a file in Unix(i.e. mac) system. 

bin/hadoop fs -mkdir /user/hadoop-user/citation/input

While it creates the directory I need, I am struggling to figure out exact 
location of the folder in my local box.





Regards
Bejoy K S

Re: hdfs directory location

2011-10-10 Thread Jignesh Patel

Bejoy,

If I create a directory in unix box then how I can link it with HDFS directory 
structure?

-Jignesh
On Oct 10, 2011, at 2:59 PM, bejoy.had...@gmail.com wrote:

 Jignesh
   You are creating a dir in hdfs by that command. The dir won't be in 
 your local file system but it hdfs. Issue a command like
 hadoop fs -ls /user/hadoop-user/citation/
 You can see the dir you created in hdfs
 
 If you want to create a die on local unix use a simple linux command
 mkdir /user/hadoop-user/citation/input
 
 
 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: hdfs directory location
 Sent: Oct 10, 2011 23:45
 
 I am using following command to create a file in Unix(i.e. mac) system. 
 
 bin/hadoop fs -mkdir /user/hadoop-user/citation/input
 
 While it creates the directory I need, I am struggling to figure out exact 
 location of the folder in my local box.
 
 
 
 
 
 Regards
 Bejoy K S

Re: Developing MapReduce

2011-10-10 Thread bejoy . hadoop

Hi Mohit
I'm really not sure how many of the map reduce developers use the map 
reduce eclipse plugin. AFAIK majority don't. As Jignesh mentioned you can get 
it from the hadoop distribution folder as soon as you unzip the same. 
My suggested approach would be,If you are on Windows OS, you can test run your 
map reduce code in two ways.
-set up cygwin in Windows, atop you can set up hadoop and related tools. It is 
a little messy.
-Use a linux VM image. I'd recommend  Cloudera test VM,as it comes pre 
configured with the whole hadoop technology stack. It really segregates the 
developer from the hassles of installing the hadoop tools and making them up 
and running.

In Linux or Mac you can just add the hadoop jars to your class path and run the 
driver class as just how you run a java class within eclipse.(Here hadoop would 
be on standalone mode).

Hope it helps!...


--Original Message--
From: Jignesh Patel
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Re: Developing MapReduce
Sent: Oct 10, 2011 20:31

When you download the hadoop in its dist(i don't remember  exact name) there is 
a related plugin. Go and get it from there. 
On Oct 10, 2011, at 10:34 AM, Mohit Anchlia wrote:

 I use eclipse. Is this http://wiki.apache.org/hadoop/EclipsePlugIn
 still the best way to develop mapreduce programs in hadoop? Just want
 to make sure before I go down this path.
 
 Or should I just add hadoop jars in my classpath of eclipse and create
 my own MapReduce programs.
 
 Thanks



Regards
Bejoy K S

Re: hdfs directory location

2011-10-10 Thread bejoy . hadoop

Jignesh
Sorry I didn't get your query, 'how I can link it with HDFS 
directory structure?
' 

You mean putting your unix dir contents into hdfs? If so use hadoop fs 
-copyFromLocal src destn 
--Original Message--
From: Jignesh Patel
To: common-user@hadoop.apache.org
To: bejoy.had...@gmail.com
Subject: Re: hdfs directory location
Sent: Oct 11, 2011 01:18

Bejoy,

If I create a directory in unix box then how I can link it with HDFS directory 
structure?

-Jignesh
On Oct 10, 2011, at 2:59 PM, bejoy.had...@gmail.com wrote:

 Jignesh
   You are creating a dir in hdfs by that command. The dir won't be in 
 your local file system but it hdfs. Issue a command like
 hadoop fs -ls /user/hadoop-user/citation/
 You can see the dir you created in hdfs

 If you want to create a die on local unix use a simple linux command
 mkdir /user/hadoop-user/citation/input

 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: hdfs directory location
 Sent: Oct 10, 2011 23:45

 I am using following command to create a file in Unix(i.e. mac) system. 

 bin/hadoop fs -mkdir /user/hadoop-user/citation/input

 While it creates the directory I need, I am struggling to figure out exact 
 location of the folder in my local box.

 Regards
 Bejoy K S

Regards
Bejoy K S

Re: hdfs directory location

2011-10-10 Thread Jignesh Patel

Bejoy,

copyToLocal makes sense, it worked. But I am still wondering if HDFS has a 
directory created on local box, somewhere it exist physically but couldn't able 
to locate it.

Is HDFS directory structure is a virtual structure, doesn't exist physically?

-Jignesh
On Oct 10, 2011, at 3:53 PM, bejoy.had...@gmail.com wrote:

 Jignesh
Sorry I didn't get your query, 'how I can link it with HDFS 
 directory structure?
 ' 
 
 You mean putting your unix dir contents into hdfs? If so use hadoop fs 
 -copyFromLocal src destn 
 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 To: bejoy.had...@gmail.com
 Subject: Re: hdfs directory location
 Sent: Oct 11, 2011 01:18
 
 Bejoy,
 
 If I create a directory in unix box then how I can link it with HDFS 
 directory structure?
 
 -Jignesh
 On Oct 10, 2011, at 2:59 PM, bejoy.had...@gmail.com wrote:
 
 Jignesh
  You are creating a dir in hdfs by that command. The dir won't be in 
 your local file system but it hdfs. Issue a command like
 hadoop fs -ls /user/hadoop-user/citation/
 You can see the dir you created in hdfs
 
 If you want to create a die on local unix use a simple linux command
 mkdir /user/hadoop-user/citation/input
 
 
 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: hdfs directory location
 Sent: Oct 10, 2011 23:45
 
 I am using following command to create a file in Unix(i.e. mac) system. 
 
 bin/hadoop fs -mkdir /user/hadoop-user/citation/input
 
 While it creates the directory I need, I am struggling to figure out exact 
 location of the folder in my local box.
 
 
 
 
 
 Regards
 Bejoy K S
 
 
 
 Regards
 Bejoy K S

Re: hdfs directory location

2011-10-10 Thread bejoy . hadoop

Jignesh
 You are absolutely right. In hdfs directory doesn't exist physically. It is 
just meta data on name node. I don't think such a dir structure would be there 
in name node lfs as well as it just meta data and hence no physical dir 
structure is  created.

Regards
Bejoy K S

-Original Message-
From: Jignesh Patel jign...@websoft.com
Date: Mon, 10 Oct 2011 16:02:53 
To: bejoy.had...@gmail.com
Cc: common-user@hadoop.apache.org
Subject: Re: hdfs directory location

Bejoy,

copyToLocal makes sense, it worked. But I am still wondering if HDFS has a 
directory created on local box, somewhere it exist physically but couldn't able 
to locate it.

Is HDFS directory structure is a virtual structure, doesn't exist physically?

-Jignesh
On Oct 10, 2011, at 3:53 PM, bejoy.had...@gmail.com wrote:

 Jignesh
Sorry I didn't get your query, 'how I can link it with HDFS 
 directory structure?
 ' 
 
 You mean putting your unix dir contents into hdfs? If so use hadoop fs 
 -copyFromLocal src destn 
 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 To: bejoy.had...@gmail.com
 Subject: Re: hdfs directory location
 Sent: Oct 11, 2011 01:18
 
 Bejoy,
 
 If I create a directory in unix box then how I can link it with HDFS 
 directory structure?
 
 -Jignesh
 On Oct 10, 2011, at 2:59 PM, bejoy.had...@gmail.com wrote:
 
 Jignesh
  You are creating a dir in hdfs by that command. The dir won't be in 
 your local file system but it hdfs. Issue a command like
 hadoop fs -ls /user/hadoop-user/citation/
 You can see the dir you created in hdfs
 
 If you want to create a die on local unix use a simple linux command
 mkdir /user/hadoop-user/citation/input
 
 
 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: hdfs directory location
 Sent: Oct 10, 2011 23:45
 
 I am using following command to create a file in Unix(i.e. mac) system. 
 
 bin/hadoop fs -mkdir /user/hadoop-user/citation/input
 
 While it creates the directory I need, I am struggling to figure out exact 
 location of the folder in my local box.
 
 
 
 
 
 Regards
 Bejoy K S
 
 
 
 Regards
 Bejoy K S

Re: hdfs directory location

2011-10-10 Thread Arko Provo Mukherjee

Hi,

I guess what you are wanting is to see your HDFS directory through normal
File System commands like ls etc or by browsing your directory structure.

This is not possible as none of your commands or Finder (in Mac) have
ability to read / write HDFS. So they don't have the capability to show HDFS
directories.

Hence, the HDFS directory structure must be viewed using the HDFS tools and
not the Operating System FS commands.

Hope this helps!

Warm regards
Arko

On Mon, Oct 10, 2011 at 3:08 PM, bejoy.had...@gmail.com wrote:

 Jignesh
  You are absolutely right. In hdfs directory doesn't exist physically. It
 is just meta data on name node. I don't think such a dir structure would be
 there in name node lfs as well as it just meta data and hence no physical
 dir structure is  created.

 Regards
 Bejoy K S

 -Original Message-
 From: Jignesh Patel jign...@websoft.com
 Date: Mon, 10 Oct 2011 16:02:53
 To: bejoy.had...@gmail.com
 Cc: common-user@hadoop.apache.org
 Subject: Re: hdfs directory location

 Bejoy,

 copyToLocal makes sense, it worked. But I am still wondering if HDFS has a
 directory created on local box, somewhere it exist physically but couldn't
 able to locate it.

 Is HDFS directory structure is a virtual structure, doesn't exist
 physically?

 -Jignesh
 On Oct 10, 2011, at 3:53 PM, bejoy.had...@gmail.com wrote:

  Jignesh
 Sorry I didn't get your query, 'how I can link it with HDFS
  directory structure?
  '
 
  You mean putting your unix dir contents into hdfs? If so use hadoop fs
 -copyFromLocal src destn
  --Original Message--
  From: Jignesh Patel
  To: common-user@hadoop.apache.org
  To: bejoy.had...@gmail.com
  Subject: Re: hdfs directory location
  Sent: Oct 11, 2011 01:18
 
  Bejoy,
 
  If I create a directory in unix box then how I can link it with HDFS
 directory structure?
 
  -Jignesh
  On Oct 10, 2011, at 2:59 PM, bejoy.had...@gmail.com wrote:
 
  Jignesh
   You are creating a dir in hdfs by that command. The dir won't be in
 your local file system but it hdfs. Issue a command like
  hadoop fs -ls /user/hadoop-user/citation/
  You can see the dir you created in hdfs
 
  If you want to create a die on local unix use a simple linux command
  mkdir /user/hadoop-user/citation/input
 
 
  --Original Message--
  From: Jignesh Patel
  To: common-user@hadoop.apache.org
  ReplyTo: common-user@hadoop.apache.org
  Subject: hdfs directory location
  Sent: Oct 10, 2011 23:45
 
  I am using following command to create a file in Unix(i.e. mac) system.
 
  bin/hadoop fs -mkdir /user/hadoop-user/citation/input
 
  While it creates the directory I need, I am struggling to figure out
 exact location of the folder in my local box.
 
 
 
 
 
  Regards
  Bejoy K S
 
 
 
  Regards
  Bejoy K S

ssh setup stop working

2011-10-10 Thread Jignesh Patel

I have created private key setup on local box and till this week end everything 
was working great. 


But when today I tried JPS I found none of the service works as well as when I 
tried to do ssh localhost it started asking for password.

when I tried ssh-keygen -t rsa the message appeared
/Users/hadoop-user/.ssh/id_rsa already exists

What went wrong? Do  I need to recreate the key?

-Jignesh

Re: ssh setup stop working

2011-10-10 Thread Jignesh Patel

nope they works. I have a mac system
On Oct 10, 2011, at 4:40 PM, Ilker Ozkaymak wrote:

 Has your user account's password been expired??
 
 Best regards,
 IO
 
 On Mon, Oct 10, 2011 at 3:35 PM, Jignesh Patel jign...@websoft.com wrote:
 
 I have created private key setup on local box and till this week end
 everything was working great.
 
 
 But when today I tried JPS I found none of the service works as well as
 when I tried to do ssh localhost it started asking for password.
 
 when I tried ssh-keygen -t rsa the message appeared
 /Users/hadoop-user/.ssh/id_rsa already exists
 
 What went wrong? Do  I need to recreate the key?
 
 -Jignesh

Re: Secondary namenode fsimage concept

2011-10-10 Thread Shouguo Li

hey parick

i wanted to configure my cluster to write namenode metadata to multiple
directories as well:
  property
namedfs.name.dir/name
value/hadoop/var/name,/mnt/hadoop/var/name/value
  /property

in my case, /hadoop/var/name is local directory, /mnt/hadoop/var/name is NFS
volume. i took down the cluster first, then copied over files from
/hadoop/var/name to /mnt/hadoop/var/name, and then tried to start up the
cluster. but the cluster won't start up properly...
here's the namenode log: http://pastebin.com/gmu0B7yd

any ideas why it wouldn't start up?
thx


On Thu, Oct 6, 2011 at 6:58 PM, patrick sang silvianhad...@gmail.comwrote:

 I would say your namenode write metadata in local fs (where your secondary
 namenode will pull files), and NFS mount.

  property
namedfs.name.dir/name
value/hadoop/name,/hadoop/nfs_server_name/value
  /property


 my 0.02$
 P

 On Thu, Oct 6, 2011 at 12:04 AM, shanmuganathan.r 
 shanmuganatha...@zohocorp.com wrote:

  Hi Kai,
 
   There is no datas stored  in the secondarynamenode related to the
  Hadoop cluster . Am I correct?
  If it correct means If we run the secondaryname node in separate machine
  then fetching , merging and transferring time is increased if the cluster
  has large data in the namenode fsimage file . At the time if fail over
  occurs , then how can we recover the nearly one hour changes in the HDFS
  file ? (default check point time is one hour)
 
  Thanks R.Shanmuganathan
 
 
 
 
 
 
   On Thu, 06 Oct 2011 12:20:28 +0530 Kai Voigtlt;k...@123.orggt; wrote
  
 
 
  Hi,
 
  the secondary namenode only fetches the two files when a checkpointing is
  needed.
 
  Kai
 
  Am 06.10.2011 um 08:45 schrieb shanmuganathan.r:
 
  gt; Hi Kai,
  gt;
  gt; In the Second part I meant
  gt;
  gt;
  gt; Is the secondary namenode also contain the FSImage file or the two
  files(FSImage and EdiltLog) are transferred from the namenode at the
  checkpoint time.
  gt;
  gt;
  gt; Thanks
  gt; Shanmuganathan
  gt;
  gt;
  gt;
  gt;
  gt;
  gt;  On Thu, 06 Oct 2011 11:37:50 +0530 Kai Voigtamp;lt;k...@123.org
 amp;gt;
  wrote 
  gt;
  gt;
  gt; Hi,
  gt;
  gt; you're correct when saying the namenode hosts the fsimage file and
 the
  edits log file.
  gt;
  gt; The fsimage file contains a snapshot of the HDFS metadata (a
 filename
  to blocks list mapping). Whenever there is a change to HDFS, it will be
  appended to the edits file. Think of it as a database transaction log,
 where
  changes will not be applied to the datafile, but appended to a log.
  gt;
  gt; To prevent the edits file growing infinitely, the secondary namenode
  periodically pulls these two files, and the namenode starts writing
 changes
  to a new edits file. Then, the secondary namenode merges the changes from
  the edits file with the old snapshot from the fsimage file and creates an
  updated fsimage file. This updated fsimage file is then copied to the
  namenode.
  gt;
  gt; Then, the entire cycle starts again. To answer your question: The
  namenode has both files, even if the secondary namenode is running on a
  different machine.
  gt;
  gt; Kai
  gt;
  gt; Am 06.10.2011 um 07:57 schrieb shanmuganathan.r:
  gt;
  gt; amp;gt;
  gt; amp;gt; Hi All,
  gt; amp;gt;
  gt; amp;gt; I have a doubt in hadoop secondary namenode concept .
 Please
  correct if the following statements are wrong .
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; The namenode hosts the fsimage and edit log files. The
  secondary namenode hosts the fsimage file only. At the time of checkpoint
  the edit log file is transferred to the secondary namenode and the both
  files are merged, Then the updated fsimage file is transferred to the
  namenode . Is it correct?
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; If we run the secondary namenode in separate machine , then
  both machines contain the fsimage file . Namenode only contains the
 editlog
  file. Is it true?
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt; Thanks R.Shanmuganathan
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt; amp;gt;
  gt;
  gt; --
  gt; Kai Voigt
  gt; k...@123.org
  gt;
  gt;
  gt;
  gt;
  gt;
  gt;
  gt;
 
  --
  Kai Voigt
  k...@123.org

Re: ssh setup stop working

2011-10-10 Thread Jignesh Patel

Infect I have created passphraseless key again and still it asks me for 
password.
On Oct 10, 2011, at 4:51 PM, Jignesh Patel wrote:

 nope they works. I have a mac system
 On Oct 10, 2011, at 4:40 PM, Ilker Ozkaymak wrote:
 
 Has your user account's password been expired??
 
 Best regards,
 IO
 
 On Mon, Oct 10, 2011 at 3:35 PM, Jignesh Patel jign...@websoft.com wrote:
 
 I have created private key setup on local box and till this week end
 everything was working great.
 
 
 But when today I tried JPS I found none of the service works as well as
 when I tried to do ssh localhost it started asking for password.
 
 when I tried ssh-keygen -t rsa the message appeared
 /Users/hadoop-user/.ssh/id_rsa already exists
 
 What went wrong? Do  I need to recreate the key?
 
 -Jignesh

Re: ssh setup stop working

2011-10-10 Thread Ilker Ozkaymak

Key requires a specific permissions for .ssh directory 700 and
authorized_keys file 600 anything more it won't work.
However you said it worked before, I usually experience problem when
password ages the key also doesn't work until the password is reset.

Anyhow it might be little different.

Best regards,

On Mon, Oct 10, 2011 at 4:10 PM, Jignesh Patel jign...@websoft.com wrote:

 Infect I have created passphraseless key again and still it asks me for
 password.
 On Oct 10, 2011, at 4:51 PM, Jignesh Patel wrote:

  nope they works. I have a mac system
  On Oct 10, 2011, at 4:40 PM, Ilker Ozkaymak wrote:
 
  Has your user account's password been expired??
 
  Best regards,
  IO
 
  On Mon, Oct 10, 2011 at 3:35 PM, Jignesh Patel jign...@websoft.com
 wrote:
 
  I have created private key setup on local box and till this week end
  everything was working great.
 
 
  But when today I tried JPS I found none of the service works as well as
  when I tried to do ssh localhost it started asking for password.
 
  when I tried ssh-keygen -t rsa the message appeared
  /Users/hadoop-user/.ssh/id_rsa already exists
 
  What went wrong? Do  I need to recreate the key?
 
  -Jignesh

Subscribe to list

2011-10-10 Thread Joan Figuerola hurtado

Hi,
 I want to know your improvement subscribing to this list.

Many thanks :)

problem in running program

2011-10-10 Thread Jignesh Patel


I m trying to run attached program. My input directory structure  is 
/user/hadoop-user/input/cite65_77.txt file.

But it doesn't do anything. It doesn't read the file and not creates output 
directory.

Re: ssh setup stop working

2011-10-10 Thread Jignesh Patel

You are right I have a problem with the access rights. Now it works.
On Oct 10, 2011, at 5:36 PM, Ilker Ozkaymak wrote:

 Key requires a specific permissions for .ssh directory 700 and
 authorized_keys file 600 anything more it won't work.
 However you said it worked before, I usually experience problem when
 password ages the key also doesn't work until the password is reset.
 
 Anyhow it might be little different.
 
 Best regards,
 
 On Mon, Oct 10, 2011 at 4:10 PM, Jignesh Patel jign...@websoft.com wrote:
 
 Infect I have created passphraseless key again and still it asks me for
 password.
 On Oct 10, 2011, at 4:51 PM, Jignesh Patel wrote:
 
 nope they works. I have a mac system
 On Oct 10, 2011, at 4:40 PM, Ilker Ozkaymak wrote:
 
 Has your user account's password been expired??
 
 Best regards,
 IO
 
 On Mon, Oct 10, 2011 at 3:35 PM, Jignesh Patel jign...@websoft.com
 wrote:
 
 I have created private key setup on local box and till this week end
 everything was working great.
 
 
 But when today I tried JPS I found none of the service works as well as
 when I tried to do ssh localhost it started asking for password.
 
 when I tried ssh-keygen -t rsa the message appeared
 /Users/hadoop-user/.ssh/id_rsa already exists
 
 What went wrong? Do  I need to recreate the key?
 
 -Jignesh

Re: ways to expand hadoop.tmp.dir capacity?

2011-10-10 Thread Meng Mao

So the only way we can expand to multiple mapred.local.dir paths is to
config our site.xml and to restart the DataNode?

On Mon, Oct 10, 2011 at 9:36 AM, Marcos Luis Ortiz Valmaseda
marcosluis2...@googlemail.com wrote:

2011/10/9 Harsh J ha...@cloudera.com

Hello Meng,

On Wed, Oct 5, 2011 at 11:02 AM, Meng Mao meng...@gmail.com wrote:
Currently, we've got defined:
property
namehadoop.tmp.dir/name
value/hadoop/hadoop-metadata/cache//value
/property

In our experiments with SOLR, the intermediate files are so large that
they
tend to blow out disk space and fail (and annoyingly leave behind their
huge
failed attempts). We've had issues with it in the past, but we're
having
real problems with SOLR if we can't comfortably get more space out of
hadoop.tmp.dir somehow.

1) It seems we never set *mapred.system.dir* to anything special, so
it's
defaulting to ${hadoop.tmp.dir}/mapred/system.
Is this a problem? The docs seem to recommend against it when
hadoop.tmp.dir
had ${user.name} in it, which ours doesn't.

The {mapred.system.dir} is a HDFS location, and you shouldn't really
be worried about it as much.

{hadoop.tmp.dir} is indeed reused for {mapred.system.dir}, although it
is on HDFS, and hence is confusing, but there should just be one
mapred.system.dir, yes.

Also, the config {hadoop.tmp.dir} doesn't support 1 path. What you
need here is a proper {mapred.local.dir} configuration.

2) IIRC, there's a -D switch for supplying config name/value pairs into
indivdiual jobs. Does such a switch exist? Googling for single letters
is
fruitless. If we had a path on our workers with more space (in our
case,
another hard disk), could we simply pass that path in as hadoop.tmp.dir
for
our SOLR jobs? Without incurring any consistency issues on future jobs
that
might use the SOLR output on HDFS?

Only a few parameters of a job are user-configurable. Stuff like
hadoop.tmp.dir and mapred.local.dir are not override-able by user set
parameters as they are server side configurations (static).

Given that the default value is ${hadoop.tmp.dir}/mapred/local, would
the
expanded capacity we're looking for be as easily accomplished as by
defining
mapred.local.dir to span multiple disks? Setting aside the issue of
temp
files so big that they could still fill a whole disk.

--
Harsh J

Here it's a excellent explanation how to install Apache Hadoop manually,
and
Lars explains this very good.

http://blog.lars-francke.de/2011/01/26/setting-up-a-hadoop-cluster-part-1-manual-installation/

Regards

--
Marcos Luis Ortíz Valmaseda
Linux Infrastructure Engineer
Linux User # 418229
http://marcosluis2186.posterous.com
http://www.linkedin.com/in/marcosluis2186
Twitter: @marcosluis2186

Re: hdfs directory location

2011-10-10 Thread Harsh J

Jignesh,

Can be done. Use the fuse-dfs feature of HDFS to have your DFS as a
'physical' mount point on Linux. Instructions may be found here:
http://wiki.apache.org/hadoop/MountableHDFS and on other resources
across the web (search around for fuse hdfs).

On Tue, Oct 11, 2011 at 1:32 AM, Jignesh Patel jign...@websoft.com wrote:
 Bejoy,

 copyToLocal makes sense, it worked. But I am still wondering if HDFS has a 
 directory created on local box, somewhere it exist physically but couldn't 
 able to locate it.

 Is HDFS directory structure is a virtual structure, doesn't exist physically?

 -Jignesh
 On Oct 10, 2011, at 3:53 PM, bejoy.had...@gmail.com wrote:

 Jignesh
        Sorry I didn't get your query, 'how I can link it with HDFS
 directory structure?
 '

 You mean putting your unix dir contents into hdfs? If so use hadoop fs 
 -copyFromLocal src destn
 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 To: bejoy.had...@gmail.com
 Subject: Re: hdfs directory location
 Sent: Oct 11, 2011 01:18

 Bejoy,

 If I create a directory in unix box then how I can link it with HDFS 
 directory structure?

 -Jignesh
 On Oct 10, 2011, at 2:59 PM, bejoy.had...@gmail.com wrote:

 Jignesh
      You are creating a dir in hdfs by that command. The dir won't be in 
 your local file system but it hdfs. Issue a command like
 hadoop fs -ls /user/hadoop-user/citation/
 You can see the dir you created in hdfs

 If you want to create a die on local unix use a simple linux command
 mkdir /user/hadoop-user/citation/input


 --Original Message--
 From: Jignesh Patel
 To: common-user@hadoop.apache.org
 ReplyTo: common-user@hadoop.apache.org
 Subject: hdfs directory location
 Sent: Oct 10, 2011 23:45

 I am using following command to create a file in Unix(i.e. mac) system.

 bin/hadoop fs -mkdir /user/hadoop-user/citation/input

 While it creates the directory I need, I am struggling to figure out exact 
 location of the folder in my local box.





 Regards
 Bejoy K S



 Regards
 Bejoy K S





-- 
Harsh J

Re: problem in running program

2011-10-10 Thread Harsh J

Jignesh,

Please do not attach files to the mailing list. They are stripped away
and the community will never receive them. Instead, if its small
enough, paste it along in the mail, or paste it at services like
pastebin.com and pass along the public links.

On Tue, Oct 11, 2011 at 3:35 AM, Jignesh Patel jign...@websoft.com wrote:

 I m trying to run attached program. My input directory structure  is 
 /user/hadoop-user/input/cite65_77.txt file.

 But it doesn't do anything. It doesn't read the file and not creates output 
 directory.









-- 
Harsh J

Re: ways to expand hadoop.tmp.dir capacity?

2011-10-10 Thread Harsh J

Meng,

Yes, configure the mapred-site.xml (mapred.local.dir) to add the
property and roll-restart your TaskTrackers. If you'd like to expand
your DataNode to multiple disks as well (helps HDFS I/O greatly), do
the same with hdfs-site.xml (dfs.data.dir) and perform the same
rolling restart of DataNodes.

Ensure that for each service, the directories you create are owned by
the same user as the one running the process. This will help avoid
permission nightmares.

On Tue, Oct 11, 2011 at 3:58 AM, Meng Mao meng...@gmail.com wrote:
So the only way we can expand to multiple mapred.local.dir paths is to
config our site.xml and to restart the DataNode?