date:20091008

detecting stalled daemons?

2009-10-08 Thread james warren

Quick question for the hadoop / linux masters out there:

I recently observed a stalled tasktracker daemon on our production cluster,
and was wondering if there were common tests to detect failures so that
administration tools (e.g. monit) can automatically restart the daemon.  The
particular observed symptoms were:

   - the node was dropped by the jobtracker
   - information in /proc listed the tasktracker process as sleeping, not
   zombie
   - the web interface (port 50060) was unresponsive, though telnet did
   connect
   - no error information in the hadoop logs -- they simply were no longer
   being updated

I certainly cannot be the first person to encounter this - anyone have a
neat and tidy solution they could share?

(And yes, we will eventually we go down the nagios / ganglia / cloudera
desktop path but we're waiting until we're running CDH2.)

Many thanks,
-James Warren

hadoop startup problem

2009-10-08 Thread asmaa.atef


hello everyone,
i have a problem in hadoop startup ,every time i try to start hadoop name
node doesnot start and when i tried to stop name node ,it gives an error :no
name node to start.
i tried to format the name node and it works well ,but now i have data in
hadoop and formatting name node will erase all data.
what can i do?
thanks in advance,
asmaa
-- 
View this message in context: 
http://www.nabble.com/hadoop-startup-problem-tp25800609p25800609.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Error: INFO ipc.Client: Retrying connect to server: /192.168.100.11:8020. Already tried 0 time(s).

2009-10-08 Thread santosh gandham

Hi,
  I am new to Hadoop. I just configured it based on the documentation. While
I was running example program wordcount.java, I am getting errors. When I
gave command
 $  /bin/hadoop dfs -mkdir santhosh  , I am getting error as

 09/10/08 13:30:12 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 0 time(s).
09/10/08 13:30:13 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 1 time(s).
09/10/08 13:30:14 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 2 time(s).
09/10/08 13:30:15 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 3 time(s).
09/10/08 13:30:16 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 4 time(s).
09/10/08 13:30:17 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 5 time(s).
09/10/08 13:30:18 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 6 time(s).
09/10/08 13:30:19 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 7 time(s).
09/10/08 13:30:20 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 8 time(s).
09/10/08 13:30:21 INFO ipc.Client: Retrying connect to server: /
192.168.100.11:8020. Already tried 9 time(s).
Bad connection to FS. command aborted.

I can able to ssh to the server without any password. And I didnt get any
error while I was formating HDFS using the command  $ bin/hadoop namenode
-format .
Please help me . what should I do now?. Thank you.



-- 
Gandham Santhosh

Recommended file-system for DataNode

2009-10-08 Thread Stas Oskin

Hi.

I'm using the stock Ext3 as the most tested one, but I wonder, has someone
ever tried, or even using there days in production another file system, like
JFS, XFS or even maybe Ext4?

I'm exploring way to boost the performance of DataNodes, and this seems as
one of possible venues.

Thanks for any info!

Re: hadoop startup problem

2009-10-08 Thread David Howell

It sounds like the name node is crashing on startup. What kind of
errors are there in the name node log?

On Thu, Oct 8, 2009 at 4:01 AM, asmaa.atef sw_as...@hotmail.com wrote:

 hello everyone,
 i have a problem in hadoop startup ,every time i try to start hadoop name
 node doesnot start and when i tried to stop name node ,it gives an error :no
 name node to start.
 i tried to format the name node and it works well ,but now i have data in
 hadoop and formatting name node will erase all data.
 what can i do?
 thanks in advance,
 asmaa
 --
 View this message in context: 
 http://www.nabble.com/hadoop-startup-problem-tp25800609p25800609.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Recommended file-system for DataNode

2009-10-08 Thread Jason Venner

I have used xfs pretty extensively, it seemed to be somewhat faster than
ext3.

The only trouble we had related to some machines running the PAE 32 bit
kernels, where we the filesystems lockup. That is an obscure use case
however.
Running JBOD with your dfs.data.dir listing a directory on each device
speeds things up, as does keeping other users off of the disks/machine.


On Thu, Oct 8, 2009 at 4:37 AM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 I'm using the stock Ext3 as the most tested one, but I wonder, has someone
 ever tried, or even using there days in production another file system,
 like
 JFS, XFS or even maybe Ext4?

 I'm exploring way to boost the performance of DataNodes, and this seems as
 one of possible venues.

 Thanks for any info!




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Recommended file-system for DataNode

2009-10-08 Thread Stas Oskin

Hi.

Thanks for the info, question is whether XFS performance justifies switching
from the more common Ext3?

JBOD is a great approach indeed.

Regards.

2009/10/8 Jason Venner jason.had...@gmail.com

 I have used xfs pretty extensively, it seemed to be somewhat faster than
 ext3.

 The only trouble we had related to some machines running the PAE 32 bit
 kernels, where we the filesystems lockup. That is an obscure use case
 however.
 Running JBOD with your dfs.data.dir listing a directory on each device
 speeds things up, as does keeping other users off of the disks/machine.


 On Thu, Oct 8, 2009 at 4:37 AM, Stas Oskin stas.os...@gmail.com wrote:

  Hi.
 
  I'm using the stock Ext3 as the most tested one, but I wonder, has
 someone
  ever tried, or even using there days in production another file system,
  like
  JFS, XFS or even maybe Ext4?
 
  I'm exploring way to boost the performance of DataNodes, and this seems
 as
  one of possible venues.
 
  Thanks for any info!
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals

Re: Recommended file-system for DataNode

2009-10-08 Thread Tom Wheeler

As an aside, there's a short article comparing the two in the latest
edition of Linux Journal.  It was hardly scientific, but the main
points were:

  - XFS is faster than ext3, especially for large files
  - XFS is currently unsupported on Red Hat Enterprise, but apparently
will be soon.

On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com wrote:
 Thanks for the info, question is whether XFS performance justifies switching
 from the more common Ext3?

-- 
Tom Wheeler
http://www.tomwheeler.com/

Re: Recommended file-system for DataNode

2009-10-08 Thread Jason Venner

Busy datanodes become bound by the metadata lookup times for the directory
and inode entries required to open a block.
Anything that optimizes that will help substantially.

We are thinking of playing with brtfs, and using a small SSD for our file
system metadata, and the spinning disks for the block storage.


On Thu, Oct 8, 2009 at 10:26 AM, Tom Wheeler tomwh...@gmail.com wrote:

 As an aside, there's a short article comparing the two in the latest
 edition of Linux Journal.  It was hardly scientific, but the main
 points were:

  - XFS is faster than ext3, especially for large files
  - XFS is currently unsupported on Red Hat Enterprise, but apparently
 will be soon.

 On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com wrote:
  Thanks for the info, question is whether XFS performance justifies
 switching
  from the more common Ext3?

 --
 Tom Wheeler
 http://www.tomwheeler.com/




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: Recommended file-system for DataNode

2009-10-08 Thread paul

Check out the bottom of this page:

http://wiki.apache.org/hadoop/DiskSetup


noatime is all we've done in our environment.  I haven't found it worth the
time to optimize further since we're CPU bound in most of our jobs.


-paul

On Thu, Oct 8, 2009 at 3:26 PM, Stas Oskin stas.os...@gmail.com wrote:

 Hi.

 Thanks for the info.

 What about JFS, any idea how well it compares to XFS?

 From what I read, JFS is considered more stable then XFS, but less
 performing, so I wonder if this true.

 Also, Ext4 is around the corner and was recently accepted into kernel, so I
 wonder if anyone knows about this one.

 Regards.

 2009/10/8 Tom Wheeler tomwh...@gmail.com

  As an aside, there's a short article comparing the two in the latest
  edition of Linux Journal.  It was hardly scientific, but the main
  points were:
 
   - XFS is faster than ext3, especially for large files
   - XFS is currently unsupported on Red Hat Enterprise, but apparently
  will be soon.
 
  On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com
 wrote:
   Thanks for the info, question is whether XFS performance justifies
  switching
   from the more common Ext3?
 
  --
  Tom Wheeler
  http://www.tomwheeler.com/

Re: Recommended file-system for DataNode

2009-10-08 Thread Tom Wheeler

I've used XFS on Silicon Graphics machines and JFS on AIX systems --
both were quite fast and extremely reliable, though this long predates
my use of Hadoop.

To your question, I recently came across a blog that compares
performance of several Linux filesystems:

   http://log.amitshah.net/2009/04/re-comparing-file-systems.html

I'd consider his results anecdotal unless the tests reflect the actual
workload of a datanode, but since he's made the code available, you
could probably adapt it yourself to get a better measure.

On Thu, Oct 8, 2009 at 2:26 PM, Stas Oskin stas.os...@gmail.com wrote:
 Hi.

 Thanks for the info.

 What about JFS, any idea how well it compares to XFS?

 From what I read, JFS is considered more stable then XFS, but less
 performing, so I wonder if this true.

 Also, Ext4 is around the corner and was recently accepted into kernel, so I
 wonder if anyone knows about this one.

-- 
Tom Wheeler
http://www.tomwheeler.com/

Re: Recommended file-system for DataNode

2009-10-08 Thread Edward Capriolo

On Thu, Oct 8, 2009 at 4:00 PM, Jason Venner jason.had...@gmail.com wrote:
 noatime is absolutely essential, I forget to mention it, because it is
 automatic now for me.

 I have a fun story about atime, I have some Solaris machines with ZFS file
 systems, and I was doing a find on a 6 level hashed directory tree with
 25 leaf nodes.

 The find on a cold idle file system was running slowly, and the machine was
 writing at 5-10MB/sec, solaris lets you toggle atime at runtime,
 when I turned it off, the writes went to 0, and the find drastically speeded
 up.

 This is very representative of a datanode with many blocks.



 On Thu, Oct 8, 2009 at 12:43 PM, Tom Wheeler tomwh...@gmail.com wrote:

 I've used XFS on Silicon Graphics machines and JFS on AIX systems --
 both were quite fast and extremely reliable, though this long predates
 my use of Hadoop.

 To your question, I recently came across a blog that compares
 performance of several Linux filesystems:

   http://log.amitshah.net/2009/04/re-comparing-file-systems.html

 I'd consider his results anecdotal unless the tests reflect the actual
 workload of a datanode, but since he's made the code available, you
 could probably adapt it yourself to get a better measure.

 On Thu, Oct 8, 2009 at 2:26 PM, Stas Oskin stas.os...@gmail.com wrote:
  Hi.
 
  Thanks for the info.
 
  What about JFS, any idea how well it compares to XFS?
 
  From what I read, JFS is considered more stable then XFS, but less
  performing, so I wonder if this true.
 
  Also, Ext4 is around the corner and was recently accepted into kernel, so
 I
  wonder if anyone knows about this one.

 --
 Tom Wheeler
 http://www.tomwheeler.com/




 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals


The good news is its not like you are stuck into the file system you
pick. Assuming you use the normal replication level 3, you can pull
out a datanode, format it's disk with any FS you want and then stick
it back into the cluster. Hadoop should not care after all.  Not
suggesting this...but you could theoretically run each node with a
different file system, look at the performance and say THIS is the
one for me

University of Maryland: cloud computing assistant professor position

2009-10-08 Thread Jimmy Lin

FYI---the University of Maryland is seeking an assistant professor in 
cloud computing.  See job description below.


=

College of Information Studies, Maryland's iSchool
University of Maryland, College Park
Assistant Professor in Cloud Computing

The recently-formed Cloud Computing Center in the College of
Information Studies (Maryland's iSchool) at the University of
Maryland, College Park invites highly qualified individuals to apply
for a tenure-track faculty position in the area of cloud computing
applications. Expertise is sought in information processing systems,
architectures, and applications. Areas of interest include, but are
not limited to, information retrieval, natural language processing,
machine learning, databases, data mining, data analytics, network
analysis, and multimedia processing. An emphasis on scalability and
parallel/distributed approaches to processing is especially desirable,
as well as expertise in working with Web and other large datasets.

The Cloud Computing Center (http://ccc.umiacs.umd.edu/) explores the
many aspects of cloud computing as it impacts technology, people, and
society. It is an interdisciplinary research center in the iSchool
that draws faculty from computer science, engineering, business, and
linguistics.

Qualifications. Ph.D. in an appropriate field; demonstrated record of
research accomplishments or demonstrated potential for research
excellence; demonstrated record of attracting and managing outside
funding or demonstrated potential for attracting and managing outside
funding; and evidence of effective and innovative teaching or
demonstrated potential for excellence in teaching.

Application Submission. Applicants will be reviewed on an ongoing
basis. For best consideration, submit applications prior to November
15, 2009. Send application materials, including a CV, a research
statement, a teaching statement, and a cover letter that clearly
describes your primary area(s) of expertise and the specific
contributions that you would make to the College and the Cloud
Computing Center, by email to ischoolcloudsea...@umd.edu. All
applications will be acknowledged electronically within 3 business
days; however, it is the responsibility of the applicant to make sure
that the materials were received.  Inquiries can be directed to Jimmy
Lin, search committee chair, at jimmy...@umd.edu.

The University of Maryland is an equal opportunity, affirmative action
employer.  Minorities and women are encouraged to apply.

retrieving sequenceFile Postion of Key in mapper task

2009-10-08 Thread ishwar ramani

Hi,

I need to get the position of the key being processed in a mapper task.
My inputFile is a sequence file 

I tried the Context, but the best i could get was the inputsplit
position and the
file name 


My other option is to start recording the pos in the key value while generating
the sequence file.
But that would mean rewriting all the files i already have :(

any thoughts?

ishwar

Re: Recommended file-system for DataNode

2009-10-08 Thread Stas Oskin

Hi Jason.

Brtfs is cool, I read that it has a 10% better performance then any other FS
coming next to it.

Can you post here the results of any your findings?

Regards.

2009/10/8 Jason Venner jason.had...@gmail.com

 Busy datanodes become bound by the metadata lookup times for the directory
 and inode entries required to open a block.
 Anything that optimizes that will help substantially.

 We are thinking of playing with brtfs, and using a small SSD for our file
 system metadata, and the spinning disks for the block storage.


 On Thu, Oct 8, 2009 at 10:26 AM, Tom Wheeler tomwh...@gmail.com wrote:

  As an aside, there's a short article comparing the two in the latest
  edition of Linux Journal.  It was hardly scientific, but the main
  points were:
 
   - XFS is faster than ext3, especially for large files
   - XFS is currently unsupported on Red Hat Enterprise, but apparently
  will be soon.
 
  On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com
 wrote:
   Thanks for the info, question is whether XFS performance justifies
  switching
   from the more common Ext3?
 
  --
  Tom Wheeler
  http://www.tomwheeler.com/
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals

Re: Recommended file-system for DataNode

2009-10-08 Thread Edward Capriolo

On Thu, Oct 8, 2009 at 9:15 PM, Stas Oskin stas.os...@gmail.com wrote:
 Hi.

 I head about this option before, but never actually tried it.

 There is also another option, called relatime, which described as being
 more compatible then noatime.
 Can anyone comment on this?

 Regards.

 2009/10/8 Edward Capriolo edlinuxg...@gmail.com

 On Thu, Oct 8, 2009 at 4:00 PM, Jason Venner jason.had...@gmail.com
 wrote:
  noatime is absolutely essential, I forget to mention it, because it is
  automatic now for me.
 
  I have a fun story about atime, I have some Solaris machines with ZFS
 file
  systems, and I was doing a find on a 6 level hashed directory tree with
  25 leaf nodes.
 
  The find on a cold idle file system was running slowly, and the machine
 was
  writing at 5-10MB/sec, solaris lets you toggle atime at runtime,
  when I turned it off, the writes went to 0, and the find drastically
 speeded
  up.
 
  This is very representative of a datanode with many blocks.
 
 
 
  On Thu, Oct 8, 2009 at 12:43 PM, Tom Wheeler tomwh...@gmail.com wrote:
 
  I've used XFS on Silicon Graphics machines and JFS on AIX systems --
  both were quite fast and extremely reliable, though this long predates
  my use of Hadoop.
 
  To your question, I recently came across a blog that compares
  performance of several Linux filesystems:
 
    http://log.amitshah.net/2009/04/re-comparing-file-systems.html
 
  I'd consider his results anecdotal unless the tests reflect the actual
  workload of a datanode, but since he's made the code available, you
  could probably adapt it yourself to get a better measure.
 
  On Thu, Oct 8, 2009 at 2:26 PM, Stas Oskin stas.os...@gmail.com
 wrote:
   Hi.
  
   Thanks for the info.
  
   What about JFS, any idea how well it compares to XFS?
  
   From what I read, JFS is considered more stable then XFS, but less
   performing, so I wonder if this true.
  
   Also, Ext4 is around the corner and was recently accepted into kernel,
 so
  I
   wonder if anyone knows about this one.
 
  --
  Tom Wheeler
  http://www.tomwheeler.com/
 
 
 
 
  --
  Pro Hadoop, a book to guide you from beginner to hadoop mastery,
  http://www.amazon.com/dp/1430219424?tag=jewlerymall
  www.prohadoopbook.com a community for Hadoop Professionals
 

 The good news is its not like you are stuck into the file system you
 pick. Assuming you use the normal replication level 3, you can pull
 out a datanode, format it's disk with any FS you want and then stick
 it back into the cluster. Hadoop should not care after all.  Not
 suggesting this...but you could theoretically run each node with a
 different file system, look at the performance and say THIS is the
 one for me



Relatime is like a noatime that gets updated only periodically, not
every read. Hadoop does not use atime so there is no benefit to
relative. Go 'noatime nodiratime' although I think nodiratime is a
subset of noatime.

FYI almost nothing really uses noatime, I heard mutt does and older
versions of vim might have. But I turned if and never had an issue.

Re: detecting stalled daemons?

2009-10-08 Thread Todd Lipcon

Hi James,
This doesn't quite answer your original question, but if you want to help
track down these kinds of bugs, you should grab a stack trace next time this
happens.

You can do this either using jstack from the command line, by visiting
/stacks on the HTTP interface, or by sending the process a SIGQUIT (kill
-QUIT pid). If you go the SIGQUIT route, the stack dump will show up in
that daemon's stdout log (logs/hadoop-out).

Oftentimes the stack trace will be enough for the developers to track down a
deadlock, or it may point to some sort of configuration issue on your
machine.

-Todd


On Wed, Oct 7, 2009 at 11:19 PM, james warren ja...@rockyou.com wrote:

 Quick question for the hadoop / linux masters out there:

 I recently observed a stalled tasktracker daemon on our production cluster,
 and was wondering if there were common tests to detect failures so that
 administration tools (e.g. monit) can automatically restart the daemon.
  The
 particular observed symptoms were:

   - the node was dropped by the jobtracker
   - information in /proc listed the tasktracker process as sleeping, not
   zombie
   - the web interface (port 50060) was unresponsive, though telnet did
   connect
   - no error information in the hadoop logs -- they simply were no longer
   being updated

 I certainly cannot be the first person to encounter this - anyone have a
 neat and tidy solution they could share?

 (And yes, we will eventually we go down the nagios / ganglia / cloudera
 desktop path but we're waiting until we're running CDH2.)

 Many thanks,
 -James Warren

Re: detecting stalled daemons?

2009-10-08 Thread Edward Capriolo

On Thu, Oct 8, 2009 at 9:20 PM, Todd Lipcon t...@cloudera.com wrote:
 Hi James,
 This doesn't quite answer your original question, but if you want to help
 track down these kinds of bugs, you should grab a stack trace next time this
 happens.

 You can do this either using jstack from the command line, by visiting
 /stacks on the HTTP interface, or by sending the process a SIGQUIT (kill
 -QUIT pid). If you go the SIGQUIT route, the stack dump will show up in
 that daemon's stdout log (logs/hadoop-out).

 Oftentimes the stack trace will be enough for the developers to track down a
 deadlock, or it may point to some sort of configuration issue on your
 machine.

 -Todd


 On Wed, Oct 7, 2009 at 11:19 PM, james warren ja...@rockyou.com wrote:

 Quick question for the hadoop / linux masters out there:

 I recently observed a stalled tasktracker daemon on our production cluster,
 and was wondering if there were common tests to detect failures so that
 administration tools (e.g. monit) can automatically restart the daemon.
  The
 particular observed symptoms were:

   - the node was dropped by the jobtracker
   - information in /proc listed the tasktracker process as sleeping, not
   zombie
   - the web interface (port 50060) was unresponsive, though telnet did
   connect
   - no error information in the hadoop logs -- they simply were no longer
   being updated

 I certainly cannot be the first person to encounter this - anyone have a
 neat and tidy solution they could share?

 (And yes, we will eventually we go down the nagios / ganglia / cloudera
 desktop path but we're waiting until we're running CDH2.)

 Many thanks,
 -James Warren



James,

I am using nagios to run a web_check on each of the components web interfaces.

http://www.jointhegrid.com/svn/hadoop-cacti-jtg/trunk/check_scripts/0_19/

I know there is a Jira open to add life cycle methods to each hadoop
component that can be polled for progress. I dont know the # off hand.

Edward

Re: Error: INFO ipc.Client: Retrying connect to server: /192.168.100.11:8020. Already tried 0 time(s).

2009-10-08 Thread .ke. sivakumar

Hi Santosh,

Check whether all the datanodes are up and running, using
the command
'bin/hadoop dfsadmin -report'.




On Thu, Oct 8, 2009 at 4:24 AM, santosh gandham santhosh...@gmail.comwrote:

 Hi,
  I am new to Hadoop. I just configured it based on the documentation. While
 I was running example program wordcount.java, I am getting errors. When I
 gave command
  $  /bin/hadoop dfs -mkdir santhosh  , I am getting error as

  09/10/08 13:30:12 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 0 time(s).
 09/10/08 13:30:13 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 1 time(s).
 09/10/08 13:30:14 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 2 time(s).
 09/10/08 13:30:15 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 3 time(s).
 09/10/08 13:30:16 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 4 time(s).
 09/10/08 13:30:17 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 5 time(s).
 09/10/08 13:30:18 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 6 time(s).
 09/10/08 13:30:19 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 7 time(s).
 09/10/08 13:30:20 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 8 time(s).
 09/10/08 13:30:21 INFO ipc.Client: Retrying connect to server: /
 192.168.100.11:8020. Already tried 9 time(s).
 Bad connection to FS. command aborted.

 I can able to ssh to the server without any password. And I didnt get any
 error while I was formating HDFS using the command  $ bin/hadoop namenode
 -format .
 Please help me . what should I do now?. Thank you.



 --
 Gandham Santhosh

Re: retrieving sequenceFile Postion of Key in mapper task

2009-10-08 Thread Ahad Rana

Hi Ishwar,
You can implement a custom MapRunner and retrieve the position from the
reader before calling your map function. Be aware though, that for block
compressed files, the position returned represents block start position, not
the individual record position.

Ahad.

On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani rvmish...@gmail.com wrote:

 Hi,

 I need to get the position of the key being processed in a mapper task.
 My inputFile is a sequence file 

 I tried the Context, but the best i could get was the inputsplit
 position and the
 file name 


 My other option is to start recording the pos in the key value while
 generating
 the sequence file.
 But that would mean rewriting all the files i already have :(

 any thoughts?

 ishwar

Re: retrieving sequenceFile Postion of Key in mapper task

2009-10-08 Thread Ahad Rana

Oops, memory fails me. To correct my previous statement, for block
compressed files, getPosition reflects the position in the input stream of
the NEXT compressed block of data, so you have to watch for the change in
position after reading the key/value to capture a block transition.
Ahad.

On Thu, Oct 8, 2009 at 10:22 PM, Ahad Rana a...@commoncrawl.org wrote:

 Hi Ishwar,
 You can implement a custom MapRunner and retrieve the position from the
 reader before calling your map function. Be aware though, that for block
 compressed files, the position returned represents block start position, not
 the individual record position.

 Ahad.


 On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani rvmish...@gmail.com wrote:

 Hi,

 I need to get the position of the key being processed in a mapper task.
 My inputFile is a sequence file 

 I tried the Context, but the best i could get was the inputsplit
 position and the
 file name 


 My other option is to start recording the pos in the key value while
 generating
 the sequence file.
 But that would mean rewriting all the files i already have :(

 any thoughts?

 ishwar

detecting stalled daemons?

hadoop startup problem

Error: INFO ipc.Client: Retrying connect to server: /192.168.100.11:8020. Already tried 0 time(s).

Recommended file-system for DataNode

Re: hadoop startup problem

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

University of Maryland: cloud computing assistant professor position

retrieving sequenceFile Postion of Key in mapper task

Re: Recommended file-system for DataNode

Re: Recommended file-system for DataNode

Re: detecting stalled daemons?

Re: detecting stalled daemons?

Re: Error: INFO ipc.Client: Retrying connect to server: /192.168.100.11:8020. Already tried 0 time(s).

Re: retrieving sequenceFile Postion of Key in mapper task

Re: retrieving sequenceFile Postion of Key in mapper task

21 matches

Site Navigation

Mail list logo

Footer information