Re: Can not start task tracker because java.lang.NullPointerException

2009-05-22 Thread Aaron Kimball
It hasn't been cut as a release, but there's a 19.2 development branch at
branches/branch-19
- Aaron

On Fri, May 22, 2009 at 2:31 PM, Lance Riedel  wrote:

> Sure, I'll try out 19.2.. but where is it?  I don't see it here:
> http://svn.apache.org/repos/asf/hadoop/core/
> (looking under tags)
>
>
>
> On Fri, May 22, 2009 at 2:11 PM, Todd Lipcon  wrote:
>
> > Hi Lance,
> >
> > It's possible this is related to the other JIRA (HADOOP-5761). If it's
> not
> > too much trouble to try out the 19.2 branch from SVN, it would be helpful
> > in
> > determining whether this is a problem that's already fixed or if you've
> > discovered something new.
> >
> > Thanks
> > -Todd
> >
> > On Fri, May 22, 2009 at 2:01 PM, Lance Riedel 
> wrote:
> >
> > > Hi Todd,
> > > We had looked at that before.. here is the location of the tmp
> directory:
> > >
> > > [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh
> > > /dist/app/hadoop-0.19.1/tmp
> > > 248G/dist/app/hadoop-0.19.1/tmp
> > >
> > > There are no cron jobs that would have anything to do with that
> > directory.
> > >
> > > Here is the /tmp
> > > [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /tmp
> > > 204K/tmp
> > >
> > > Does this look like a disk error? I had seen that the
> > > "org.apache.hadoop.util.DiskChecker$DiskErrorException" is bogus.
> > >
> > > Thanks!
> > > Lance
> > >
> > >
> > >
> > >
> > >
> > > On Fri, May 22, 2009 at 9:33 AM, Lance Riedel 
> > wrote:
> > >
> > > > Version 19.1 with patches:
> > > > 4780-2v19.patch (Jira  4780)
> > > > closeAll3.patch (Jira 3998)
> > > > I have confirmed that
> > > https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that
> is
> > > not the fix.
> > > >
> > > >
> > > > We are having task trackers die every night with a null pointer
> > > exception.
> > > > Usually 2 or so out of 8 (25% each night).
> > > >
> > > >
> > > > Here are the logs:
> > > >
> > > > Version 19.1 with
> > > > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > Received 'KillJobAction' for job: job_200905211749_0451
> > > > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
> > > > attempt_200905211749_0451_m_00_0 done; removing files.
> > > > 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > > in any of the configured local directories
> > > > 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker:
> > > Received
> > > > 'KillJobAction' for job: job_200905211749_0444
> > > > 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
> > > > attempt_200905211749_0444_m_00_0 done; removing files.
> > > > 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > > in any of the configured local directories
> > > > 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > > in any of the configured local directories
> > > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > LaunchTaskAction (registerTask): attempt_200905211749_0452_m_06_0
> > > task's
> > > > state:UNASSIGNED
> > > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> > Trying
> > > > to launch : attempt_200905211749_0452_m_06_0
> > > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
> > > > TaskLauncher, current free slots : 4 and trying to launch
> > > > attempt_200905211749_0452_m_06_0
> > > > 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM
> > > > Runner jvm_200905211749_0452_m_1998728288 spawned.
> > > > 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker:
> JVM
> > > with
> > > > ID: jvm_200905211749_0452_m_1998728288 given task:
> > > > attempt_200905211749_0452_m_06_0
> > > > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > > in any of the configured local directories
> > > > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > >
> > >
> >
> taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_06_0/output/file.out
> > > > in any of the configured local directories
> > > > 2009-05-22 02:49:19,784 INFO org.apache.h

Re: Tutorial on building an AMI

2009-05-22 Thread Tom White
Hi Saptarshi,

You can use the guide at http://wiki.apache.org/hadoop/AmazonEC2 to
run Hadoop 0.19 or later on EC2. It includes instructions for building
your own customized AMI.

Cheers,
Tom

On Fri, May 22, 2009 at 7:11 PM, Saptarshi Guha
 wrote:
> Hello,
> Is there a tutorial available to build an Hadoop AMI (like
> Cloudera's)? Cloudera has an 18.2 ami and for reasons I understand
> they can't provide(as of now) AMIs for higher Hadoop versions until
> they become stable.
> I would like to create an AMI for 19.2 - so was hoping if there is a
> guide for building one.
>
>
> Thank you
>
> Saptarshi Guha
>


Re: input/output error while setting up superblock

2009-05-22 Thread Amr Awadallah

You should take a look at GlusterFS, it should be a good fit for this
kind of workload:

http://www.gluster.org/

-- amr

Aaron Kimball wrote:
> More specifically:
>
> HDFS does not support operations such as opening a file for write/append
> after it has already been closed, or seeking to a new location in a writer.
> You can only write files linearly; all other operations will return a "not
> supported" error.
>
> You'll also find that random-access read performance, while implemented, is
> not particularly high-throughput. For serving Xen images even in read-only
> mode, you'll likely have much better luck with a different FS.
>
> - Aaron
>
>
> 2009/5/22 Taeho Kang 
>
>   
>> I don't think HDFS is a good place to store your Xen image file as it will
>> likely be updated/appended frequently in small blocks. With the way HDFS is
>> designed for, you can't quite use it like a regular filesystem (e.g. ones
>> that support frequent small block appends/updates in files). My suggestion
>> is to use another filesystem like NAS or SAN.
>>
>> /Taeho
>>
>> 2009/5/22 신승엽 
>>
>> 
>>> Hi, I have a problem to use hdfs.
>>>
>>> I mounted hdfs using fuse-dfs.
>>>
>>> I created a dummy file for 'Xen' in hdfs and then formated the dummy file
>>> using 'mke2fs'.
>>>
>>> But the operation was faced error. The error message is as follows.
>>>
>>> [r...@localhost hdfs]# mke2fs -j -F ./file_dumy
>>> mke2fs 1.40.2 (12-Jul-2007)
>>> ./file_dumy: Input/output error while setting up superblock
>>> Also, I copyed an image file of xen to hdfs. But Xen couldn't the image
>>> files in hdfs.
>>>
>>> r...@localhost hdfs]# fdisk -l fedora6_demo.img
>>> last_lba(): I don't know how to handle files with mode 81a4
>>> You must set cylinders.
>>> You can do this from the extra functions menu.
>>>
>>> Disk fedora6_demo.img: 0 MB, 0 bytes
>>> 255 heads, 63 sectors/track, 0 cylinders
>>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>>
>>>   Device Boot  Start End  Blocks   Id  System
>>> fedora6_demo.img1   *   1 156 1253038+  83  Linux
>>>
>>> Could you answer me anything about this problem.
>>>
>>> Thank you.
>>>
>>>   
>
>   


Re: Can not start task tracker because java.lang.NullPointerException

2009-05-22 Thread Lance Riedel
Sure, I'll try out 19.2.. but where is it?  I don't see it here:
http://svn.apache.org/repos/asf/hadoop/core/
(looking under tags)



On Fri, May 22, 2009 at 2:11 PM, Todd Lipcon  wrote:

> Hi Lance,
>
> It's possible this is related to the other JIRA (HADOOP-5761). If it's not
> too much trouble to try out the 19.2 branch from SVN, it would be helpful
> in
> determining whether this is a problem that's already fixed or if you've
> discovered something new.
>
> Thanks
> -Todd
>
> On Fri, May 22, 2009 at 2:01 PM, Lance Riedel  wrote:
>
> > Hi Todd,
> > We had looked at that before.. here is the location of the tmp directory:
> >
> > [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh
> > /dist/app/hadoop-0.19.1/tmp
> > 248G/dist/app/hadoop-0.19.1/tmp
> >
> > There are no cron jobs that would have anything to do with that
> directory.
> >
> > Here is the /tmp
> > [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /tmp
> > 204K/tmp
> >
> > Does this look like a disk error? I had seen that the
> > "org.apache.hadoop.util.DiskChecker$DiskErrorException" is bogus.
> >
> > Thanks!
> > Lance
> >
> >
> >
> >
> >
> > On Fri, May 22, 2009 at 9:33 AM, Lance Riedel 
> wrote:
> >
> > > Version 19.1 with patches:
> > > 4780-2v19.patch (Jira  4780)
> > > closeAll3.patch (Jira 3998)
> > > I have confirmed that
> > https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that is
> > not the fix.
> > >
> > >
> > > We are having task trackers die every night with a null pointer
> > exception.
> > > Usually 2 or so out of 8 (25% each night).
> > >
> > >
> > > Here are the logs:
> > >
> > > Version 19.1 with
> > > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker:
> > > Received 'KillJobAction' for job: job_200905211749_0451
> > > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
> > > attempt_200905211749_0451_m_00_0 done; removing files.
> > > 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > in any of the configured local directories
> > > 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker:
> > Received
> > > 'KillJobAction' for job: job_200905211749_0444
> > > 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
> > > attempt_200905211749_0444_m_00_0 done; removing files.
> > > 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > in any of the configured local directories
> > > 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > in any of the configured local directories
> > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> > > LaunchTaskAction (registerTask): attempt_200905211749_0452_m_06_0
> > task's
> > > state:UNASSIGNED
> > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> Trying
> > > to launch : attempt_200905211749_0452_m_06_0
> > > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
> > > TaskLauncher, current free slots : 4 and trying to launch
> > > attempt_200905211749_0452_m_06_0
> > > 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM
> > > Runner jvm_200905211749_0452_m_1998728288 spawned.
> > > 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM
> > with
> > > ID: jvm_200905211749_0452_m_1998728288 given task:
> > > attempt_200905211749_0452_m_06_0
> > > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > >
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > > in any of the configured local directories
> > > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > >
> >
> taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_06_0/output/file.out
> > > in any of the configured local directories
> > > 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker:
> > > attempt_200905211749_0452_m_06_0 1.0% hdfs://
> > >
> >
> ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> > <
> >
> http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> > >
> > > 2009-0

Re: Can not start task tracker because java.lang.NullPointerException

2009-05-22 Thread Todd Lipcon
Hi Lance,

It's possible this is related to the other JIRA (HADOOP-5761). If it's not
too much trouble to try out the 19.2 branch from SVN, it would be helpful in
determining whether this is a problem that's already fixed or if you've
discovered something new.

Thanks
-Todd

On Fri, May 22, 2009 at 2:01 PM, Lance Riedel  wrote:

> Hi Todd,
> We had looked at that before.. here is the location of the tmp directory:
>
> [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh
> /dist/app/hadoop-0.19.1/tmp
> 248G/dist/app/hadoop-0.19.1/tmp
>
> There are no cron jobs that would have anything to do with that directory.
>
> Here is the /tmp
> [dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /tmp
> 204K/tmp
>
> Does this look like a disk error? I had seen that the
> "org.apache.hadoop.util.DiskChecker$DiskErrorException" is bogus.
>
> Thanks!
> Lance
>
>
>
>
>
> On Fri, May 22, 2009 at 9:33 AM, Lance Riedel  wrote:
>
> > Version 19.1 with patches:
> > 4780-2v19.patch (Jira  4780)
> > closeAll3.patch (Jira 3998)
> > I have confirmed that
> https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that is
> not the fix.
> >
> >
> > We are having task trackers die every night with a null pointer
> exception.
> > Usually 2 or so out of 8 (25% each night).
> >
> >
> > Here are the logs:
> >
> > Version 19.1 with
> > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker:
> > Received 'KillJobAction' for job: job_200905211749_0451
> > 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
> > attempt_200905211749_0451_m_00_0 done; removing files.
> > 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > in any of the configured local directories
> > 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker:
> Received
> > 'KillJobAction' for job: job_200905211749_0444
> > 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
> > attempt_200905211749_0444_m_00_0 done; removing files.
> > 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > in any of the configured local directories
> > 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > in any of the configured local directories
> > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> > LaunchTaskAction (registerTask): attempt_200905211749_0452_m_06_0
> task's
> > state:UNASSIGNED
> > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> > to launch : attempt_200905211749_0452_m_06_0
> > 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
> > TaskLauncher, current free slots : 4 and trying to launch
> > attempt_200905211749_0452_m_06_0
> > 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM
> > Runner jvm_200905211749_0452_m_1998728288 spawned.
> > 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM
> with
> > ID: jvm_200905211749_0452_m_1998728288 given task:
> > attempt_200905211749_0452_m_06_0
> > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> >
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> > in any of the configured local directories
> > 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> >
> taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_06_0/output/file.out
> > in any of the configured local directories
> > 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker:
> > attempt_200905211749_0452_m_06_0 1.0% hdfs://
> >
> ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> <
> http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> >
> > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task
> > attempt_200905211749_0452_m_06_0 is done.
> > 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker:
> reported
> > output size for attempt_200905211749_0452_m_06_0  was 0
> > 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker:
> > addFreeSlot : current free slots : 4
> > 2009-05-22 02:49:19,954 INFO org.apache.hadoop.m

Re: Can not start task tracker because java.lang.NullPointerException

2009-05-22 Thread Lance Riedel
Hi Todd,
We had looked at that before.. here is the location of the tmp directory:

[dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh
/dist/app/hadoop-0.19.1/tmp
248G/dist/app/hadoop-0.19.1/tmp

There are no cron jobs that would have anything to do with that directory.

Here is the /tmp
[dotsp...@domu-12-31-38-00-80-21 hadoop-0.19.1]$ du -sh /tmp
204K/tmp

Does this look like a disk error? I had seen that the
"org.apache.hadoop.util.DiskChecker$DiskErrorException" is bogus.

Thanks!
Lance





On Fri, May 22, 2009 at 9:33 AM, Lance Riedel  wrote:

> Version 19.1 with patches:
> 4780-2v19.patch (Jira  4780)
> closeAll3.patch (Jira 3998)
> I have confirmed that https://issues.apache.org/jira/browse/HADOOP-4924patch 
> is in, so that is not the fix.
>
>
> We are having task trackers die every night with a null pointer exception.
> Usually 2 or so out of 8 (25% each night).
>
>
> Here are the logs:
>
> Version 19.1 with
> 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker:
> Received 'KillJobAction' for job: job_200905211749_0451
> 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200905211749_0451_m_00_0 done; removing files.
> 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_200905211749_0444
> 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200905211749_0444_m_00_0 done; removing files.
> 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200905211749_0452_m_06_0 task's
> state:UNASSIGNED
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_200905211749_0452_m_06_0
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 4 and trying to launch
> attempt_200905211749_0452_m_06_0
> 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_200905211749_0452_m_1998728288 spawned.
> 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
> ID: jvm_200905211749_0452_m_1998728288 given task:
> attempt_200905211749_0452_m_06_0
> 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_06_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200905211749_0452_m_06_0 1.0% hdfs://
> ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task
> attempt_200905211749_0452_m_06_0 is done.
> 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: reported
> output size for attempt_200905211749_0452_m_06_0  was 0
> 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 4
> 2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1
> 2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: Recieved
> RenitTrackerAction from JobTracker
> 2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
> start task tracker because java.lang.NullPointerException
> at
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300)
> at
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskT

Re: Can not start task tracker because java.lang.NullPointerException

2009-05-22 Thread Todd Lipcon
Hi Lance,

Is it possible that your mapred.local.dir is in /tmp and you have a cron job
that cleans it up at night (default on many systems)?

Thanks
-Todd

On Fri, May 22, 2009 at 9:33 AM, Lance Riedel  wrote:

> Version 19.1 with patches:
> 4780-2v19.patch (Jira  4780)
> closeAll3.patch (Jira 3998)
> I have confirmed that
> https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that
> is not the fix.
>
>
> We are having task trackers die every night with a null pointer exception.
> Usually 2 or so out of 8 (25% each night).
>
>
> Here are the logs:
>
> Version 19.1 with
> 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_200905211749_0451
> 2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200905211749_0451_m_00_0 done; removing files.
> 2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: Received
> 'KillJobAction' for job: job_200905211749_0444
> 2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200905211749_0444_m_00_0 done; removing files.
> 2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_200905211749_0452_m_06_0
> task's
> state:UNASSIGNED
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to
> launch : attempt_200905211749_0452_m_06_0
> 2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 4 and trying to launch
> attempt_200905211749_0452_m_06_0
> 2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner
> jvm_200905211749_0452_m_1998728288 spawned.
> 2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
> ID: jvm_200905211749_0452_m_1998728288 given task:
> attempt_200905211749_0452_m_06_0
> 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_06_0/output/file.out
> in any of the configured local directories
> 2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_200905211749_0452_m_06_0 1.0% hdfs://
>
> ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> <
> http://ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h%233e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
> >
> 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task
> attempt_200905211749_0452_m_06_0 is done.
> 2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: reported
> output size for attempt_200905211749_0452_m_06_0  was 0
> 2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 4
> 2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1
> 2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: Recieved
> RenitTrackerAction from JobTracker
> 2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
> start task tracker because java.lang.NullPointerException
>at
>
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300)
>at
>
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2273)
>at org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:840)
>at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1728)
>at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2785)
>
> 2009-05-22 02:59:19,300 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTD

Tutorial on building an AMI

2009-05-22 Thread Saptarshi Guha
Hello,
Is there a tutorial available to build an Hadoop AMI (like
Cloudera's)? Cloudera has an 18.2 ami and for reasons I understand
they can't provide(as of now) AMIs for higher Hadoop versions until
they become stable.
I would like to create an AMI for 19.2 - so was hoping if there is a
guide for building one.


Thank you

Saptarshi Guha


Re: input/output error while setting up superblock

2009-05-22 Thread Aaron Kimball
More specifically:

HDFS does not support operations such as opening a file for write/append
after it has already been closed, or seeking to a new location in a writer.
You can only write files linearly; all other operations will return a "not
supported" error.

You'll also find that random-access read performance, while implemented, is
not particularly high-throughput. For serving Xen images even in read-only
mode, you'll likely have much better luck with a different FS.

- Aaron


2009/5/22 Taeho Kang 

> I don't think HDFS is a good place to store your Xen image file as it will
> likely be updated/appended frequently in small blocks. With the way HDFS is
> designed for, you can't quite use it like a regular filesystem (e.g. ones
> that support frequent small block appends/updates in files). My suggestion
> is to use another filesystem like NAS or SAN.
>
> /Taeho
>
> 2009/5/22 신승엽 
>
> > Hi, I have a problem to use hdfs.
> >
> > I mounted hdfs using fuse-dfs.
> >
> > I created a dummy file for 'Xen' in hdfs and then formated the dummy file
> > using 'mke2fs'.
> >
> > But the operation was faced error. The error message is as follows.
> >
> > [r...@localhost hdfs]# mke2fs -j -F ./file_dumy
> > mke2fs 1.40.2 (12-Jul-2007)
> > ./file_dumy: Input/output error while setting up superblock
> > Also, I copyed an image file of xen to hdfs. But Xen couldn't the image
> > files in hdfs.
> >
> > r...@localhost hdfs]# fdisk -l fedora6_demo.img
> > last_lba(): I don't know how to handle files with mode 81a4
> > You must set cylinders.
> > You can do this from the extra functions menu.
> >
> > Disk fedora6_demo.img: 0 MB, 0 bytes
> > 255 heads, 63 sectors/track, 0 cylinders
> > Units = cylinders of 16065 * 512 = 8225280 bytes
> >
> >   Device Boot  Start End  Blocks   Id  System
> > fedora6_demo.img1   *   1 156 1253038+  83  Linux
> >
> > Could you answer me anything about this problem.
> >
> > Thank you.
> >
>


Re: ssh issues

2009-05-22 Thread Edward Capriolo
Pankil,

I used to be very confused by hadoop and SSH keys. SSH is NOT
required. Each component can be started by hand. This gem of knowledge
is hidden away in the hundreds of DIGG style articles entitled 'HOW TO
RUN A HADOOP MULTI-MASTER CLUSTER!'

The SSH keys are only required by the shell scripts that are contained
with Hadoop like start-all. They are wrappers to kick off other
scripts on a list of nodes. I PERSONALLY dislike using SSH keys as a
software component and believe they should only be used by
administrators.

We chose the cloudera distribution.
http://www.cloudera.com/distribution. A big factor behind this was the
simple init.d scripts they provided. Each hadoop component has its own
start scripts hadoop-namenode, hadoop-datanode, etc.

My suggestion is taking a look at the Cloudera startup scripts. Even
if you decide not to use the distribution you can take a look at their
start up scripts and fit them to your needs.

On Fri, May 22, 2009 at 10:34 AM,   wrote:
> Steve,
>
> Security through obscurity is always a good practice from a development
> standpoint and one of the reasons why tricking you out is an easy task.
> Please, keep hiding relevant details from people in order to keep everyone
> smiling.
>
> Hal
>
>> Pankil Doshi wrote:
>>> Well i made ssh with passphares. as the system in which i need to login
>>> requires ssh with pass phrases and those systems have to be part of my
>>> cluster. and so I need a way where I can specify -i path/to key/ and
>>> passphrase to hadoop in before hand.
>>>
>>> Pankil
>>>
>>
>> Well, are trying to manage a system whose security policy is
>> incompatible with hadoop's current shell scripts. If you push out the
>> configs and manage the lifecycle using other tools, this becomes a
>> non-issue. Dont raise the topic of HDFS security to your ops team
>> though, as they will probably be unhappy about what is currently on offer.
>>
>> -steve
>>
>>
>
>
>


Can not start task tracker because java.lang.NullPointerException

2009-05-22 Thread Lance Riedel
Version 19.1 with patches:
4780-2v19.patch (Jira  4780)
closeAll3.patch (Jira 3998)
I have confirmed that
https://issues.apache.org/jira/browse/HADOOP-4924patch is in, so that
is not the fix.


We are having task trackers die every night with a null pointer exception.
Usually 2 or so out of 8 (25% each night).


Here are the logs:

Version 19.1 with
2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_200905211749_0451
2009-05-22 02:46:49,911 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200905211749_0451_m_00_0 done; removing files.
2009-05-22 02:46:54,911 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
in any of the configured local directories
2009-05-22 02:47:13,968 INFO org.apache.hadoop.mapred.TaskTracker: Received
'KillJobAction' for job: job_200905211749_0444
2009-05-22 02:47:13,969 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200905211749_0444_m_00_0 done; removing files.
2009-05-22 02:47:18,968 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
in any of the configured local directories
2009-05-22 02:48:52,324 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
in any of the configured local directories
2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_200905211749_0452_m_06_0 task's
state:UNASSIGNED
2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_200905211749_0452_m_06_0
2009-05-22 02:49:10,779 INFO org.apache.hadoop.mapred.TaskTracker: In
TaskLauncher, current free slots : 4 and trying to launch
attempt_200905211749_0452_m_06_0
2009-05-22 02:49:15,274 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner
jvm_200905211749_0452_m_1998728288 spawned.
2009-05-22 02:49:15,765 INFO org.apache.hadoop.mapred.TaskTracker: JVM with
ID: jvm_200905211749_0452_m_1998728288 given task:
attempt_200905211749_0452_m_06_0
2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200905211749_0421/attempt_200905211749_0421_r_09_0/output/file.out
in any of the configured local directories
2009-05-22 02:49:15,781 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200905211749_0452/attempt_200905211749_0452_m_06_0/output/file.out
in any of the configured local directories
2009-05-22 02:49:19,784 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200905211749_0452_m_06_0 1.0% hdfs://
ec2-75-101-247-52.compute-1.amazonaws.com:54310/paragraphInstances/2009-05-22/rollup#06h#3e04c188-245a-4856-9a54-2fec60e85e3d.seq:0+9674259
2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: Task
attempt_200905211749_0452_m_06_0 is done.
2009-05-22 02:49:19,785 INFO org.apache.hadoop.mapred.TaskTracker: reported
output size for attempt_200905211749_0452_m_06_0  was 0
2009-05-22 02:49:19,787 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 4
2009-05-22 02:49:19,954 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_200905211749_0452_m_1998728288 exited. Number of tasks it ran: 1
2009-05-22 02:59:19,297 INFO org.apache.hadoop.mapred.TaskTracker: Recieved
RenitTrackerAction from JobTracker
2009-05-22 02:59:19,298 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
start task tracker because java.lang.NullPointerException
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2300)
at
org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:2273)
at org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:840)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1728)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2785)

2009-05-22 02:59:19,300 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down TaskTracker at domU-12-31-38-01-AD-91/
10.253.178.95
/


Re: ssh issues

2009-05-22 Thread hmarti2
Steve,

Security through obscurity is always a good practice from a development
standpoint and one of the reasons why tricking you out is an easy task.
Please, keep hiding relevant details from people in order to keep everyone
smiling.

Hal

> Pankil Doshi wrote:
>> Well i made ssh with passphares. as the system in which i need to login
>> requires ssh with pass phrases and those systems have to be part of my
>> cluster. and so I need a way where I can specify -i path/to key/ and
>> passphrase to hadoop in before hand.
>>
>> Pankil
>>
>
> Well, are trying to manage a system whose security policy is
> incompatible with hadoop's current shell scripts. If you push out the
> configs and manage the lifecycle using other tools, this becomes a
> non-issue. Dont raise the topic of HDFS security to your ops team
> though, as they will probably be unhappy about what is currently on offer.
>
> -steve
>
>




Re: ssh issues

2009-05-22 Thread Steve Loughran

Pankil Doshi wrote:

Well i made ssh with passphares. as the system in which i need to login
requires ssh with pass phrases and those systems have to be part of my
cluster. and so I need a way where I can specify -i path/to key/ and
passphrase to hadoop in before hand.

Pankil



Well, are trying to manage a system whose security policy is 
incompatible with hadoop's current shell scripts. If you push out the 
configs and manage the lifecycle using other tools, this becomes a 
non-issue. Dont raise the topic of HDFS security to your ops team 
though, as they will probably be unhappy about what is currently on offer.


-steve


Re: ssh issues

2009-05-22 Thread Pankil Doshi
Well i made ssh with passphares. as the system in which i need to login
requires ssh with pass phrases and those systems have to be part of my
cluster. and so I need a way where I can specify -i path/to key/ and
passphrase to hadoop in before hand.

Pankil

On Thu, May 21, 2009 at 9:35 PM, Aaron Kimball  wrote:

> Pankil,
>
> That means that either you're using the wrong ssh key and it's falling back
> to password authentication, or else you created your ssh keys with
> passphrases attached; try making new ssh keys with ssh-keygen and
> distributing those to start again?
>
> - Aaron
>
> On Thu, May 21, 2009 at 3:49 PM, Pankil Doshi  wrote:
>
> > The problem is that it also prompts for the pass phrase.
> >
> > On Thu, May 21, 2009 at 2:14 PM, Brian Bockelman  > >wrote:
> >
> > > Hey Pankil,
> > >
> > > Use ~/.ssh/config to set the default key location to the proper place
> for
> > > each host, if you're going down that route.
> > >
> > > I'd remind you that SSH is only used as a convenient method to launch
> > > daemons.  If you have a preferred way to start things up on your
> cluster,
> > > you can use that (I think most large clusters don't use ssh... could be
> > > wrong).
> > >
> > > Brian
> > >
> > >
> > > On May 21, 2009, at 2:07 PM, Pankil Doshi wrote:
> > >
> > >  Hello everyone,
> > >>
> > >> I got hint how to solve the problem where clusters have different
> > >> usernames.but now other problem I face is that i can ssh a machine by
> > >> using
> > >> -i path/to key/ ..I cant ssh them directly but I will have to always
> > pass
> > >> the key.
> > >>
> > >> Now i face problem in ssh-ing my machines.Does anyone have any ideas
> how
> > >> to
> > >> deal with that??
> > >>
> > >> Regards
> > >> Pankil
> > >>
> > >
> > >
> >
>


Re: Hama Problem

2009-05-22 Thread Edward J. Yoon
Hi,

Before consider this, let's talk about your problem and why do you
want to use these. If your application isn't huge then I think
MPI-based matrix package could be much helpful to you since Hama
concept also is the large-scale, not high performance for small
matrices.

And, Have you tried to subscribe/mail here:
http://incubator.apache.org/hama/mailing_lists.html

On Fri, May 22, 2009 at 4:51 PM, ykj  wrote:
>
>
> Currently in Hama, eigenvalue decomposition is not  implement.So In STEP 4,
> it is hard to migrate it.so I
>
> work out an idea to bypass it. before Step 4, I can let L be
> denseMatrix.when I come to Step 4, I can
> transform L into  submatrix.in Jama,eigenvalue decomposition is support
> although it is not parallel computing.So  I can get    eigValues ,eigVectors
> values.But after that in step 5,It need to sort two matrix.
>
> I want to use the hbase sort function.so Hwo can transform this two
> submatrix into two densematrix?
>
> or other way ?
>                /**
>                 * STEP 4
>                 *              Calculate the eigen values and vectors of this 
> covariance matrix
>                 *
>                 *              % Get the eigenvectors (columns of Vectors) 
> and eigenvalues (diag of
> Values)
>                 */
>                EigenvalueDecomposition eigen = L.eig();
>                eigValues       = eigen.getD();
>                eigVectors      = eigen.getV();
>
>
>                /**
>                 * STEP 5
>                 *              % Sort the vectors/values according to size of 
> eigenvalue
>                 */
>                Matrix[] eigDVSorted = sortem(eigValues, eigVectors);
>                eigValues = eigDVSorted[0];
>                eigVectors = eigDVSorted[1];
>
>
>                /**
>                 * STEP 6
>                 *              % Convert the eigenvectors of A'*A into 
> eigenvectors of A*A'
>                 */
>
>                eigVectors = A.times(eigVectors);
>
>
>                /**
>                 * STEP 7
>                 *              % Get the eigenvalues out of the diagonal 
> matrix and
>                 *              % normalize them so the evalues are 
> specifically for cov(A'), not
> A*A'.
>                 */
>                double[] values = diag(eigValues);
>                for(int i = 0; i < values.length; i++)
>                        values[i] /= A.getColumnDimension() - 1;
>
>
>                /**
>                 * STEP 8
>                 *              % Normalize Vectors to unit length, kill 
> vectors corr. to tiny
> evalues
>                 */
>                numEigenVecs = 0;
>                for(int i = 0; i < eigVectors.getColumnDimension(); i++) {
>                        Matrix tmp;
>                        if (values[i] < 0.0001)
>                        {
>                                tmp = new 
> Matrix(eigVectors.getRowDimension(),1);
>                        }
>                        else
>                        {
>                                tmp = 
> eigVectors.getMatrix(0,eigVectors.getRowDimension()-1,i,i).times(
>                                                1 / eigVectors.getMatrix(0, 
> eigVectors.getRowDimension() - 1, i,
> i).normF());
>                                numEigenVecs++;
>                        }
>                        
> eigVectors.setMatrix(0,eigVectors.getRowDimension()-1,i,i,tmp);
>                        //eigVectors.timesEquals(1 / eigVectors.getMatrix(0,
> eigVectors.getRowDimension() - 1, i, i).normInf());
>                }
>                eigVectors = 
> eigVectors.getMatrix(0,eigVectors.getRowDimension() - 1, 0,
> numEigenVecs - 1);
>
>                trained = true;
>
>
>                /*System.out.println("There are " + numGood + "
> eigenVectors\n\nEigenVectorSize");
>                System.out.println(eigVectors.getRowDimension());
>                System.out.println(eigVectors.getColumnDimension());
>                try {
>            PrintWriter pw = new PrintWriter("c:\\tmp\\test.txt");
>            eigVectors.print(pw, 8, 4);
>            pw.flush();
>            pw.close();
>        } catch (Exception e) {
>            e.printStackTrace();
>        }
>
>                int width = pics[0].img.getWidth(null);
>                BufferedImage biAvg = 
> imageFromMatrix(bigAvg.getArrayCopy()[0], width);
>
>                try {
>            saveImage(new File("c:\\tmp\\test.jpg"), biAvg);
>        } catch (IOException e1) {
>            e1.printStackTrace();
>        }*/
>        }
>
>        /**
>         * Returns a number of eigenFace values to be used in a feature space
>         * @param pic
>         * @param number number of eigen feature values.
>         * @return will be of length number or this.getNumEigenVecs whichever 
> is
> the smaller
>         */
>        public double[] getEigenFaces(Picture pic, int number)
>        {
>                if (number > numEigenVecs)              //adjust the number to

Re: input/output error while setting up superblock

2009-05-22 Thread Taeho Kang
I don't think HDFS is a good place to store your Xen image file as it will
likely be updated/appended frequently in small blocks. With the way HDFS is
designed for, you can't quite use it like a regular filesystem (e.g. ones
that support frequent small block appends/updates in files). My suggestion
is to use another filesystem like NAS or SAN.

/Taeho

2009/5/22 신승엽 

> Hi, I have a problem to use hdfs.
>
> I mounted hdfs using fuse-dfs.
>
> I created a dummy file for 'Xen' in hdfs and then formated the dummy file
> using 'mke2fs'.
>
> But the operation was faced error. The error message is as follows.
>
> [r...@localhost hdfs]# mke2fs -j -F ./file_dumy
> mke2fs 1.40.2 (12-Jul-2007)
> ./file_dumy: Input/output error while setting up superblock
> Also, I copyed an image file of xen to hdfs. But Xen couldn't the image
> files in hdfs.
>
> r...@localhost hdfs]# fdisk -l fedora6_demo.img
> last_lba(): I don't know how to handle files with mode 81a4
> You must set cylinders.
> You can do this from the extra functions menu.
>
> Disk fedora6_demo.img: 0 MB, 0 bytes
> 255 heads, 63 sectors/track, 0 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>   Device Boot  Start End  Blocks   Id  System
> fedora6_demo.img1   *   1 156 1253038+  83  Linux
>
> Could you answer me anything about this problem.
>
> Thank you.
>


input/output error while setting up superblock

2009-05-22 Thread 신승엽
Hi, I have a problem to use hdfs.

I mounted hdfs using fuse-dfs.

I created a dummy file for 'Xen' in hdfs and then formated the dummy file using 
'mke2fs'.

But the operation was faced error. The error message is as follows.

[r...@localhost hdfs]# mke2fs -j -F ./file_dumy  
mke2fs 1.40.2 (12-Jul-2007)
./file_dumy: Input/output error while setting up superblock
Also, I copyed an image file of xen to hdfs. But Xen couldn't the image files 
in hdfs.

r...@localhost hdfs]# fdisk -l fedora6_demo.img
last_lba(): I don't know how to handle files with mode 81a4
You must set cylinders.
You can do this from the extra functions menu.

Disk fedora6_demo.img: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
fedora6_demo.img1   *   1 156 1253038+  83  Linux

Could you answer me anything about this problem.

Thank you.


Re: Hama Problem

2009-05-22 Thread ykj


Currently in Hama, eigenvalue decomposition is not  implement.So In STEP 4,
it is hard to migrate it.so I 

work out an idea to bypass it. before Step 4, I can let L be
denseMatrix.when I come to Step 4, I can 
transform L into  submatrix.in Jama,eigenvalue decomposition is support
although it is not parallel computing.So  I can geteigValues ,eigVectors
values.But after that in step 5,It need to sort two matrix.

I want to use the hbase sort function.so Hwo can transform this two 
submatrix into two densematrix?

or other way ?
/**
 * STEP 4
 *  Calculate the eigen values and vectors of this 
covariance matrix
 * 
 *  % Get the eigenvectors (columns of Vectors) and 
eigenvalues (diag of
Values)
 */
EigenvalueDecomposition eigen = L.eig();
eigValues   = eigen.getD();
eigVectors  = eigen.getV();


/**
 * STEP 5
 *  % Sort the vectors/values according to size of 
eigenvalue
 */
Matrix[] eigDVSorted = sortem(eigValues, eigVectors);
eigValues = eigDVSorted[0];
eigVectors = eigDVSorted[1];


/**
 * STEP 6
 *  % Convert the eigenvectors of A'*A into 
eigenvectors of A*A'
 */

eigVectors = A.times(eigVectors);


/**
 * STEP 7
 *  % Get the eigenvalues out of the diagonal 
matrix and
 *  % normalize them so the evalues are 
specifically for cov(A'), not
A*A'.
 */
double[] values = diag(eigValues);
for(int i = 0; i < values.length; i++)
values[i] /= A.getColumnDimension() - 1;


/**
 * STEP 8
 *  % Normalize Vectors to unit length, kill 
vectors corr. to tiny
evalues
 */
numEigenVecs = 0;
for(int i = 0; i < eigVectors.getColumnDimension(); i++) {
Matrix tmp;
if (values[i] < 0.0001)
{
tmp = new 
Matrix(eigVectors.getRowDimension(),1);
}
else
{
tmp = 
eigVectors.getMatrix(0,eigVectors.getRowDimension()-1,i,i).times(
1 / eigVectors.getMatrix(0, 
eigVectors.getRowDimension() - 1, i,
i).normF());
numEigenVecs++;
}

eigVectors.setMatrix(0,eigVectors.getRowDimension()-1,i,i,tmp);
//eigVectors.timesEquals(1 / eigVectors.getMatrix(0,
eigVectors.getRowDimension() - 1, i, i).normInf());
}
eigVectors = 
eigVectors.getMatrix(0,eigVectors.getRowDimension() - 1, 0,
numEigenVecs - 1);

trained = true;


/*System.out.println("There are " + numGood + "
eigenVectors\n\nEigenVectorSize");
System.out.println(eigVectors.getRowDimension());
System.out.println(eigVectors.getColumnDimension());
try {
PrintWriter pw = new PrintWriter("c:\\tmp\\test.txt");
eigVectors.print(pw, 8, 4);
pw.flush();
pw.close();
} catch (Exception e) {
e.printStackTrace();
}

int width = pics[0].img.getWidth(null);
BufferedImage biAvg = imageFromMatrix(bigAvg.getArrayCopy()[0], 
width);

try {
saveImage(new File("c:\\tmp\\test.jpg"), biAvg);
} catch (IOException e1) {
e1.printStackTrace();
}*/
}

/**
 * Returns a number of eigenFace values to be used in a feature space
 * @param pic
 * @param number number of eigen feature values.  
 * @return will be of length number or this.getNumEigenVecs whichever is
the smaller
 */
public double[] getEigenFaces(Picture pic, int number)
{
if (number > numEigenVecs)  //adjust the number to 
the maxium number of
eigen vectors availiable
number = numEigenVecs;

double[] ret = new double[number];

double[] pixels = pic.getImagePixels();
Matrix face = new Matrix(pixels, pixels.length);
Matrix Vecs = 
eigVectors.getMatrix(0,eigVectors.g