Re: JobConf.setJobEndNotificationURI

2010-03-23 Thread Tom White
I think you can set the URI on the configuration object with the key
JobContext.END_NOTIFICATION_URL.

Cheers,
Tom

On Tue, Feb 23, 2010 at 12:02 PM, Ted Yu  wrote:
> Hi,
> I am looking for counterpart to JobConf.setJobEndNotificationURI() in
> org.apache.hadoop.mapreduce
>
> Please advise.
>
> Thanks
>


Re:Re: HOD on Scyld

2010-03-23 Thread Andy
Hi Boyu, have you tried not to use the package deploying and use the extracted 
folder on each machine instead? Because I'm using this method and works fine.
 
Song Liu



在2010-03-24 02:43:44,"Boyu Zhang"  写道:
>Thanks for the tip, I use -c to specify hod-conf-dir in the command, and I
>set both the two java-home, I still get the same error. I will keep looking
>and let you know.
>
>Boyu
>
>On Tue, Mar 23, 2010 at 11:09 AM, Antonio D'Ettole wrote:
>
>> Make sure you set HOD_CONF_DIR to /path/to/hod/conf
>> Also make sure that, in the file /path/to/hod/conf/hodrc you set
>> "java-home"
>> (under both [hod] and [hodring] ) to a working JRE or JDK in your system.
>>
>> Does that work?
>>
>> Antonio
>>
>> On Tue, Mar 23, 2010 at 3:34 PM, Boyu Zhang  wrote:
>>
>> > thanks a lot! I found out the HOD_PYTHON_HOME error too: )
>> >
>> > I had a new error after I use the correct version of Python:
>> >
>> > --
>> > INFO - Cluster Id 62.geronimo.xxx.xxx.xxx.edu
>> > CRITICAL - Cluster could not be allocated because of the following errors
>> > on
>> > the ringmaster host n3.
>> > Could not retrive the version of Hadoop. Check the Hadoop installation or
>> > the value of the hodring.java-home variable.
>> > CRITICAL - Cannot allocate cluster /home/zhang/cluster
>> >
>> >
>> -
>> >
>> > Is that because I set my java-home wrong? Thanks!
>> >
>> > Boyu
>> >
>> > On Tue, Mar 23, 2010 at 10:04 AM, Antonio D'Ettole > > >wrote:
>> >
>> > > Boyu,
>> > > I've found that only Python 2.5.x works with HOD. Version 2.6.x will
>> give
>> > > you the exception.
>> > > You should set HOD_PYTHON_HOME to the path to a 2.5.x executable (not
>> the
>> > > directory).
>> > >
>> > > Antonio
>> > >
>> > > On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang 
>> > wrote:
>> > >
>> > > > Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster
>> >  -n
>> > > 4
>> > > > -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
>> > > > /home/zhang/hadoop-0.20.2.tar.gz -b 4
>> > > >
>> > > > and I get the errot:  Using Python: 2.4.3 (#1, Sep  3 2009, 15:37:37)
>> > > > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
>> > > >
>> > > >  Uncaught Exception : need more than 2 values to unpack
>> > > >
>> > > > I have multiple versions of python running on my system, and I set
>> > > env-vars
>> > > > in hodrc file to point to the python 2.6.5 version(
>> > > > HOD_PYTHON_HOME=/opt/python/2.6.5/bin/python
>> > > > ). Do I need to do anything else, like export the HOD_PYTHON_HOM
>> > > > environment
>> > > > variable? Thanks a lot!
>> > > >
>> > > >
>> > > > On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang 
>> > > > wrote:
>> > > >
>> > > > > Dear All,
>> > > > >
>> > > > > I have been trying to get HOD working on a cluster running Scyld.
>> But
>> > > > there
>> > > > > are some problems. I configured the minimum configurations.
>> > > > >
>> > > > > 1.  I executed the command:
>> > > > > $ bin/hod allocate -d /home/zhang/cluster  -n 4 -c
>> > > > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
>> > > > > /home/zhang/hadoop-0.20.2.tar.gz
>> > > > > I get the error: file hod, line 576, finally: Syntax error. So I
>> > > > commented
>> > > > > out the line 576, and try again.
>> > > > >
>> > > > > 2. #[zh...@geronimo hod]$ bin/hod allocate -d /home/zhang/cluster
>> >  -n
>> > > 4
>> > > > -c
>> > > > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
>> > > > > /home/zhang/hadoop-0.20.2.tar.gz
>> > > > >   Uncaught Exception : need more than 2 values to unpack
>> > > > >
>> > > > > Could anyone tell me why am I having this error? Is the problem the
>> > > > > operating system, or Torque, or because I commented out line 576,
>> or
>> > > > > anything else?
>> > > > >
>> > > > > Any comment is welcome and appreciated. Thanks a lot!
>> > > > >
>> > > > > Sincerely,
>> > > > >
>> > > > > Boyu Zhang
>> > > > >
>> > > >
>> > >
>> >
>>


Re: Stop reducers until all maps are finished

2010-03-23 Thread Allen Wittenauer



On 3/23/10 11:41 AM, "ANKITBHATNAGAR"  wrote:
> Is there a configuration option in 18.3 to to reducers to copy the data
> until all maps are finished.

mapred.reduce.slowstart.completed.maps



Re: HOD on Scyld

2010-03-23 Thread Boyu Zhang
Thanks for the tip, I use -c to specify hod-conf-dir in the command, and I
set both the two java-home, I still get the same error. I will keep looking
and let you know.

Boyu

On Tue, Mar 23, 2010 at 11:09 AM, Antonio D'Ettole wrote:

> Make sure you set HOD_CONF_DIR to /path/to/hod/conf
> Also make sure that, in the file /path/to/hod/conf/hodrc you set
> "java-home"
> (under both [hod] and [hodring] ) to a working JRE or JDK in your system.
>
> Does that work?
>
> Antonio
>
> On Tue, Mar 23, 2010 at 3:34 PM, Boyu Zhang  wrote:
>
> > thanks a lot! I found out the HOD_PYTHON_HOME error too: )
> >
> > I had a new error after I use the correct version of Python:
> >
> > --
> > INFO - Cluster Id 62.geronimo.xxx.xxx.xxx.edu
> > CRITICAL - Cluster could not be allocated because of the following errors
> > on
> > the ringmaster host n3.
> > Could not retrive the version of Hadoop. Check the Hadoop installation or
> > the value of the hodring.java-home variable.
> > CRITICAL - Cannot allocate cluster /home/zhang/cluster
> >
> >
> -
> >
> > Is that because I set my java-home wrong? Thanks!
> >
> > Boyu
> >
> > On Tue, Mar 23, 2010 at 10:04 AM, Antonio D'Ettole  > >wrote:
> >
> > > Boyu,
> > > I've found that only Python 2.5.x works with HOD. Version 2.6.x will
> give
> > > you the exception.
> > > You should set HOD_PYTHON_HOME to the path to a 2.5.x executable (not
> the
> > > directory).
> > >
> > > Antonio
> > >
> > > On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang 
> > wrote:
> > >
> > > > Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster
> >  -n
> > > 4
> > > > -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > > /home/zhang/hadoop-0.20.2.tar.gz -b 4
> > > >
> > > > and I get the errot:  Using Python: 2.4.3 (#1, Sep  3 2009, 15:37:37)
> > > > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
> > > >
> > > >  Uncaught Exception : need more than 2 values to unpack
> > > >
> > > > I have multiple versions of python running on my system, and I set
> > > env-vars
> > > > in hodrc file to point to the python 2.6.5 version(
> > > > HOD_PYTHON_HOME=/opt/python/2.6.5/bin/python
> > > > ). Do I need to do anything else, like export the HOD_PYTHON_HOM
> > > > environment
> > > > variable? Thanks a lot!
> > > >
> > > >
> > > > On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang 
> > > > wrote:
> > > >
> > > > > Dear All,
> > > > >
> > > > > I have been trying to get HOD working on a cluster running Scyld.
> But
> > > > there
> > > > > are some problems. I configured the minimum configurations.
> > > > >
> > > > > 1.  I executed the command:
> > > > > $ bin/hod allocate -d /home/zhang/cluster  -n 4 -c
> > > > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > > > /home/zhang/hadoop-0.20.2.tar.gz
> > > > > I get the error: file hod, line 576, finally: Syntax error. So I
> > > > commented
> > > > > out the line 576, and try again.
> > > > >
> > > > > 2. #[zh...@geronimo hod]$ bin/hod allocate -d /home/zhang/cluster
> >  -n
> > > 4
> > > > -c
> > > > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > > > /home/zhang/hadoop-0.20.2.tar.gz
> > > > >   Uncaught Exception : need more than 2 values to unpack
> > > > >
> > > > > Could anyone tell me why am I having this error? Is the problem the
> > > > > operating system, or Torque, or because I commented out line 576,
> or
> > > > > anything else?
> > > > >
> > > > > Any comment is welcome and appreciated. Thanks a lot!
> > > > >
> > > > > Sincerely,
> > > > >
> > > > > Boyu Zhang
> > > > >
> > > >
> > >
> >
>


Stop reducers until all maps are finished

2010-03-23 Thread ANKITBHATNAGAR

Hi All,

Is there a configuration option in 18.3 to to reducers to copy the data
until all maps are finished.


Ankit

-- 
View this message in context: 
http://old.nabble.com/Stop-reducers-until-all-maps-are-finished-tp28005383p28005383.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: Hadoop for accounting?

2010-03-23 Thread Brian Bockelman
Yeah, that's what struck me immediately too.

I believe this was one of the reasons for moving (quickly) on Kerberos security 
for Hadoop.  Even with that, there's still a relatively high barrier if you 
someone says the words "FIPS" or "HIPAA".

I love HDFS.  The damn thing never breaks, no matter what hardware or user we 
throw at it.  Our scientists love it.  However, there's a damn good reason that 
transactions were invented, especially for accounting/billing matters...

Brian

On Mar 23, 2010, at 11:30 AM, Allen Wittenauer wrote:

> 
> 
> 
> On 3/23/10 4:04 AM, "Marcos Medrado Rubinelli" 
> wrote:
>> If not, what are your main concerns? What parts would you consider
>> stable enough for this kind of use?
> 
> While we're not doing any sort of billing on Hadoop, my #1 concern would be
> the fact that Hadoop (today) has zero security.  No way it would pass any
> reasonable PCI audit.



smime.p7s
Description: S/MIME cryptographic signature


Re: Hadoop for accounting?

2010-03-23 Thread Allen Wittenauer



On 3/23/10 4:04 AM, "Marcos Medrado Rubinelli" 
wrote:
> If not, what are your main concerns? What parts would you consider
> stable enough for this kind of use?

While we're not doing any sort of billing on Hadoop, my #1 concern would be
the fact that Hadoop (today) has zero security.  No way it would pass any
reasonable PCI audit.



Re: Hadoop for accounting?

2010-03-23 Thread Eric Sammer
Marcos:

It is extremely common that data processed with Hadoop is eventually
used for billing purposes. Regarding your example, one could use map
reduce to calculate usage totals by customer ID and drive billing from
there.

On Tue, Mar 23, 2010 at 7:04 AM, Marcos Medrado Rubinelli
 wrote:
> Hi,
>
> The wiki's "Powered By" page ( http://wiki.apache.org/hadoop/PoweredBy )
> lists dozens of companies using Hadoop in production, some of them for
> mission-critical operations, but is anyone using it - or planning to - for
> anything that involves money, like calculating a bill from API usage, or
> microtransactions?
>
> If not, what are your main concerns? What parts would you consider stable
> enough for this kind of use?
>
> Thanks,
> Marcos
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com


Re: execute mapreduce job on multiple hdfs files

2010-03-23 Thread Amogh Vasekar
Hi,
Piggybacking on Gang’s reply, to add files / dirs recursively you can use the 
filestatus, liststatus to determine if its a file or dir and add as needed ( 
check FileStatus API for this ) There is a patch which does this for 
FileInputFormat

http://issues.apache.org/jira/browse/MAPREDUCE-1501


Amogh


On 3/23/10 6:25 PM, "Gang Luo"  wrote:

Hi Oleg,
you can use FileInputFormat.addInputPath(JobConf, Path) multiple times in your 
program to add arbitrary paths. Instead, if you use 
FileInputFormat.setInputPath, there could be only one input path.

If you are talking about output, the path you give is an output directory, all 
the output files (part-0, part-1...) will be generated in that 
directory.

-Gang




- 原始邮件 
发件人: Oleg Ruchovets 
收件人: common-user@hadoop.apache.org
发送日期: 2010/3/23 (周二) 6:18:34 上午
主   题: execute mapreduce job on multiple hdfs files

Hi ,
All examples that I found executes mapreduce job on a single file but in my
situation I have more than one.

Suppose I have such folder on HDFS which contains some files:

/my_hadoop_hdfs/my_folder:
/my_hadoop_hdfs/my_folder/file1.txt
/my_hadoop_hdfs/my_folder/file2.txt
/my_hadoop_hdfs/my_folder/file3.txt


How can I execute  hadoop mapreduce on file1.txt , file2.txt and file3.txt?

Is it possible to provide to hadoop job folder as parameter and all files
will be produced by mapreduce job?

Thanks In Advance
Oleg







Re: a problem when starting datanodes

2010-03-23 Thread 毛宏
I edit the first line of  file
"~/Software/Development/Hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh" 
and then execute bin/start-all.sh, it displays the following message:

...
...
slave1: 
/home/maohong/Software/Development/Hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: 
line 39: dirname: command not found
slave1: 
/home/maohong/Software/Development/Hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: 
line 42: 
/home/maohong/Software/Development/Hadoop/hadoop-0.20.2/hadoop-config.sh: No 
such file or directory
slave1: 
/home/maohong/Software/Development/Hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh: 
line 75: mkdir: command not found
slave1: Usage: hadoop-daemon.sh [--config ] [--hosts
hostlistfile] (start|stop)  
..
..

It seems that the problem lies in the first line of the
file /bin/hadoop-daemon.sh. 

What do you think? How to configure it correctly ? 


在 2010-03-23二的 21:23 +0800,liu chang写道:

> On Tue, Mar 23, 2010 at 9:11 PM, 毛宏  wrote:
> > I use "file /usr/bin/env" to check if /usr/bin/env is present in my
> > system and the answer is yes.
> > But why does it still display
> >  datanode1:/usr/bin/env : bash:  No such file or directory
> >  datanode2:/usr/bin/env : bash:  No such file or directory?
> 
> Can you execute the 'bash' command at your shell? What UNIX system do you use?
> 
> > 在 2010-03-23二的 19:56 +0800,liu chang写道:
> >> Sorry, your error message says bash is not found. You should already
> >> have /usr/bin/env.
> >>
> >> Bash is installed by default in most Linux distributions. It could be
> >> that bash is not installed on your system, or your PATH environmental
> >> variable is somehow messed up. Can you execute 'bash' in your shell?
> >>
> >> On Tue, Mar 23, 2010 at 7:51 PM, liu chang  wrote:
> >> > Check if /usr/bin/env is present in your system:
> >> >
> >> > file /usr/bin/env
> >> >
> >> > If not, you probably have /bin/env instead. Verify using:
> >> >
> >> > file /bin/env
> >> >
> >> > If you have /bin/env but not /usr/bin/env, you can make a symbolic link 
> >> > for it:
> >> >
> >> > ln -s /usr/bin/env /bin/env
> >> >
> >> > You need to execute the command above as root.
> >> >
> >> > Liu Chang
> >> >
> >> > On Tue, Mar 23, 2010 at 7:48 PM, 毛宏  wrote:
> >> >> Hi all,
> >> >>I install Hadoop in three machines, my pc is the namenode, two 
> >> >> other pc
> >> >> are the datanodes, but when I execute  bin/start-dfs.sh, it displays
> >> >> these two line as follows:
> >> >>   datanode1: /usr/bin/env: bash: No such
> >> >> file or directory
> >> >>   datanode2: /usr/bin/env: bash: No such
> >> >> file or directory
> >> >>
> >> >>What does it mean?  How to solve this problem?
> >> >>Thanks for your attention~
> >> >>
> >> >>
> >> >
> >
> >
> >




Re: HOD on Scyld

2010-03-23 Thread Antonio D'Ettole
Make sure you set HOD_CONF_DIR to /path/to/hod/conf
Also make sure that, in the file /path/to/hod/conf/hodrc you set "java-home"
(under both [hod] and [hodring] ) to a working JRE or JDK in your system.

Does that work?

Antonio

On Tue, Mar 23, 2010 at 3:34 PM, Boyu Zhang  wrote:

> thanks a lot! I found out the HOD_PYTHON_HOME error too: )
>
> I had a new error after I use the correct version of Python:
>
> --
> INFO - Cluster Id 62.geronimo.xxx.xxx.xxx.edu
> CRITICAL - Cluster could not be allocated because of the following errors
> on
> the ringmaster host n3.
> Could not retrive the version of Hadoop. Check the Hadoop installation or
> the value of the hodring.java-home variable.
> CRITICAL - Cannot allocate cluster /home/zhang/cluster
>
> -
>
> Is that because I set my java-home wrong? Thanks!
>
> Boyu
>
> On Tue, Mar 23, 2010 at 10:04 AM, Antonio D'Ettole  >wrote:
>
> > Boyu,
> > I've found that only Python 2.5.x works with HOD. Version 2.6.x will give
> > you the exception.
> > You should set HOD_PYTHON_HOME to the path to a 2.5.x executable (not the
> > directory).
> >
> > Antonio
> >
> > On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang 
> wrote:
> >
> > > Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster
>  -n
> > 4
> > > -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > /home/zhang/hadoop-0.20.2.tar.gz -b 4
> > >
> > > and I get the errot:  Using Python: 2.4.3 (#1, Sep  3 2009, 15:37:37)
> > > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
> > >
> > >  Uncaught Exception : need more than 2 values to unpack
> > >
> > > I have multiple versions of python running on my system, and I set
> > env-vars
> > > in hodrc file to point to the python 2.6.5 version(
> > > HOD_PYTHON_HOME=/opt/python/2.6.5/bin/python
> > > ). Do I need to do anything else, like export the HOD_PYTHON_HOM
> > > environment
> > > variable? Thanks a lot!
> > >
> > >
> > > On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang 
> > > wrote:
> > >
> > > > Dear All,
> > > >
> > > > I have been trying to get HOD working on a cluster running Scyld. But
> > > there
> > > > are some problems. I configured the minimum configurations.
> > > >
> > > > 1.  I executed the command:
> > > > $ bin/hod allocate -d /home/zhang/cluster  -n 4 -c
> > > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > > /home/zhang/hadoop-0.20.2.tar.gz
> > > > I get the error: file hod, line 576, finally: Syntax error. So I
> > > commented
> > > > out the line 576, and try again.
> > > >
> > > > 2. #[zh...@geronimo hod]$ bin/hod allocate -d /home/zhang/cluster
>  -n
> > 4
> > > -c
> > > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > > /home/zhang/hadoop-0.20.2.tar.gz
> > > >   Uncaught Exception : need more than 2 values to unpack
> > > >
> > > > Could anyone tell me why am I having this error? Is the problem the
> > > > operating system, or Torque, or because I commented out line 576, or
> > > > anything else?
> > > >
> > > > Any comment is welcome and appreciated. Thanks a lot!
> > > >
> > > > Sincerely,
> > > >
> > > > Boyu Zhang
> > > >
> > >
> >
>


Re: HOD on Scyld

2010-03-23 Thread Boyu Zhang
thanks a lot! I found out the HOD_PYTHON_HOME error too: )

I had a new error after I use the correct version of Python:

--
INFO - Cluster Id 62.geronimo.xxx.xxx.xxx.edu
CRITICAL - Cluster could not be allocated because of the following errors on
the ringmaster host n3.
Could not retrive the version of Hadoop. Check the Hadoop installation or
the value of the hodring.java-home variable.
CRITICAL - Cannot allocate cluster /home/zhang/cluster
-

Is that because I set my java-home wrong? Thanks!

Boyu

On Tue, Mar 23, 2010 at 10:04 AM, Antonio D'Ettole wrote:

> Boyu,
> I've found that only Python 2.5.x works with HOD. Version 2.6.x will give
> you the exception.
> You should set HOD_PYTHON_HOME to the path to a 2.5.x executable (not the
> directory).
>
> Antonio
>
> On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang  wrote:
>
> > Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster  -n
> 4
> > -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > /home/zhang/hadoop-0.20.2.tar.gz -b 4
> >
> > and I get the errot:  Using Python: 2.4.3 (#1, Sep  3 2009, 15:37:37)
> > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
> >
> >  Uncaught Exception : need more than 2 values to unpack
> >
> > I have multiple versions of python running on my system, and I set
> env-vars
> > in hodrc file to point to the python 2.6.5 version(
> > HOD_PYTHON_HOME=/opt/python/2.6.5/bin/python
> > ). Do I need to do anything else, like export the HOD_PYTHON_HOM
> > environment
> > variable? Thanks a lot!
> >
> >
> > On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang 
> > wrote:
> >
> > > Dear All,
> > >
> > > I have been trying to get HOD working on a cluster running Scyld. But
> > there
> > > are some problems. I configured the minimum configurations.
> > >
> > > 1.  I executed the command:
> > > $ bin/hod allocate -d /home/zhang/cluster  -n 4 -c
> > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > /home/zhang/hadoop-0.20.2.tar.gz
> > > I get the error: file hod, line 576, finally: Syntax error. So I
> > commented
> > > out the line 576, and try again.
> > >
> > > 2. #[zh...@geronimo hod]$ bin/hod allocate -d /home/zhang/cluster  -n
> 4
> > -c
> > > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > > /home/zhang/hadoop-0.20.2.tar.gz
> > >   Uncaught Exception : need more than 2 values to unpack
> > >
> > > Could anyone tell me why am I having this error? Is the problem the
> > > operating system, or Torque, or because I commented out line 576, or
> > > anything else?
> > >
> > > Any comment is welcome and appreciated. Thanks a lot!
> > >
> > > Sincerely,
> > >
> > > Boyu Zhang
> > >
> >
>


Re: HOD on Scyld

2010-03-23 Thread Antonio D'Ettole
Boyu,
I've found that only Python 2.5.x works with HOD. Version 2.6.x will give
you the exception.
You should set HOD_PYTHON_HOME to the path to a 2.5.x executable (not the
directory).

Antonio

On Mon, Mar 22, 2010 at 5:07 PM, Boyu Zhang  wrote:

> Updata: I used the command: $ bin/hod allocate -d /home/zhang/cluster  -n 4
> -c /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> /home/zhang/hadoop-0.20.2.tar.gz -b 4
>
> and I get the errot:  Using Python: 2.4.3 (#1, Sep  3 2009, 15:37:37)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
>
>  Uncaught Exception : need more than 2 values to unpack
>
> I have multiple versions of python running on my system, and I set env-vars
> in hodrc file to point to the python 2.6.5 version(
> HOD_PYTHON_HOME=/opt/python/2.6.5/bin/python
> ). Do I need to do anything else, like export the HOD_PYTHON_HOM
> environment
> variable? Thanks a lot!
>
>
> On Mon, Mar 22, 2010 at 11:52 AM, Boyu Zhang 
> wrote:
>
> > Dear All,
> >
> > I have been trying to get HOD working on a cluster running Scyld. But
> there
> > are some problems. I configured the minimum configurations.
> >
> > 1.  I executed the command:
> > $ bin/hod allocate -d /home/zhang/cluster  -n 4 -c
> > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > /home/zhang/hadoop-0.20.2.tar.gz
> > I get the error: file hod, line 576, finally: Syntax error. So I
> commented
> > out the line 576, and try again.
> >
> > 2. #[zh...@geronimo hod]$ bin/hod allocate -d /home/zhang/cluster  -n 4
> -c
> > /home/zhang/hadoop-0.20.2/contrib/hod/conf/hodrc -t
> > /home/zhang/hadoop-0.20.2.tar.gz
> >   Uncaught Exception : need more than 2 values to unpack
> >
> > Could anyone tell me why am I having this error? Is the problem the
> > operating system, or Torque, or because I commented out line 576, or
> > anything else?
> >
> > Any comment is welcome and appreciated. Thanks a lot!
> >
> > Sincerely,
> >
> > Boyu Zhang
> >
>


Re: a problem when starting datanodes

2010-03-23 Thread 毛宏
yes I can, I am using Ubuntu 9.10 in my namenode, debian 4.0 in my
datanode. 

They all have /usr/bin/env


在 2010-03-23二的 21:23 +0800,liu chang写道:
> On Tue, Mar 23, 2010 at 9:11 PM, 毛宏  wrote:
> > I use "file /usr/bin/env" to check if /usr/bin/env is present in my
> > system and the answer is yes.
> > But why does it still display
> >  datanode1:/usr/bin/env : bash:  No such file or directory
> >  datanode2:/usr/bin/env : bash:  No such file or directory?
> 
> Can you execute the 'bash' command at your shell? What UNIX system do you use?
> 
> > 在 2010-03-23二的 19:56 +0800,liu chang写道:
> >> Sorry, your error message says bash is not found. You should already
> >> have /usr/bin/env.
> >>
> >> Bash is installed by default in most Linux distributions. It could be
> >> that bash is not installed on your system, or your PATH environmental
> >> variable is somehow messed up. Can you execute 'bash' in your shell?
> >>
> >> On Tue, Mar 23, 2010 at 7:51 PM, liu chang  wrote:
> >> > Check if /usr/bin/env is present in your system:
> >> >
> >> > file /usr/bin/env
> >> >
> >> > If not, you probably have /bin/env instead. Verify using:
> >> >
> >> > file /bin/env
> >> >
> >> > If you have /bin/env but not /usr/bin/env, you can make a symbolic link 
> >> > for it:
> >> >
> >> > ln -s /usr/bin/env /bin/env
> >> >
> >> > You need to execute the command above as root.
> >> >
> >> > Liu Chang
> >> >
> >> > On Tue, Mar 23, 2010 at 7:48 PM, 毛宏  wrote:
> >> >> Hi all,
> >> >>I install Hadoop in three machines, my pc is the namenode, two 
> >> >> other pc
> >> >> are the datanodes, but when I execute  bin/start-dfs.sh, it displays
> >> >> these two line as follows:
> >> >>   datanode1: /usr/bin/env: bash: No such
> >> >> file or directory
> >> >>   datanode2: /usr/bin/env: bash: No such
> >> >> file or directory
> >> >>
> >> >>What does it mean?  How to solve this problem?
> >> >>Thanks for your attention~
> >> >>
> >> >>
> >> >
> >
> >
> >




Re: a problem when starting datanodes

2010-03-23 Thread liu chang
On Tue, Mar 23, 2010 at 9:11 PM, 毛宏  wrote:
> I use "file /usr/bin/env" to check if /usr/bin/env is present in my
> system and the answer is yes.
> But why does it still display
>  datanode1:/usr/bin/env : bash:  No such file or directory
>  datanode2:/usr/bin/env : bash:  No such file or directory        ?

Can you execute the 'bash' command at your shell? What UNIX system do you use?

> 在 2010-03-23二的 19:56 +0800,liu chang写道:
>> Sorry, your error message says bash is not found. You should already
>> have /usr/bin/env.
>>
>> Bash is installed by default in most Linux distributions. It could be
>> that bash is not installed on your system, or your PATH environmental
>> variable is somehow messed up. Can you execute 'bash' in your shell?
>>
>> On Tue, Mar 23, 2010 at 7:51 PM, liu chang  wrote:
>> > Check if /usr/bin/env is present in your system:
>> >
>> > file /usr/bin/env
>> >
>> > If not, you probably have /bin/env instead. Verify using:
>> >
>> > file /bin/env
>> >
>> > If you have /bin/env but not /usr/bin/env, you can make a symbolic link 
>> > for it:
>> >
>> > ln -s /usr/bin/env /bin/env
>> >
>> > You need to execute the command above as root.
>> >
>> > Liu Chang
>> >
>> > On Tue, Mar 23, 2010 at 7:48 PM, 毛宏  wrote:
>> >> Hi all,
>> >>        I install Hadoop in three machines, my pc is the namenode, two 
>> >> other pc
>> >> are the datanodes, but when I execute  bin/start-dfs.sh, it displays
>> >> these two line as follows:
>> >>                               datanode1: /usr/bin/env: bash: No such
>> >> file or directory
>> >>                               datanode2: /usr/bin/env: bash: No such
>> >> file or directory
>> >>
>> >>        What does it mean?  How to solve this problem?
>> >>        Thanks for your attention~
>> >>
>> >>
>> >
>
>
>


Re: a problem when starting datanodes

2010-03-23 Thread 毛宏
I use "file /usr/bin/env" to check if /usr/bin/env is present in my
system and the answer is yes.  
But why does it still display   
 datanode1:/usr/bin/env : bash:  No such file or directory  
 datanode2:/usr/bin/env : bash:  No such file or directory?


在 2010-03-23二的 19:56 +0800,liu chang写道:
> Sorry, your error message says bash is not found. You should already
> have /usr/bin/env.
> 
> Bash is installed by default in most Linux distributions. It could be
> that bash is not installed on your system, or your PATH environmental
> variable is somehow messed up. Can you execute 'bash' in your shell?
> 
> On Tue, Mar 23, 2010 at 7:51 PM, liu chang  wrote:
> > Check if /usr/bin/env is present in your system:
> >
> > file /usr/bin/env
> >
> > If not, you probably have /bin/env instead. Verify using:
> >
> > file /bin/env
> >
> > If you have /bin/env but not /usr/bin/env, you can make a symbolic link for 
> > it:
> >
> > ln -s /usr/bin/env /bin/env
> >
> > You need to execute the command above as root.
> >
> > Liu Chang
> >
> > On Tue, Mar 23, 2010 at 7:48 PM, 毛宏  wrote:
> >> Hi all,
> >>I install Hadoop in three machines, my pc is the namenode, two 
> >> other pc
> >> are the datanodes, but when I execute  bin/start-dfs.sh, it displays
> >> these two line as follows:
> >>   datanode1: /usr/bin/env: bash: No such
> >> file or directory
> >>   datanode2: /usr/bin/env: bash: No such
> >> file or directory
> >>
> >>What does it mean?  How to solve this problem?
> >>Thanks for your attention~
> >>
> >>
> >




Re: execute mapreduce job on multiple hdfs files

2010-03-23 Thread Gang Luo
Hi Oleg,
you can use FileInputFormat.addInputPath(JobConf, Path) multiple times in your 
program to add arbitrary paths. Instead, if you use 
FileInputFormat.setInputPath, there could be only one input path.

If you are talking about output, the path you give is an output directory, all 
the output files (part-0, part-1...) will be generated in that 
directory.

-Gang
 



- 原始邮件 
发件人: Oleg Ruchovets 
收件人: common-user@hadoop.apache.org
发送日期: 2010/3/23 (周二) 6:18:34 上午
主   题: execute mapreduce job on multiple hdfs files

Hi ,
All examples that I found executes mapreduce job on a single file but in my
situation I have more than one.

Suppose I have such folder on HDFS which contains some files:

/my_hadoop_hdfs/my_folder:
/my_hadoop_hdfs/my_folder/file1.txt
/my_hadoop_hdfs/my_folder/file2.txt
/my_hadoop_hdfs/my_folder/file3.txt


How can I execute  hadoop mapreduce on file1.txt , file2.txt and file3.txt?

Is it possible to provide to hadoop job folder as parameter and all files
will be produced by mapreduce job?

Thanks In Advance
Oleg






Re: a problem when starting datanodes

2010-03-23 Thread liu chang
Sorry, your error message says bash is not found. You should already
have /usr/bin/env.

Bash is installed by default in most Linux distributions. It could be
that bash is not installed on your system, or your PATH environmental
variable is somehow messed up. Can you execute 'bash' in your shell?

On Tue, Mar 23, 2010 at 7:51 PM, liu chang  wrote:
> Check if /usr/bin/env is present in your system:
>
> file /usr/bin/env
>
> If not, you probably have /bin/env instead. Verify using:
>
> file /bin/env
>
> If you have /bin/env but not /usr/bin/env, you can make a symbolic link for 
> it:
>
> ln -s /usr/bin/env /bin/env
>
> You need to execute the command above as root.
>
> Liu Chang
>
> On Tue, Mar 23, 2010 at 7:48 PM, 毛宏  wrote:
>> Hi all,
>>        I install Hadoop in three machines, my pc is the namenode, two other 
>> pc
>> are the datanodes, but when I execute  bin/start-dfs.sh, it displays
>> these two line as follows:
>>                               datanode1: /usr/bin/env: bash: No such
>> file or directory
>>                               datanode2: /usr/bin/env: bash: No such
>> file or directory
>>
>>        What does it mean?  How to solve this problem?
>>        Thanks for your attention~
>>
>>
>


Re: a problem when starting datanodes

2010-03-23 Thread liu chang
Check if /usr/bin/env is present in your system:

file /usr/bin/env

If not, you probably have /bin/env instead. Verify using:

file /bin/env

If you have /bin/env but not /usr/bin/env, you can make a symbolic link for it:

ln -s /usr/bin/env /bin/env

You need to execute the command above as root.

Liu Chang

On Tue, Mar 23, 2010 at 7:48 PM, 毛宏  wrote:
> Hi all,
>        I install Hadoop in three machines, my pc is the namenode, two other pc
> are the datanodes, but when I execute  bin/start-dfs.sh, it displays
> these two line as follows:
>                               datanode1: /usr/bin/env: bash: No such
> file or directory
>                               datanode2: /usr/bin/env: bash: No such
> file or directory
>
>        What does it mean?  How to solve this problem?
>        Thanks for your attention~
>
>


a problem when starting datanodes

2010-03-23 Thread 毛宏
Hi all,
I install Hadoop in three machines, my pc is the namenode, two other pc
are the datanodes, but when I execute  bin/start-dfs.sh, it displays
these two line as follows:
   datanode1: /usr/bin/env: bash: No such
file or directory
   datanode2: /usr/bin/env: bash: No such
file or directory

What does it mean?  How to solve this problem? 
Thanks for your attention~



Hadoop for accounting?

2010-03-23 Thread Marcos Medrado Rubinelli

Hi,

The wiki's "Powered By" page ( http://wiki.apache.org/hadoop/PoweredBy ) 
lists dozens of companies using Hadoop in production, some of them for 
mission-critical operations, but is anyone using it - or planning to - 
for anything that involves money, like calculating a bill from API 
usage, or microtransactions?


If not, what are your main concerns? What parts would you consider 
stable enough for this kind of use?


Thanks,
Marcos


execute mapreduce job on multiple hdfs files

2010-03-23 Thread Oleg Ruchovets
Hi ,
All examples that I found executes mapreduce job on a single file but in my
situation I have more than one.

Suppose I have such folder on HDFS which contains some files:

/my_hadoop_hdfs/my_folder:
/my_hadoop_hdfs/my_folder/file1.txt
/my_hadoop_hdfs/my_folder/file2.txt
/my_hadoop_hdfs/my_folder/file3.txt


How can I execute  hadoop mapreduce on file1.txt , file2.txt and file3.txt?

Is it possible to provide to hadoop job folder as parameter and all files
will be produced by mapreduce job?

Thanks In Advance
Oleg