Compile Hadoop 1.0.3 native library failed on mac 10.7.4

2012-06-08 Thread Yongwei Xing
Hello

I am trying to compile the hadoop native library on mac os.

My Mac OS X is 10.7.4. My Hadoop is 1.0.3

I have installed the zlib 1.2.7 and lzo 2.0.6 like below:

./configure -shared --prefix=/usr/local/[zlib/lzo]

make

make install


I check the /usr/local/zlib-1.2.7 and /usr/local/lzo-2.0.6, the header
files and libraries are there.

I change the .bash_profile like below

export
C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/zlib-1.2.7/include:/usr/local/lzo-2.06/include

export
LIBRARY_PATH=$LIBRARY_PATH:/usr/local/zlib-1.2.7/lib:/usr/local/lzo-2.06/lib

export CFLAGS="-arch x86_64"

I switch to hadoop folder and run

ant -Dcompile.native=true compile-native

I got such information like below

[exec] checking stddef.h usability... yes

 [exec] checking stddef.h presence... yes

 [exec] checking for stddef.h... yes

 [exec] checking jni.h usability... yes

 [exec] checking jni.h presence... yes

 [exec] checking for jni.h... yes

 [exec] checking zlib.h usability... yes

 [exec] checking zlib.h presence... yes

 [exec] checking for zlib.h... yes

 [exec] checking Checking for the 'actual' dynamic-library for '-lz'...

 [exec] configure: error: Can't find either 'objdump' or 'ldd' to
compute the dynamic library for '-lz'


BUILD FAILED

Does anyone meet this issue before?

Best Regards,

--


memory usage tasks

2012-06-08 Thread Koert Kuipers
silly question, but i have our hadoop slave boxes configured with 7 mappers
each, yet i see java 14 process for user mapred on each box. and each
process takes up about 2GB, which is equals to my memory allocation
(mapred.child.java.opts=-Xmx2048m). so it is using twice as much memory as
i expected! why is that?


Sync and Data Replication

2012-06-08 Thread Mohit Anchlia
I am wondering the role of sync in replication of data to other nodes. Say
client writes a line to a file in Hadoop, at this point file handle is open
and sync has not been called. In this scenario is data also replicated as
defined by the replication factor to other nodes as well? I am wondering if
at this point if crash occurs do I have data in other nodes?


hbase client security (cluster is secure)

2012-06-08 Thread Tony Dean
Hi all,

I have created a hadoop/hbase/zookeeper cluster that is secured and verified.  
Now a simple test is to connect an hbase client (e.g, shell) to see its 
behavior.

Well, I get the following message on the hbase master: AccessControlException: 
authentication is required.

Looking at the code it appears that the client passed "simple" authentication 
byte in the rpc header.  Why, I don't know?

My client configuration is as follows:

hbase-site.xml:
   
  hbase.security.authentication
  kerberos
   

   
  hbase.rpc.engine
  org.apache.hadoop.hbase.ipc.SecureRpcEngine
   

hbase-env.sh:
export HBASE_OPTS="$HBASE_OPTS 
-Djava.security.auth.login.config=/usr/local/hadoop/hbase/conf/hbase.jaas"

hbase.jaas:
Client {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=false
   useTicketCache=true
 };

I issue kinit for the client I want to use.  Then invoke hbase shell.  I simply 
issue list and see the error on the server.

Any ideas what I am doing wrong?

Thanks so much!


_
From: Tony Dean
Sent: Tuesday, June 05, 2012 5:41 PM
To: common-user@hadoop.apache.org
Subject: hadoop file permission 1.0.3 (security)


Can someone detail the options that are available to set file permissions at 
the hadoop and os level?  Here's what I have discovered thus far:

dfs.permissions  = true|false (works as advertised)
dfs.supergroup = supergroup (works as advertised)
dfs.umaskmode = umask (I believe this should be used in lieu of dfs.umask) - it 
appears to set the permissions for files created in hadoop fs (minus execute 
permission).
why was dffs.umask deprecated?  what's difference between the 2.
dfs.datanode.data.dir.perm = perm (not sure this is working at all?) I thought 
it was supposed to set permission on blks at the os level.

Are there any other file permission configuration properties?

What I would really like to do is set data blk file permissions at the os level 
so that the blocks can be locked down from all users except super and 
supergroup, but allow it to be used accessed by hadoop API as specified by hdfs 
permissions.  Is this possible?

Thanks.


Tony Dean
SAS Institute Inc.
Senior Software Developer
919-531-6704

 << OLE Object: Picture (Device Independent Bitmap) >>





Re: decommissioning datanodes

2012-06-08 Thread Chris Grier
Thanks, this seems to work now.

Note that the parameter is 'dfs.hosts' instead of 'dfs.hosts.include'.
(Also, the normal caveats like hostnames are case sensitive).

-Chris

On Fri, Jun 8, 2012 at 12:19 PM, Serge Blazhiyevskyy <
serge.blazhiyevs...@nice.com> wrote:

> Your config should be something like this:
>
> >
> >dfs.hosts.exclude
> >/opt/hadoop/hadoop-1.0.0/conf/exclude
> >
>
> >
> >dfs.hosts.include
> >/opt/hadoop/hadoop-1.0.0/conf/include
> >
>
>
>
> >
> >Add to exclude file:
> >
> >host1
> >host2
> >
>
>
>
> Add to include file
> >host1
> >host2
> Plus the rest of the nodes
>
>
>
>
> On 6/8/12 12:15 PM, "Chris Grier"  wrote:
>
> >Do you mean the file specified by the 'dfs.hosts' parameter? That is not
> >currently set in my configuration (the hosts are only specified in the
> >slaves file).
> >
> >-Chris
> >
> >On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy <
> >serge.blazhiyevs...@nice.com> wrote:
> >
> >> Your nodes need to be in include and exclude file in the same time
> >>
> >>
> >> Do you use both files?
> >>
> >> On 6/8/12 11:46 AM, "Chris Grier"  wrote:
> >>
> >> >Hello,
> >> >
> >> >I'm in the trying to figure out how to decommission data nodes. Here's
> >> >what
> >> >I do:
> >> >
> >> >In hdfs-site.xml I have:
> >> >
> >> >
> >> >dfs.hosts.exclude
> >> >/opt/hadoop/hadoop-1.0.0/conf/exclude
> >> >
> >> >
> >> >Add to exclude file:
> >> >
> >> >host1
> >> >host2
> >> >
> >> >Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the
> >>two
> >> >nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
> >> >nothing in the Decommissioning Nodes list). If I look at the datanode
> >>logs
> >> >running on host1 or host2, I still see blocks being copied in and it
> >>does
> >> >not appear that any additional replication was happening.
> >> >
> >> >What am I missing during the decommission process?
> >> >
> >> >-Chris
> >>
> >>
>
>


Re: decommissioning datanodes

2012-06-08 Thread Serge Blazhiyevskyy
Your config should be something like this:

>
>dfs.hosts.exclude
>/opt/hadoop/hadoop-1.0.0/conf/exclude
>

>
>dfs.hosts.include
>/opt/hadoop/hadoop-1.0.0/conf/include
>



>
>Add to exclude file:
>
>host1
>host2
>



Add to include file
>host1
>host2
Plus the rest of the nodes




On 6/8/12 12:15 PM, "Chris Grier"  wrote:

>Do you mean the file specified by the 'dfs.hosts' parameter? That is not
>currently set in my configuration (the hosts are only specified in the
>slaves file).
>
>-Chris
>
>On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy <
>serge.blazhiyevs...@nice.com> wrote:
>
>> Your nodes need to be in include and exclude file in the same time
>>
>>
>> Do you use both files?
>>
>> On 6/8/12 11:46 AM, "Chris Grier"  wrote:
>>
>> >Hello,
>> >
>> >I'm in the trying to figure out how to decommission data nodes. Here's
>> >what
>> >I do:
>> >
>> >In hdfs-site.xml I have:
>> >
>> >
>> >dfs.hosts.exclude
>> >/opt/hadoop/hadoop-1.0.0/conf/exclude
>> >
>> >
>> >Add to exclude file:
>> >
>> >host1
>> >host2
>> >
>> >Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the
>>two
>> >nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
>> >nothing in the Decommissioning Nodes list). If I look at the datanode
>>logs
>> >running on host1 or host2, I still see blocks being copied in and it
>>does
>> >not appear that any additional replication was happening.
>> >
>> >What am I missing during the decommission process?
>> >
>> >-Chris
>>
>>



Re: decommissioning datanodes

2012-06-08 Thread Chris Grier
Do you mean the file specified by the 'dfs.hosts' parameter? That is not
currently set in my configuration (the hosts are only specified in the
slaves file).

-Chris

On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy <
serge.blazhiyevs...@nice.com> wrote:

> Your nodes need to be in include and exclude file in the same time
>
>
> Do you use both files?
>
> On 6/8/12 11:46 AM, "Chris Grier"  wrote:
>
> >Hello,
> >
> >I'm in the trying to figure out how to decommission data nodes. Here's
> >what
> >I do:
> >
> >In hdfs-site.xml I have:
> >
> >
> >dfs.hosts.exclude
> >/opt/hadoop/hadoop-1.0.0/conf/exclude
> >
> >
> >Add to exclude file:
> >
> >host1
> >host2
> >
> >Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two
> >nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
> >nothing in the Decommissioning Nodes list). If I look at the datanode logs
> >running on host1 or host2, I still see blocks being copied in and it does
> >not appear that any additional replication was happening.
> >
> >What am I missing during the decommission process?
> >
> >-Chris
>
>


Re: decommissioning datanodes

2012-06-08 Thread Serge Blazhiyevskyy
Your nodes need to be in include and exclude file in the same time


Do you use both files?

On 6/8/12 11:46 AM, "Chris Grier"  wrote:

>Hello,
>
>I'm in the trying to figure out how to decommission data nodes. Here's
>what
>I do:
>
>In hdfs-site.xml I have:
>
>
>dfs.hosts.exclude
>/opt/hadoop/hadoop-1.0.0/conf/exclude
>
>
>Add to exclude file:
>
>host1
>host2
>
>Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two
>nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
>nothing in the Decommissioning Nodes list). If I look at the datanode logs
>running on host1 or host2, I still see blocks being copied in and it does
>not appear that any additional replication was happening.
>
>What am I missing during the decommission process?
>
>-Chris



decommissioning datanodes

2012-06-08 Thread Chris Grier
Hello,

I'm in the trying to figure out how to decommission data nodes. Here's what
I do:

In hdfs-site.xml I have:


dfs.hosts.exclude
/opt/hadoop/hadoop-1.0.0/conf/exclude


Add to exclude file:

host1
host2

Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two
nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
nothing in the Decommissioning Nodes list). If I look at the datanode logs
running on host1 or host2, I still see blocks being copied in and it does
not appear that any additional replication was happening.

What am I missing during the decommission process?

-Chris


Re: Hadoop-Git-Eclipse

2012-06-08 Thread shashwat shriparv
Check out these thread :

http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/22976
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201012.mbox/%3c4cff292d.3090...@corp.mail.ru%3E


On Fri, Jun 8, 2012 at 6:24 PM, Prajakta Kalmegh  wrote:

> Hi
>
> Yes I did configure using the wiki link at
> http://wiki.apache.org/hadoop/EclipseEnvironment.
> I am facing a new problem while setting up Hadoop in Psuedo-distributed
> mode on my laptop.  I am trying to execute the following commands for
> setting up Hadoop:
> hdfs namenode -format
> hdfs namenode
> hdfs datanode
> yarn resourcemanager
> yarn nodemanager
>
> It gives me a "Hadoop Common not found." error for all the commands. When I
> try to use "hadoop namenode -format" instead, it gives me a deprecated
> command warning.
>
> I am following the instructions for setting up Hadoop with Eclipse given in
> - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
> -
>
> http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html
>
> This issue is discussed in JIRA <
> https://issues.apache.org/jira/browse/HDFS-2014 > and is resolved. Not
> sure
> why I am getting the error.
>
> My environment variables look something like:
>
> HADOOP_COMMON_HOME=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT
>
> HADOOP_CONF_DIR=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT/etc/hadoop
>
> HADOOP_HDFS_HOME=/home/Projects/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-3.0.0-SNAPSHOT
>
> HADOOP_MAPRED_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/target/hadoop-mapreduce-3.0.0-SNAPSHOT
>
> YARN_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/target/hadoop-yarn-common-3.0.0-SNAPSHOT
>
> YARN_CONF_DIR=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/conf
>
> I have included them in the PATH. I am trying to build and setup from
> apache-hadoop-common git repository (my own cloned fork). Any idea why
> 'Hadoop Common Not found' error is coming? Do I have to add anything to the
> hadoop-config.sh or hdfs-config.sh?
>
> Regards,
> Prajakta
>
>
>
>
>
> Deniz Demir 
> 06/08/2012 05:35 PM
> Please respond to
> common-user@hadoop.apache.org
>  To
> common-user@hadoop.apache.org,
>  cc
>  Subject
>  Re: Hadoop-Git-Eclipse
>
>
> I did not find that screencast useful. This one worked for me:
>
> http://wiki.apache.org/hadoop/EclipseEnvironment
>
> Best,
> Deniz
>
> On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:
>
> > Check out this link:
> >
>
> http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
> >
> > Regards
> >
> > ∞
> > Shashwat Shriparv
> >
> >
> >
> >
> > On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh  >wrote:
> >
> >> Hi
> >>
> >> I have done MapReduce programming using Eclipse before but now I need to
> >> learn the Hadoop code internals for one of my projects.
> >>
> >> I have forked Hadoop from github (
> https://github.com/apache/hadoop-common
> >> ) and need to configure it to work with Eclipse. All the links I could
> >> find list steps for earlier versions of Hadoop. I am right now following
> >> instructions given in these links:
> >> - http://wiki.apache.org/hadoop/GitAndHadoop
> >> - http://wiki.apache.org/hadoop/EclipseEnvironment
> >> - http://wiki.apache.org/hadoop/HowToContribute
> >>
> >> Can someone please give me a link to the steps to be followed for
> getting
> >> Hadoop (latest from trunk) started in Eclipse? I need to be able to
> commit
> >> changes to my forked repository on github.
> >>
> >> Thanks in advance.
> >> Regards,
> >> Prajakta
> >
> >
> >
> >
> > --
> >
> >
> > ∞
> > Shashwat Shriparv
>



-- 


∞
Shashwat Shriparv


Re: Hadoop-Git-Eclipse

2012-06-08 Thread Prajakta Kalmegh
Hi

Yes I did configure using the wiki link at
http://wiki.apache.org/hadoop/EclipseEnvironment.
I am facing a new problem while setting up Hadoop in Psuedo-distributed
mode on my laptop.  I am trying to execute the following commands for
setting up Hadoop:
hdfs namenode -format
hdfs namenode
hdfs datanode
yarn resourcemanager
yarn nodemanager

It gives me a "Hadoop Common not found." error for all the commands. When I
try to use "hadoop namenode -format" instead, it gives me a deprecated
command warning.

I am following the instructions for setting up Hadoop with Eclipse given in
- http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
-
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

This issue is discussed in JIRA <
https://issues.apache.org/jira/browse/HDFS-2014 > and is resolved. Not sure
why I am getting the error.

My environment variables look something like:
HADOOP_COMMON_HOME=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT
HADOOP_CONF_DIR=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT/etc/hadoop
HADOOP_HDFS_HOME=/home/Projects/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-3.0.0-SNAPSHOT
HADOOP_MAPRED_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/target/hadoop-mapreduce-3.0.0-SNAPSHOT
YARN_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/target/hadoop-yarn-common-3.0.0-SNAPSHOT
YARN_CONF_DIR=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/conf

I have included them in the PATH. I am trying to build and setup from
apache-hadoop-common git repository (my own cloned fork). Any idea why
'Hadoop Common Not found' error is coming? Do I have to add anything to the
hadoop-config.sh or hdfs-config.sh?

Regards,
Prajakta





Deniz Demir 
06/08/2012 05:35 PM
Please respond to
common-user@hadoop.apache.org
 To
common-user@hadoop.apache.org,
 cc
 Subject
 Re: Hadoop-Git-Eclipse


I did not find that screencast useful. This one worked for me:

http://wiki.apache.org/hadoop/EclipseEnvironment

Best,
Deniz

On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:

> Check out this link:
>
http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
>
> Regards
>
> ∞
> Shashwat Shriparv
>
>
>
>
> On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh wrote:
>
>> Hi
>>
>> I have done MapReduce programming using Eclipse before but now I need to
>> learn the Hadoop code internals for one of my projects.
>>
>> I have forked Hadoop from github (https://github.com/apache/hadoop-common
>> ) and need to configure it to work with Eclipse. All the links I could
>> find list steps for earlier versions of Hadoop. I am right now following
>> instructions given in these links:
>> - http://wiki.apache.org/hadoop/GitAndHadoop
>> - http://wiki.apache.org/hadoop/EclipseEnvironment
>> - http://wiki.apache.org/hadoop/HowToContribute
>>
>> Can someone please give me a link to the steps to be followed for getting
>> Hadoop (latest from trunk) started in Eclipse? I need to be able to
commit
>> changes to my forked repository on github.
>>
>> Thanks in advance.
>> Regards,
>> Prajakta
>
>
>
>
> --
>
>
> ∞
> Shashwat Shriparv


Re: Hadoop-Git-Eclipse

2012-06-08 Thread Deniz Demir
I did not find that screencast useful. This one worked for me:

http://wiki.apache.org/hadoop/EclipseEnvironment

Best,
Deniz

On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:

> Check out this link:
> http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
> 
> Regards
> 
> ∞
> Shashwat Shriparv
> 
> 
> 
> 
> On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh wrote:
> 
>> Hi
>> 
>> I have done MapReduce programming using Eclipse before but now I need to
>> learn the Hadoop code internals for one of my projects.
>> 
>> I have forked Hadoop from github (https://github.com/apache/hadoop-common
>> ) and need to configure it to work with Eclipse. All the links I could
>> find list steps for earlier versions of Hadoop. I am right now following
>> instructions given in these links:
>> - http://wiki.apache.org/hadoop/GitAndHadoop
>> - http://wiki.apache.org/hadoop/EclipseEnvironment
>> - http://wiki.apache.org/hadoop/HowToContribute
>> 
>> Can someone please give me a link to the steps to be followed for getting
>> Hadoop (latest from trunk) started in Eclipse? I need to be able to commit
>> changes to my forked repository on github.
>> 
>> Thanks in advance.
>> Regards,
>> Prajakta
> 
> 
> 
> 
> -- 
> 
> 
> ∞
> Shashwat Shriparv



RE: InvalidJobConfException

2012-06-08 Thread Devaraj k
By default it uses the TextOutputFomat(subclass of FileOutputFormat) which 
checks for output path. 

You can use NullOuputFormat or your custom output format which doesn't do any 
thing for your job.



Thanks
Devaraj


From: huanchen.zhang [huanchen.zh...@ipinyou.com]
Sent: Friday, June 08, 2012 4:16 PM
To: common-user
Subject: InvalidJobConfException

Hi,

Here I'm developing a MapReduce web crawler which reads url lists and writes 
html to MongoDB.
So, each map read one url list file, get the html and insert to MongoDB. There 
is no reduce and no output of map. So, how to set the output directory in this 
case? If I do not set the output directory, it gives me following exception,

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: 
Output directory not set.
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


Thank you !

Best,
Huanchen


2012-06-08



huanchen.zhang


Re: InvalidJobConfException

2012-06-08 Thread Harsh J
Hi Huanchen,

Just set your output format class to NullOutputFormat
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.html
if you don't need any direct outputs to HDFS/etc. from your M/R
classes.

On Fri, Jun 8, 2012 at 4:16 PM, huanchen.zhang
 wrote:
> Hi,
>
> Here I'm developing a MapReduce web crawler which reads url lists and writes 
> html to MongoDB.
> So, each map read one url list file, get the html and insert to MongoDB. 
> There is no reduce and no output of map. So, how to set the output directory 
> in this case? If I do not set the output directory, it gives me following 
> exception,
>
> Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: 
> Output directory not set.
>        at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>        at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
>        at 
> com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>
> Thank you !
>
> Best,
> Huanchen
>
>
> 2012-06-08
>
>
>
> huanchen.zhang



-- 
Harsh J


InvalidJobConfException

2012-06-08 Thread huanchen.zhang
Hi,

Here I'm developing a MapReduce web crawler which reads url lists and writes 
html to MongoDB.
So, each map read one url list file, get the html and insert to MongoDB. There 
is no reduce and no output of map. So, how to set the output directory in this 
case? If I do not set the output directory, it gives me following exception,

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: 
Output directory not set.
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


Thank you ! 

Best,
Huanchen
  

2012-06-08 



huanchen.zhang 


Re: Hadoop command not found:hdfs and yarn

2012-06-08 Thread Jagat Singh
Hello ,

Can you quickly review your hadoop install with below page may be you get
some hints to install.

http://jugnu-life.blogspot.in/2012/05/hadoop-20-install-tutorial-023x.html

The depreciated warning is correct as hadoop jobs have been divided now.

Regards,

Jagat Singh

On Fri, Jun 8, 2012 at 2:56 PM, Prajakta Kalmegh wrote:

> Hi
>
> I am trying to execute the following commands for setting up Hadoop:
> # Format the namenode
> hdfs namenode -format
> # Start the namenode
> hdfs namenode
> # Start a datanode
> hdfs datanode
>
> yarn resourcemanager
> yarn nodemanager
>
> It gives me a "Hadoop Command not found." error for all the commands. When
> I try to use "hadoop namenode -format" instead, it gives me a deprecated
> command warning. Can someone please tell me if I am missing including any
> env variables? I have included HADOOP_COMMON_HOME, HADOOP_HDFS_HOME,
> HADOOP_MAPRED_HOME, YARN_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR,
> HADOOP_PREFIX in my path (apart from java etc).
>
> I am following the instructions for setting up Hadoop with Eclipse given
> in
> - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
> -
>
> http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html
>
> Regards,
> Prajakta
>
>


AUTO: Prabhat Pandey is out of the office (returning 06/28/2012)

2012-06-08 Thread Prabhat Pandey


I am out of the office until 06/28/2012.

I am out of the office until 06/28/2012.
For any issues please contact Dispatcher: dbqor...@us.ibm.com
Thanks.

Prabhat Pandey


Note: This is an automated response to your message  "Nutch hadoop
integration" sent on 06/08/2012 1:59:22.

This is the only notification you will receive while this person is away.

Re: Nutch hadoop integration

2012-06-08 Thread abhishek tiwari
http://wiki.apache.org/nutch/NutchHadoopTutorial

above tutorial is not working for me ..
i am using nutch 1.4 .. can u give the steps.. what property i have to set
in nutch-site.xml

On Fri, Jun 8, 2012 at 1:34 PM, shashwat shriparv  wrote:

> Check out these links :
>
> http://wiki.apache.org/nutch/NutchHadoopTutorial
>
> http://wiki.apache.org/nutch/NutchTutorial
> http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/
>
> http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster
>
> Regards
>
> ∞
> Shashwat Shriparv
>
> On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari <
> abhishektiwari.u...@gmail.com> wrote:
>
> > how can i integrate hadood and nutch ..anyone please brief me .
> >
>
>
>
> --
>
>
> ∞
> Shashwat Shriparv
>


Hadoop command not found:hdfs and yarn

2012-06-08 Thread Prajakta Kalmegh
Hi

I am trying to execute the following commands for setting up Hadoop:
# Format the namenode
hdfs namenode -format
# Start the namenode
hdfs namenode
# Start a datanode
hdfs datanode

yarn resourcemanager
yarn nodemanager

It gives me a "Hadoop Command not found." error for all the commands. When 
I try to use "hadoop namenode -format" instead, it gives me a deprecated 
command warning. Can someone please tell me if I am missing including any 
env variables? I have included HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, 
HADOOP_MAPRED_HOME, YARN_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR, 
HADOOP_PREFIX in my path (apart from java etc).

I am following the instructions for setting up Hadoop with Eclipse given 
in 
- http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
- 
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

Regards,
Prajakta



Re: Hadoop-Git-Eclipse

2012-06-08 Thread shashwat shriparv
Check out this link:
http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/

Regards

∞
Shashwat Shriparv




On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh wrote:

> Hi
>
> I have done MapReduce programming using Eclipse before but now I need to
> learn the Hadoop code internals for one of my projects.
>
> I have forked Hadoop from github (https://github.com/apache/hadoop-common
> ) and need to configure it to work with Eclipse. All the links I could
> find list steps for earlier versions of Hadoop. I am right now following
> instructions given in these links:
> - http://wiki.apache.org/hadoop/GitAndHadoop
> - http://wiki.apache.org/hadoop/EclipseEnvironment
> - http://wiki.apache.org/hadoop/HowToContribute
>
> Can someone please give me a link to the steps to be followed for getting
> Hadoop (latest from trunk) started in Eclipse? I need to be able to commit
> changes to my forked repository on github.
>
> Thanks in advance.
> Regards,
> Prajakta




-- 


∞
Shashwat Shriparv


Re: Nutch hadoop integration

2012-06-08 Thread shashwat shriparv
Check out these links :

http://wiki.apache.org/nutch/NutchHadoopTutorial

http://wiki.apache.org/nutch/NutchTutorial
http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/
http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster

Regards

∞
Shashwat Shriparv

On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari <
abhishektiwari.u...@gmail.com> wrote:

> how can i integrate hadood and nutch ..anyone please brief me .
>



-- 


∞
Shashwat Shriparv


Hadoop-Git-Eclipse

2012-06-08 Thread Prajakta Kalmegh
Hi 

I have done MapReduce programming using Eclipse before but now I need to 
learn the Hadoop code internals for one of my projects. 

I have forked Hadoop from github (https://github.com/apache/hadoop-common 
) and need to configure it to work with Eclipse. All the links I could 
find list steps for earlier versions of Hadoop. I am right now following 
instructions given in these links:
- http://wiki.apache.org/hadoop/GitAndHadoop 
- http://wiki.apache.org/hadoop/EclipseEnvironment 
- http://wiki.apache.org/hadoop/HowToContribute 

Can someone please give me a link to the steps to be followed for getting 
Hadoop (latest from trunk) started in Eclipse? I need to be able to commit 
changes to my forked repository on github. 

Thanks in advance.
Regards,
Prajakta

Re: Nutch hadoop integration

2012-06-08 Thread Biju Balakrishnan
> how can i integrate hadood and nutch ..anyone please brief me .
>

Just configure hadoop cluster.
Configure nutch path to store the nuth crawl index and crawl list to hdfs.
Thats it.

-- 
*Biju*


Re: Nutch hadoop integration

2012-06-08 Thread Nitin Pawar
may be this will help you if you have not already checked it

http://wiki.apache.org/nutch/NutchHadoopTutorial

On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari <
abhishektiwari.u...@gmail.com> wrote:

> how can i integrate hadood and nutch ..anyone please brief me .
>



-- 
Nitin Pawar