Re: HBase is able to connect to ZooKeeper but the connection closes immediately

2012-06-07 Thread Manu S
Hi All,

Thank you for your reply.

I tried all these options but still I am facing this issue.

@Mayank: I tried the same, but still getting error.
export
HADOOP_CLASSPATH=/usr/lib/hadoop/:/usr/lib/hadoop/lib/:/usr/lib/hadoop/conf/
export
HBASE_CLASSPATH=/usr/lib/hbase/:/usr/lib/hbase/lib/:/usr/lib/hbase/conf/:/usr/lib/zookeeper/:/usr/lib/zookeeper/conf/:/usr/lib/zookeeper/lib/
export CLASSPATH=${HADOOP_CLASSPATH}:${HBASE_CLASSPATH}

@Marcos  Tariq:
We are using Hbase version 0.90.4
Job creating single HBaseConfiguration object only

@Kevin:
No luck, same error


Thanks,
Manu S

On Thu, Jun 7, 2012 at 3:50 AM, Mayank Bansal may...@apache.org wrote:


 zookeeper conf is not on the class path for the mapreduce job. Add conf
 file to class path for the job.

 Thanks,
 Mayank


 On Wed, Jun 6, 2012 at 7:25 AM, Manu S manupk...@gmail.com wrote:

 Hi All,

 We are running a mapreduce job in a fully distributed cluster.The output
 of the job is writing to HBase.

 While running this job we are getting an error:

 *Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is 
 able to connect to ZooKeeper but the connection closes immediately. This 
 could be a sign that the server has too many connections (30 is the 
 default). Consider inspecting your ZK server logs for that error and then 
 make sure you are reusing HBaseConfiguration as often as you can. See 
 HTable's javadoc for more information.*
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:155)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1002)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:304)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:295)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:157)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:169)
 at 
 org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)


 I had gone through some threads related to this issue and I modified the
 *zoo.cfg* accordingly. These configurations are same in all the nodes.
 Please find the configuration of HBase  ZooKeeper:

 Hbase-site.xml:

 configuration

 property
 namehbase.cluster.distributed/name
 valuetrue/value
 /property

 property
 namehbase.rootdir/name
 valuehdfs://namenode/hbase/value
 /property

 property
 namehbase.zookeeper.quorum/name
 valuenamenode/value
 /property

 /configuration


 Zoo.cfg:

 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial
 # synchronization phase can take
 initLimit=10
 # The number of ticks that can pass between
 # sending a request and getting an acknowledgement
 syncLimit=5
 # the directory where the snapshot is stored.
 dataDir=/var/zookeeper
 # the port at which the clients will connect
 clientPort=2181
 #server.0=localhost:2888:3888
 server.0=namenode:2888:3888

 # Max Client connections ###
 *maxClientCnxns=1000
 minSessionTimeout=4000
 maxSessionTimeout=4*


 It would be really great if anyone can help me to resolve this issue by
 giving your thoughts/suggestions.

 Thanks,
 Manu S






java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException

2012-06-07 Thread huanchen.zhang
Hi,

I coded a map reduce program with hadoop java api.

When I submitted the job to the cluster, I got the following errors:

Exception in thread main java.lang.NoClassDefFoundError: 
org/codehaus/jackson/map/JsonMappingException
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:489)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:487)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:475)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:52)
Caused by: java.lang.ClassNotFoundException: 
org.codehaus.jackson.map.JsonMappingException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more


I found the classes not found here is in jackson-core-asl-1.5.2 and 
jackson-mapper-asl-1.5.2, so added these two jars to the project and 
resubmitted the job. But I got the following errors:

Jun 7, 2012 4:18:55 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
Jun 7, 2012 4:18:55 PM org.apache.hadoop.util.NativeCodeLoader clinit
WARNING: Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient copyAndConfigureFiles
WARNING: Use GenericOptionsParser for parsing the arguments. Applications 
should implement Tool for the same.
Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient$2 run
INFO: Cleaning up the staging area 
file:/tmp/hadoop-huanchen/mapred/staging/huanchen757608919/.staging/job_local_0001
Exception in thread main 
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: file:/data/huanchen/pagecrawler/url
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:51)


Note that the error is Input path does not exist: file:/ instead of  Input 
path does not exist: hdfs:/ . So does it mean the job does not successfully 
connect to the hadoop cluster? The first NoClassDefFoundError: 
org/codehaus/jackson/map/JsonMappingException error is also for this reason?

Any one has any ideas?

Thank you !


Best,
Huanchen

2012-06-07 



huanchen.zhang 


Re: HBase is able to connect to ZooKeeper but the connection closes immediately

2012-06-07 Thread Manu S
Hi Tariq,

Thank you!!
I already changed the maxClientCnxns to 1000.
Also we have set CLASSPATH that includes all the Hadoop,HBase  Zookeper
path's. I think copying hadoop .jar files to Hbase lib folder is the same
affect of setting CLASSPATH with all the folders.
There is no commons-configuration-*.jar inside hadoop/lib folder.

Any other options?

Thanks,
Manu S

On Thu, Jun 7, 2012 at 1:31 PM, Mohammad Tariq donta...@gmail.com wrote:

 Actually zookeeper servers have an active connections limit, which by
 default is 30. You can increase this limit by setting maxClientCnxns
 property accordingly in your zookeeper config file, zoo.cfg. For
 example - maxClientCnxns=100but before that copy the
 hadoop-core-*.jar present inside hadoop folder to the hbase/lib
 folder.Also copy commons-configuration-1.6.jar from hadoop/lib folder
 to hbase/lib folder and check it once and see if it works for you.

 Regards,
 Mohammad Tariq


 On Thu, Jun 7, 2012 at 1:13 PM, Manu S manupk...@gmail.com wrote:
  Hi All,
 
  Thank you for your reply.
 
  I tried all these options but still I am facing this issue.
 
  @Mayank: I tried the same, but still getting error.
  export
 
 HADOOP_CLASSPATH=/usr/lib/hadoop/:/usr/lib/hadoop/lib/:/usr/lib/hadoop/conf/
  export
 
 HBASE_CLASSPATH=/usr/lib/hbase/:/usr/lib/hbase/lib/:/usr/lib/hbase/conf/:/usr/lib/zookeeper/:/usr/lib/zookeeper/conf/:/usr/lib/zookeeper/lib/
  export CLASSPATH=${HADOOP_CLASSPATH}:${HBASE_CLASSPATH}
 
  @Marcos  Tariq:
  We are using Hbase version 0.90.4
  Job creating single HBaseConfiguration object only
 
  @Kevin:
  No luck, same error
 
 
  Thanks,
  Manu S
 
  On Thu, Jun 7, 2012 at 3:50 AM, Mayank Bansal may...@apache.org wrote:
 
 
  zookeeper conf is not on the class path for the mapreduce job. Add conf
  file to class path for the job.
 
  Thanks,
  Mayank
 
 
  On Wed, Jun 6, 2012 at 7:25 AM, Manu S manupk...@gmail.com wrote:
 
  Hi All,
 
  We are running a mapreduce job in a fully distributed cluster.The
 output
  of the job is writing to HBase.
 
  While running this job we are getting an error:
 
  *Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException:
 HBase is able to connect to ZooKeeper but the connection closes
 immediately. This could be a sign that the server has too many connections
 (30 is the default). Consider inspecting your ZK server logs for that error
 and then make sure you are reusing HBaseConfiguration as often as you can.
 See HTable's javadoc for more information.*
  at
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:155)
  at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1002)
  at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:304)
  at
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:295)
  at
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:157)
  at org.apache.hadoop.hbase.client.HTable.init(HTable.java:169)
  at
 org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36)
 
 
  I had gone through some threads related to this issue and I modified
 the
  *zoo.cfg* accordingly. These configurations are same in all the nodes.
  Please find the configuration of HBase  ZooKeeper:
 
  Hbase-site.xml:
 
  configuration
 
  property
  namehbase.cluster.distributed/name
  valuetrue/value
  /property
 
  property
  namehbase.rootdir/name
  valuehdfs://namenode/hbase/value
  /property
 
  property
  namehbase.zookeeper.quorum/name
  valuenamenode/value
  /property
 
  /configuration
 
 
  Zoo.cfg:
 
  # The number of milliseconds of each tick
  tickTime=2000
  # The number of ticks that the initial
  # synchronization phase can take
  initLimit=10
  # The number of ticks that can pass between
  # sending a request and getting an acknowledgement
  syncLimit=5
  # the directory where the snapshot is stored.
  dataDir=/var/zookeeper
  # the port at which the clients will connect
  clientPort=2181
  #server.0=localhost:2888:3888
  server.0=namenode:2888:3888
 
  # Max Client connections ###
  *maxClientCnxns=1000
  minSessionTimeout=4000
  maxSessionTimeout=4*
 
 
  It would be really great if anyone can help me to resolve this issue
 by
  giving your thoughts/suggestions.
 
  Thanks,
  Manu S
 
 
 
 



kerberos mapreduce question

2012-06-07 Thread Koert Kuipers
with kerberos enabled a mapreduce job runs as the user that submitted it.
does this mean the user that submitted the job needs to have linux accounts
on all machines on the cluster?

how does mapreduce do this (run jobs as the user)? do the tasktrackers use
secure impersonation to run-as the user?

thanks! koert


Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain

2012-06-07 Thread shashwat shriparv
Are you able ping to

yourpcipaddress
domainnameyougaveformachine
hostnameofthemachine


Hbase stops means its not able to start itself on the ip or hostname which
you are giving.


On Thu, Jun 7, 2012 at 2:48 PM, Manu S manupk...@gmail.com wrote:

 Hi All,

 In pseudo distributed node HBaseMaster is stopping automatically when we
 starts HbaseRegion.

 I have changed all the configuration files of Hadoop,Hbase  Zookeeper to
 set the exact hostname of the machine. Also commented the localhost entry
 from /etc/hosts  cleared the cache as well. There is no entry of
 localhost.localdomain entry in these configurations, but this it is
 resolving to localhost.localdomain.

 Please find the error:
 2012-06-07 12:13:11,995 INFO
 org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split
 *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress:
 Could not resolve the DNS name of localhost.localdomain
 2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster:
 Unhandled exception. Starting shutdown.*
 *java.lang.IllegalArgumentException: hostname can't be null*
at java.net.InetSocketAddress.init(InetSocketAddress.java:121)
at

 org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
at
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64)
at

 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
at

 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
at

 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222)
at

 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240)
at

 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487)
at
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455)
at

 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
 2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster:
 Aborting
 2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster:
 Stopping service threads

 Thanks,
 Manu S




-- 


∞
Shashwat Shriparv


Re: Ideal file size

2012-06-07 Thread M. C. Srivas
On Wed, Jun 6, 2012 at 10:14 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 On Wed, Jun 6, 2012 at 9:48 AM, M. C. Srivas mcsri...@gmail.com wrote:

  Many factors to consider than just the size of the file.  . How long can
  you wait before you *have to* process the data?  5 minutes? 5 hours? 5
  days?  If you want good timeliness, you need to roll-over faster.  The
  longer you wait:
 
  1.  the lesser the load on the NN.
  2.  but the poorer the timeliness
  3.  and the larger chance of lost data  (ie, the data is not saved until
  the file is closed and rolled over, unless you want to sync() after every
  write)
 
  To Begin with I was going to use Flume and specify rollover file size. I
 understand the above parameters, I just want to ensure that too many small
 files doesn't cause problem on the NameNode. For instance there would be
 times when we get GBs of data in an hour and at times only few 100 MB. From
 what Harsh, Edward and you've described it doesn't cause issues with the
 NameNode but rather increase in processing times if there are too many
 small files. Looks like I need to find that balance.

 It would also be interesting to see how others solve this problem when not
 using Flume.



They use NFS with MapR.

Any and all log-rotators (like the one in log4j) simply just work over NFS,
and MapR does not have a NN, so the problems with small files or number of
files do not exist.





 
 
  On Wed, Jun 6, 2012 at 7:00 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   We have continuous flow of data into the sequence file. I am wondering
  what
   would be the ideal file size before file gets rolled over. I know too
  many
   small files are not good but could someone tell me what would be the
  ideal
   size such that it doesn't overload NameNode.
  
 



ArrayIndexOutOfBounds in TestFSMainOperationsLocalFileSystem

2012-06-07 Thread Amith D K
Hi



Currently I am using Hadoop 2.0.1, when I run the 
TestFSMainOperationsLocalFileSystem test class I am getting



org.apache.hadoop.fs.viewfs.TestFSMainOperationsLocalFileSystem.testWDAbsolute
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:237)
at 
org.apache.hadoop.fs.viewfs.InodeTree.init(InodeTree.java:334)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem$1.init(ViewFileSystem.java:178)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:178)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2150)
at 
org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2184)
at 
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2166)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystemTestSetup.setupForViewFileSystem(ViewFileSystemTestSetup.java:64)
at 
org.apache.hadoop.fs.viewfs.TestFSMainOperationsLocalFileSystem.setUp(TestFSMainOperationsLocalFileSystem.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at 
org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)

Standard Output
2012-06-07 15:11:09,325 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-06-07 15:11:09,328 INFO  mortbay.log (Slf4jLog.java:info(67)) - Home dir 
base /

I am running the tests in SuSE 11,



can anyone please tell me what could be the problem



Thanks and Regards

Amith


Re: kerberos mapreduce question

2012-06-07 Thread Mapred Learn
Yes, User submitting a job needs to have an account on all the nodes.

Sent from my iPhone

On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote:

 with kerberos enabled a mapreduce job runs as the user that submitted it.
 does this mean the user that submitted the job needs to have linux accounts
 on all machines on the cluster?
 
 how does mapreduce do this (run jobs as the user)? do the tasktrackers use
 secure impersonation to run-as the user?
 
 thanks! koert


Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain

2012-06-07 Thread Manu S
Thank you Harsh  Shashwat

I given the hostname in /etc/sysconfig/network as pseudo-distributed.
hostname command returns this name also. I added this name in /etc/hosts
file and changed all the configuration accordingly. But zookeeper is trying
to resolve to localhost.localdomain. There was no entries in any conf files
or hostname related files for localhost.localdomain.

Yea, everything is pinging as I given the names in /etc/hosts.

On Thu, Jun 7, 2012 at 7:13 PM, shashwat shriparv dwivedishash...@gmail.com
 wrote:

 Are you able ping to

 yourpcipaddress
 domainnameyougaveformachine
 hostnameofthemachine


 Hbase stops means its not able to start itself on the ip or hostname which
 you are giving.


 On Thu, Jun 7, 2012 at 2:48 PM, Manu S manupk...@gmail.com wrote:

  Hi All,
 
  In pseudo distributed node HBaseMaster is stopping automatically when we
  starts HbaseRegion.
 
  I have changed all the configuration files of Hadoop,Hbase  Zookeeper to
  set the exact hostname of the machine. Also commented the localhost entry
  from /etc/hosts  cleared the cache as well. There is no entry of
  localhost.localdomain entry in these configurations, but this it is
  resolving to localhost.localdomain.
 
  Please find the error:
  2012-06-07 12:13:11,995 INFO
  org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split
  *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress:
  Could not resolve the DNS name of localhost.localdomain
  2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster:
  Unhandled exception. Starting shutdown.*
  *java.lang.IllegalArgumentException: hostname can't be null*
 at java.net.InetSocketAddress.init(InetSocketAddress.java:121)
 at
 
 
 org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
 at
  org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64)
 at
 
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
 at
 
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
 at
 
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222)
 at
 
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240)
 at
 
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487)
 at
 
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455)
 at
 
 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
  2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster:
  Aborting
  2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster:
  Stopping service threads
 
  Thanks,
  Manu S
 



 --


 ∞
 Shashwat Shriparv



Re: kerberos mapreduce question

2012-06-07 Thread Koert Kuipers
thanks for your answer.

so at a large place like say yahoo, or facebook, assuming they use
kerberos, every analyst that uses hive has an account on every node of
their large cluster? sounds like an admin nightmare to me

On Thu, Jun 7, 2012 at 10:46 AM, Mapred Learn mapred.le...@gmail.comwrote:

 Yes, User submitting a job needs to have an account on all the nodes.

 Sent from my iPhone

 On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote:

  with kerberos enabled a mapreduce job runs as the user that submitted
 it.
  does this mean the user that submitted the job needs to have linux
 accounts
  on all machines on the cluster?
 
  how does mapreduce do this (run jobs as the user)? do the tasktrackers
 use
  secure impersonation to run-as the user?
 
  thanks! koert



Re: kerberos mapreduce question

2012-06-07 Thread slim tebourbi
Hi,
take a look at this :
http://hadoop.apache.org/common/docs/r1.0.3/Secure_Impersonation.html

I think that it can help you.

Slim Tebourbi.

2012/6/7 Koert Kuipers ko...@tresata.com

 thanks for your answer.

 so at a large place like say yahoo, or facebook, assuming they use
 kerberos, every analyst that uses hive has an account on every node of
 their large cluster? sounds like an admin nightmare to me

 On Thu, Jun 7, 2012 at 10:46 AM, Mapred Learn mapred.le...@gmail.com
 wrote:

  Yes, User submitting a job needs to have an account on all the nodes.
 
  Sent from my iPhone
 
  On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote:
 
   with kerberos enabled a mapreduce job runs as the user that submitted
  it.
   does this mean the user that submitted the job needs to have linux
  accounts
   on all machines on the cluster?
  
   how does mapreduce do this (run jobs as the user)? do the tasktrackers
  use
   secure impersonation to run-as the user?
  
   thanks! koert
 



Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain

2012-06-07 Thread Stack
On Thu, Jun 7, 2012 at 2:18 AM, Manu S manupk...@gmail.com wrote:
 *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress:
 Could not resolve the DNS name of localhost.localdomain

This is pretty basic.  Fix this first and then your hbase will work.

Please stop spraying your queries across multiple lists.  Doing so
makes us think you arrogant which I am sure is not the case.  Pick the
list that seems most appropriate.  For example, in this case, it seems
like the hbase-user list would have been the right place to write; not
common-user and cdh-user.  If it turns out you've chosen wrong,
usually the chosen list will help you figure the proper target.

Thanks,
St.Ack


Re: kerberos mapreduce question

2012-06-07 Thread Alejandro Abdelnur
If you provision your user/group information via LDAP to all your nodes it
is not a nightmare.

On Thu, Jun 7, 2012 at 7:49 AM, Koert Kuipers ko...@tresata.com wrote:

 thanks for your answer.

 so at a large place like say yahoo, or facebook, assuming they use
 kerberos, every analyst that uses hive has an account on every node of
 their large cluster? sounds like an admin nightmare to me

 On Thu, Jun 7, 2012 at 10:46 AM, Mapred Learn mapred.le...@gmail.com
 wrote:

  Yes, User submitting a job needs to have an account on all the nodes.
 
  Sent from my iPhone
 
  On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote:
 
   with kerberos enabled a mapreduce job runs as the user that submitted
  it.
   does this mean the user that submitted the job needs to have linux
  accounts
   on all machines on the cluster?
  
   how does mapreduce do this (run jobs as the user)? do the tasktrackers
  use
   secure impersonation to run-as the user?
  
   thanks! koert
 




-- 
Alejandro


Re: Hadoop Eclipse Plugin set up

2012-06-07 Thread vonmixer
Hi Ali,

You need to go to the VM setting and set the network setting to 'Bridged'. 
Enjoy 
:)






Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain

2012-06-07 Thread shashwat shriparv
Hey manu which linux distribution you are using??

On Thu, Jun 7, 2012 at 8:18 PM, Manu S manupk...@gmail.com wrote:

 Thank you Harsh  Shashwat

 I given the hostname in /etc/sysconfig/network as pseudo-distributed.
 hostname command returns this name also. I added this name in /etc/hosts
 file and changed all the configuration accordingly. But zookeeper is trying
 to resolve to localhost.localdomain. There was no entries in any conf files
 or hostname related files for localhost.localdomain.

 Yea, everything is pinging as I given the names in /etc/hosts.

 On Thu, Jun 7, 2012 at 7:13 PM, shashwat shriparv 
 dwivedishash...@gmail.com
  wrote:

  Are you able ping to
 
  yourpcipaddress
  domainnameyougaveformachine
  hostnameofthemachine
 
 
  Hbase stops means its not able to start itself on the ip or hostname
 which
  you are giving.
 
 
  On Thu, Jun 7, 2012 at 2:48 PM, Manu S manupk...@gmail.com wrote:
 
   Hi All,
  
   In pseudo distributed node HBaseMaster is stopping automatically when
 we
   starts HbaseRegion.
  
   I have changed all the configuration files of Hadoop,Hbase  Zookeeper
 to
   set the exact hostname of the machine. Also commented the localhost
 entry
   from /etc/hosts  cleared the cache as well. There is no entry of
   localhost.localdomain entry in these configurations, but this it is
   resolving to localhost.localdomain.
  
   Please find the error:
   2012-06-07 12:13:11,995 INFO
   org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split
   *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress:
   Could not resolve the DNS name of localhost.localdomain
   2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster:
   Unhandled exception. Starting shutdown.*
   *java.lang.IllegalArgumentException: hostname can't be null*
  at java.net.InetSocketAddress.init(InetSocketAddress.java:121)
  at
  
  
 
 org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108)
  at
   org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64)
  at
  
  
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82)
  at
  
  
 
 org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73)
  at
  
  
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222)
  at
  
  
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240)
  at
  
  
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487)
  at
  
 
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455)
  at
  
  
 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406)
  at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293)
   2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster:
   Aborting
   2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster:
   Stopping service threads
  
   Thanks,
   Manu S
  
 
 
 
  --
 
 
  ∞
  Shashwat Shriparv
 




-- 


∞
Shashwat Shriparv


Re: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException

2012-06-07 Thread shashwat shriparv
If you have

*property*
*  namehadoop.tmp.dir/name*
*
  value../Hadoop/hdfs/tmp/value
  /property

in your configuration file then remove it and try

Thanks and regards

∞
Shashwat Shriparv
*


On Thu, Jun 7, 2012 at 1:56 PM, huanchen.zhang
huanchen.zh...@ipinyou.comwrote:

 Hi,

 I coded a map reduce program with hadoop java api.

 When I submitted the job to the cluster, I got the following errors:

 Exception in thread main java.lang.NoClassDefFoundError:
 org/codehaus/jackson/map/JsonMappingException
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:489)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:487)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:475)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at
 com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:52)
 Caused by: java.lang.ClassNotFoundException:
 org.codehaus.jackson.map.JsonMappingException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more


 I found the classes not found here is in jackson-core-asl-1.5.2 and
 jackson-mapper-asl-1.5.2, so added these two jars to the project and
 resubmitted the job. But I got the following errors:

 Jun 7, 2012 4:18:55 PM org.apache.hadoop.metrics.jvm.JvmMetrics init
 INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
 Jun 7, 2012 4:18:55 PM org.apache.hadoop.util.NativeCodeLoader clinit
 WARNING: Unable to load native-hadoop library for your platform... using
 builtin-java classes where applicable
 Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient
 copyAndConfigureFiles
 WARNING: Use GenericOptionsParser for parsing the arguments. Applications
 should implement Tool for the same.
 Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient$2 run
 INFO: Cleaning up the staging area
 file:/tmp/hadoop-huanchen/mapred/staging/huanchen757608919/.staging/job_local_0001
 Exception in thread main
 org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
 does not exist: file:/data/huanchen/pagecrawler/url
at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
at
 org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at
 com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:51)


 Note that the error is Input path does not exist: file:/ instead of
  Input path does not exist: hdfs:/ . So does it mean the job does not
 successfully connect to the hadoop cluster? The first
 NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException error
 is also for this reason?

 Any one has any ideas?

 Thank you !


 Best,
 Huanchen

 2012-06-07



 huanchen.zhang




-- 


∞
Shashwat Shriparv


Re: namenode restart

2012-06-07 Thread Abhishek Pratap Singh
Hi Rita,

What kind of client you have?? Please be sure that there is no job running
and especially no data is copying while you do the restart.
Also are you restarting by using stop-dfs.sh and start-dfs.sh?

Regards,
Abhishek

On Thu, Jun 7, 2012 at 3:29 AM, Rita rmorgan...@gmail.com wrote:

 Running Hadoop 0.22 and I need to restart the namenode so my new rack
 configuration will be set into place. I am thinking of doing a quick stop
 and start of the namenode but what will happen to the current clients? Can
 they tolerate a 30 second hiccup by retrying?

 --
 --- Get your facts first, then you can distort them as you please.--



Re: Ideal file size

2012-06-07 Thread Abhishek Pratap Singh
Almost all the answers is already provided in this post. My 2 cents... try
to have a file size in multiple of block size so while processing number of
mappers are less and performance of the job is better..
You can also merge file in HDFS later on for processing.

Regards,
Abhishek



On Thu, Jun 7, 2012 at 7:29 AM, M. C. Srivas mcsri...@gmail.com wrote:

 On Wed, Jun 6, 2012 at 10:14 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  On Wed, Jun 6, 2012 at 9:48 AM, M. C. Srivas mcsri...@gmail.com wrote:
 
   Many factors to consider than just the size of the file.  . How long
 can
   you wait before you *have to* process the data?  5 minutes? 5 hours? 5
   days?  If you want good timeliness, you need to roll-over faster.  The
   longer you wait:
  
   1.  the lesser the load on the NN.
   2.  but the poorer the timeliness
   3.  and the larger chance of lost data  (ie, the data is not saved
 until
   the file is closed and rolled over, unless you want to sync() after
 every
   write)
  
   To Begin with I was going to use Flume and specify rollover file size.
 I
  understand the above parameters, I just want to ensure that too many
 small
  files doesn't cause problem on the NameNode. For instance there would be
  times when we get GBs of data in an hour and at times only few 100 MB.
 From
  what Harsh, Edward and you've described it doesn't cause issues with the
  NameNode but rather increase in processing times if there are too many
  small files. Looks like I need to find that balance.
 
  It would also be interesting to see how others solve this problem when
 not
  using Flume.
 


 They use NFS with MapR.

 Any and all log-rotators (like the one in log4j) simply just work over NFS,
 and MapR does not have a NN, so the problems with small files or number of
 files do not exist.



 
 
  
  
   On Wed, Jun 6, 2012 at 7:00 AM, Mohit Anchlia mohitanch...@gmail.com
   wrote:
  
We have continuous flow of data into the sequence file. I am
 wondering
   what
would be the ideal file size before file gets rolled over. I know too
   many
small files are not good but could someone tell me what would be the
   ideal
size such that it doesn't overload NameNode.