Re: HBase is able to connect to ZooKeeper but the connection closes immediately
Hi All, Thank you for your reply. I tried all these options but still I am facing this issue. @Mayank: I tried the same, but still getting error. export HADOOP_CLASSPATH=/usr/lib/hadoop/:/usr/lib/hadoop/lib/:/usr/lib/hadoop/conf/ export HBASE_CLASSPATH=/usr/lib/hbase/:/usr/lib/hbase/lib/:/usr/lib/hbase/conf/:/usr/lib/zookeeper/:/usr/lib/zookeeper/conf/:/usr/lib/zookeeper/lib/ export CLASSPATH=${HADOOP_CLASSPATH}:${HBASE_CLASSPATH} @Marcos Tariq: We are using Hbase version 0.90.4 Job creating single HBaseConfiguration object only @Kevin: No luck, same error Thanks, Manu S On Thu, Jun 7, 2012 at 3:50 AM, Mayank Bansal may...@apache.org wrote: zookeeper conf is not on the class path for the mapreduce job. Add conf file to class path for the job. Thanks, Mayank On Wed, Jun 6, 2012 at 7:25 AM, Manu S manupk...@gmail.com wrote: Hi All, We are running a mapreduce job in a fully distributed cluster.The output of the job is writing to HBase. While running this job we are getting an error: *Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information.* at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:155) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1002) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:304) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:295) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:157) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:169) at org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36) I had gone through some threads related to this issue and I modified the *zoo.cfg* accordingly. These configurations are same in all the nodes. Please find the configuration of HBase ZooKeeper: Hbase-site.xml: configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name valuehdfs://namenode/hbase/value /property property namehbase.zookeeper.quorum/name valuenamenode/value /property /configuration Zoo.cfg: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/var/zookeeper # the port at which the clients will connect clientPort=2181 #server.0=localhost:2888:3888 server.0=namenode:2888:3888 # Max Client connections ### *maxClientCnxns=1000 minSessionTimeout=4000 maxSessionTimeout=4* It would be really great if anyone can help me to resolve this issue by giving your thoughts/suggestions. Thanks, Manu S
java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
Hi, I coded a map reduce program with hadoop java api. When I submitted the job to the cluster, I got the following errors: Exception in thread main java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException at org.apache.hadoop.mapreduce.Job$1.run(Job.java:489) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapreduce.Job.connect(Job.java:487) at org.apache.hadoop.mapreduce.Job.submit(Job.java:475) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:52) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 8 more I found the classes not found here is in jackson-core-asl-1.5.2 and jackson-mapper-asl-1.5.2, so added these two jars to the project and resubmitted the job. But I got the following errors: Jun 7, 2012 4:18:55 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId= Jun 7, 2012 4:18:55 PM org.apache.hadoop.util.NativeCodeLoader clinit WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient copyAndConfigureFiles WARNING: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient$2 run INFO: Cleaning up the staging area file:/tmp/hadoop-huanchen/mapred/staging/huanchen757608919/.staging/job_local_0001 Exception in thread main org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/data/huanchen/pagecrawler/url at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapreduce.Job.submit(Job.java:476) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:51) Note that the error is Input path does not exist: file:/ instead of Input path does not exist: hdfs:/ . So does it mean the job does not successfully connect to the hadoop cluster? The first NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException error is also for this reason? Any one has any ideas? Thank you ! Best, Huanchen 2012-06-07 huanchen.zhang
Re: HBase is able to connect to ZooKeeper but the connection closes immediately
Hi Tariq, Thank you!! I already changed the maxClientCnxns to 1000. Also we have set CLASSPATH that includes all the Hadoop,HBase Zookeper path's. I think copying hadoop .jar files to Hbase lib folder is the same affect of setting CLASSPATH with all the folders. There is no commons-configuration-*.jar inside hadoop/lib folder. Any other options? Thanks, Manu S On Thu, Jun 7, 2012 at 1:31 PM, Mohammad Tariq donta...@gmail.com wrote: Actually zookeeper servers have an active connections limit, which by default is 30. You can increase this limit by setting maxClientCnxns property accordingly in your zookeeper config file, zoo.cfg. For example - maxClientCnxns=100but before that copy the hadoop-core-*.jar present inside hadoop folder to the hbase/lib folder.Also copy commons-configuration-1.6.jar from hadoop/lib folder to hbase/lib folder and check it once and see if it works for you. Regards, Mohammad Tariq On Thu, Jun 7, 2012 at 1:13 PM, Manu S manupk...@gmail.com wrote: Hi All, Thank you for your reply. I tried all these options but still I am facing this issue. @Mayank: I tried the same, but still getting error. export HADOOP_CLASSPATH=/usr/lib/hadoop/:/usr/lib/hadoop/lib/:/usr/lib/hadoop/conf/ export HBASE_CLASSPATH=/usr/lib/hbase/:/usr/lib/hbase/lib/:/usr/lib/hbase/conf/:/usr/lib/zookeeper/:/usr/lib/zookeeper/conf/:/usr/lib/zookeeper/lib/ export CLASSPATH=${HADOOP_CLASSPATH}:${HBASE_CLASSPATH} @Marcos Tariq: We are using Hbase version 0.90.4 Job creating single HBaseConfiguration object only @Kevin: No luck, same error Thanks, Manu S On Thu, Jun 7, 2012 at 3:50 AM, Mayank Bansal may...@apache.org wrote: zookeeper conf is not on the class path for the mapreduce job. Add conf file to class path for the job. Thanks, Mayank On Wed, Jun 6, 2012 at 7:25 AM, Manu S manupk...@gmail.com wrote: Hi All, We are running a mapreduce job in a fully distributed cluster.The output of the job is writing to HBase. While running this job we are getting an error: *Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information.* at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:155) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1002) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:304) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:295) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:157) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:169) at org.apache.hadoop.hbase.client.HTableFactory.createHTableInterface(HTableFactory.java:36) I had gone through some threads related to this issue and I modified the *zoo.cfg* accordingly. These configurations are same in all the nodes. Please find the configuration of HBase ZooKeeper: Hbase-site.xml: configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name valuehdfs://namenode/hbase/value /property property namehbase.zookeeper.quorum/name valuenamenode/value /property /configuration Zoo.cfg: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/var/zookeeper # the port at which the clients will connect clientPort=2181 #server.0=localhost:2888:3888 server.0=namenode:2888:3888 # Max Client connections ### *maxClientCnxns=1000 minSessionTimeout=4000 maxSessionTimeout=4* It would be really great if anyone can help me to resolve this issue by giving your thoughts/suggestions. Thanks, Manu S
kerberos mapreduce question
with kerberos enabled a mapreduce job runs as the user that submitted it. does this mean the user that submitted the job needs to have linux accounts on all machines on the cluster? how does mapreduce do this (run jobs as the user)? do the tasktrackers use secure impersonation to run-as the user? thanks! koert
Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain
Are you able ping to yourpcipaddress domainnameyougaveformachine hostnameofthemachine Hbase stops means its not able to start itself on the ip or hostname which you are giving. On Thu, Jun 7, 2012 at 2:48 PM, Manu S manupk...@gmail.com wrote: Hi All, In pseudo distributed node HBaseMaster is stopping automatically when we starts HbaseRegion. I have changed all the configuration files of Hadoop,Hbase Zookeeper to set the exact hostname of the machine. Also commented the localhost entry from /etc/hosts cleared the cache as well. There is no entry of localhost.localdomain entry in these configurations, but this it is resolving to localhost.localdomain. Please find the error: 2012-06-07 12:13:11,995 INFO org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain 2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.* *java.lang.IllegalArgumentException: hostname can't be null* at java.net.InetSocketAddress.init(InetSocketAddress.java:121) at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads Thanks, Manu S -- ∞ Shashwat Shriparv
Re: Ideal file size
On Wed, Jun 6, 2012 at 10:14 AM, Mohit Anchlia mohitanch...@gmail.comwrote: On Wed, Jun 6, 2012 at 9:48 AM, M. C. Srivas mcsri...@gmail.com wrote: Many factors to consider than just the size of the file. . How long can you wait before you *have to* process the data? 5 minutes? 5 hours? 5 days? If you want good timeliness, you need to roll-over faster. The longer you wait: 1. the lesser the load on the NN. 2. but the poorer the timeliness 3. and the larger chance of lost data (ie, the data is not saved until the file is closed and rolled over, unless you want to sync() after every write) To Begin with I was going to use Flume and specify rollover file size. I understand the above parameters, I just want to ensure that too many small files doesn't cause problem on the NameNode. For instance there would be times when we get GBs of data in an hour and at times only few 100 MB. From what Harsh, Edward and you've described it doesn't cause issues with the NameNode but rather increase in processing times if there are too many small files. Looks like I need to find that balance. It would also be interesting to see how others solve this problem when not using Flume. They use NFS with MapR. Any and all log-rotators (like the one in log4j) simply just work over NFS, and MapR does not have a NN, so the problems with small files or number of files do not exist. On Wed, Jun 6, 2012 at 7:00 AM, Mohit Anchlia mohitanch...@gmail.com wrote: We have continuous flow of data into the sequence file. I am wondering what would be the ideal file size before file gets rolled over. I know too many small files are not good but could someone tell me what would be the ideal size such that it doesn't overload NameNode.
ArrayIndexOutOfBounds in TestFSMainOperationsLocalFileSystem
Hi Currently I am using Hadoop 2.0.1, when I run the TestFSMainOperationsLocalFileSystem test class I am getting org.apache.hadoop.fs.viewfs.TestFSMainOperationsLocalFileSystem.testWDAbsolute java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:237) at org.apache.hadoop.fs.viewfs.InodeTree.init(InodeTree.java:334) at org.apache.hadoop.fs.viewfs.ViewFileSystem$1.init(ViewFileSystem.java:178) at org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:178) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2150) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:80) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2184) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2166) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:302) at org.apache.hadoop.fs.viewfs.ViewFileSystemTestSetup.setupForViewFileSystem(ViewFileSystemTestSetup.java:64) at org.apache.hadoop.fs.viewfs.TestFSMainOperationsLocalFileSystem.setUp(TestFSMainOperationsLocalFileSystem.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) Standard Output 2012-06-07 15:11:09,325 INFO mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-06-07 15:11:09,328 INFO mortbay.log (Slf4jLog.java:info(67)) - Home dir base / I am running the tests in SuSE 11, can anyone please tell me what could be the problem Thanks and Regards Amith
Re: kerberos mapreduce question
Yes, User submitting a job needs to have an account on all the nodes. Sent from my iPhone On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote: with kerberos enabled a mapreduce job runs as the user that submitted it. does this mean the user that submitted the job needs to have linux accounts on all machines on the cluster? how does mapreduce do this (run jobs as the user)? do the tasktrackers use secure impersonation to run-as the user? thanks! koert
Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain
Thank you Harsh Shashwat I given the hostname in /etc/sysconfig/network as pseudo-distributed. hostname command returns this name also. I added this name in /etc/hosts file and changed all the configuration accordingly. But zookeeper is trying to resolve to localhost.localdomain. There was no entries in any conf files or hostname related files for localhost.localdomain. Yea, everything is pinging as I given the names in /etc/hosts. On Thu, Jun 7, 2012 at 7:13 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Are you able ping to yourpcipaddress domainnameyougaveformachine hostnameofthemachine Hbase stops means its not able to start itself on the ip or hostname which you are giving. On Thu, Jun 7, 2012 at 2:48 PM, Manu S manupk...@gmail.com wrote: Hi All, In pseudo distributed node HBaseMaster is stopping automatically when we starts HbaseRegion. I have changed all the configuration files of Hadoop,Hbase Zookeeper to set the exact hostname of the machine. Also commented the localhost entry from /etc/hosts cleared the cache as well. There is no entry of localhost.localdomain entry in these configurations, but this it is resolving to localhost.localdomain. Please find the error: 2012-06-07 12:13:11,995 INFO org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain 2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.* *java.lang.IllegalArgumentException: hostname can't be null* at java.net.InetSocketAddress.init(InetSocketAddress.java:121) at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads Thanks, Manu S -- ∞ Shashwat Shriparv
Re: kerberos mapreduce question
thanks for your answer. so at a large place like say yahoo, or facebook, assuming they use kerberos, every analyst that uses hive has an account on every node of their large cluster? sounds like an admin nightmare to me On Thu, Jun 7, 2012 at 10:46 AM, Mapred Learn mapred.le...@gmail.comwrote: Yes, User submitting a job needs to have an account on all the nodes. Sent from my iPhone On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote: with kerberos enabled a mapreduce job runs as the user that submitted it. does this mean the user that submitted the job needs to have linux accounts on all machines on the cluster? how does mapreduce do this (run jobs as the user)? do the tasktrackers use secure impersonation to run-as the user? thanks! koert
Re: kerberos mapreduce question
Hi, take a look at this : http://hadoop.apache.org/common/docs/r1.0.3/Secure_Impersonation.html I think that it can help you. Slim Tebourbi. 2012/6/7 Koert Kuipers ko...@tresata.com thanks for your answer. so at a large place like say yahoo, or facebook, assuming they use kerberos, every analyst that uses hive has an account on every node of their large cluster? sounds like an admin nightmare to me On Thu, Jun 7, 2012 at 10:46 AM, Mapred Learn mapred.le...@gmail.com wrote: Yes, User submitting a job needs to have an account on all the nodes. Sent from my iPhone On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote: with kerberos enabled a mapreduce job runs as the user that submitted it. does this mean the user that submitted the job needs to have linux accounts on all machines on the cluster? how does mapreduce do this (run jobs as the user)? do the tasktrackers use secure impersonation to run-as the user? thanks! koert
Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain
On Thu, Jun 7, 2012 at 2:18 AM, Manu S manupk...@gmail.com wrote: *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain This is pretty basic. Fix this first and then your hbase will work. Please stop spraying your queries across multiple lists. Doing so makes us think you arrogant which I am sure is not the case. Pick the list that seems most appropriate. For example, in this case, it seems like the hbase-user list would have been the right place to write; not common-user and cdh-user. If it turns out you've chosen wrong, usually the chosen list will help you figure the proper target. Thanks, St.Ack
Re: kerberos mapreduce question
If you provision your user/group information via LDAP to all your nodes it is not a nightmare. On Thu, Jun 7, 2012 at 7:49 AM, Koert Kuipers ko...@tresata.com wrote: thanks for your answer. so at a large place like say yahoo, or facebook, assuming they use kerberos, every analyst that uses hive has an account on every node of their large cluster? sounds like an admin nightmare to me On Thu, Jun 7, 2012 at 10:46 AM, Mapred Learn mapred.le...@gmail.com wrote: Yes, User submitting a job needs to have an account on all the nodes. Sent from my iPhone On Jun 7, 2012, at 6:20 AM, Koert Kuipers ko...@tresata.com wrote: with kerberos enabled a mapreduce job runs as the user that submitted it. does this mean the user that submitted the job needs to have linux accounts on all machines on the cluster? how does mapreduce do this (run jobs as the user)? do the tasktrackers use secure impersonation to run-as the user? thanks! koert -- Alejandro
Re: Hadoop Eclipse Plugin set up
Hi Ali, You need to go to the VM setting and set the network setting to 'Bridged'. Enjoy :)
Re: Pseudo Distributed: ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain
Hey manu which linux distribution you are using?? On Thu, Jun 7, 2012 at 8:18 PM, Manu S manupk...@gmail.com wrote: Thank you Harsh Shashwat I given the hostname in /etc/sysconfig/network as pseudo-distributed. hostname command returns this name also. I added this name in /etc/hosts file and changed all the configuration accordingly. But zookeeper is trying to resolve to localhost.localdomain. There was no entries in any conf files or hostname related files for localhost.localdomain. Yea, everything is pinging as I given the names in /etc/hosts. On Thu, Jun 7, 2012 at 7:13 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Are you able ping to yourpcipaddress domainnameyougaveformachine hostnameofthemachine Hbase stops means its not able to start itself on the ip or hostname which you are giving. On Thu, Jun 7, 2012 at 2:48 PM, Manu S manupk...@gmail.com wrote: Hi All, In pseudo distributed node HBaseMaster is stopping automatically when we starts HbaseRegion. I have changed all the configuration files of Hadoop,Hbase Zookeeper to set the exact hostname of the machine. Also commented the localhost entry from /etc/hosts cleared the cache as well. There is no entry of localhost.localdomain entry in these configurations, but this it is resolving to localhost.localdomain. Please find the error: 2012-06-07 12:13:11,995 INFO org.apache.hadoop.hbase.master.MasterFileSystem: No logs to split *2012-06-07 12:13:12,103 ERROR org.apache.hadoop.hbase.HServerAddress: Could not resolve the DNS name of localhost.localdomain 2012-06-07 12:13:12,104 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.* *java.lang.IllegalArgumentException: hostname can't be null* at java.net.InetSocketAddress.init(InetSocketAddress.java:121) at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:108) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:64) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.dataToHServerAddress(RootRegionTracker.java:82) at org.apache.hadoop.hbase.zookeeper.RootRegionTracker.waitRootRegionLocation(RootRegionTracker.java:73) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:222) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:240) at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:487) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:455) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:406) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:293) 2012-06-07 12:13:12,106 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-06-07 12:13:12,106 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads Thanks, Manu S -- ∞ Shashwat Shriparv -- ∞ Shashwat Shriparv
Re: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
If you have *property* * namehadoop.tmp.dir/name* * value../Hadoop/hdfs/tmp/value /property in your configuration file then remove it and try Thanks and regards ∞ Shashwat Shriparv * On Thu, Jun 7, 2012 at 1:56 PM, huanchen.zhang huanchen.zh...@ipinyou.comwrote: Hi, I coded a map reduce program with hadoop java api. When I submitted the job to the cluster, I got the following errors: Exception in thread main java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException at org.apache.hadoop.mapreduce.Job$1.run(Job.java:489) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapreduce.Job.connect(Job.java:487) at org.apache.hadoop.mapreduce.Job.submit(Job.java:475) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:52) Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 8 more I found the classes not found here is in jackson-core-asl-1.5.2 and jackson-mapper-asl-1.5.2, so added these two jars to the project and resubmitted the job. But I got the following errors: Jun 7, 2012 4:18:55 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId= Jun 7, 2012 4:18:55 PM org.apache.hadoop.util.NativeCodeLoader clinit WARNING: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient copyAndConfigureFiles WARNING: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Jun 7, 2012 4:18:55 PM org.apache.hadoop.mapred.JobClient$2 run INFO: Cleaning up the staging area file:/tmp/hadoop-huanchen/mapred/staging/huanchen757608919/.staging/job_local_0001 Exception in thread main org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/data/huanchen/pagecrawler/url at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961) at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapreduce.Job.submit(Job.java:476) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:51) Note that the error is Input path does not exist: file:/ instead of Input path does not exist: hdfs:/ . So does it mean the job does not successfully connect to the hadoop cluster? The first NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException error is also for this reason? Any one has any ideas? Thank you ! Best, Huanchen 2012-06-07 huanchen.zhang -- ∞ Shashwat Shriparv
Re: namenode restart
Hi Rita, What kind of client you have?? Please be sure that there is no job running and especially no data is copying while you do the restart. Also are you restarting by using stop-dfs.sh and start-dfs.sh? Regards, Abhishek On Thu, Jun 7, 2012 at 3:29 AM, Rita rmorgan...@gmail.com wrote: Running Hadoop 0.22 and I need to restart the namenode so my new rack configuration will be set into place. I am thinking of doing a quick stop and start of the namenode but what will happen to the current clients? Can they tolerate a 30 second hiccup by retrying? -- --- Get your facts first, then you can distort them as you please.--
Re: Ideal file size
Almost all the answers is already provided in this post. My 2 cents... try to have a file size in multiple of block size so while processing number of mappers are less and performance of the job is better.. You can also merge file in HDFS later on for processing. Regards, Abhishek On Thu, Jun 7, 2012 at 7:29 AM, M. C. Srivas mcsri...@gmail.com wrote: On Wed, Jun 6, 2012 at 10:14 AM, Mohit Anchlia mohitanch...@gmail.com wrote: On Wed, Jun 6, 2012 at 9:48 AM, M. C. Srivas mcsri...@gmail.com wrote: Many factors to consider than just the size of the file. . How long can you wait before you *have to* process the data? 5 minutes? 5 hours? 5 days? If you want good timeliness, you need to roll-over faster. The longer you wait: 1. the lesser the load on the NN. 2. but the poorer the timeliness 3. and the larger chance of lost data (ie, the data is not saved until the file is closed and rolled over, unless you want to sync() after every write) To Begin with I was going to use Flume and specify rollover file size. I understand the above parameters, I just want to ensure that too many small files doesn't cause problem on the NameNode. For instance there would be times when we get GBs of data in an hour and at times only few 100 MB. From what Harsh, Edward and you've described it doesn't cause issues with the NameNode but rather increase in processing times if there are too many small files. Looks like I need to find that balance. It would also be interesting to see how others solve this problem when not using Flume. They use NFS with MapR. Any and all log-rotators (like the one in log4j) simply just work over NFS, and MapR does not have a NN, so the problems with small files or number of files do not exist. On Wed, Jun 6, 2012 at 7:00 AM, Mohit Anchlia mohitanch...@gmail.com wrote: We have continuous flow of data into the sequence file. I am wondering what would be the ideal file size before file gets rolled over. I know too many small files are not good but could someone tell me what would be the ideal size such that it doesn't overload NameNode.