Re: how to use ContentSumary
thanks a lot ,i can successful exec the follow code Configuration conf = new Configuration(); conf.set(fs.defaultFS, 192.1.1.1:8020); FileSystem fs = FileSystem.get(conf); ContentSummary cs = fs.getContentSummary(new Path(/sequence)); System.out.println(cs is what ? + cs.toString()); System.out.println(direcotry count is : + cs.getDirectoryCount()); System.out.println(file count is : + cs.getFileCount()); 2013/10/9 Brahma Reddy Battula brahmareddy.batt...@huawei.com Please check the following for same DistributedFileSystem dfs= *new* DistributedFileSystem (); dfs.initialize(URI.*create*(hdfs://hacluster), conf); DistributedFileSystem dfs = *new* DistributedFileSystem(); cnSum=dfs.getContentSummary(new Path(dirName)); cnSum.getQuota() cnSum.getSpaceQuota() cnSum.getSpaceConsumed() ... ... ... Note : you need to pass the conf correctly.. -- *From:* kun yan [yankunhad...@gmail.com] *Sent:* Wednesday, October 09, 2013 10:20 AM *To:* user@hadoop.apache.org *Subject:* how to use ContentSumary hi all In *org.apache.hadoop.fs * *i found ContentSumary but i am not sure how to use it.Who can help me ,thanks a lot * -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com
Problem with streaming exact binary chunks
Hello, I wrote a very simple InputFormat and RecordReader to send binary data to mappers. Binary data can contain anything (including \n, \t, \r), here is what next() may actually send: public class MyRecordReader implements RecordReaderBytesWritable, BytesWritable { ... public boolean next(BytesWritable key, BytesWritable ignore) throws IOException { ... byte[] result = new byte[8]; for (int i = 0; i result.length; ++i) result[i] = (byte)(i+1); result[3] = (byte)'\n'; result[4] = (byte)'\n'; key.set(result, 0, result.length); return true; } } As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 07 08, I also use Hadoop-1722 typed bytes (by setting -D stream.map.input=typedbytes). According to the documentation of typed bytes the mapper should receive the following byte sequence: 00 00 00 08 01 02 03 0a 0a 06 07 08 However bytes are somehow modified and I get the following sequence instead: 00 00 00 08 01 02 03 09 0a 09 0a 06 07 08 0a = '\n' 09 = '\t' It seems that Hadoop (streaming?) parsed the new line character as a separator and put '\t' which is the key/value separator for streaming I assume. Is there any work around to send *exactly* the same bytes sequence no matter what characters are in the sequence? Thanks in advance. Best regards, Youssef Hatem
RE: Problem with streaming exact binary chunks
Hi, The only way that I could find was to override the various InputWriter and OutputWriter classes. as defined by the configuration settings stream.map.input.writer.class stream.map.output.reader.class stream.reduce.input.writer.class stream.reduce. output.reader.class which was painful. Hopefully someone will tell you the _correct_ way to do this. If not I will provide more details. Regards, Peter Marron Trillium Software UK Limited Tel : +44 (0) 118 940 7609 Fax : +44 (0) 118 940 7699 E: peter.mar...@trilliumsoftware.com -Original Message- From: Youssef Hatem [mailto:youssef.ha...@rwth-aachen.de] Sent: 09 October 2013 12:14 To: user@hadoop.apache.org Subject: Problem with streaming exact binary chunks Hello, I wrote a very simple InputFormat and RecordReader to send binary data to mappers. Binary data can contain anything (including \n, \t, \r), here is what next() may actually send: public class MyRecordReader implements RecordReaderBytesWritable, BytesWritable { ... public boolean next(BytesWritable key, BytesWritable ignore) throws IOException { ... byte[] result = new byte[8]; for (int i = 0; i result.length; ++i) result[i] = (byte)(i+1); result[3] = (byte)'\n'; result[4] = (byte)'\n'; key.set(result, 0, result.length); return true; } } As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 07 08, I also use Hadoop-1722 typed bytes (by setting -D stream.map.input=typedbytes). According to the documentation of typed bytes the mapper should receive the following byte sequence: 00 00 00 08 01 02 03 0a 0a 06 07 08 However bytes are somehow modified and I get the following sequence instead: 00 00 00 08 01 02 03 09 0a 09 0a 06 07 08 0a = '\n' 09 = '\t' It seems that Hadoop (streaming?) parsed the new line character as a separator and put '\t' which is the key/value separator for streaming I assume. Is there any work around to send *exactly* the same bytes sequence no matter what characters are in the sequence? Thanks in advance. Best regards, Youssef Hatem
Oozie SSH Action Issue
#Oozie SSH Action Issue: *Issue:* We are trying to run few commands on a particular host machine of our cluster. We chose SSH Action for the same. We have been facing this SSH issue for some time now. What might be the real issue here? Please point me towards the solution. *logs:* AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 USER@1.2.3.4 mkdir -p oozie-oozi/000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added host,1.2.3.4 (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). org.apache.oozie.action.ActionExecutorException: AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 user@1.2.3.4 mkdir -p oozie-oozi/000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added 1.2.3.4,192.168.34.208 (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:589) at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:204) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59) at org.apache.oozie.command.XCommand.call(XCommand.java:277) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 user@1.2.3.4 mkdir -p oozie-oozi/000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added '1.2.3.4,1.2.3.4' (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). at org.apache.oozie.action.ssh.SshActionExecutor.executeCommand(SshActionExecutor.java:340) at org.apache.oozie.action.ssh.SshActionExecutor.setupRemote(SshActionExecutor.java:373) at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:206) at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:204) at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:547) ... 10 more 2013-10-09 12:48:25,982 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[000-131008185935754-oozie-oozi-W@action1] Suspending Workflow Job id=000-131008185935754-oozie-oozi-W 2013-10-09 12:48:27,204 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[000-131008185935754-oozie-oozi-W@action1] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-10-09 12:59:57,477 INFO org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=000-131008185935754-oozie-oozi-W 2013-10-09 12:59:57,685 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-10-09 12:59:57,686 INFO org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=000-131008185935754-oozie-oozi-W 2013-10-09 13:41:32,654 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 000-131008185935754-oozie-oozi-W, Error Code: E0725 2013-10-09 13:41:45,199 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 000-131008185935754-oozie-oozi-W, Error Code: E0725 2013-10-09 13:42:04,869 WARN org.apache.oozie.command.wf.ResumeXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E1100:
using environment variables in XML conf files
I have the following path hard-coded for hadoop.tmp.dir in conf/core-site.xml: configuration property namehadoop.tmp.dir/name value/home/aim/tmp/hadoop-${user.name}/value /property /configuration Is it possible to replace the /home/aim with a substitutable version $HOME? On a whim I tried ${env.home} but that didn't work... -- andy
Java version with Hadoop 2.0
I am preparing to deploy multiple cluster / distros of Hadoop for testing / benchmarking. In my research I have noticed discrepancies in the version of the JDK that various groups are using. Example: Hortonworks is suggesting JDK6u31, CDH recommends either 6 or 7 providing you stick to some guidelines for each and Apache Hadoop seems to be somewhat of a no mans land; a lot of people using a lot of different versions. Does anyone have any insight they could share about how to approach choosing the best JDK release? (I'm a total Java newb, so any info / further reading you guys can provide is appreciated.) Thanks. sf
Re: Java version with Hadoop 2.0
maybe you've already seen this. http://wiki.apache.org/hadoop/HadoopJavaVersions On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.commailto:sfhad...@gmail.com wrote: I am preparing to deploy multiple cluster / distros of Hadoop for testing / benchmarking. In my research I have noticed discrepancies in the version of the JDK that various groups are using. Example: Hortonworks is suggesting JDK6u31, CDH recommends either 6 or 7 providing you stick to some guidelines for each and Apache Hadoop seems to be somewhat of a no mans land; a lot of people using a lot of different versions. Does anyone have any insight they could share about how to approach choosing the best JDK release? (I'm a total Java newb, so any info / further reading you guys can provide is appreciated.) Thanks. sf
Re: Java version with Hadoop 2.0
I hadn't. Thank you!!! Very helpful. Andy On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum patai.sangbutsara...@turn.com wrote: maybe you've already seen this. http://wiki.apache.org/hadoop/HadoopJavaVersions On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com wrote: I am preparing to deploy multiple cluster / distros of Hadoop for testing / benchmarking. In my research I have noticed discrepancies in the version of the JDK that various groups are using. Example: Hortonworks is suggesting JDK6u31, CDH recommends either 6 or 7 providing you stick to some guidelines for each and Apache Hadoop seems to be somewhat of a no mans land; a lot of people using a lot of different versions. Does anyone have any insight they could share about how to approach choosing the best JDK release? (I'm a total Java newb, so any info / further reading you guys can provide is appreciated.) Thanks. sf
Re: Java version with Hadoop 2.0
also keep in mind, that java 6 no longer gets public updates from Oracle: http://www.oracle.com/technetwork/java/eol-135779.html - André On Wed, Oct 9, 2013 at 11:48 PM, SF Hadoop sfhad...@gmail.com wrote: I hadn't. Thank you!!! Very helpful. Andy On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum patai.sangbutsara...@turn.com wrote: maybe you've already seen this. http://wiki.apache.org/hadoop/HadoopJavaVersions On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com wrote: I am preparing to deploy multiple cluster / distros of Hadoop for testing / benchmarking. In my research I have noticed discrepancies in the version of the JDK that various groups are using. Example: Hortonworks is suggesting JDK6u31, CDH recommends either 6 or 7 providing you stick to some guidelines for each and Apache Hadoop seems to be somewhat of a no mans land; a lot of people using a lot of different versions. Does anyone have any insight they could share about how to approach choosing the best JDK release? (I'm a total Java newb, so any info / further reading you guys can provide is appreciated.) Thanks. sf -- André Kelpe an...@concurrentinc.com http://concurrentinc.com
Re: Java version with Hadoop 2.0
Does that mean for the new cluster, we probably try to start to aim to test/use/deploy at java 7? On Oct 9, 2013, at 3:05 PM, Andre Kelpe ake...@concurrentinc.com wrote: also keep in mind, that java 6 no longer gets public updates from Oracle: http://www.oracle.com/technetwork/java/eol-135779.html - André On Wed, Oct 9, 2013 at 11:48 PM, SF Hadoop sfhad...@gmail.com wrote: I hadn't. Thank you!!! Very helpful. Andy On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum patai.sangbutsara...@turn.com wrote: maybe you've already seen this. http://wiki.apache.org/hadoop/HadoopJavaVersions On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com wrote: I am preparing to deploy multiple cluster / distros of Hadoop for testing / benchmarking. In my research I have noticed discrepancies in the version of the JDK that various groups are using. Example: Hortonworks is suggesting JDK6u31, CDH recommends either 6 or 7 providing you stick to some guidelines for each and Apache Hadoop seems to be somewhat of a no mans land; a lot of people using a lot of different versions. Does anyone have any insight they could share about how to approach choosing the best JDK release? (I'm a total Java newb, so any info / further reading you guys can provide is appreciated.) Thanks. sf -- André Kelpe an...@concurrentinc.com http://concurrentinc.com
issue about combine map task output
hi,all: i read the doc ,and have a question about combine,if i set min.num.spills.for.combine = 3 ,so when the 3rd spill happened,the combine also happend but i do not know when combine happend if it take three spill file as input ,and merge into one big spill file as output?
Re: using environment variables in XML conf files
Java provides a system property for ${user.home} that is usually set to the same as $HOME of the session on Linux. You can therefore use ${user.home}/tmp/hadoop-${user.name}. On Wed, Oct 9, 2013 at 9:49 PM, Andrew McDermott andrew.mcderm...@linaro.org wrote: I have the following path hard-coded for hadoop.tmp.dir in conf/core-site.xml: configuration property namehadoop.tmp.dir/name value/home/aim/tmp/hadoop-${user.name}/value /property /configuration Is it possible to replace the /home/aim with a substitutable version $HOME? On a whim I tried ${env.home} but that didn't work... -- andy -- Harsh J