Re: how to use ContentSumary

2013-10-09 Thread kun yan
thanks a lot ,i can successful exec the follow code

Configuration conf = new Configuration();
conf.set(fs.defaultFS, 192.1.1.1:8020);
FileSystem fs = FileSystem.get(conf);
ContentSummary cs = fs.getContentSummary(new Path(/sequence));
System.out.println(cs is what ?  + cs.toString());
System.out.println(direcotry count is : + cs.getDirectoryCount());
System.out.println(file count is : + cs.getFileCount());


2013/10/9 Brahma Reddy Battula brahmareddy.batt...@huawei.com

  Please check the following for same



 DistributedFileSystem dfs=
 *new* DistributedFileSystem ();

 dfs.initialize(URI.*create*(hdfs://hacluster), conf);

 DistributedFileSystem
 dfs = *new* DistributedFileSystem();

 cnSum=dfs.getContentSummary(new Path(dirName));
 cnSum.getQuota()
 cnSum.getSpaceQuota()
 cnSum.getSpaceConsumed()

 ...

 ...

 ...









 Note : you need to pass the conf correctly..
  --
 *From:* kun yan [yankunhad...@gmail.com]
 *Sent:* Wednesday, October 09, 2013 10:20 AM
 *To:* user@hadoop.apache.org
 *Subject:* how to use ContentSumary

   hi all
 In *org.apache.hadoop.fs *
 *i found ContentSumary  but i am not sure how to use it.Who can help me
 ,thanks a lot
 *


  --

 In the Hadoop world, I am just a novice, explore the entire Hadoop
 ecosystem, I hope one day I can contribute their own code

 YanBit
 yankunhad...@gmail.com




-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhad...@gmail.com


Problem with streaming exact binary chunks

2013-10-09 Thread Youssef Hatem
Hello,

I wrote a very simple InputFormat and RecordReader to send binary data to 
mappers. Binary data can contain anything (including \n, \t, \r), here is what 
next() may actually send:

public class MyRecordReader implements
RecordReaderBytesWritable, BytesWritable {
...
public boolean next(BytesWritable key, BytesWritable ignore)
throws IOException {
...

byte[] result = new byte[8];
for (int i = 0; i  result.length; ++i)
result[i] = (byte)(i+1);
result[3] = (byte)'\n';
result[4] = (byte)'\n';

key.set(result, 0, result.length);
return true;
}
}

As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 
07 08, I also use Hadoop-1722 typed bytes (by setting -D 
stream.map.input=typedbytes).

According to the documentation of typed bytes the mapper should receive the 
following byte sequence: 
00 00 00 08 01 02 03 0a 0a 06 07 08

However bytes are somehow modified and I get the following sequence instead:
00 00 00 08 01 02 03 09 0a 09 0a 06 07 08

0a = '\n'
09 = '\t'

It seems that Hadoop (streaming?) parsed the new line character as a separator 
and put '\t' which is the key/value separator for streaming I assume.

Is there any work around to send *exactly* the same bytes sequence no matter 
what characters are in the sequence? Thanks in advance.

Best regards,
Youssef Hatem


RE: Problem with streaming exact binary chunks

2013-10-09 Thread Peter Marron
Hi,

The only way that I could find was to override the various InputWriter and 
OutputWriter classes.
as defined by the configuration settings
stream.map.input.writer.class
stream.map.output.reader.class
stream.reduce.input.writer.class
stream.reduce. output.reader.class
which was painful. Hopefully someone will tell you the _correct_ way to do this.
If not I will provide more details.

Regards,

Peter Marron
Trillium Software UK Limited

Tel : +44 (0) 118 940 7609
Fax : +44 (0) 118 940 7699
E: peter.mar...@trilliumsoftware.com

-Original Message-
From: Youssef Hatem [mailto:youssef.ha...@rwth-aachen.de] 
Sent: 09 October 2013 12:14
To: user@hadoop.apache.org
Subject: Problem with streaming exact binary chunks

Hello,

I wrote a very simple InputFormat and RecordReader to send binary data to 
mappers. Binary data can contain anything (including \n, \t, \r), here is what 
next() may actually send:

public class MyRecordReader implements
RecordReaderBytesWritable, BytesWritable {
...
public boolean next(BytesWritable key, BytesWritable ignore)
throws IOException {
...

byte[] result = new byte[8];
for (int i = 0; i  result.length; ++i)
result[i] = (byte)(i+1);
result[3] = (byte)'\n';
result[4] = (byte)'\n';

key.set(result, 0, result.length);
return true;
}
}

As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 
07 08, I also use Hadoop-1722 typed bytes (by setting -D 
stream.map.input=typedbytes).

According to the documentation of typed bytes the mapper should receive the 
following byte sequence: 
00 00 00 08 01 02 03 0a 0a 06 07 08

However bytes are somehow modified and I get the following sequence instead:
00 00 00 08 01 02 03 09 0a 09 0a 06 07 08

0a = '\n'
09 = '\t'

It seems that Hadoop (streaming?) parsed the new line character as a separator 
and put '\t' which is the key/value separator for streaming I assume.

Is there any work around to send *exactly* the same bytes sequence no matter 
what characters are in the sequence? Thanks in advance.

Best regards,
Youssef Hatem


Oozie SSH Action Issue

2013-10-09 Thread Kasa V Varun Tej
#Oozie SSH Action Issue:

*Issue:*
We are trying to run few commands on a particular host machine of our
cluster. We chose SSH Action for the same. We have been facing this SSH
issue for some time now. What might be the real issue here? Please point me
towards the solution.


*logs:*

AUTH_FAILED: Not able to perform operation [ssh -o
PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 USER@1.2.3.4 mkdir -p
oozie-oozi/000-131008185935754-oozie-oozi-W/action1--ssh/ ] |
ErrorStream: Warning: Permanently added host,1.2.3.4 (RSA) to the list of
known hosts. Permission denied
(publickey,gssapi-keyex,gssapi-with-mic,password).

org.apache.oozie.action.ActionExecutorException: AUTH_FAILED: Not able to
perform operation [ssh -o PasswordAuthentication=no -o
KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20
user@1.2.3.4  mkdir -p
oozie-oozi/000-131008185935754-oozie-oozi-W/action1--ssh/ ] |
ErrorStream: Warning: Permanently added 1.2.3.4,192.168.34.208 (RSA) to the
list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

 at
org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:589)
 at
org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:204)
 at
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211)
 at
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59)
 at org.apache.oozie.command.XCommand.call(XCommand.java:277)
 at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
 at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
 at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Not able to perform operation [ssh -o
PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 user@1.2.3.4  mkdir -p
oozie-oozi/000-131008185935754-oozie-oozi-W/action1--ssh/ ] |
ErrorStream: Warning: Permanently added '1.2.3.4,1.2.3.4' (RSA) to the list
of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

 at
org.apache.oozie.action.ssh.SshActionExecutor.executeCommand(SshActionExecutor.java:340)
 at
org.apache.oozie.action.ssh.SshActionExecutor.setupRemote(SshActionExecutor.java:373)
 at
org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:206)
 at
org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:204)
 at
org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:547)
 ... 10 more
2013-10-09 12:48:25,982 WARN
org.apache.oozie.command.wf.ActionStartXCommand: USER[user] GROUP[-]
TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W]
ACTION[000-131008185935754-oozie-oozi-W@action1] Suspending Workflow
Job id=000-131008185935754-oozie-oozi-W
2013-10-09 12:48:27,204 WARN
org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user]
GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W]
ACTION[000-131008185935754-oozie-oozi-W@action1] E1100: Command
precondition does not hold before execution, [, coord action is null],
Error Code: E1100
2013-10-09 12:59:57,477 INFO org.apache.oozie.command.wf.KillXCommand:
USER[user] GROUP[-] TOKEN[] APP[Test]
JOB[000-131008185935754-oozie-oozi-W] ACTION[-] STARTED
WorkflowKillXCommand for jobId=000-131008185935754-oozie-oozi-W
2013-10-09 12:59:57,685 WARN
org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user]
GROUP[-] TOKEN[] APP[Test] JOB[000-131008185935754-oozie-oozi-W]
ACTION[-] E1100: Command precondition does not hold before execution, [,
coord action is null], Error Code: E1100
2013-10-09 12:59:57,686 INFO org.apache.oozie.command.wf.KillXCommand:
USER[user] GROUP[-] TOKEN[] APP[Test]
JOB[000-131008185935754-oozie-oozi-W] ACTION[-] ENDED
WorkflowKillXCommand for jobId=000-131008185935754-oozie-oozi-W
2013-10-09 13:41:32,654 WARN org.apache.oozie.command.wf.KillXCommand:
USER[user] GROUP[-] TOKEN[] APP[Test]
JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow
instance can not be killed, 000-131008185935754-oozie-oozi-W, Error
Code: E0725
2013-10-09 13:41:45,199 WARN org.apache.oozie.command.wf.KillXCommand:
USER[user] GROUP[-] TOKEN[] APP[Test]
JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow
instance can not be killed, 000-131008185935754-oozie-oozi-W, Error
Code: E0725
2013-10-09 13:42:04,869 WARN org.apache.oozie.command.wf.ResumeXCommand:
USER[user] GROUP[-] TOKEN[] APP[Test]
JOB[000-131008185935754-oozie-oozi-W] ACTION[-] E1100: 

using environment variables in XML conf files

2013-10-09 Thread Andrew McDermott

I have the following path hard-coded for hadoop.tmp.dir in
conf/core-site.xml:

configuration
  property
namehadoop.tmp.dir/name
value/home/aim/tmp/hadoop-${user.name}/value
  /property
/configuration

Is it possible to replace the /home/aim with a substitutable version
$HOME?  On a whim I tried ${env.home} but that didn't work...

-- 
andy




Java version with Hadoop 2.0

2013-10-09 Thread SF Hadoop
I am preparing to deploy multiple cluster / distros of Hadoop for testing /
benchmarking.

In my research I have noticed discrepancies in the version of the JDK that
various groups are using.  Example:  Hortonworks is suggesting JDK6u31, CDH
recommends either 6 or 7 providing you stick to some guidelines for each
and Apache Hadoop seems to be somewhat of a no mans land; a lot of people
using a lot of different versions.

Does anyone have any insight they could share about how to approach
choosing the best JDK release?  (I'm a total Java newb, so any info /
further reading you guys can provide is appreciated.)

Thanks.

sf


Re: Java version with Hadoop 2.0

2013-10-09 Thread Patai Sangbutsarakum
maybe you've already seen this.

http://wiki.apache.org/hadoop/HadoopJavaVersions


On Oct 9, 2013, at 2:16 PM, SF Hadoop 
sfhad...@gmail.commailto:sfhad...@gmail.com
 wrote:

I am preparing to deploy multiple cluster / distros of Hadoop for testing / 
benchmarking.

In my research I have noticed discrepancies in the version of the JDK that 
various groups are using.  Example:  Hortonworks is suggesting JDK6u31, CDH 
recommends either 6 or 7 providing you stick to some guidelines for each and 
Apache Hadoop seems to be somewhat of a no mans land; a lot of people using a 
lot of different versions.

Does anyone have any insight they could share about how to approach choosing 
the best JDK release?  (I'm a total Java newb, so any info / further reading 
you guys can provide is appreciated.)

Thanks.

sf



Re: Java version with Hadoop 2.0

2013-10-09 Thread SF Hadoop
I hadn't.  Thank you!!!  Very helpful.

Andy


On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum 
patai.sangbutsara...@turn.com wrote:

  maybe you've already seen this.

  http://wiki.apache.org/hadoop/HadoopJavaVersions


  On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com
  wrote:

  I am preparing to deploy multiple cluster / distros of Hadoop for
 testing / benchmarking.

  In my research I have noticed discrepancies in the version of the JDK
 that various groups are using.  Example:  Hortonworks is suggesting
 JDK6u31, CDH recommends either 6 or 7 providing you stick to some
 guidelines for each and Apache Hadoop seems to be somewhat of a no mans
 land; a lot of people using a lot of different versions.

  Does anyone have any insight they could share about how to approach
 choosing the best JDK release?  (I'm a total Java newb, so any info /
 further reading you guys can provide is appreciated.)

  Thanks.

  sf





Re: Java version with Hadoop 2.0

2013-10-09 Thread Andre Kelpe
also keep in mind, that java 6 no longer gets public updates from
Oracle: http://www.oracle.com/technetwork/java/eol-135779.html

- André

On Wed, Oct 9, 2013 at 11:48 PM, SF Hadoop sfhad...@gmail.com wrote:
 I hadn't.  Thank you!!!  Very helpful.

 Andy


 On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum
 patai.sangbutsara...@turn.com wrote:

 maybe you've already seen this.

 http://wiki.apache.org/hadoop/HadoopJavaVersions


 On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com
  wrote:

 I am preparing to deploy multiple cluster / distros of Hadoop for testing
 / benchmarking.

 In my research I have noticed discrepancies in the version of the JDK that
 various groups are using.  Example:  Hortonworks is suggesting JDK6u31, CDH
 recommends either 6 or 7 providing you stick to some guidelines for each and
 Apache Hadoop seems to be somewhat of a no mans land; a lot of people
 using a lot of different versions.

 Does anyone have any insight they could share about how to approach
 choosing the best JDK release?  (I'm a total Java newb, so any info /
 further reading you guys can provide is appreciated.)

 Thanks.

 sf






-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


Re: Java version with Hadoop 2.0

2013-10-09 Thread Patai Sangbutsarakum
Does that mean for the new cluster, we probably try to start to aim to 
test/use/deploy at java 7?


On Oct 9, 2013, at 3:05 PM, Andre Kelpe ake...@concurrentinc.com
 wrote:

 also keep in mind, that java 6 no longer gets public updates from
 Oracle: http://www.oracle.com/technetwork/java/eol-135779.html
 
 - André
 
 On Wed, Oct 9, 2013 at 11:48 PM, SF Hadoop sfhad...@gmail.com wrote:
 I hadn't.  Thank you!!!  Very helpful.
 
 Andy
 
 
 On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum
 patai.sangbutsara...@turn.com wrote:
 
 maybe you've already seen this.
 
 http://wiki.apache.org/hadoop/HadoopJavaVersions
 
 
 On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com
 wrote:
 
 I am preparing to deploy multiple cluster / distros of Hadoop for testing
 / benchmarking.
 
 In my research I have noticed discrepancies in the version of the JDK that
 various groups are using.  Example:  Hortonworks is suggesting JDK6u31, CDH
 recommends either 6 or 7 providing you stick to some guidelines for each and
 Apache Hadoop seems to be somewhat of a no mans land; a lot of people
 using a lot of different versions.
 
 Does anyone have any insight they could share about how to approach
 choosing the best JDK release?  (I'm a total Java newb, so any info /
 further reading you guys can provide is appreciated.)
 
 Thanks.
 
 sf
 
 
 
 
 
 
 -- 
 André Kelpe
 an...@concurrentinc.com
 http://concurrentinc.com



issue about combine map task output

2013-10-09 Thread ch huang
hi,all:
 i read the doc ,and have a question about combine,if i set
min.num.spills.for.combine = 3 ,so when the 3rd spill happened,the combine
also happend
but i do not know when combine happend if it take three spill file as input
,and merge into  one big spill file as output?


Re: using environment variables in XML conf files

2013-10-09 Thread Harsh J
Java provides a system property for ${user.home} that is usually set
to the same as $HOME of the session on Linux. You can therefore use
${user.home}/tmp/hadoop-${user.name}.

On Wed, Oct 9, 2013 at 9:49 PM, Andrew McDermott
andrew.mcderm...@linaro.org wrote:

 I have the following path hard-coded for hadoop.tmp.dir in
 conf/core-site.xml:

 configuration
   property
 namehadoop.tmp.dir/name
 value/home/aim/tmp/hadoop-${user.name}/value
   /property
 /configuration

 Is it possible to replace the /home/aim with a substitutable version
 $HOME?  On a whim I tried ${env.home} but that didn't work...

 --
 andy





-- 
Harsh J