benchmark my application on hadoop cluster

2015-06-18 Thread Pa Rö
hello,

i want benchmark my mapreduce, mahout, spark, flink k-means on hadoop
cluster.
i have write a jhm benchmark, but i get a error by run on cluster, local
it's work fine.
maybe someone can solve this problem, i have post on stackoverflow:

http://stackoverflow.com/questions/30892720/jmh-benchmark-on-hadoop-yarn

or maybe somebody have experience with benchmark on cluster? which
framework i can use for it?

best regards,
paul


Help with hadoop 2.5.2 in windows

2015-06-18 Thread Nishanth S
Hey friends.

I build hadoop 2.5.2 on my pc  and I am able  to run map reduce jobs
locally after setting hadoop_home.I am trying to set this up in another
machine  by using the same tar file that i built in mine but getting the
below error. Can you please help

Exception in thread "main" java.lang.UnsatisfiedLinkError:
org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String;JZ)V

   at
org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(*Native
Method*)

   at org.apache.hadoop.util.NativeCrc32.calculateChunkedSumsByteArray(
*NativeCrc32.java:86*)

   at org.apache.hadoop.util.DataChecksum.calculateChunkedSums(
*DataChecksum.java:430*)

   at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(
*FSOutputSummer.java:202*)

   at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(
*FSOutputSummer.java:163*)

   at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(
*FSOutputSummer.java:144*)

   at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(
*ChecksumFileSystem.java:400*)

   at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(
*FSDataOutputStream.java:72*)

   at org.apache.hadoop.fs.FSDataOutputStream.close(
*FSDataOutputStream.java:106*)

   at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(
*JobSplitWriter.java:80*)

   at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(
*JobSubmitter.java:603*)

   at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(
*JobSubmitter.java:614*)

   at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
*JobSubmitter.java:492*)

   at org.apache.hadoop.mapreduce.Job$10.run(*Job.java:1296*)

   at org.apache.hadoop.mapreduce.Job$10.run(*Job.java:1293*)


-Nishanth


copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-18 Thread Divya Gehlot
Hi,
I need to copy data from first hadoop cluster to second hadoop cluster.
I cant access second hadoop cluster from first hadoop cluster due to some
security issue.
Can any point me how can I do apart from distcp command.
For instance
Cluster 1 secured zone -> copy hdfs data  to -> cluster 2 in non secured
zone



Thanks,
Divya


Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-18 Thread Nitin Pawar
What's the size of the data?
If you can not do distcp between clusters then other way is doing hdfs get
on the data and then hdfs put on another cluster
On 19-Jun-2015 9:56 am, "Divya Gehlot"  wrote:

> Hi,
> I need to copy data from first hadoop cluster to second hadoop cluster.
> I cant access second hadoop cluster from first hadoop cluster due to some
> security issue.
> Can any point me how can I do apart from distcp command.
> For instance
> Cluster 1 secured zone -> copy hdfs data  to -> cluster 2 in non secured
> zone
>
>
>
> Thanks,
> Divya
>
>
>


Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-18 Thread Divya Gehlot
In thats It will be like three step process .
1. first cluster (secure zone) HDFS  -> copytoLocal -> user local file
system
2. user local space -> copy data -> second cluster user local file system
3. second cluster user local file system -> copyfromlocal -> second
clusterHDFS

Am I on the right track ?



On 19 June 2015 at 12:38, Nitin Pawar  wrote:

> What's the size of the data?
> If you can not do distcp between clusters then other way is doing hdfs get
> on the data and then hdfs put on another cluster
> On 19-Jun-2015 9:56 am, "Divya Gehlot"  wrote:
>
>> Hi,
>> I need to copy data from first hadoop cluster to second hadoop cluster.
>> I cant access second hadoop cluster from first hadoop cluster due to some
>> security issue.
>> Can any point me how can I do apart from distcp command.
>> For instance
>> Cluster 1 secured zone -> copy hdfs data  to -> cluster 2 in non secured
>> zone
>>
>>
>>
>> Thanks,
>> Divya
>>
>>
>>


Using UserGroupInformation in multithread process

2015-06-18 Thread Gaurav Gupta
I am using UserGroupInformation to get the Kerberos tokens.
I have a process in a Yarn container that is spawning another thread
(slave). I am renewing the Kerberos Tokens in master thread but the slave
thread is still using older Tokens.
Are tokens not shared across threads in same JVM?

Thanks
Gaurav