Regarding containers not launching

2018-01-29 Thread nishchay malhotra
What should I be looking for if my 24-node cluster in not launching enough
containers?
only 40/288 cores are used and 87GB/700GB is memory is used.
Yarn.nodemanager memory/core conf look good. And so do container
memory/core conf.

Thanks
Nishchay Malhotra


RE: HDFS latency and bandwidth/speed

2018-01-29 Thread Manuel Sopena Ballesteros
Thank you very much Anu,

This is very useful

Manuel

From: Anu Engineer [mailto:aengin...@hortonworks.com]
Sent: Tuesday, January 30, 2018 12:25 PM
To: Manuel Sopena Ballesteros; user@hadoop.apache.org
Subject: Re: HDFS latency and bandwidth/speed

Hi Manuel,



Depending on your use case: There are several tools. Unfortunately, most of 
them need some familiarity with HDFS.

Here is a quick set of links that google returns.

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Benchmarking.html

An old blog, but most of these applications work. There are a set of 
applications that get shipped with Hadoop. Both DFSIO and Terragen are useful 
benchmarks.

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

If this is the first time you are using HDFS, you might want to take this as an 
opportunity to write a small program that reads the local files and puts them 
on to HDFS.

When you start working against the cluster, it is the apps that matter, and 
having some familiarity with how applications are written will be very useful.

Thanks
Anu

From: Manuel Sopena Ballesteros 
mailto:manuel...@garvan.org.au>>
Date: Monday, January 29, 2018 at 5:08 PM
To: "user@hadoop.apache.org" 
mailto:user@hadoop.apache.org>>
Subject: HDFS latency and bandwidth/speed

Hi all,

I am going to start working on HDFS, how could I test HDFS latency and speed? 
Is there an ioping command and or hdpram or fio I can use in HDFS?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: HDFS latency and bandwidth/speed

2018-01-29 Thread Anu Engineer
Hi Manuel,



Depending on your use case: There are several tools. Unfortunately, most of 
them need some familiarity with HDFS.

Here is a quick set of links that google returns.

https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/Benchmarking.html

An old blog, but most of these applications work. There are a set of 
applications that get shipped with Hadoop. Both DFSIO and Terragen are useful 
benchmarks.

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

If this is the first time you are using HDFS, you might want to take this as an 
opportunity to write a small program that reads the local files and puts them 
on to HDFS.

When you start working against the cluster, it is the apps that matter, and 
having some familiarity with how applications are written will be very useful.

Thanks
Anu

From: Manuel Sopena Ballesteros 
Date: Monday, January 29, 2018 at 5:08 PM
To: "user@hadoop.apache.org" 
Subject: HDFS latency and bandwidth/speed

Hi all,

I am going to start working on HDFS, how could I test HDFS latency and speed? 
Is there an ioping command and or hdpram or fio I can use in HDFS?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


HDFS latency and bandwidth/speed

2018-01-29 Thread Manuel Sopena Ballesteros
Hi all,

I am going to start working on HDFS, how could I test HDFS latency and speed? 
Is there an ioping command and or hdpram or fio I can use in HDFS?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: 
manuel...@garvan.org.au

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.


Re: performance about writing data to HDFS

2018-01-29 Thread Miklos Szegedi
Hello,

Here is an example.

You can set an initial low replication like this code does:
https://github.com/apache/hadoop/blob/56feaa40bb94fcaa96ae668eebfabec4611928c0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java#L193

Create and write to a stream instead of dealing with a local copy:
https://github.com/apache/hadoop/blob/56feaa40bb94fcaa96ae668eebfabec4611928c0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java#L195

Once you are done, you can set a final replication count and HDFS will
replicate in the background:
https://github.com/apache/hadoop/blob/56feaa40bb94fcaa96ae668eebfabec4611928c0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java#L250

You can optionally even wait until an acceptable replication count is
reached:
https://github.com/apache/hadoop/blob/56feaa40bb94fcaa96ae668eebfabec4611928c0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java#L256

Thanks,
Miklos

On Sun, Jan 28, 2018 at 10:57 PM, 徐传印  wrote:

>
> Hi community:
>   I have a question about the performance of writing to HDFS.
>
>   I've learned that When We write data to HDFS using the interface
> provided by HDFS such as 'FileSystem.create', our client will block until
> all the blocks and their replications are done. This will cause efficiency
> problem if we use HDFS as our final data storage. And many of my colleagues
> write the data to local disk in the main thread and copy it to HDFS in
> another thread. Obviously, it increases the disk I/O.
>
>   So, is there a way to optimize this usage? I don't want to increase the
> disk I/O, neither do I want to be blocked during the writing of extra
> replications.
>
>   How about writing to HDFS by specifying only one replication in the main
> thread and set the actual number of replication in another thread? Or is
> there any better way to do this?
>


performance about writing data to HDFS

2018-01-29 Thread 徐传印

Hi community:
  I have a question about the performance of writing to HDFS.

  I've learned that When We write data to HDFS using the interface provided by 
HDFS such as 'FileSystem.create', our client will block until all the blocks 
and their replications are done. This will cause efficiency problem if we use 
HDFS as our final data storage. And many of my colleagues write the data to 
local disk in the main thread and copy it to HDFS in another thread. Obviously, 
it increases the disk I/O.

  So, is there a way to optimize this usage? I don't want to increase the disk 
I/O, neither do I want to be blocked during the writing of extra replications.



  How about writing to HDFS by specifying only one replication in the main 
thread and set the actual number of replication in another thread? Or is there 
any better way to do this?

Re: Anyone has tried accessing TDE using HDFS Java APIs

2018-01-29 Thread praveenesh kumar
Hi Ajay

Did you get any chance to look into this. Thanks

Regards
Prav

On Fri, Jan 26, 2018 at 8:48 AM, praveenesh kumar 
wrote:

> Hi Ajay
>
> We are using HDP 2.5.5 with HDFS 2.7.1.2.5
>
> Thanks
> Prav
>
> On Thu, Jan 25, 2018 at 5:47 PM, Ajay Kumar 
> wrote:
>
>> Hi Praveenesh,
>>
>>
>>
>> What version of Hadoop you are using?
>>
>>
>>
>> Thanks,
>>
>> Ajay
>>
>>
>>
>> *From: *praveenesh kumar 
>> *Date: *Thursday, January 25, 2018 at 8:22 AM
>> *To: *"user@hadoop.apache.org" 
>> *Subject: *Anyone has tried accessing TDE using HDFS Java APIs
>>
>>
>>
>> Hi
>>
>>
>>
>> We are trying to access TDE files using HDFS JAVA API. The user which is
>> running the job has access to the TDE zone. We have tried accessing the
>> file successfully in Hadoop FS Command shell.
>>
>>
>>
>> If we pass the same file in spark using the same user, it also gets read
>> properly.
>>
>>
>>
>> Its just when we are trying to use the vanila HDFS APIs, its not able to
>> pick the file. And when it picks, its not able to decipher the text. The
>> data is not getting decrypted. My understanding is when you pass
>> hdfs-site.xml, core-site.xml and kms-site.xml to the configuration object,
>> it should be able to handle keys automatically.
>>
>>
>>
>> Not sure if we need to do anything extra in JAVA API. Any pointers. There
>> aren't a single example in the documentation to get the TDE files via HDFS
>> Java API.
>>
>>
>>
>> Any suggestions would be much appreciated.
>>
>>
>>
>> Regards
>>
>> Prav
>>
>
>