Re: Performance with large no of files

2022-10-10 Thread Wei-Chiu Chuang
Do you have security enabled?

We did some preliminary benchmarks around webhdfs (i really want to revisit
it again) and with security enabled, a lot of overhead is between client
and KDC (SPENGO). Try run webhdfs using delegation tokens should help
remove that bottleneck.

On Sat, Oct 8, 2022 at 8:26 PM Abhishek  wrote:

> Hi,
> We want to backup large no of hadoop small files (~1mn) with webhdfs API
> We are getting a performance bottleneck here and it's taking days to back
> it up.
> Anyone know any solution where performance could be improved using any xml
> settings?
> This would really help us.
> v 3.1.1
>
> Appreciate your help !!
>
> --
>
>
>
>
>
>
>
>
>
>
>
>
>
> ~
> *Abhishek...*
>


Re: Performance with large no of files

2022-10-08 Thread Brahma Reddy Battula
Not sure, what's your backup approach.  One option can be archiving[1] the
files which were done for yarn logs[2].
To Speed on this, you can write one mapreduce job for archiving the files.
Please refer to the Document for sample mapreduce[3].


1.https://hadoop.apache.org/docs/stable/hadoop-archives/HadoopArchives.html
2.
https://hadoop.apache.org/docs/stable/hadoop-archive-logs/HadoopArchiveLogs.html
3.
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

On Sun, Oct 9, 2022 at 9:22 AM Ayush Saxena  wrote:

> Using DistCp is the only option AFAIK. Distcp does support webhdfs, then
> try playing with the number of mappers and so to tune it for better
> performance
>
> -Ayush
>
>
> On 09-Oct-2022, at 8:56 AM, Abhishek  wrote:
>
> 
> Hi,
> We want to backup large no of hadoop small files (~1mn) with webhdfs API
> We are getting a performance bottleneck here and it's taking days to back
> it up.
> Anyone know any solution where performance could be improved using any xml
> settings?
> This would really help us.
> v 3.1.1
>
> Appreciate your help !!
>
> --
>
>
>
>
>
>
>
>
>
>
>
>
>
> ~
> *Abhishek...*
>
>


Re: Performance with large no of files

2022-10-08 Thread Ayush Saxena
Using DistCp is the only option AFAIK. Distcp does support webhdfs, then try 
playing with the number of mappers and so to tune it for better performance

-Ayush


> On 09-Oct-2022, at 8:56 AM, Abhishek  wrote:
> 
> 
> Hi,
> We want to backup large no of hadoop small files (~1mn) with webhdfs API
> We are getting a performance bottleneck here and it's taking days to back it 
> up.
> Anyone know any solution where performance could be improved using any xml 
> settings?
> This would really help us.
> v 3.1.1
> 
> Appreciate your help !!
> 
> -- 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ~
> Abhishek...


Performance with large no of files

2022-10-08 Thread Abhishek
Hi,
We want to backup large no of hadoop small files (~1mn) with webhdfs API
We are getting a performance bottleneck here and it's taking days to back
it up.
Anyone know any solution where performance could be improved using any xml
settings?
This would really help us.
v 3.1.1

Appreciate your help !!

-- 













~
*Abhishek...*