Re: Performance with large no of files
Do you have security enabled? We did some preliminary benchmarks around webhdfs (i really want to revisit it again) and with security enabled, a lot of overhead is between client and KDC (SPENGO). Try run webhdfs using delegation tokens should help remove that bottleneck. On Sat, Oct 8, 2022 at 8:26 PM Abhishek wrote: > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API > We are getting a performance bottleneck here and it's taking days to back > it up. > Anyone know any solution where performance could be improved using any xml > settings? > This would really help us. > v 3.1.1 > > Appreciate your help !! > > -- > > > > > > > > > > > > > > ~ > *Abhishek...* >
Re: Performance with large no of files
Not sure, what's your backup approach. One option can be archiving[1] the files which were done for yarn logs[2]. To Speed on this, you can write one mapreduce job for archiving the files. Please refer to the Document for sample mapreduce[3]. 1.https://hadoop.apache.org/docs/stable/hadoop-archives/HadoopArchives.html 2. https://hadoop.apache.org/docs/stable/hadoop-archive-logs/HadoopArchiveLogs.html 3. https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html On Sun, Oct 9, 2022 at 9:22 AM Ayush Saxena wrote: > Using DistCp is the only option AFAIK. Distcp does support webhdfs, then > try playing with the number of mappers and so to tune it for better > performance > > -Ayush > > > On 09-Oct-2022, at 8:56 AM, Abhishek wrote: > > > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API > We are getting a performance bottleneck here and it's taking days to back > it up. > Anyone know any solution where performance could be improved using any xml > settings? > This would really help us. > v 3.1.1 > > Appreciate your help !! > > -- > > > > > > > > > > > > > > ~ > *Abhishek...* > >
Re: Performance with large no of files
Using DistCp is the only option AFAIK. Distcp does support webhdfs, then try playing with the number of mappers and so to tune it for better performance -Ayush > On 09-Oct-2022, at 8:56 AM, Abhishek wrote: > > > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API > We are getting a performance bottleneck here and it's taking days to back it > up. > Anyone know any solution where performance could be improved using any xml > settings? > This would really help us. > v 3.1.1 > > Appreciate your help !! > > -- > > > > > > > > > > > > > > ~ > Abhishek...
Performance with large no of files
Hi, We want to backup large no of hadoop small files (~1mn) with webhdfs API We are getting a performance bottleneck here and it's taking days to back it up. Anyone know any solution where performance could be improved using any xml settings? This would really help us. v 3.1.1 Appreciate your help !! -- ~ *Abhishek...*