Re: Performance with large no of files
Not sure, what's your backup approach. One option can be archiving[1] the files which were done for yarn logs[2]. To Speed on this, you can write one mapreduce job for archiving the files. Please refer to the Document for sample mapreduce[3]. 1.https://hadoop.apache.org/docs/stable/hadoop-archives/HadoopArchives.html 2. https://hadoop.apache.org/docs/stable/hadoop-archive-logs/HadoopArchiveLogs.html 3. https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html On Sun, Oct 9, 2022 at 9:22 AM Ayush Saxena wrote: > Using DistCp is the only option AFAIK. Distcp does support webhdfs, then > try playing with the number of mappers and so to tune it for better > performance > > -Ayush > > > On 09-Oct-2022, at 8:56 AM, Abhishek wrote: > > > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API > We are getting a performance bottleneck here and it's taking days to back > it up. > Anyone know any solution where performance could be improved using any xml > settings? > This would really help us. > v 3.1.1 > > Appreciate your help !! > > -- > > > > > > > > > > > > > > ~ > *Abhishek...* > >
Re: Performance with large no of files
Using DistCp is the only option AFAIK. Distcp does support webhdfs, then try playing with the number of mappers and so to tune it for better performance -Ayush > On 09-Oct-2022, at 8:56 AM, Abhishek wrote: > > > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API > We are getting a performance bottleneck here and it's taking days to back it > up. > Anyone know any solution where performance could be improved using any xml > settings? > This would really help us. > v 3.1.1 > > Appreciate your help !! > > -- > > > > > > > > > > > > > > ~ > Abhishek...
Performance with large no of files
Hi, We want to backup large no of hadoop small files (~1mn) with webhdfs API We are getting a performance bottleneck here and it's taking days to back it up. Anyone know any solution where performance could be improved using any xml settings? This would really help us. v 3.1.1 Appreciate your help !! -- ~ *Abhishek...*
Re: Communicating between yarn and tasks after delegation token renewal
There’s no way to do that. Once YARN launches containers, it doesn’t communicate with them for anything after that. The tasks / containers can obviously always reach out to YARN services. But even that in this case is not helpful because YARN never exposes through APIs what it is doing with the tokens or when it is renewing them. What is it that you are doing? What new information are you trying to share with the tasks? What framework is this? A custom YARN app or MapReduce / Tez / Spark / Flink etc..? Thanks +Vinod > On Oct 7, 2022, at 10:40 PM, Julien Phalip wrote: > > Hi, > > IIUC, when a distributed job is started, Yarn first obtains a delegation > token from the target resource, then securely pushes the delegation token to > the individual tasks. If the job lasts longer than a given period of time, > then Yarn renews the delegation token (or more precisely, extends its > lifetime), therefore allowing the tasks to continue using the delegation > token. This is based on the assumption that the delegation token itself is > static and doesn't change (only its lifetime can be extended on the target > resource's server). > > I'm building a custom service where I'd like to share new information with > the tasks once the delegation token has been renewed. Is there a way to let > Yarn push new information to the running tasks right after renewing the token? > > Thanks, > > Julien