Re: Nifi - Content-repo on AWS-EBS volumes
I just switched a cluster using 3 EBS volumes for cont-repo from gp2 to gp3… resolved definite I/O throughput issues. The change to gp3 was significant enough that I might actually reduce from 3 to 2 volumes, perhaps even a single volume would be sufficient. Of course every use case is unique. On Dec 15, 2023 at 5:37 PM -0500, Gregory M. Foreman , wrote: > Mark: > > Got it. Thank you for the help. > > Greg > > > On Dec 15, 2023, at 4:14 PM, Mark Payne wrote: > > > > Greg, > > > > Whether or not multiple content repos will have any impact depends very > > much on where your system’s bottleneck is. If your bottleneck is disk I/O, > > it will absolutely help. If your bottleneck is CPU, it won’t. If, for > > example, you’re running on bare metal and have 48 cores on your machine and > > you’re running with spinning disks, you’ll definitely want to use multiple > > spinning disks. But if you’re running in AWS on a VM that has 4 cores and > > you’re using gp3 EBS volumes, it’s unlikely that multiple content repos > > will help. > > > > Thanks > > -Mark > > > > > > > > > On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman > > > wrote: > > > > > > Mark: > > > > > > I was just discussing multiple content repos on EBS volumes with a > > > colleague. I found your post from a long time ago: > > > > > > https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv > > > > > > “Re #2: I don't know that i've used any SAN to back my repositories other > > > than the EBS provided by Amazon EC2. In that environment, I found that > > > having one or having multiple repos was essentially equivalent.” > > > > > > Does that statement still hold true today? Essentially there is no real > > > performance benefit to having multiple content repos on multiple EBS > > > volumes? > > > > > > Thanks, > > > Greg > > > > > > > > > > > > > On Dec 11, 2023, at 8:50 PM, Mark Payne wrote: > > > > > > > > Hey Phil, > > > > > > > > NiFi will not spread the content of a single file over multiple > > > > partitions. It will write the content of FlowFile 1 to content repo 1, > > > > then write the next FlowFile to repo 2, etc. so it does round-robin but > > > > does not spread a single FlowFile across multiple repos. > > > > > > > > Thanks > > > > -Mark > > > > > > > > Sent from my iPhone > > > > > > > > > On Dec 11, 2023, at 8:45 PM, Phillip Lord > > > > > wrote: > > > > > > > > > > > > > > > Hello Nifi comrades, > > > > > > > > > > Here's my scenario... > > > > > Let's say I have a Nifi cluster running on EC2 instances with > > > > > attached EBS volumes serving as their repos. They've split up their > > > > > content-repos into three content-repos per node(cont1, cont2, cont3). > > > > > Each being a dedicated EBS volume. My understanding is that the > > > > > content-claims for a single file can potentially span across more > > > > > than one of these repos.(correct me if I've lost my mind over the > > > > > years) > > > > > For instance if you have a 1 MB file, and lets say your > > > > > max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) > > > > > potentially split up across the 3 EBS volumes. So if Nifi is trying > > > > > to move that file to S3 or something for instance... it needs to be > > > > > read from each of the volumes. > > > > > Whereas if it was a single EBS volume for the cont-repo... it would > > > > > read from the single volume, which I would think would be more > > > > > performant? Or does spreading out any IO contention across volumes > > > > > provide more of a benefit? > > > > > I know there's different levels of EBS volumes... but not factoring > > > > > that in for right now. > > > > > > > > > > Appreciate any insight... trying to determine the best configuration. > > > > > > > > > > Thanks, > > > > > Phil > > > > > > > > > > > > > > > >
Re: Nifi - Content-repo on AWS-EBS volumes
Mark: Got it. Thank you for the help. Greg > On Dec 15, 2023, at 4:14 PM, Mark Payne wrote: > > Greg, > > Whether or not multiple content repos will have any impact depends very much > on where your system’s bottleneck is. If your bottleneck is disk I/O, it will > absolutely help. If your bottleneck is CPU, it won’t. If, for example, you’re > running on bare metal and have 48 cores on your machine and you’re running > with spinning disks, you’ll definitely want to use multiple spinning disks. > But if you’re running in AWS on a VM that has 4 cores and you’re using gp3 > EBS volumes, it’s unlikely that multiple content repos will help. > > Thanks > -Mark > > > >> On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman >> wrote: >> >> Mark: >> >> I was just discussing multiple content repos on EBS volumes with a >> colleague. I found your post from a long time ago: >> >> https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv >> >> “Re #2: I don't know that i've used any SAN to back my repositories other >> than the EBS provided by Amazon EC2. In that environment, I found that >> having one or having multiple repos was essentially equivalent.” >> >> Does that statement still hold true today? Essentially there is no real >> performance benefit to having multiple content repos on multiple EBS volumes? >> >> Thanks, >> Greg >> >> >> >>> On Dec 11, 2023, at 8:50 PM, Mark Payne wrote: >>> >>> Hey Phil, >>> >>> NiFi will not spread the content of a single file over multiple partitions. >>> It will write the content of FlowFile 1 to content repo 1, then write the >>> next FlowFile to repo 2, etc. so it does round-robin but does not spread a >>> single FlowFile across multiple repos. >>> >>> Thanks >>> -Mark >>> >>> Sent from my iPhone >>> On Dec 11, 2023, at 8:45 PM, Phillip Lord wrote: Hello Nifi comrades, Here's my scenario... Let's say I have a Nifi cluster running on EC2 instances with attached EBS volumes serving as their repos. They've split up their content-repos into three content-repos per node(cont1, cont2, cont3). Each being a dedicated EBS volume. My understanding is that the content-claims for a single file can potentially span across more than one of these repos.(correct me if I've lost my mind over the years) For instance if you have a 1 MB file, and lets say your max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially split up across the 3 EBS volumes. So if Nifi is trying to move that file to S3 or something for instance... it needs to be read from each of the volumes. Whereas if it was a single EBS volume for the cont-repo... it would read from the single volume, which I would think would be more performant? Or does spreading out any IO contention across volumes provide more of a benefit? I know there's different levels of EBS volumes... but not factoring that in for right now. Appreciate any insight... trying to determine the best configuration. Thanks, Phil >> >
Re: Nifi - Content-repo on AWS-EBS volumes
Greg, Whether or not multiple content repos will have any impact depends very much on where your system’s bottleneck is. If your bottleneck is disk I/O, it will absolutely help. If your bottleneck is CPU, it won’t. If, for example, you’re running on bare metal and have 48 cores on your machine and you’re running with spinning disks, you’ll definitely want to use multiple spinning disks. But if you’re running in AWS on a VM that has 4 cores and you’re using gp3 EBS volumes, it’s unlikely that multiple content repos will help. Thanks -Mark > On Dec 15, 2023, at 3:25 PM, Gregory M. Foreman > wrote: > > Mark: > > I was just discussing multiple content repos on EBS volumes with a colleague. > I found your post from a long time ago: > > https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv > > “Re #2: I don't know that i've used any SAN to back my repositories other > than the EBS provided by Amazon EC2. In that environment, I found that having > one or having multiple repos was essentially equivalent.” > > Does that statement still hold true today? Essentially there is no real > performance benefit to having multiple content repos on multiple EBS volumes? > > Thanks, > Greg > > > >> On Dec 11, 2023, at 8:50 PM, Mark Payne wrote: >> >> Hey Phil, >> >> NiFi will not spread the content of a single file over multiple partitions. >> It will write the content of FlowFile 1 to content repo 1, then write the >> next FlowFile to repo 2, etc. so it does round-robin but does not spread a >> single FlowFile across multiple repos. >> >> Thanks >> -Mark >> >> Sent from my iPhone >> >>> On Dec 11, 2023, at 8:45 PM, Phillip Lord wrote: >>> >>> >>> Hello Nifi comrades, >>> >>> Here's my scenario... >>> Let's say I have a Nifi cluster running on EC2 instances with attached EBS >>> volumes serving as their repos. They've split up their content-repos into >>> three content-repos per node(cont1, cont2, cont3). Each being a dedicated >>> EBS volume. My understanding is that the content-claims for a single file >>> can potentially span across more than one of these repos.(correct me if >>> I've lost my mind over the years) >>> For instance if you have a 1 MB file, and lets say your >>> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially >>> split up across the 3 EBS volumes. So if Nifi is trying to move that file >>> to S3 or something for instance... it needs to be read from each of the >>> volumes. >>> Whereas if it was a single EBS volume for the cont-repo... it would read >>> from the single volume, which I would think would be more performant? Or >>> does spreading out any IO contention across volumes provide more of a >>> benefit? >>> I know there's different levels of EBS volumes... but not factoring that in >>> for right now. >>> >>> Appreciate any insight... trying to determine the best configuration. >>> >>> Thanks, >>> Phil >>> >>> >
Re: Nifi - Content-repo on AWS-EBS volumes
Mark: I was just discussing multiple content repos on EBS volumes with a colleague. I found your post from a long time ago: https://lists.apache.org/thread/nq3mpry0wppzrodmldrcfnxwzp3n1cjv “Re #2: I don't know that i've used any SAN to back my repositories other than the EBS provided by Amazon EC2. In that environment, I found that having one or having multiple repos was essentially equivalent.” Does that statement still hold true today? Essentially there is no real performance benefit to having multiple content repos on multiple EBS volumes? Thanks, Greg > On Dec 11, 2023, at 8:50 PM, Mark Payne wrote: > > Hey Phil, > > NiFi will not spread the content of a single file over multiple partitions. > It will write the content of FlowFile 1 to content repo 1, then write the > next FlowFile to repo 2, etc. so it does round-robin but does not spread a > single FlowFile across multiple repos. > > Thanks > -Mark > > Sent from my iPhone > >> On Dec 11, 2023, at 8:45 PM, Phillip Lord wrote: >> >> >> Hello Nifi comrades, >> >> Here's my scenario... >> Let's say I have a Nifi cluster running on EC2 instances with attached EBS >> volumes serving as their repos. They've split up their content-repos into >> three content-repos per node(cont1, cont2, cont3). Each being a dedicated >> EBS volume. My understanding is that the content-claims for a single file >> can potentially span across more than one of these repos.(correct me if I've >> lost my mind over the years) >> For instance if you have a 1 MB file, and lets say your >> max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially >> split up across the 3 EBS volumes. So if Nifi is trying to move that file >> to S3 or something for instance... it needs to be read from each of the >> volumes. >> Whereas if it was a single EBS volume for the cont-repo... it would read >> from the single volume, which I would think would be more performant? Or >> does spreading out any IO contention across volumes provide more of a >> benefit? >> I know there's different levels of EBS volumes... but not factoring that in >> for right now. >> >> Appreciate any insight... trying to determine the best configuration. >> >> Thanks, >> Phil >> >>
Re: Nifi - Content-repo on AWS-EBS volumes
Hey Phil, NiFi will not spread the content of a single file over multiple partitions. It will write the content of FlowFile 1 to content repo 1, then write the next FlowFile to repo 2, etc. so it does round-robin but does not spread a single FlowFile across multiple repos. Thanks -Mark Sent from my iPhone > On Dec 11, 2023, at 8:45 PM, Phillip Lord wrote: > > > Hello Nifi comrades, > > Here's my scenario... > Let's say I have a Nifi cluster running on EC2 instances with attached EBS > volumes serving as their repos. They've split up their content-repos into > three content-repos per node(cont1, cont2, cont3). Each being a dedicated > EBS volume. My understanding is that the content-claims for a single file > can potentially span across more than one of these repos.(correct me if I've > lost my mind over the years) > For instance if you have a 1 MB file, and lets say your > max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially > split up across the 3 EBS volumes. So if Nifi is trying to move that file to > S3 or something for instance... it needs to be read from each of the volumes. > > Whereas if it was a single EBS volume for the cont-repo... it would read from > the single volume, which I would think would be more performant? Or does > spreading out any IO contention across volumes provide more of a benefit? > I know there's different levels of EBS volumes... but not factoring that in > for right now. > > Appreciate any insight... trying to determine the best configuration. > > Thanks, > Phil > >
Nifi - Content-repo on AWS-EBS volumes
Hello Nifi comrades, Here's my scenario... Let's say I have a Nifi cluster running on EC2 instances with attached EBS volumes serving as their repos. They've split up their content-repos into three content-repos per node(cont1, cont2, cont3). Each being a dedicated EBS volume. My understanding is that the content-claims for a single file can potentially span across more than one of these repos.(correct me if I've lost my mind over the years) For instance if you have a 1 MB file, and lets say your max.content.claim.size is 100KB, that's 10 - 100KB claims(ish) potentially split up across the 3 EBS volumes. So if Nifi is trying to move that file to S3 or something for instance... it needs to be read from each of the volumes. Whereas if it was a single EBS volume for the cont-repo... it would read from the single volume, which I would think would be more performant? Or does spreading out any IO contention across volumes provide more of a benefit? I know there's different levels of EBS volumes... but not factoring that in for right now. Appreciate any insight... trying to determine the best configuration. Thanks, Phil