[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out pipeline usage

2020-04-21 Thread GitBox


sodonnel commented on issue #668:
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-617114038


   @timmylicheng Thanks for the update and the Jira. Unfortunately Github is 
now reporting conflicts, possibly due to HDDS-3101 which was merged. Could you 
resolve the conflicts and push again and hopefully we will get a green build to 
commit this. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out pipeline usage

2020-04-20 Thread GitBox


sodonnel commented on issue #668:
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-616706028


   With the second look @fapifta gave this, I am +1 to commit. We just need to 
do two more things:
   
   1. Create a Jira with my benchmark results and some of the details from my 
comment earlier today in it, and add a comment to `filterViableNodes()` 
mentioning this new Jira as a "todo for performance on large clusters".
   
   2. Push a new commit to get a green CI run. We have one failing integration 
test, which is probably due to it being flaky. However I cannot commit the 
change until we have all tests green.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out pipeline usage

2020-04-20 Thread GitBox


sodonnel commented on issue #668:
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-616555768


   I am a little concerned about the expense of forming the list of healthy 
nodes on large clusters. We have to do quite a lot of work to form a list and 
then only use 3 nodes from the list. Even the method `currentPipelineCount()` 
needs to do a few map lookups per node to get the current pipeline count. This 
is the case even before this change. Creating a pipeline on a large cluster 
would be expensive already, but this change probably makes it worse, due to the 
sort needed. I know it was me who suggested the sort.
   
   I think the code as it is will work OK upto about 1000 nodes, and then the 
performance will drop off as the number of nodes goes toward 10k.
   
   Eg here are some benchmarks I created using this test code, which is similar 
to what we are doing in `filterViableNodes()`:
   
   ```
 public List sortingWithMap(BenchmarkState state) {
   return state.otherList.stream()
   .map(o -> new Mock(o, state.rand.nextInt(20)))
   .filter(o -> o.getSize() <= 20)
   .sorted(Comparator.comparingInt(Mock::getSize))
   .map(o -> o.getObject())
   .collect(Collectors.toList());
 }
   ```
   
   The OPs per second for various list sizes are:
   
   ```
   Benchmark   (listSize)   Mode  Cnt   Score Error  Units
   Sorting.sortingWithMap 100  thrpt3  113948.345 ± 446.426  ops/s
   Sorting.sortingWithMap1000  thrpt39468.507 ± 894.138  ops/s
   Sorting.sortingWithMap5000  thrpt31931.612 ± 263.919  ops/s
   Sorting.sortingWithMap   1  thrpt3 970.745 ±  25.823  ops/s
   Sorting.sortingWithMap  10  thrpt3  87.684 ±  35.438  ops/s
   ```
   
   For a 1000 node cluster, with 10 pipelines per node, we would be looking at 
about 1 second to form all the piplines.
   
   For a 5k node cluster, it would be about 25 seconds
   
   For a 10k node cluster it would be 103 seconds, but even here, that would be 
at close to 1000 pipelines per second.
   
   Those approx times are just for the filterViableNodes step, and not any 
other logic.
   
   Removing the sort step from the above code makes it go about 5 times faster 
and removing the two map steps and the sort makes it overall 10x faster.
   
   It we want to allocate the pipelines evenly, where the lowest loaded nodes 
always get the new pipeline, then I don't think we can avoid the sort. However 
on a larger cluster, picking nodes that have not exceeded the limit at random 
would probably result in quite an even spread, and we could do that more 
efficiently I think.
   
   What do you think? 
   
   If Sammi does not have bandwidth to give this a further review, I will see 
if I can get someone else to take a look too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out pipeline usage

2020-04-14 Thread GitBox
sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out 
pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-613407726
 
 
   I think this looks better and I have just a few minor comments. I added a 
few inline plus these two:
   
   At line 243:
   
   ```
   // First choose an anchor nodes randomly
   DatanodeDetails anchor = chooseNode(healthyNodes);
   ```
   
   This no longer picks a random node - it just picks the lowest loaded node. I 
wonder if this first anchor node should be a random node?
   
   Method getHigherLoadNodes() does not seem to be used any longer, so we can 
remove it.
   
   This code has changed a lot with this patch, and this is really my first 
time looking at it. It would be great if @ChenSammi could give this a review 
too, after addressing the minor points I mentioned here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out pipeline usage

2020-04-10 Thread GitBox
sodonnel commented on issue #668: HDDS-3139. Pipeline placement should max out 
pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-611950545
 
 
   Thanks for the update. Its a holiday weekend here in Europe, so it will 
probably be Tuesday before I get to look at this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out pipeline usage

2020-04-01 Thread GitBox
sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out 
pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-607397167
 
 
   
   
[example.txt](https://github.com/apache/hadoop-ozone/files/4416772/example.txt)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out pipeline usage

2020-04-01 Thread GitBox
sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out 
pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-607396397
 
 
   @timmylicheng I did some more experimenting and I have attached a patch file 
that should apply on your latest PR. In this, I quickly implemented what I 
outlined above. It passes all the existing tests, but this code is probably not 
production ready. There may be some cases I have not considered yet. It is 
something we can consider to change things here. Let me know your thoughts 
please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out pipeline usage

2020-04-01 Thread GitBox
sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out 
pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-607342420
 
 
   > @sodonnel I rebase the code with minor conflicts in test class, but the 
test won't pass. I took a close look and made some change. But I realize the 
issue that I mention in the last comment about how to leverage with 
chooseNodeFromTopology. Wanna learn your thoughts.
   
   I think one problem is this line:
   
   ```
   datanodeDetails = nodes.stream().findAny().get();
   ```
   The findAny method does not seem to return a random entry - so the same node 
is returned until it uses up its pipeline allocation.
   
   I am also not sure about the limit calculation in getLowerLoadNodes:
   
   ```
int limit = nodes.size() * heavyNodeCriteria
   / HddsProtos.ReplicationFactor.THREE.getNumber();
   ```
   
   Adding debug, I find this method starts to return an empty list when there 
are still available nodes to handle the pipeline.
   
   Also in `filterViableNodes()` via the `meetCriteria()` method, nodes with 
more than the heavy load limit are already filtered out, so you are guaranteed 
your healthy node list container only nodes with the capacity to take another 
pipeline. So I wonder why we need to filter the nodes further.
   
   > But I realize the issue that I mention in the last comment about how to 
leverage with chooseNodeFromTopology.
   
   There seems to be some inconsistency in how we pick the nodes (not just in 
this PR, but in the wider code). Eg in `chooseNodeBasedOnRackAwareness()` we 
don't call into NetworkTopology(), but instead we use the 
`getNetworkLocation()` method on the `DatanodeDetails` object to find nodes 
that do not match the anchor's location. 
   
   Then later in `chooseNodeFromNetworkTopology()` we try to find a node where 
location is equal to the anchor and that is where we call into 
`networkTopology.chooseRandom()`. Could we not avoid that call, and avoid 
generating a new list of nodes and do something similar to 
`chooseNodeBasedOnRackAwareness()`, using the `getNetworkLocation()` method to 
find matching nodes. That would probably be more efficient that the current 
implementation.
   
   As we are also then able to re-use the same list of healthy nodes everywhere 
without more filtering, maybe we could sort that list once by pipeline count in 
filterViableNodes or meetCriteria and then later always pick the node with the 
lowest load, filling the nodes up that way.
   
   I hope this comment makes sense as it is very long.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org



[GitHub] [hadoop-ozone] sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out pipeline usage

2020-03-30 Thread GitBox
sodonnel commented on issue #668: HDDS-3139 Pipeline placement should max out 
pipeline usage
URL: https://github.com/apache/hadoop-ozone/pull/668#issuecomment-605964160
 
 
   Hi @timmylicheng this patch is giving some conflicts now, probably as we 
merged the other pipeline related change. Could you rebase against master and 
push it again please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org