[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375821309 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375770793 > If we were using stock ConcurrentHashMap it'd be less risky, but since the hashmap is our code too, it's better to err on the safe side. `ConcurrentLongHashMap` is already everywhere in the code base :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375752857 @ivankelly > it seems pointless to have locks around a concurrent hashmap well. depends on how you see stuffs. concurrent hashmap is section based implementation, it has better parallelism and less contention on accessing stuffs. especially most of the time the map is protected under a read lock and only be blocked when the map is going to modified. > I've made a PR into your branch with a lockless approach. I have thought about lockless approach but I didn't do it. A lockless approach moves the mutation out of the protected of a write lock, which can potentially make it exposed to race conditions that we don't think of. This area has been very tricky based on the PRs and discussions we had around it. It is a too-risky change for me. That's also the reason I explained above and the reason I don't feel comfortable about your lockless change. My immediate concern is only "don't do IO under one lock". I am much comfortable to keep the read/write lock until there is an evidence showing read/write lock is a problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375587424 > Ah yes. At minimum we should be checking again for existence inside the write lock. this is one of the change I made here, checking the existence. > But again, my question is whether this has been observed to be a problem? no at benchmark now. I hit the race condition then I noticed this behavior that file lookup is under a giant lock, which I would like to remove. because we have been working hard at SortedLedgerStorage to avoid doing any IO under lock. This FileInfoBackingCache unfortunately added this behavior back. So I have a strong concern about this and want to move the I/O operation out. However I don't want to change the locking behavior on getting FileInfo, reasons explained as below. > if we want to get rid of the locking, we should get rid of the big lock completely, which shouldn't be too hard with the concurrent map. a lot of race conditions can come around when competing on getting the fileinfo. so check-existence and put is much safer to happen under the write lock. and they are just memory operation, which is fine to happen under the write lock. However find the directory for a ledger is an I/O operation, it is expensive, unpredictable than memory operation. so it should not happen under the lock. so the current approach is the most comfortable and safest approach I can think of. However if you really feel strong about this, feel free to start your approach. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375587424 > Ah yes. At minimum we should be checking again for existence inside the write lock. this is one of the change I made here, checking the existence. > But again, my question is whether this has been observed to be a problem? no at benchmark now. I hit the race condition then I noticed this behavior that file lookup is under a giant lock, which I would like to remove. because we have been working hard at SortedLedgerStorage to avoid doing any IO under lock. This FileInfoBackingCache unfortunately added this behavior back. So I have a strong concern about this and want to move the I/O operation out. However I don't want to change the locking behavior on getting FileInfo, reasons explained as below. > if we want to get rid of the locking, we should get rid of the big lock completely, which shouldn't be too hard with the concurrent map. a lot of race conditions can come around when competing on getting the fileinfo. so check-existence and put is much safer to happen under the write lock. and they are just memory operation, which is fine to happen under the write lock. However check if the file exists or not is an I/O operation, it is expensive, unpredictable than memory operation. so it should not happen under the lock. so the current approach is the most comfortable and safest approach I can think of. However if you really feel strong about this, feel free to start your approach. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375420639 > Could you give a sequence of events for how the other race occurs? I am using your diagram and using the existing logic ``` | Read Fi CacheLoader | Write Fi CacheLoader | | == | == | | calls loadFileInfo | | | - checks fileInfos under read lock | | || | || calls loadFileInfo | || - checks fileInfos under read lock | || - create a file info and put into fileInfos under write lock | | - create another file info and put it | | | into fileInfos under write lock| | ``` The Read Fi CacheLoader will overwrite the fileinfo created by Write Fi CacheLoader. > Has the locking shown to be a problem in benching? it is a giant write lock. so every thread is blocking waiting for file lookup (which is done by the file loader). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375420639 > Could you give a sequence of events for how the other race occurs? I am using your diagram and using the existing logic ``` | Read Fi CacheLoader | Write Fi CacheLoader | | == | == | | calls loadFileInfo | | | - checks fileInfos under read lock | | || | || calls loadFileInfo | || - checks fileInfos under read lock | || - create a file info and put into fileInfos under write lock | | - create another file info and put it | | | into fileInfos under write lock| | ``` > Has the locking shown to be a problem in benching? it is a giant write lock. so every thread is blocking waiting for file lookup (which is done by the file loader). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375420639 > Could you give a sequence of events for how the other race occurs? I am using your diagram and using the existing logic ``` | Read Fi CacheLoader | Write Fi CacheLoader | | == | == | | calls loadFileInfo | | | - checks fileInfos under read lock | | || | || calls loadFileInfo | || - checks fileInfos under read lock | || - create a file info and put into fileInfos under write lock | | - create another file || |info and put it into|| | fileInfos under write lock || ``` > Has the locking shown to be a problem in benching? it is a giant write lock. so every thread is blocking waiting for file lookup (which is done by the file loader). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375420639 > Could you give a sequence of events for how the other race occurs? I am using your diagram and using the existing logic ``` | Read Fi CacheLoader | Write Fi CacheLoader | | == | == | | calls loadFileInfo | | | - checks fileInfos under read lock | | ||| || calls loadFileInfo | || - checks fileInfos under read lock | || - create a file info and put into fileInfos under write lock | | - create another file info and put it into fileInfos under write lock | | ``` > Has the locking shown to be a problem in benching? it is a giant write lock. so every thread is blocking waiting for file lookup (which is done by the file loader). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375420639 > Could you give a sequence of events for how the other race occurs? I am using your diagram and using the existing logic ``` | Read Fi CacheLoader | Write Fi CacheLoader | | == | == | | calls loadFileInfo | | | - checks fileInfos under read lock | | ||| || calls loadFileInfo | || - checks fileInfos under read lock | || - create a file info and put into fileInfos under write lock | | - create another file | | |info and put it into fileInfos under write lock | | ``` > Has the locking shown to be a problem in benching? it is a giant write lock. so every thread is blocking waiting for file lookup (which is done by the file loader). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] sijie commented on issue #1284: Improve FileInfoBackingCache
sijie commented on issue #1284: Improve FileInfoBackingCache URL: https://github.com/apache/bookkeeper/pull/1284#issuecomment-375182391 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services