[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Reviewed-on: http://gerrit.cloudera.org:8080/6478 Reviewed-by: Dan Hecht Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 866 insertions(+), 419 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 24 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 23: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 23: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/654/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Dan Hecht has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 23: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/23/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 201: // for its partitions. technically, the alignment decl isn't needed here (struct alignment equals the maximum alignment of fields). but what is needed is the overload new operator. but the comment is fine. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, Impala Public Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#23). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 866 insertions(+), 419 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/23 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/6478/22/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 155: uint8_t padding[CACHE_LINE_SIZE]; > What happened? The clang-tidy complaints seem to be about other types; does I misunderstood and thought that changing the alignas(64) on this struct to a CacheLineAligned would eliminate the problem, but aligning this in either way requires DiskIoMgr to also be aligned. http://gerrit.cloudera.org:8080/#/c/6478/22/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 199: class DiskIoMgr { > This is apparently a type that should be cache-aligned. This cache alignment is required because FileHandleCachePartition is cache aligned. To pass clang tidy, cache alignment should apply to both or neither. This upload does neither. I'm switching it to both. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 21: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/653/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Jim Apple has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/6478/22/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 155: uint8_t padding[CACHE_LINE_SIZE]; What happened? The clang-tidy complaints seem to be about other types; doesn't the alignment trick fix it for this one? http://gerrit.cloudera.org:8080/#/c/6478/22/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 199: class DiskIoMgr { This is apparently a type that should be cache-aligned. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, Impala Public Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#22). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 866 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/22 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 21: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/653/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Dan Hecht has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 21: Code-Review+2 Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/20/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 155: uint8_t padding[64]; > You can do this without padding by publicly inheriting from CacheLineAligne Thanks, that is just what I needed. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, Impala Public Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#21). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 862 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/21 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Jim Apple has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/20/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 155: uint8_t padding[64]; You can do this without padding by publicly inheriting from CacheLineAligned from util/aligned-new.h. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, Impala Public Jenkins, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#20). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 866 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/20 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 19: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/648/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 19: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/648/ -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Dan Hecht has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 19: Code-Review+2 Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#19). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 861 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/19 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6478 to look at the new patch set (#18). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 861 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/18 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 288: if (use_file_handle_cache && expected_local_) return Status::OK(); > you need to document that behavior in the scanrange declaration Updated ScanRange::Open documentation and the comment for exclusive_hdfs_fh_. http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 453: /// file handle from the file handle cache to hold for its exclusive use. > also document the behavior for non-local scan ranges Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 453: /// file handle from the file handle cache to hold for its exclusive use. also document the behavior for non-local scan ranges -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 17: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6478/17/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 288: if (use_file_handle_cache && expected_local_) return Status::OK(); you need to document that behavior in the scanrange declaration -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 16: This upload disables the caching for remote files. This also rebases to the latest (which involved minor merges on the tests due to the ADLS change). -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#17). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default for local files 2. Share the file handle between scan ranges from the same file Local scan ranges no longer maintain their own Hdfs file handles. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs or the scan range is remote, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 857 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/17 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 15: (5 comments) http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 283: if (exclusive_hdfs_fh_ != nullptr) return Status::OK(); > when will this be true? The IO thread is calling ReadRange on each ScanRange. ReadRange calls Open whether or not that ScanRange was already open. This check short-circuits in the case that the ScanRange was already open. Line 351: VLOG_FILE << "Cache HDFS file handle file=" << file(); > this is not a meaningful log message, let's remove it Removed. Line 420: HdfsFileHandle* hdfs_file_from_cache = nullptr; > the name is somewhat misleading, because exclusive_hdfs_fh_ is still from t Changed to "borrowed_hdfs_fh" Line 484: // - or if the file handle is not from the cache > but exclusive_hdfs_fh_ is also from the cache, why wouldn't we retry in tha Changed to do retry for exclusive file handles. http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1271: bool cache_hit; > rename to dummy or something like that Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#16). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 849 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/16 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 15: (5 comments) http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 283: if (exclusive_hdfs_fh_ != nullptr) return Status::OK(); when will this be true? Line 351: VLOG_FILE << "Cache HDFS file handle file=" << file(); this is not a meaningful log message, let's remove it Line 420: HdfsFileHandle* hdfs_file_from_cache = nullptr; the name is somewhat misleading, because exclusive_hdfs_fh_ is still from the cache Line 484: // - or if the file handle is not from the cache but exclusive_hdfs_fh_ is also from the cache, why wouldn't we retry in that case (and reset exclusive_hdfs_fh_)? http://gerrit.cloudera.org:8080/#/c/6478/15/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1271: bool cache_hit; rename to dummy or something like that -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#12). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 850 insertions(+), 418 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/12 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/6478/11/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 445: // bad file handle; record the error, but allow for a retry to fix it. > wouldn't the same apply to the pread path? Added retry for hdfsSeek in Open() and hadoopReadZero in ReadFromCache. Since there are multiple locations for retry, I created a separate function that reopens the file handle. Changed the hdfsPread/hdfsRead paths to fill in status inside the loop. Moved this comment up top and made it apply to all the operations. Line 467: if (current_bytes_read != -1 || num_retries >= 1 || > num_retries == 1 and dcheck that it's not >1 somewhere? Done Line 468: (hdfs_file_from_cache == nullptr)) { > remove ()? Done Line 471: // If the file handle is from the cache, it might have become invalid. Destroy > rephrase, we know it's from the cache Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 11: (4 comments) http://gerrit.cloudera.org:8080/#/c/6478/11/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 445: // bad file handle; record the error, but allow for a retry to fix it. wouldn't the same apply to the pread path? also, wouldn't the same apply to l293? Line 467: if (current_bytes_read != -1 || num_retries >= 1 || num_retries == 1 and dcheck that it's not >1 somewhere? Line 468: (hdfs_file_from_cache == nullptr)) { remove ()? Line 471: // If the file handle is from the cache, it might have become invalid. Destroy rephrase, we know it's from the cache -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#11). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. If a file handle in the cache becomes invalid, it may result in Read() calls failing. Consequently, if Read() encounters an error using a file handle from the cache, it will destroy the handle and retry once with a new file handle. Any subsequent error is unrelated to the file handle cache and will be returned. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 815 insertions(+), 410 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/11 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 10: (3 comments) About to post another upload. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 88: /// is set to true. Otherwise, the partition will try to construct a file handle and > true, but you're describing a scenario with concurrent queries and updates. That makes sense. We don't provide guarantees, and it seems like an uncommon case. I will include this on the JIRA I file for the time based eviction. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 417: } else { > so it's just num_bytes_read then. when i see 'last' or 'prev' or something That makes sense. I think this came about trying to avoid a naming collision with the function's argument "bytes_read". Since we already have bytes_read and bytes_read_, I switched this to current_bytes_read rather than num_bytes_read. http://gerrit.cloudera.org:8080/#/c/6478/10/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 352: int success = hdfsGetHedgedReadMetrics(fs_, &hedged_metrics); > you're getting metrics for the whole fs_, but every scan range does that on This came in due to a rebase. IMPALA-4764 added this. I had to move it around because the code changed in Close(). It is setting the metric to this value rather than adding it. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 10: (4 comments) http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 88: /// is set to true. Otherwise, the partition will try to construct a file handle and > Added a TODO for this. I plan on dealing with this when doing eviction by t true, but you're describing a scenario with concurrent queries and updates. we don't make any guarantees in that case. it would be fine for step 6 to notice that the mtime changed when it tries to reopen that file, and return an error at that point. http://gerrit.cloudera.org:8080/#/c/6478/10/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 130: // return code, and we close the file handle and remove it from the cache. thanks for the explanation http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 417: } else { > This is the number of bytes read by either hdfsPread or hdfsRead. In case o so it's just num_bytes_read then. when i see 'last' or 'prev' or something like that in a loop i usually expect this value gets set in iteration n and used in n+1. http://gerrit.cloudera.org:8080/#/c/6478/10/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 352: int success = hdfsGetHedgedReadMetrics(fs_, &hedged_metrics); you're getting metrics for the whole fs_, but every scan range does that on its own, and then updates a global counter? how is that not double-counting? -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#10). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 769 insertions(+), 410 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/10 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#9). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. Tests: query_test/test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It checks the row counts before the modification and after "invalidate metadata", but it does not check the results in between. custom_cluster/test_hdfs_fd_caching.py starts up a cluster with a small file handle cache size. It verifies that a file handle can be reused (i.e. rerunning a query does not result in more file handles cached). It also verifies that the cache capacity is enforced. Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 757 insertions(+), 399 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/9 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 8: (15 comments) http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 88: /// with a call to ReleaseFileHandle to release exclusive control. > this should also evict handles if a more recent mtime for a file with exist Added a TODO for this. I plan on dealing with this when doing eviction by timeout. Here is a scenario that I might be concerned about for immediate eviction when seeing a higher mtime: 1. ScanRange #1 starts up 2. ScanRange #1 does a read, putting a file handle with mtime=5 in the cache. 3. The file is modified and now has an mtime=6. 4. ScanRange #2 starts up 5. ScanRange #2 does a read, putting a file handle with mtime=6 in the cache and removing the mtime=5 file handle. 6. ScanRange #1 wants to do another read and goes to get a file handle from the cache, but mtime=5 file handles aren't available. What happens now? It could return an error, which makes some sense. It could open a new file handle, but it would be looking at the mtime=6 file. The mtime=6 file might have different offsets, so it might be reading garbage. A lot of this is hypothetical, but it would be nice for a ScanRange to deal with a single version of a file. That is a property that the old code had. Line 98: static const uint32_t HASH_SEED = 0x87654321; > where from? No need for a non-zero hash seed for our purpose, so I have removed this and substituted zero in the code. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 65: // Keep going until find unused entry with the same mtime > fix grammar Done Line 113: // to its place in the map > add it? I started working on this. The templating starts to infect everything. I will keep looking into it for a followup, but I think I will leave this as a TODO for the first version. Line 117: if (elem->fh.get() == fh) { > single line Done Line 129: // File can be unbuffered > where does the corresponding buffering take place? Added information in the comment. The buffering here is readahead buffering that Hdfs does for a read. That is what we are unbuffering. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 417: int last_read = -1; > what does this mean? does it count anything, and if so, what? This is the number of bytes read by either hdfsPread or hdfsRead. In case of an error, it returns -1. I changed the name to "last_bytes_read" Line 429: return Status(GetHdfsErrorMsg("Error seeking HDFS file: ", file_)); > also print position Done Line 435: return Status(GetHdfsErrorMsg("Error reading from HDFS file: ", file_)); > leave todo that you need to retry once to deal with stale cache entries Done Line 549: remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead; > why can't remote_bytes_ go into reader_? The only real obstacle is this piece of logging: https://github.com/apache/incubator-impala/blob/master/be/src/runtime/disk-io-mgr-scan-range.cc#L341 (It is in a different location after this code change.) The logging is printing the number of unexpected remote bytes for this scan range. If the remote bytes go the reader_, I don't have a way to separate it out. If we don't need the logging, then the state on the ScanRange can be a boolean, and we can send remote bytes directly to reader_. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1300 > this comment disappeared. Added a comment in the new location. Hdfs can do some readahead buffering, so we want to reduce this buffering when putting a file handle in the cache. Line 278: cached_read_options_(nullptr), > move into .h file and remove here Done http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 498: int64_t remote_bytes_; > num_... Done http://gerrit.cloudera.org:8080/#/c/6478/8/tests/query_test/test_hdfs_fd_caching.py File tests/query_test/test_hdfs_fd_caching.py: Line 74: self.execute_query_expect_success(self.client, count_query) > this should give some expected result Done Line 99: self.execute_query_expect_success(self.client, count_query) > you should check for the expected result, depending on the modification typ Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe M
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 8: (16 comments) http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 88: /// with a call to ReleaseFileHandle to release exclusive control. this should also evict handles if a more recent mtime for a file with existing cache entries is being requested (rather than waiting for them to age out). Line 98: static const uint32_t HASH_SEED = 0x87654321; where from? http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 65: // Keep going until find unused entry with the same mtime fix grammar Line 113: // to its place in the map add it? Line 117: if (elem->fh.get() == fh) { single line Line 129: // File can be unbuffered where does the corresponding buffering take place? this looks asymmetrical. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 417: int last_read = -1; what does this mean? does it count anything, and if so, what? Line 429: return Status(GetHdfsErrorMsg("Error seeking HDFS file: ", file_)); also print position Line 435: return Status(GetHdfsErrorMsg("Error reading from HDFS file: ", file_)); leave todo that you need to retry once to deal with stale cache entries Line 549: remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead; why can't remote_bytes_ go into reader_? http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1300 this comment disappeared. why is this being called? Line 278: cached_read_options_(nullptr), move into .h file and remove here http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 494: /// times for a scan range, it is necessary to keep some state about whether this the descriptions makes it sounds as if you need a flag, not a count. http://gerrit.cloudera.org:8080/#/c/6478/8/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 498: int64_t remote_bytes_; num_... http://gerrit.cloudera.org:8080/#/c/6478/8/tests/query_test/test_hdfs_fd_caching.py File tests/query_test/test_hdfs_fd_caching.py: Line 74: self.execute_query_expect_success(self.client, count_query) this should give some expected result Line 99: self.execute_query_expect_success(self.client, count_query) you should check for the expected result, depending on the modification type -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#8). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It does not check the content of the rows returned. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/custom_cluster/test_hdfs_fd_caching.py M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 13 files changed, 718 insertions(+), 390 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/8 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#7). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0 or the the scan range is accessing data cached by Hdfs, the scan range will get a file handle from the cache and hold it until the scan range is closed. This mimics the existing behavior, except the file handle stays in the cache and is owned by the cache. Since it is in use, it will not be evicted. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". It does not check the content of the rows returned. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/metadata/test_refresh_partition.py M tests/query_test/test_hdfs_fd_caching.py 12 files changed, 702 insertions(+), 374 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/7 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 92: tnode.hdfs_scan_node.parquet_mr_write_zone : ""), > you can also initialize that inline in the .h file Done, also did this for counters_running_ http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 105 > no trailing _ for structs Done http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 116 > provide comments for variables, in particular invariants (e.g., it looks li Done http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 33 > could these be null? hdfs_file_ can be null when we fail to open the file (hdfsOpenFile returns null). http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 61 > also: use something out of util/hash-util.h Done http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1309: string key = string(fname) + ":" + std::to_string(mtime); > probably a good idea, it's unlikely that we'll see a lot of overwritten fil Changed this to hash the filename and then match the mtime on iterating. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 6: (6 comments) quick comments http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 92: initial_ranges_issued_(false), you can also initialize that inline in the .h file http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 105: std::unique_ptr fh_; no trailing _ for structs http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.h File be/src/runtime/disk-io-mgr-handle-cache.h: Line 116: MapType cache_; provide comments for variables, in particular invariants (e.g., it looks like the lru list only contains unused handles) http://gerrit.cloudera.org:8080/#/c/6478/5/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 33: if (hdfs_file_ != nullptr && fs_ != nullptr) { could these be null? http://gerrit.cloudera.org:8080/#/c/6478/6/be/src/runtime/disk-io-mgr-handle-cache.inline.h File be/src/runtime/disk-io-mgr-handle-cache.inline.h: Line 61: int index = hash()(key) % NUM_PARTITIONS; also: use something out of util/hash-util.h http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1309 > I use this as the key for the map, so switching the hash alone wouldn't eli probably a good idea, it's unlikely that we'll see a lot of overwritten files. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#6). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". In the delete case, we expect zero rows. In the move case, we expect the same number of rows. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/query_test/test_hdfs_fd_caching.py 11 files changed, 679 insertions(+), 368 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/6 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (4 comments) Fixed the test comments and fixed some errors in the code found by other existing tests (custom_cluster/test_hdfs_fd_caching.py tests the metrics). http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1331: ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L); > I'm splitting hairs about the exact meaning of cached. I found an existing test that was checking the metrics, and I've changed this code to match what that test was expecting. IO_MGR_NUM_FILE_HANDLES_OUTSTANDING is the number of file handles in use. IO_MGR_NUM_CACHED_FILE_HANDLES now is the count of all files handles. It would also make sense for this to exclude the oustanding file handles, but right now, this counts all file handles. http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py File tests/query_test/test_hdfs_fd_caching.py: Line 76: # File deleted: either # rows is the same or # rows = 0 > we don't make any guarantees for this, so might as well not check the resul Changed to not check the results. Line 98: new_table_location = "{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database) > long lines Done Line 105: # Delete the whole directory (including data file) > this function is pretty much identical to the one above (and below), except Done -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#5). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". In the delete case, we expect zero rows. In the move case, we expect the same number of rows. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h A be/src/runtime/disk-io-mgr-handle-cache.h A be/src/runtime/disk-io-mgr-handle-cache.inline.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/query_test/test_hdfs_fd_caching.py 11 files changed, 732 insertions(+), 368 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/5 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (23 comments) Still fixing the test, but here is an update on the rest of it. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 894: runtime_state_->io_mgr()->cached_file_handles_hit_count(reader_context_)); > pretty awkward. why not call the reader directly (and maybe introduce a new DiskIoRequestContext is an opaque type. disk-io-mgr.h declares it, but the body is defined in disk-io-mgr-internal.h, and we don't include that anywhere outside of disk-io-mgr*. It is annoying. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 441: RuntimeProfile::Counter* cached_file_handles_hit_count_; > initialize to nullptr here, that way you don't need to do it in the c'tor ( Done http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 25: #include > why? boost/unordered_map.hpp is needed, but it was being implicitly included by including lru-cache.h in disk-io-mgr.h. I removed lru-cache.h from disk-io-mgr.h, which broke this. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 342: io_mgr_->CacheOrCloseFileHandle(file(), exclusive_hdfs_fh_); > yes, the name really needs to distinguish between cache operations and dire Done Line 548: remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead; > why not stick that into reader_ as well? Added a comment in disk-io-mgr.h http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1309: string key = string(fname) + ":" + std::to_string(mtime); > you can get rid of dealing with strings here (and doing a bunch of allocati I use this as the key for the map, so switching the hash alone wouldn't eliminate this. However, I could hash on the filename and use the filename as the key, then as it is iterating through file handles for that filename, find one that is free with a matching mtime. Line 1315: // Get lock > that's obvious, no need to paraphrase a single statement (unless there's so Done Line 1318: // Look up in the multimap (find range of values for key) > same here Done Line 1326: if (!elem->reference_count_) { > use explicit numeric comparisons, this isn't a bool (although maybe it shou Changed to bool. Line 1331: ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L); > hm, i'm not sure we should do that. it's still a cached file handle. I'm splitting hairs about the exact meaning of cached. Every file handle is always in the cache, so IO_MGR_NUM_CACHED_FILE_HANDLES could be the same as IO_MGR_NUM_FILE_HANDLES_OUTSTANDING. That would be understandable. The meaning I'm using here is that this is a file handle that is in the cache and not in use. I'm not sure if that is useful. Line 1395: if (p.size_ > p.capacity_) { > single line Done http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 231: /// Threads checkout a file handle for exclusive access and return it when finished. > "check out" (checkout is a noun) Done Line 237: template > please pull this along with hdfsfilehandle out into a separate file, this f Moved this out to disk-io-mgr-handle-cache.h/disk-io-mgr-handle-cache.inline.h. disk-io-mgr-handle-cache.h is included in disk-io-mgr.h. disk-io-mgr-handle-cache.inline.h is only used in disk-io-mgr.cc Line 242: /// first k partitions will each have one additional file handle. > let's make the size uniform across all partitions (e.g., by rounding up), t Done Line 273:: fh_(fh), reference_count_(0) {} > single line Done Line 276: int reference_count_; > why refcount instead of just 'bool in_use' or something like that? Switched to in_use. Line 282: /// the partitions are aligned to cache line boundaries (64 bytes). > hm, isn't there some predefined constant somewhere for the cache line size? std::hardware_destructive_interference_size is in C++17. There is a way to query the system to get it, but I haven't seen a compile time definition. Line 549: Status Open(bool use_file_handle_cache) WARN_UNUSED_RESULT; > does it ever make sense to call open on the same scan range with different This happens on the HDFS caching codepath. If we think a scan range is cached, we call Open(false), but then if caching falls through, we call Close and then reuse the scan range on the normal codepath, which can call it with either true or false. Clarified the comment. Line 573: void* meta_data_; > initialize nullptr here (for all poi
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 1309: string key = string(fname) + ":" + std::to_string(mtime); you can get rid of dealing with strings here (and doing a bunch of allocations) entirely by using something out of util/hash-util.h to compute the hash on fname and mtime separately (no need to convert it to a string). Line 1315: // Get lock that's obvious, no need to paraphrase a single statement (unless there's something subtle going on). Line 1318: // Look up in the multimap (find range of values for key) same here Line 1326: if (!elem->reference_count_) { use explicit numeric comparisons, this isn't a bool (although maybe it should be). Line 1331: ImpaladMetrics::IO_MGR_NUM_CACHED_FILE_HANDLES->Increment(-1L); hm, i'm not sure we should do that. it's still a cached file handle. Line 1395: if (p.size_ > p.capacity_) { single line http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 273:: fh_(fh), reference_count_(0) {} single line Line 276: int reference_count_; why refcount instead of just 'bool in_use' or something like that? -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (18 comments) looking good. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 894: runtime_state_->io_mgr()->cached_file_handles_hit_count(reader_context_)); pretty awkward. why not call the reader directly (and maybe introduce a new getter)? http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 441: RuntimeProfile::Counter* cached_file_handles_hit_count_; initialize to nullptr here, that way you don't need to do it in the c'tor (which you have to edit if the member order changes, or which you might have to duplicate if you multiple c'tors). do that for all pointer members. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 25: #include why? http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 342: io_mgr_->CacheOrCloseFileHandle(file(), exclusive_hdfs_fh_); yes, the name really needs to distinguish between cache operations and direct operations on handles. Line 548: remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead; why not stick that into reader_ as well? http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 231: /// Threads checkout a file handle for exclusive access and return it when finished. "check out" (checkout is a noun) Line 237: template please pull this along with hdfsfilehandle out into a separate file, this file is large enough as it is. Line 242: /// first k partitions will each have one additional file handle. let's make the size uniform across all partitions (e.g., by rounding up), that should simplify the code. Line 282: /// the partitions are aligned to cache line boundaries (64 bytes). hm, isn't there some predefined constant somewhere for the cache line size? Line 549: Status Open(bool use_file_handle_cache) WARN_UNUSED_RESULT; does it ever make sense to call open on the same scan range with different values of the bool? if not, make a parameter of the c'tor? point out clearly in the comment that if the param is true, open does nothing. Line 573: void* meta_data_; initialize nullptr here (for all pointer members) Line 587: /// Total number of bytes read remotely point out why we need to maintain this particular metric instead of in requestcontext. Line 601: /// handle, and no sharing takes place. also mention when this is populated and how long it's valid. Line 856: HdfsFileHandle* OpenHdfsFile(const hdfsFS& fs, if this always returns a cached handle, would be good to reflect that in the name. Line 860: void CacheOrCloseFileHandle(const char* fname, HdfsFileHandle* fid); rename http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py File tests/query_test/test_hdfs_fd_caching.py: Line 76: # File deleted: either # rows is the same or # rows = 0 we don't make any guarantees for this, so might as well not check the result. we just don't want to crash. Line 98: new_table_location = "{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database) long lines Line 105: # Delete the whole directory (including data file) this function is pretty much identical to the one above (and below), except for the specific modification action. consolidate into a single function and drive that with an enum parameter. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#4). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Opening the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. The file handle cache maintains ownership of the file handles at all times, but it will not evict a file handle that is in use. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". In the delete case, we expect zero rows. In the move case, we expect the same number of rows. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M tests/query_test/test_hdfs_fd_caching.py 9 files changed, 639 insertions(+), 311 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/4 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 3: (9 comments) i don't understand the need for unique_ptrs for the file handles, please see comments inline. http://gerrit.cloudera.org:8080/#/c/6478/3/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 286: if (fs_ != NULL) { replace w/ nullptr throughout (and in the other files you touch). http://gerrit.cloudera.org:8080/#/c/6478/3/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 205: /// additionally encapsulates the last modified time of the associated file when it was fix this sentence. Line 207: class HdfsCachedFileHandle { we use this class exclusively to access hdfs data, even if the file handle isn't cached. remove 'cached' from the name, it's distracting. Line 476: /// Opens the file for this range. This function only modifies state in this range. explain param Line 489: void GetHdfsStatistics(hdfsFile fh); is this okay to pass by value? what diskiorequestcontext is this referring to? the comment should make sense on its own. Line 521: /// File handle for local fs (FILE*) remove (file*) Line 526: std::unique_ptr hdfs_file_; requestrange already contains an fs_. what of the functionality provided by hdfscachedfilehandle do you actually need here? this class also contains an mtime_, so lots of confusing and error-prone duplication. this encapsulates a file handle out of the cache. but if the cache is used to manage all file handles, why do you need to store this with a unique_ptr (instead of a raw ptr, and then simply hand that back to the cache when you're done)? this looks like it contradicts the responsibility of the cache for managing file handles. Line 774: /// and return an instance of HdfsCachedFileHandle. In case of an error returns NULL. explain all parameters Line 784: std::unique_ptr& fid); no ref parameters -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 3: (11 comments) initial thoughts on the partitioned map. http://gerrit.cloudera.org:8080/#/c/6478/3/be/src/util/lru-cache.h File be/src/util/lru-cache.h: Line 67: typedef void (*DeleterFn)(Value&); why now a ref? Line 82: /// Variant of Put with move semantics. is this needed? Line 91: bool Pop(const Key& k, Value& out); we typically don't use refs for this purpose. why the switch? Line 102: static void DummyDeleter(Value& v) {} make private Line 115: typedef std::pair ValueType; value of what? Line 139: /// Wrapper around FifoMultimap that hashes the keys into a fixed number of partitions which hash fn? Line 146: template you can make the number of partitions another template parameter, that way you can use a std::array for cache_partitions_. i also don't see a real need for the unique_ptr in there. fewer indirection should be faster. it would also be good to put the individual Fifomultimap objects on cache line boundaries. alignas might help here. Line 151: /// See FifoMultimap this doesn't explain how you deal with the remainder. you could also simplify this and stipulate that capacity is per-partition. http://gerrit.cloudera.org:8080/#/c/6478/3/be/src/util/lru-cache.inline.h File be/src/util/lru-cache.inline.h: Line 36: const typename MapType::value_type& val = i think if you use 'using' for MapType instead of typedef you can get rid of the typename. Line 37: std::make_pair(k, lru_list_.emplace(lru_list_.end(), k, std::move(v))); is the explicit move needed for the rvalue v? Line 51: void FifoMultimap::Put(const Key& k, const Value& v) { there should be a way not to require basically a verbatim copy of this function -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#3). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Open the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. Tests: test_hdfs_fd_caching.py copies the files from an existing table into a new directory and uses that to create an external table. It queries the external table, then uses the hdfs commandline to manipulate the hdfs file (delete, move, etc). It queries again to make sure we don't crash. Then, it runs "invalidate metadata". In the delete case, we expect zero rows. In the move case, we expect the same number of rows. TODO: 1. Determine appropriate defaults. 2. Other tests a. File overwrite b. Any others? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/lru-cache-test.cc M be/src/util/lru-cache.h M be/src/util/lru-cache.inline.h M tests/query_test/test_hdfs_fd_caching.py 12 files changed, 498 insertions(+), 166 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/3 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Joe McDonnell has uploaded a new patch set (#2). Change subject: IMPALA-4623: Enable file handle cache .. IMPALA-4623: Enable file handle cache Currently, every scan range maintains a file handle, even when multiple scan ranges are accessing the same file. Open the file handles causes load on the NameNode, which can lead to scaling issues. There are two parts to this transaction: 1. Enable file handle caching by default 2. Share the file handle between scan ranges from the same file The scan range no longer maintains its own Hdfs file handle. On each read, the io thread will get the Hdfs file handle from the cache (opening it if necessary) and use that for the read. This allows multiple scan ranges on the same file to use the same file handle. Since the file offsets are no longer consistent for an individual scan range, all Hdfs reads need to either use hdfsPread or do a seek before reading. Additionally, since Hdfs read statistics are maintained on the file handle, the read statistics must be retrieved and cleared after each read. Scan ranges that are accessing data cached by Hdfs will get a file handle from the cache, but the file handle will be kept on the scan range for the time that the scan range is in use. This prevents the cache from closing the file handle while the data buffer is in use. To manage contention, the file handle cache is now partitioned by a hash of the key into independent caches with independent locks. The allowed capacity of the file handle cache is split evenly among the partitions. File handles are evicted independently for each partition. If max_cached_file_handles is set to 0, file handle caching is off and the previous behavior applies. TODO: 1. Determine appropriate defaults. 2. Write tests a. Delete test b. Block rebalance test? 3. For scan ranages that use Hdfs caching, should there be some sharing at the scanner level? Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d --- M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/runtime/buffered-block-mgr.h M be/src/runtime/disk-io-mgr-internal.h M be/src/runtime/disk-io-mgr-reader-context.cc M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/disk-io-mgr.h M be/src/util/lru-cache-test.cc M be/src/util/lru-cache.h M be/src/util/lru-cache.inline.h 11 files changed, 387 insertions(+), 124 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6478/2 -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong