[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Laszlo Gaal has abandoned this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > How can we separate them though? The new API returns a UUID. oops, i didn't see that. in that case, shipping those around would cause an immediate size regression. so we need to translate them into ordinals. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > in this patch, let's focus on using the new api, rather than shrinking the How can we separate them though? The new API returns a UUID. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > Agree, that's even better. We need to group the UUIDs by host and then assi in this patch, let's focus on using the new api, rather than shrinking the disk ids down to their minimum size. that's probably best handled in a follow-on. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 437: BlockLocation[] locations = fs.getFileBlockLocations(file, 0, file.getLen()); > still using the old API? I don't think this change is complete. This call stack need to be moved to the new API as well. com.cloudera.impala.catalog.TableLoader.load(Db,String) com.cloudera.impala.catalog.HdfsTable.load(boolean,IMetaStoreClient,Table) com.cloudera.impala.catalog.HdfsTable.load(boolean,IMetaStoreClient,Table,boolean,boolean,Set) com.cloudera.impala.catalog.HdfsTable.loadAllPartitions(List,Table) com.cloudera.impala.catalog.HdfsTable.createPartition(StorageDescriptor,Partition,Map) com.cloudera.impala.catalog.HdfsTable.updatePartitionFds(Path,boolean,HdfsFileFormat,Map) com.cloudera.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem,FileStatus,HdfsPartition$FileDescriptor,HdfsFileFormat,Map) org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(FileStatus,long,long) org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(Path,long,long) org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystem,Path) org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(Path) org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(Path) org.apache.hadoop.hdfs.DFSClient.getBlockLocations(String,long,long) org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(String,long,long) org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(ClientProtocol,String,long,long) -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > why would we ship the uuid around, or a 16-byte int for that matter? why no Agree, that's even better. We need to group the UUIDs by host and then assign the disk ids. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > Agree and that's basically what I think I recommended in my comments here. why would we ship the uuid around, or a 16-byte int for that matter? why not stick to 1-byte ints right after we get the disk uuids and convert them? -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > there is no need to do what you just described. Agree and that's basically what I think I recommended in my comments here. We can shop the UUID around as a TUniqueId though (and not as a string). -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > the UUIDs are generated randomly and we hash those, so it's possible we get there is no need to do what you just described. what we should do: - map uuids to per-node ordinals. in your example, that machine would see disk ordinals 0-7 - use the disk ordinal to directly address the corresponding disk queue with this approach, only servers with >=127 disks would see degradation. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > it's very unusual for a server to have more than 128 disks (i don't think w the UUIDs are generated randomly and we hash those, so it's possible we get a series of disk IDs which when modded all land on the same disk queue for example, we could get hashes 8, 16, 24, 32, etc. on a machine with 8 disks Of course, in aggregate these collisions are unlikely, but like you said a single machine has few disks, so I think this effect can easily happen on a single machine. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > In the BE we mod the disk id to determine which disk queue to put reads on, it's very unusual for a server to have more than 128 disks (i don't think we've ever seen one), so from that perspective i'm not worried about using fewer bits to represent the disk "ordinal". keep in mind that we really don't care about distinguishing disks globally. using a designated "out of range" value (such as 127) if a node should really exceed 127 disks, and then placing scan ranges with that value on random disk queues should degrade performance gracefully in cases where the number of disks is just a bit above that threshold. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > could you go into detail about the performance problems? In the BE we mod the disk id to determine which disk queue to put reads on, so I'm worried about pathological cases with "collisions" where many ranges are assigned to only one disk thread. That will be quite hard to debug. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 379: private static int getDiskId(String storageId) { > As much as I'd like to use this simpler alternative, I think it could lead could you go into detail about the performance problems? the goal is to switch to an *8-bit* node-relative disk id in order to reduce the amount of storage space required for recording this info. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (1 comment) An additional high-level comment http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 526: if (SUPPORTS_STORAGE_ID) { > This seems fine as a first cut, but we should consider taking this a step f On Hadoop 3 the old API is not available at all, so the old API calls won't even compile. Can we factor out the logic into two different implementation classes with a common interface for the two different APIs? That will be necessary anyway and I think will make the logic easier to understand. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: (11 comments) Some high-level comments before digging in deeper. http://gerrit.cloudera.org:8080/#/c/4914/1//COMMIT_MSG Commit Message: Line 9: This change enables Impala to use BlockLocation#getStorageIds, nit: we usually use () after names to denote they are e function/method i.e. getStorageIds -> getStorageIds() getFileBlockLocations -> getFileBlockLocations() Line 12: getFileBlockLocations call which will be deprecated in Hadoop-3. extra space after "be" http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 287: clazz = null; remove Line 290: if ( clazz != null ) { style: extra spaces, we do if (clazz != null) { ... } Line 295: m = null; remove Line 311: LOG.info("SUPPORTS_STORAGE_ID is: " + (SUPPORTS_STORAGE_ID ? "true" : "false")); no need for this ternary logic, just: LOG.info("SUPPORTS_STORAGE_ID is: " + SUPPORTS_STORAGE_ID); Line 352: private static int getSequentialDiskId(String storageId) While this solution is minimally invasive, I'm not a big fan of the extra locking and memory consumption of this map. An overall better design is to convert the UUID to a 128-bit int and use that everywhere. Also see my comments below. Let me know what you think. Line 353: { move to previous line Line 379: private static int getDiskId(String storageId) { As much as I'd like to use this simpler alternative, I think it could lead to serious performance problems that could be very hard to debug. I don't think we can use this. Perhaps we should consider switching to a 128-bit integer to represent the UUID everywhere (including all thrift structures and the BE). The BE only needs a number that it can mod against the number of local data dirs (see DiskIoMgr::AssignQueue()). Using a 128-bit int would also work with the old volume id API. We have an existing TUniqueId which may be suitable. Line 437: BlockLocation[] locations = fs.getFileBlockLocations(file, 0, file.getLen()); still using the old API? My understanding is that switching to a new API for these calls here will make the extra loadDiskIds() unnecessary. Line 526: if (SUPPORTS_STORAGE_ID) { This seems fine as a first cut, but we should consider taking this a step further in a follow-on change. This is still grabbing the storage ids partition-by-partition, but ideally we could get all the blocks and their locations for an entire table. We'd need to handle partitions with non-standard locations specially. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Laszlo Gaal has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. Patch Set 1: First pass on the review to make sure it's going in the right direction. I also wonder about the best way to test it in the current dev environment; the minicluster seems to be too constrained in instances and (simulated) spindles. The old code path is still preserved in case the new HDFS method is not present on the platform. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs
Laszlo Gaal has uploaded a new change for review. http://gerrit.cloudera.org:8080/4914 Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs .. IMPALA-4172: Switch to BlockLocation methods for disk IDs This change enables Impala to use BlockLocation#getStorageIds, the new call in Hadoop 3 for getting HDFS data block location info from the NameNode. This call supercedes the old getFileBlockLocations call which will be deprecated in Hadoop-3. The presence of BlockLocation#getStorageIds is determined dynamically. This is necessary because the implementation was backported to various Hadoop-2 releases after it appeared in Hadoop-3, so simple version checking is not suitable to decide whether the call can be used. In cases where both getFileBlockLocations and BlockLocation#getStorageIds are supported, the code prefers the latter for performance reasons: getStorageIds does not have to query the DataNodes for this information. BlockLocation#getStorageIds returns disk IDs as UUID-based strings, which would be too expensive to ship around and store. This patch maps these strings to small integers, preserving compatibility with the existing representation of diskIDs. Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 1 file changed, 155 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/4914/1 -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal