[Impala-ASF-CR] [PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations
Dimitris Tsirogiannis has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8545 Change subject: [PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations .. [PREVIEW] IMPALA-5058: Improve concurrency of DDL/DML operations Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the gather catalog updates operation. The problem was due to the conservative locking scheme used when catalog updates were collected. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog update even if its version is not in the specified catalog update version range. Hence, the same catalog object may be sent in two consecutive catalog updates. Change-Id: I3032437f83d39bcc8cff14d897c7c106a4ab62d3 --- M be/src/exec/catalog-op-executor.cc M be/src/exec/catalog-op-executor.h M be/src/service/client-request-state.cc M be/src/service/frontend.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/thrift/CatalogService.thrift M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionManager.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/Role.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/util/SentryProxy.java M fe/src/test/java/org/apache/impala/testutil/CatalogServiceTestCatalog.java M fe/src/test/java/org/apache/impala/testutil/ImpaladTestCatalog.java 30 files changed, 1,153 insertions(+), 458 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/8545/1 -- To view, visit http://gerrit.cloudera.org:8080/8545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3032437f83d39bcc8cff14d897c7c106a4ab62d3 Gerrit-Change-Number: 8545 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8529 ) Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI. .. Patch Set 1: Screenshots of the catalog web UI to illustrate the changes: https://cloudera.box.com/s/tgvt0yc3mfhvc25i2tm30m3h49769jfp https://cloudera.box.com/s/hc0ru8q2ajfne6gzy5x4s5ri5p6kgawz -- To view, visit http://gerrit.cloudera.org:8080/8529 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e Gerrit-Change-Number: 8529 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Mon, 13 Nov 2017 19:40:35 + Gerrit-HasComments: No
[Impala-ASF-CR] [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI.
Dimitris Tsirogiannis has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8529 Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI. .. [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI. The following changes are included in this commit: * Adds a lightweight framework for registering metrics in the JVM. * Adds table-level metrics and enables these metrics to be exposed through the catalog web UI. * Adds a CatalogUsageMonitor class that monitors and reports the catalog usage in terms of the tables with the highest memory requirements and the tables with the highest number of metadata operations. The catalog usage information is exposed in the /catalog page of the catalog web UI. Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M common/thrift/Frontend.thrift M common/thrift/JniCatalog.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java A fe/src/main/java/org/apache/impala/common/Metrics.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java A fe/src/main/java/org/apache/impala/util/TopNCache.java M www/catalog.tmpl A www/table_metrics.tmpl 21 files changed, 953 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/8529/1 -- To view, visit http://gerrit.cloudera.org:8080/8529 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e Gerrit-Change-Number: 8529 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 22: Code-Review+2 (7 comments) Minor formatting nits. Tests seem reasonable to me. http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc@1930 PS22, Line 1930: int32_t hs2_port) { fix indentation http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc@1948 PS22, Line 1948: ; remove http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impala-server.cc@1980 PS22, Line 1980: boost::shared_ptr beeswax_processor(new ImpalaServiceProcessor(handler)); nit: long line http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impalad-main.cc File be/src/service/impalad-main.cc: http://gerrit.cloudera.org:8080/#/c/8202/22/be/src/service/impalad-main.cc@88 PS22, Line 88: FLAGS_hs2_port nit: indentation http://gerrit.cloudera.org:8080/#/c/8202/22/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8202/22/bin/start-impala-cluster.py@316 PS22, Line 316: """Waits for a catalog copy to be received by the impalad. When its received, additionally nit: long line http://gerrit.cloudera.org:8080/#/c/8202/22/tests/common/custom_cluster_test_suite.py File tests/common/custom_cluster_test_suite.py: http://gerrit.cloudera.org:8080/#/c/8202/22/tests/common/custom_cluster_test_suite.py@125 PS22, Line 125: cluster_size=CLUSTER_SIZE, num_coordinators=NUM_COORDINATORS, : use_exclusive_coordinators=False, log_level=1, : expected_num_executors=CLUSTER_SIZE): indentation http://gerrit.cloudera.org:8080/#/c/8202/22/tests/custom_cluster/test_catalog_wait.py File tests/custom_cluster/test_catalog_wait.py: http://gerrit.cloudera.org:8080/#/c/8202/22/tests/custom_cluster/test_catalog_wait.py@81 PS22, Line 81: # TODO: add more tests : # - impalads that are just coordinators : # - impalads that are just executors : # - process reincarnation, e.g., healthy impalad crashes and is restarted remove -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 22 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Juan Yu Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Fri, 10 Nov 2017 07:00:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pair> It will until doSubscriberUpdate is called which will remove the entry. You I was chatting with Michael and he mentioned that in general we don't recommend the use of weak_ptrs. So, if keeping the entries for some time is a concern by using shared_ptrs, you may want to ignore my recommendation. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 21:44:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pair> Wouldn't that keep a bunch of unregistered 'Subscriber' objects around due It will until doSubscriberUpdate is called which will remove the entry. You can even use a weak_ptr here if keeping these entries for some time is a concern. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 21:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3613: Avoid topic updates to unregistered subscriber instances
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8449 ) Change subject: IMPALA-3613: Avoid topic updates to unregistered subscriber instances .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h File be/src/statestore/statestore.h: http://gerrit.cloudera.org:8080/#/c/8449/3/be/src/statestore/statestore.h@385 PS3, Line 385: std::pairI think the code would be much simpler if you stored a pointer (probably a shared_ptr is needed) to the Subscriber here and simply compared it to the registered Subscriber. -- To view, visit http://gerrit.cloudera.org:8080/8449 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0329ae7d23dc6e9b04b7bc3ee8d89cbc73756f65 Gerrit-Change-Number: 8449 Gerrit-PatchSet: 3 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Mon, 06 Nov 2017 20:08:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 10: Code-Review+2 Nice work! -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 10 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Oct 2017 22:22:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 9: (16 comments) http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@42 PS9, Line 42: datastructure nit: data structure http://gerrit.cloudera.org:8080/#/c/8235/9//COMMIT_MSG@50 PS9, Line 50: speed up speedup http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@122 PS9, Line 122: Limits the "Maximum" http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@223 PS9, Line 223: public FileMetadataLoadStats(Path path) { hdfsPath = path; } nit: move the ctor below the class variable members. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@228 PS9, Line 228: nit: extra space http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@232 PS9, Line 232: the remove http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@248 PS9, Line 248: runnable callable http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@266 PS9, Line 266: // Computes the file metadata from scratch (by calling resetAndLoadFileMetadata()) : // when reuseFileMd_ is false, else refreshes the existing file metadata for : // modified files using refreshFileMetadata(). nit: you can actually remove the comment as it simply describes the code below. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@465 PS9, Line 465: hasFileChanged I think you need to put back the caching check. Alex made a good point that it is used for updating the cached bytes size. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@773 PS9, Line 773: loadParitionFileMetadataFromScratch "load from scratch" and "reset and load" essentially mean the same thing. Maybe rename this to resetAndLoadPartitionFileMetadata()? http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792 PS9, Line 792: S3 non-HDFS (S3/ADLS) http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@792 PS9, Line 792: S3 "the latter" http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@844 PS9, Line 844: if (LOG.isTraceEnabled()) { : LOG.trace(loadStats.debugString()); : } nit: single line http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java File fe/src/main/java/org/apache/impala/service/BackendConfig.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/service/BackendConfig.java@71 PS9, Line 71: maxS3PartsParallelLoad Why "maxS3..."? Everywhere else we use "maxNonHdfs..". http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java File fe/src/main/java/org/apache/impala/util/ListMap.java: http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@43 PS9, Line 43: getList() { return list_; } Hm, why is list_ exposed to the world? You may want to check who wants this and what do they do with it (i.e modify or simply iterate). If it's the latter you can pass an ImmutableList(). If it is the former, see if we need to expand the public api of this class instead. http://gerrit.cloudera.org:8080/#/c/8235/9/fe/src/main/java/org/apache/impala/util/ListMap.java@76 PS9, Line 76: synchronized (this) { this is equivalent to public synchronized void populate... -- To view, visit http://gerrit.cloudera.org:8080/8235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I07eaa7151dfc4d56da8db8c2654bd65d8f808481 Gerrit-Change-Number: 8235 Gerrit-PatchSet: 9 Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Oct 2017 19:24:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 ) Change subject: IMPALA-1422: support a constant on LHS of IN predicates. .. Patch Set 5: (8 comments) Did you add any planner tests? http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@228 PS5, Line 228:if (newConjunct instanceof BoolLiteral) { :smap.put(conjunct, newConjunct); :continue; :} nit: remove tabs. Also, I am not sure I understand what is happening here. http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@308 PS5, Line 308: correlated aggregates what is a correlated aggregate? http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@323 PS5, Line 323: c use capital C just to be consistent with L321 http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@329 PS5, Line 329: 10 IS NULL OR Since we know what the constant literal is, can't we avoid the is null check and the disjunction in some cases? http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@355 PS5, Line 355: or 10 IS NOT NULL Same comment as above. http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@360 PS5, Line 360: . nit: remove '.' http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@366 PS5, Line 366: nit: remove empty lines (L366, L369 and 371) http://gerrit.cloudera.org:8080/#/c/8322/5/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@428 PS5, Line 428: newSubquery.reset(); : newSubquery.analyze(rhsQuery.getAnalyzer()); Why do you need to go through reset/analyze? Same question for L459-460. -- To view, visit http://gerrit.cloudera.org:8080/8322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb Gerrit-Change-Number: 8322 Gerrit-PatchSet: 5 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 25 Oct 2017 18:42:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1422: support a constant on LHS of IN predicates.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 ) Change subject: IMPALA-1422: support a constant on LHS of IN predicates. .. Patch Set 4: (20 comments) First pass. I think you may want to add a few planner tests to show the resulting plans. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@96 PS4, Line 96: Preconditions.checkState(rawPath_ != null); Just by looking at some of the constructors, this precondition doesn't seem particularly correct. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@225 PS4, Line 225: if (conjunct instanceof InPredicate) { I think it's better if you extract the check from L347 and merge it here. Then you can also remove the comment. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@288 PS4, Line 288: the resulting expression :* is not analyzed Out of curiosity, why not analyzing the rewritten expression here? http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@296 PS4, Line 296: 1) For each case, you may want to mention if it works for correlated/uncorrelated subqueries. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@302 PS4, Line 302: Example: Case with "limit 1"? http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@313 PS4, Line 313: NOT EXISTS (RHS AND C = e OR e IS NULL) Not sure I understand what this means. Is C = e OR e is NULL injected into the WHERE clause of RHS or something else? http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@332 PS4, Line 332: NOT EXISTS (S ? F (RHS) t WHERE C = e or e IS NOT NULL) :) I am not following this notation. Let's talk, maybe I am missing something obvious. Alternatively, simple examples might just work for all these cases. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340 PS4, Line 340:* @param inPred :* @return rewritten expression or null :* @throws AnalysisException Remove, we don't use javadoc :) http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@349 PS4, Line 349: Expr rhs = inPred.getChild(1); : Preconditions.checkArgument(rhs.contains(Subquery.class)); I believe this check is redundant. Subqueries can only be in the rhs of the in predicate. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@353 PS4, Line 353: nit: empty lines (remove) http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@381 PS4, Line 381: new ArrayList() nit: we use guava's static functions. i.e. Lists.newArrayList(); even if you use the standard java classes, you don't need to specify the type after the class name, it can be inferred by path's type (e.g. ArrayList path = new ArrayList<>();). http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@408 PS4, Line 408: Subquery newSubquery = new Subquery(rewriteQuery); : newSubquery.analyze(rhsQuery.getAnalyzer()); Hm, rewriteQuery is already analyzed, so I am wondering if the analyze() call will have any affect at all. http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@414 PS4, Line 414: aggregation "or analytic function". http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@430 PS4, Line 430: new ArrayList<>(); use guava's static functions http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@679 PS4, Line 679: remove empty lines http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/8322/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@341 PS4, Line 341: TODO for 2.3 I know it's not your fault but can you remove the reference to 2.3 from all the TODOs in this file?
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 11: Code-Review+1 (4 comments) You may want to ask someone from BE and QE to sing off. http://gerrit.cloudera.org:8080/#/c/8202/11/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/11/be/src/service/impala-server.cc@1979 PS11, Line 1979: // Wait for the frontend catalog to be ready prior to opening client ports. nit: Maybe expand this comment to mention that this call will block indefinitely in the case when the first catalog update never arrives for some reason. http://gerrit.cloudera.org:8080/#/c/8202/11/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8202/11/bin/start-impala-cluster.py@82 PS11, Line 82: action="store_true", : help=SUPPRESS_HELP) nit: merge these two and save a line :) http://gerrit.cloudera.org:8080/#/c/8202/11/bin/start-impala-cluster.py@304 PS11, Line 304: wait_for_client I think this should be named wait_for_catalog since that is the high level operation we're interested in. http://gerrit.cloudera.org:8080/#/c/8202/11/tests/common/custom_cluster_test_suite.py File tests/common/custom_cluster_test_suite.py: http://gerrit.cloudera.org:8080/#/c/8202/11/tests/common/custom_cluster_test_suite.py@127 PS11, Line 127: if disable_catalog: : cmd.append("--disable_catalog") nit: single line -- To view, visit http://gerrit.cloudera.org:8080/8202 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52b881cba18a7e4533e21a78751c2e35c3d4c8a6 Gerrit-Change-Number: 8202 Gerrit-PatchSet: 11 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 23 Oct 2017 18:43:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4704: Turns on client connections when local catalog initialized.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Turns on client connections when local catalog initialized. .. Patch Set 9: (6 comments) http://gerrit.cloudera.org:8080/#/c/8202/9/be/src/testutil/in-process-servers.cc File be/src/testutil/in-process-servers.cc: http://gerrit.cloudera.org:8080/#/c/8202/9/be/src/testutil/in-process-servers.cc@75 PS9, Line 75: if (started.ok()) return impala; : LOG(WARNING) << started.GetDetail(); : : delete impala; The changes in this file seem to alter the behavior a bit. In particular, previously even if SetCatalogInitialized would return an error, impala_server_ would still be initialized (see L107-108). With your change, if SetCatalogInitialized() throws an error, StartWithClientServers () will return with an error causing impala to be deleted. I don't recall how we use this in-process-server thing and your changes are most likely doing the right thing, but just wanted to point out the change in behavior. http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py@80 PS9, Line 80: parser.add_option("--disable_catalog", dest="disable_catalog", default=False, : action="store_true", : help="Starts all processes except catalogd.") This is used only for testing and using this doesn't result in a valid cluster configuration. So, I think it's best if we hide/remove this option. One option is to hide it using something like SUPPRESS_HELP/USAGE. Other option is to control this behavior using an env variable and not a startup option. Maybe the first one is not too bad. Thoughts? http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py@289 PS9, Line 289: impalad's catalog "catalog cache" http://gerrit.cloudera.org:8080/#/c/8202/9/bin/start-impala-cluster.py@304 PS9, Line 304: def wait_for_catalog(impalad, timeout_in_seconds): : """Waits for the impalad catalog to become ready""" : start_time = time() : catalog_ready = False : attempt = 0 : while (time() - start_time < timeout_in_seconds and not catalog_ready): : try: : num_dbs = impalad.service.get_metric_value('catalog.num-databases') : num_tbls = impalad.service.get_metric_value('catalog.num-tables') : catalog_ready = impalad.service.get_metric_value('catalog.ready') : if catalog_ready or attempt % 4 == 0: : print 'Waiting for Catalog... Status: %s DBs / %s tables (ready=%s)' %\ : (num_dbs, num_tbls, catalog_ready) : attempt += 1 : except Exception, e: : print e : sleep(0.5) : if not catalog_ready: : raise RuntimeError('Catalog was not initialized in expected time period.') : : def wait_for_client(impalad, timeout_in_seconds=CLUSTER_WAIT_TIMEOUT_IN_SECONDS): : """Waits for the client ports to become ready.""" : start_time = time() : client_beeswax = None : client_hs2 = None : while (time() - start_time < timeout_in_seconds): : try: : client_beeswax = impalad.service.create_beeswax_client() : client_hs2 = impalad.service.create_hs2_client() : break; : except Exception as e: : print 'Client services not ready: %s. Trying again ...' : finally: : if client_beeswax is not None: client_beeswax.close() : sleep(0.5) : : if client_beeswax is None or client_hs2 is None: : raise RuntimeError('Client port not openned in expected time period.') Does it make sense to merge these two functions into a single wait_for_catalog function? wait_for_client() is only used for checking if the catalog is ready because we can't rely on the 'catalog.ready' metric, no? So, if this thing is not useful why not remove it and check for the client connections instead? And then we can retrieve the num_dbs and num_tbls from the metrics. http://gerrit.cloudera.org:8080/#/c/8202/9/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8202/9/fe/src/main/java/org/apache/impala/service/Frontend.java@860 PS9, Line 860: state update http://gerrit.cloudera.org:8080/#/c/8202/9/fe/src/main/java/org/apache/impala/service/Frontend.java@904
[Impala-ASF-CR] IMPALA-5058: Improve concurrency of DDL/DML during catalog updates
Dimitris Tsirogiannis has abandoned this change. ( http://gerrit.cloudera.org:8080/8266 ) Change subject: IMPALA-5058: Improve concurrency of DDL/DML during catalog updates .. Abandoned There are a number of serious issues in this patch. Abandoning for now. -- To view, visit http://gerrit.cloudera.org:8080/8266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I16ae27be79eb0c5a9648a15d368ca60a7d04507b Gerrit-Change-Number: 8266 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 4: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: // HMS because some of them may already exist there. In that case, we load in the : // catalog the partitions that already exist in HMS but aren't in the catalog yet. : if (allHmsPartitionsToAdd.size() != addedHmsPartitions.size()) { : List difference = computeDifference(allHmsPartitionsToAdd, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : > Sure. I tightened the partitioned loop to only include the msClient call. Well, you're only calling it for the diff, right? I would expect in most cases the diff to be small, so I am not really worried about this call. That said, we can reevaluate that decision if we have any reason to believe this is going to cause any problems. http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py@522 PS4, Line 522: alter_stmt = "alter table t add " + " ".join("partition(p=%d)" % (i,) for i in xrange(MAX_PARTITION_UPDATES_PER_RPC + 2)) nit: long line -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 4 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 16 Oct 2017 23:15:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4704: Disallow client connections to imapalad until catalog is received.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8202 ) Change subject: IMPALA-4704: Disallow client connections to imapalad until catalog is received. .. Patch Set 7: (15 comments) http://gerrit.cloudera.org:8080/#/c/8202/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8202/7//COMMIT_MSG@7 PS7, Line 7: imapalad nit: typo http://gerrit.cloudera.org:8080/#/c/8202/7//COMMIT_MSG@7 PS7, Line 7: catalog catalog update http://gerrit.cloudera.org:8080/#/c/8202/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8202/7/be/src/service/impala-server.cc@a2045 PS7, Line 2045: :) nice http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py File bin/start-impala-cluster.py: http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@81 PS7, Line 81: action="store_true", help="Starts all cluster processes except catalogd.") nit: long line http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@254 PS7, Line 254: catalog_needs_wait I think that function tests if the catalog is ready given that we haven't disabled it. So maybe "is_catalog_ready" is a better name. Thoughts? http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@323 PS7, Line 323: def wait_for_client(impalad, timeout_in_seconds=CLUSTER_WAIT_TIMEOUT_IN_SECONDS): Add a function comment. http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@332 PS7, Line 332: ready = True You can just break and use client_hs2 or client_beeswax in the check in L339. http://gerrit.cloudera.org:8080/#/c/8202/7/bin/start-impala-cluster.py@336 PS7, Line 336: client_beeswax hs2_client doesn't have a close() function? http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859 PS7, Line 859: ready Maybe also mention what "ready" means, i.e. receive the first update from the catalog. http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@867 PS7, Line 867: if (getCatalog().isReady()) return; It may make sense to add a log message here to indicate that the catalog is now ready after waiting for MAX_CATALOG_UPDATE_WAIT_TIME_MS * num_triews msec. http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@868 PS7, Line 868: catalog to be ready That phrase kind of suggests that the catalog server is not ready. Maybe best to say that we're waiting for the initial catalog update from the statestore or for the local catalog cache to be initialized. http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/Frontend.java@899 PS7, Line 899: support when the catalog is not ready. Maybe "is not supported if the local catalog cache is not initialized." http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/main/java/org/apache/impala/service/JniFrontend.java@587 PS7, Line 587: public void waitForCatalog() { : frontend_.waitForCatalog(); : } single line http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/test/java/org/apache/impala/service/FrontendTest.java File fe/src/test/java/org/apache/impala/service/FrontendTest.java: http://gerrit.cloudera.org:8080/#/c/8202/7/fe/src/test/java/org/apache/impala/service/FrontendTest.java@117 PS7, Line 117: private void testCatalogIsNotReady(String stmt, Frontend fe) { : TQueryCtx queryCtx = TestUtils.createQueryContext( : Catalog.DEFAULT_DB, AuthorizationTest.USER.getName()); : queryCtx.client_request.setStmt(stmt); : try { : fe.createExecRequest(queryCtx, new StringBuilder()); : fail("Expected failure to due uninitialized catalog."); : } catch (IllegalStateException e) { : assertEquals("Analyzing a query is not support when the catalog is not ready.", : e.getMessage()); : } catch (Exception e) { : fail("Failed to create exec request due to: " + ExceptionUtils.getStackTrace(e)); : } : } Does this even make sense to keep given the introduced behavior in this patch? http://gerrit.cloudera.org:8080/#/c/8202/7/tests/common/custom_cluster_test_suite.py File tests/common/custom_cluster_test_suite.py: http://gerrit.cloudera.org:8080/#/c/8202/7/tests/common/custom_cluster_test_suite.py@138 PS7, Line 138:
[Impala-ASF-CR] IMPALA-5058: Improve concurrency of DDL/DML during catalog updates
Dimitris Tsirogiannis has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8266 Change subject: IMPALA-5058: Improve concurrency of DDL/DML during catalog updates .. IMPALA-5058: Improve concurrency of DDL/DML during catalog updates Problem: A long running table metadata operation (e.g. refresh) could prevent any other metadata operation from making progress if it coincided with the gather catalog updates operation. The problem was due to the conservative locking scheme used when catalog updates were collected. In particular, in order to collect a consistent snapshot of metadata changes, the global catalog lock was held for the entire duration of that operation. Solution: To improve the concurrency of catalog operations the following changes are performed: * A range of catalog versions determines the catalog changes to be included in a catalog update. Any catalog changes that do not fall in the specified range are ignored (to be processed in subsequent catalog updates). * The catalog allows metadata operations to make progress while collecting catalog updates. * To prevent starvation of catalog updates (i.e. frequently updated catalog objects skipping catalog updates indefinitely), we keep track of the number of times a catalog object has skipped an update and if that number exceeds a threshold it is included in the next catalog update even if its version is not in the specified catalog update version range. Hence, the same catalog object may be sent in two consecutive catalog updates. Change-Id: I16ae27be79eb0c5a9648a15d368ca60a7d04507b --- M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java A fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSource.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/HdfsCachePool.java M fe/src/main/java/org/apache/impala/catalog/Role.java M fe/src/main/java/org/apache/impala/catalog/RolePrivilege.java M fe/src/main/java/org/apache/impala/catalog/Table.java 11 files changed, 265 insertions(+), 191 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/8266/1 -- To view, visit http://gerrit.cloudera.org:8080/8266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I16ae27be79eb0c5a9648a15d368ca60a7d04507b Gerrit-Change-Number: 8266 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 3: (2 comments) Re testing, a simple python test that adds at least MAX_PARTITION_UPDATES_PER_SEC + 1 partitions and then validates that they were actually added in the HMS (e.g. by doing show partitions) is sufficient. If you want you can convert the analysis test in AnalyzeDDLTest.java to a positive test but it is kind of redundant if you add the python test I just mentioned. http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954 PS3, Line 1954: Why did you remove this? http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: if (hmsSublist.size() != addedHmsPartitions.size()) { : List difference = computeDifference(hmsSublist, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : // updated catalog version. : addHdfsPartition(tbl, partition); : } I think you can refactor this so that you make only one call to getPartitionsFromHms() outside the for loop. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 3 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 12 Oct 2017 16:59:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5429: Multi threaded block metadata loading
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8235 ) Change subject: IMPALA-5429: Multi threaded block metadata loading .. Patch Set 5: (22 comments) http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@215 PS5, Line 215: // Block metadata loading stats for a single HDFS path. nit: File/block (since we're also loading/refreshing files). Also, you may want to change the name of the private class to reflect that, e.g. PathMetadataLoadingStats or something like that. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@217 PS5, Line 217: loadedFiles_ You may want to add a comment here. What is loaded vs refreshed? Is the one superset of the other. Good to clarify to avoid confusion. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@218 PS5, Line 218: _ I believe the convention is that we don't use '_' for public members. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@231 PS5, Line 231: Runnable I don't know how this is used later on, but alternatively you can make PathBlockMetadataLoadRequest implement Callable, hence returning the stats when calling call(). Now you seem to access the stats only through the debugString() function. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@247 PS5, Line 247: Blocks on the loadBlockMetadata() call. Not following this comment. run() either calls refreshBlockMetadata() or loadBlockMetadata(), so I can't really interpret what this comment is saying. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@333 PS5, Line 333: loadBlockMetadata I know I am guilty for some of these names but maybe rename this to "resetAndLoadBlockMetadata"? Then it is more clear what the differences are between this function and rerfreshBlockMetadata(). http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@363 PS5, Line 363: numUnknownDiskIds Are you overriding or incrementing the value of numUnknownDiskIds in the create() function? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@368 PS5, Line 368: newFileDescs Hm, that doesn't seem particularly safe (i.e. using the same list for every partition). Are we certain that any other partition modification operation (e.g. alter partition set location) will not try to override the file descriptors, thereby affecting all the other partitions? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@380 PS5, Line 380: changed :* mtime It's not just the changed mtime that we're looking for in order to determiner modification, so you may want to either remove this or mention all the criteria we use. I prefer the former. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@383 PS5, Line 383: The initial table load still uses the listFiles() :* on the data directory that fetches both the FileStatus as well as BlockLocations in :* a single call. remove http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@398 PS5, Line 398: get(0) Comment why we take the file descriptors of one partition and that it doesn't really matter which one we choose. http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@399 PS5, Line 399: public String apply(FileDescriptor desc) { : return desc.getFileName(); : } Add @Override and make it a single line (if it fits) http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@448 PS5, Line 448: (fd == null) || (fd.getFileLength() != status.getLen()) || : (fd.getModificationTime() != status.getModificationTime()); I think we were also checking if the partition was cached. Isn't this check needed anymore? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@455 PS5, Line 455: refreshFileMetadata Maybe call it refreshPartitionStorageMetadata? Overall, it may make sense to replace "File/Block" with "Storage" whenever it makes sense. Thoughts? http://gerrit.cloudera.org:8080/#/c/8235/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@694 PS5, Line 694: Exception Why this change?
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 9: Code-Review+1 (5 comments) Nice. So much better. I'll let Alex take a look as well and do the final +2. http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java File fe/src/main/java/org/apache/impala/service/FeSupport.java: http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/main/java/org/apache/impala/service/FeSupport.java@292 PS9, Line 292: paresResult I believe you meant parseResult? http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@125 PS9, Line 125: public int getStartingLineNum() { : return startLineNum; : } single line http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@129 PS9, Line 129: public TQueryOptions getOptions() { : return this.options; : } nit: single line http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@133 PS9, Line 133: public void setOptions(TQueryOptions options) { : this.options = options; : } single line http://gerrit.cloudera.org:8080/#/c/7564/9/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@367 PS9, Line 367: Add a Preconditions.checkNotNull(result) here. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 9 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 05 Oct 2017 18:55:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4682 Fix IllegalStateException issue
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8143 ) Change subject: IMPALA-4682 Fix IllegalStateException issue .. Patch Set 1: Zoram, let's try to move this forward. We don't want too many open CRs. -- To view, visit http://gerrit.cloudera.org:8080/8143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I57c20aeed401275d45913fedfd61c206c38641b7 Gerrit-Change-Number: 8143 Gerrit-PatchSet: 1 Gerrit-Owner: Zoram ThangaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 04 Oct 2017 18:12:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 8: (3 comments) Thanks! That looks much better. http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@734 PS8, Line 734: if (options == null) { : options = defaultQueryOptions(); : } else { : options = mergeQueryOptions(defaultQueryOptions(), options); : } Move above L732 and make options an input parameter to TestFileParser constructor. http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java File fe/src/test/java/org/apache/impala/testutil/TestFileParser.java: http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@26 PS8, Line 26: import java.lang.reflect.Method; Is this needed? http://gerrit.cloudera.org:8080/#/c/7564/8/fe/src/test/java/org/apache/impala/testutil/TestFileParser.java@355 PS8, Line 355: private void parseOneQueryOption(String line, TestCase testCase) { Why parsing one query option at a time? ParseQueryOptions() simply expects a comma-separated list of options and parses all of them. Wouldn't be simpler to make a JNI call to ParseQueryOptions() instead? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 8 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Wed, 04 Oct 2017 18:09:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6009: Upgrade Guava to 14.0.1
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8198 ) Change subject: IMPALA-6009: Upgrade Guava to 14.0.1 .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8198 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddc5da8849d5aa7317d3dc572884d05dee859bdd Gerrit-Change-Number: 8198 Gerrit-PatchSet: 4 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Comment-Date: Tue, 03 Oct 2017 21:01:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Fri, 29 Sep 2017 20:40:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5951: Remove flaky test catalogd timeout
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8154 ) Change subject: IMPALA-5951: Remove flaky test_catalogd_timeout .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 Gerrit-Change-Number: 8154 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Fri, 29 Sep 2017 20:40:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@744 PS2, Line 744: revoke role grant_revoke_test_ROOT from group root Is this needed for your test? http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@750 PS2, Line 750: QUERY : show current roles : RESULTS: VERIFY_IS_SUBSET : 'grant_revoke_test_ALL_SERVER' : TYPES : STRING : No need for that. Let's remove to make the test a bit smaller. http://gerrit.cloudera.org:8080/#/c/8168/2/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@808 PS2, Line 808: # IMPALA-4951: make sure database is visible to a user having only column level access : # to a table in the database I think it may be best to move this comment at the beginning of your changes so that it's more obvious what these added statements test. -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Fri, 29 Sep 2017 04:56:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@761 PS1, Line 761: root > I wanted to use a different user for this test, otherwise I would have to g I think it's fine to use $GROUP_NAME throughout. -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 28 Sep 2017 21:00:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8136 ) Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java File fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java: http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java@74 PS1, Line 74: import com.google.common.base.Joiner; Is it needed? http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test File testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test: http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test@37 PS1, Line 37: 0 Is this correct? http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test File testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test: http://gerrit.cloudera.org:8080/#/c/8136/1/testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test@38 PS1, Line 38: 71 There are a bunch of second digit changes in most of the tests. Are these ok? http://gerrit.cloudera.org:8080/#/c/8136/1/tests/metadata/test_explain.py File tests/metadata/test_explain.py: http://gerrit.cloudera.org:8080/#/c/8136/1/tests/metadata/test_explain.py@131 PS1, Line 131: nit: extra space -- To view, visit http://gerrit.cloudera.org:8080/8136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f3e72471ac563adada4a4156033a85852b7c8b7 Gerrit-Change-Number: 8136 Gerrit-PatchSet: 1 Gerrit-Owner: Alex BehmGerrit-Reviewer: Balazs Jeszenszky Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 28 Sep 2017 19:48:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5951: test catalogd timeout fails to cause expected exception
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8154 ) Change subject: IMPALA-5951: test_catalogd_timeout fails to cause expected exception .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test File testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test: http://gerrit.cloudera.org:8080/#/c/8154/1/testdata/workloads/functional-query/queries/QueryTest/kudu-timeouts-catalogd.test@24 PS1, Line 24: describe functional_kudu.alltypes Isn't this test also susceptible to the same timing issue? I don't see much value in these tests and it seems to me they will always be kind of flaky. I suggest we remove kudu-timeouts-catalog.test altogether. -- To view, visit http://gerrit.cloudera.org:8080/8154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I29fd67d0acc0ee15943c416f2179ad716d2cac05 Gerrit-Change-Number: 8154 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 28 Sep 2017 18:52:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4951: Fix database visibility for user with only column privilege
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8168 ) Change subject: IMPALA-4951: Fix database visibility for user with only column privilege .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@761 PS1, Line 761: root Why are you using root and not $GROUP_NAME for this test? http://gerrit.cloudera.org:8080/#/c/8168/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test@786 PS1, Line 786: # IMPALA-4951: make sure database is visible to a user having only column level access : # to a table in the database : show databases : RESULTS : 'default','Default Hive database' : 'grant_rev_db','' : TYPES : STRING,STRING : I think the test for this jira should work as follows: 1. You have/create a db/table for which user has no privileges. 2. You do show databases and verify the db is not listed 3. You add a column privilege on the table. 4. You do show databases and see that db is now listed. -- To view, visit http://gerrit.cloudera.org:8080/8168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id77904876729c0223fd6ace2d5e7199bd700a33a Gerrit-Change-Number: 8168 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 28 Sep 2017 18:44:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7564 ) Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 6: Thanks for adding the support for parsing the query options. Did you try to reuse the c++ implementation through a JNI call? It would be nice if we could avoid implementing the same functionality in both c++ and java. See FeSupport.java class for other cases where we call into the backend from the java side. Thanks. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-Change-Number: 7564 Gerrit-PatchSet: 6 Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Comment-Date: Thu, 28 Sep 2017 17:51:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5310: Add COMPUTE STATS TABLESAMPLE.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8136 ) Change subject: IMPALA-5310: Add COMPUTE STATS TABLESAMPLE. .. Patch Set 1: (26 comments) First pass. High level approach seems reasonable to me. Will take a look at the tests next. http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/8136/1/be/src/exec/catalog-op-executor.cc@218 PS1, Line 218: // The first column is the COUNT(*) expr of the original query. Can you put a similar comment above L206. I had a hard time figuring out what rows[0].colVals[0] represented in L208. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@58 PS1, Line 58: Existing partition-level row counts are not modified. Mention what happens to the partitions that don't have the row count set. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@60 PS1, Line 60: unmodified remove http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@61 PS1, Line 61: extrapolated Maybe add some details about how extrapolation is computed. I haven't looked at the entire patch so you may mention it elsewhere. If so, you can just point to the file that provides that description. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@62 PS1, Line 62: so as not to confuse other engines like Hive/SparkSQL which may rely on : * the shared HMS fields being accurate. That part may be confusing to someone with no background knowledge. For instance, the fact that other engines require "accurate" statistics which we update using sampling and extrapolation :), kind of oxymoron. I think it is ok to just say that we store the extrapolated stats and remove that part. Thoughts? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@64 PS1, Line 64: Independently, row-count extrapolation is also done during planning based on the : * numRows / totalSize ratio because the table contents may have changed since the : * last time COMPUTE STATS was run. Remove and put it near the relevant code (planning). http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@77 PS1, Line 77: Not supported for now to keep the logic/code simple. Ha, this is the opposite in the COMPUTE STATS case. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@106 PS1, Line 106: totalFileBytes_ Can't you use the HdfsTable.totalFileBytes_ instead? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@108 PS1, Line 108: Only set when : // TABLESAMPLE was specified. Set to -1 for non-HDFS tables Maybe "Set to -1 for non-HDFS tables or when TABLESAMPLE is not specified"? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@305 PS1, Line 305: expectAllPartitions_ = !updateTableStatsOnly() So if I do not update table stats only I should expect to compute stats on all partitions? Essentially, is isIncremental check blended in this function? I think it may make sense to extract that check. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@344 PS1, Line 344: expectAllPartitions_ = false; Why is this needed? Isn't L305 sufficient? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@440 PS1, Line 440: // Only add group by clause for HdfsTables. Comment doesn't seem relevant to the code that follows. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@442 PS1, Line 442: !updateTableStatsOnly() expectAllPartitions_? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@452 PS1, Line 452: !updateTableStatsOnly() expectAllPartitions_? http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@506 PS1, Line 506: sets 'expectAllPartitions_' to false Hm, I don't see that in the code. http://gerrit.cloudera.org:8080/#/c/8136/1/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@511 PS1, Line 511: base remove?
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 8: Code-Review+2 Rebase and keep Dan's +2 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 8 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 26 Sep 2017 16:20:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Bharath Vissapragada, Alex Behm, Vuk Ercegovac, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#7). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M tests/statestore/test_statestore.py 21 files changed, 480 insertions(+), 441 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/7 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 7 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/7731 ) Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.h@131 PS6, Line 131: Also determines any : /// deletions of catalog objects by looking at the : /// difference of the last set of topic entry keys that were sent and the current set : /// of topic entry keys > Remove? Done http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog-server.cc@308 PS6, Line 308: bool topi > should that be DCHECK_GT? The old conditional to continue on was <= ... Correct. Done http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/catalog/catalog.h@59 PS6, Line 59: Gets all Catalog objects and the metadata that is applicable for the given request. : /// Always returns all object names that exist in the Catalog, but allows for extended : /// metadata for objects that were modified after the specified version. > is that accurate? from our discussion, my understanding was that only obje Yeah, wording is not clear. Changed the comment. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc@177 PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); > What I mean is that topics that are transient do not get values in deleted Hm, it's still not clear to me why a transient topic cannot get values in deleted topics, in the sense that I see the topic being transient and how to process deletions two orthogonal issues and I don't understand why one should influence the other. http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: http://gerrit.cloudera.org:8080/#/c/7731/6/be/src/statestore/statestore.cc@170 PS6, Line 170: topic_update_log_.erase(version); > also, does that still make sense (now that we no longer do the SetValue(NUL Done http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/7731/6/common/thrift/CatalogInternalService.thrift@32 PS6, Line 32: > does it include objects with changes in version > or >= the from_version? Updated the comment (>). Done http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/7731/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@310 PS6, Line 310: Views, Databases, and :* Functions) > datasources, cachepools, sentry stuff ... I think the list is too big, may Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-Change-Number: 7731 Gerrit-PatchSet: 5 Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 25 Sep 2017 17:55:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5955: Use totalSize tblproperty instead of rawDataSize.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8110 ) Change subject: IMPALA-5955: Use totalSize tblproperty instead of rawDataSize. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8110/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8110/1//COMMIT_MSG@9 PS1, Line 9: Today, Impala populates the 'rawDataSize' property : during COMPUTE STATS for the purpose of extrapolating : row counts based on file sizes. : : Intended meaning/use of tblproperties: : - rawDataSize' is the estimated in-memory size of a table : (without encoding and compression) : - 'totalSize' represents the on-disk size : : Using the fields correctly is important for compatibility : with other users of the HMS such as Hive and SparkSQL. : For example, SparkSQL relies on the 'totalSize' for : join ordering. Although this is very informative, I don't think I understand what this commit changes. Will we be populating both rawDataSize and totalSize, replace one with the other, or something else? http://gerrit.cloudera.org:8080/#/c/8110/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8110/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1178 PS1, Line 1178: droppedRawDataSize rename to droppedTotalSize to be consistent with the tblproperty being updated? -- To view, visit http://gerrit.cloudera.org:8080/8110 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If7c2c4e1e99b297c849f9f0d18b2bef34ad811c6 Gerrit-Change-Number: 8110 Gerrit-PatchSet: 1 Gerrit-Owner: Alex BehmGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 21 Sep 2017 18:21:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5954: Set DO NOT UPDATE STATS in alterTable() RPCs to HMS.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8112 ) Change subject: IMPALA-5954: Set DO_NOT_UPDATE_STATS in alterTable() RPCs to HMS. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8112 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I05c63315579d66efeaa4c65105372ae68820ab2f Gerrit-Change-Number: 8112 Gerrit-PatchSet: 2 Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 21 Sep 2017 18:00:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#6). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M tests/statestore/test_statestore.py 21 files changed, 473 insertions(+), 425 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/6 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (27 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS5, Line 231: // If this is not a delta update, request an update from version 0 from the local : // catalog. There is an optimization that checks if pending_topic_updates_ was just : // reloaded from version 0. If so, then skip this step and use that data. > I found it impossible to understand this comment without reading through mo Done PS5, Line 234: delta.from_version == 0 && delta.to_version == 0 > from the comment above, presumably this condition means "this is not a delt Yeah, you're right. This condition is really confusing. from_version = 0 implies delta = false and can be used here instead. The part that is not clear to me is the to_version check but I couldn't find any case where it would make sense to check the to_version in order to decide whether to send the full catalog update or not. I will remove it and see if it breaks something. PS5, Line 310: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; > given that last_sent_catalog_version_ was passed to GetAllCatalogObjects(), Right, it's not needed. Replaced it with a DCHECK. http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: PS5, Line 150: added > nit: May be 'added/updated/removed' to be consistent with other places? Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS5, Line 1392: > can we delete this function now? Yes. Done PS5, Line 1366: ; > Could you add 'len' too, given this is trace logging anyway. (might be help Done PS5, Line 1399: // Nothing to do here. > Remove? Kinda seems obvious Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: PS5, Line 177: entry_it->second.SetDeleted(true); : entry_it->second.SetVersion(last_version_); > now that some subscribes require deleted entries to have a value, how does The behavior is topic specific and currently only subscribers of catalog-update topic rely on the value for deleted entries. Not sure how to make it less fragile in the sense that "value_" is opaque to the statestore and is interpreted only by subscribers. PS5, Line 490: > would be easier to read if you line break it there to keep the last arg all Done PS5, Line 538: if (topic_entry.value() != Statestore::TopicEntry::NULL_VALUE) > why do we need that guard? why can't we set topic_item.value even in that c Yeap, removed it. PS5, Line 546: int64_t topic_size = topic.total_key_size_bytes() - deleted_key_size_bytes : + topic.total_value_size_bytes(); > that math doesn't look correct anymore (deleted entries can have non-zero v Done http://gerrit.cloudera.org:8080/#/c/7731/5/be/src/statestore/statestore.h File be/src/statestore/statestore.h: PS5, Line 180: may > may or will? "will". Changed it. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 25: GetAllCatalogObjects > see below. I think we should rename this. Done PS5, Line 25: Contains all known Catalog objects : // (databases, tables/views, and functions) from the CatalogService's cache. > I think that should be updated to explain that this is relative to a partic Done PS5, Line 38: empty list if no objects detected in the Catalog > Thanks. As we talked about in person, I think we should rename GetAllCatalo Done http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS5, Line 87: If true, this item was deleted > How about: Done PS5, Line 87: If true, this item was deleted > Actually, the parenthesis I suggested doesn't clarify. Maybe that part sho Done PS5, Line 97: List of changes to topic entries, including deletions. > That's only true when is_delta==true. So, how about we say: Done PS5, Line 100: applied to in-memory state, > that's kinda dependent on the subscriber's implementation, right? would it Done http://gerrit.cloudera.org:8080/#/c/7731/5/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: PS5, Line 53: identiy > nit: typo Done PS5, Line 123: if (first.getType() != second.getType()) return false; > This seems to already be handled by the next line (by generating a differen Good point. Removed
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (2 comments) Addressing some questions by Dan. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog > s/through/trouble/g Yes, there is some context that is missing here, hence it's not easy to understand what these comments mean. Let me try to explain first here and I will expand the comments to clarify some things. 1. What I think that comment should say is "no deleted objects were detected since the last GetAllCatalogObjects request". 2. Similarly to the previous comment. Deleted and updated/new objects are with respect to the previous GetAllCatalogObjects call. The base from which these deltas are computed is the "fromVersion" param of the TGetAllCatalogObjects struct. I'll update the comments. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > Sorry, opaque was the wrong word. I meant to say agnostic. Let me try to explain. Statestore only uses that flag to avoid sending deleted topic entries when the topic update is not a delta one (this is an optimization). The statestore subscribers (e.g. impalad) are the ones that really need what's in the 'value' field in order to determine how to process a deleted topic entry. For the case of catalog update, the value contains among other things the catalog version when the deletion occurred and the impalad uses that information to determine if the deletion should be applied at the local catalog or not. We can definitely add the deleted flag inside whatever object we're putting in 'value' (e.g. TCatalogObject) and I am happy to try this out if you think it's better. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS5, Line 38: empty list if no objects detected in the Catalog > that seems kinda misleading. won't the list also be empty if nothing was de Sorry not following. This says detected not deleted. Maybe I misunderstood you comment. http://gerrit.cloudera.org:8080/#/c/7731/5/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 84: // is not included in the topic key (e.g. catalog version of deleted catalog objects). > hmm, without reading through how the catalog topic works, this is still pre Hm, I don't see why the value should be opaque to the subscriber. On the contrary, it should be opaque to the statestore but each topic subscriber could have different logic implemented around deletions. If we want to be able to handle deletions with only the information from the key then we need to expand the key and for the case of catalog updates, it should also include the catalog version. Unfortunately, that won't work well with the current implementation of the statestore. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS3, Line 86: deleted > What I meant is that we should explain in the comment here what 'value' mea Fair enough. Done PS3, Line 105: The from_version will always be 0 for non-delta updates > Yeah. what I meant was we should document it in this comment, right? Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#5). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 427 insertions(+), 338 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/5 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7731 to look at the new patch set (#4). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 422 insertions(+), 336 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/4 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/CatalogInternalService.thrift File common/thrift/CatalogInternalService.thrift: PS3, Line 34: updated > does "updated" also include new objects? or is it really just objects that It includes both added and changed objects. I updated the comment but let me know if you have a better name option. http://gerrit.cloudera.org:8080/#/c/7731/3/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: PS3, Line 86: deleted > in this case, what is in 'value'? Is it defined to be set or is only the ke What I tried to explain in the commit msg (let me know if it still unclear) was that we should not use the contents of "value" to indicate if a topic item is deleted or not. It doesn't mean that a topic item with deleted set to true should have a specific value, i.e. each topic can have a separate behavior. PS3, Line 103: all changes > your commit message seems to contradict that. Oh, maybe i'm confusing cata Hm, sorry for the confusion. What the commit is describing is related to catalog versions and the follow up changes for fixing IMPALA-5058. PS3, Line 105: The from_version will always be 0 for non-delta updates > what about to_version in this case? The to_version is 0 only when there are no changes in the topic. It's value doesn't depend on whether the update is delta or not. -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5856: Fix outer join predicate assignment. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/8014/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS7, Line 333: Bool tests BoolTestPredicates? http://gerrit.cloudera.org:8080/#/c/8014/7/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS7, Line 28: IS [NOT] Replace with the one in L31. PS7, Line 30: is one of (TRUE | FALSE | UNKNOWN). Remove, it's kind of redundant. http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: PS6, Line 591: > added negative tests and coverage to rewrites and end-to-end. the analysis Sorry, I don't think I understand this comment. What do you mean by analysis for the expr does not do much? Aren't the negative test cases captured by AnalysisError()? I think rewrite tests should handle the rewrite logic but everything else (analysis related) should be tested here. Maybe I am missing something. http://gerrit.cloudera.org:8080/#/c/8014/7/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS7, Line 217: // Incorrect type for LHS. : AnalyzedRewritesError("'foo' is true", rule, "'foo' = TRUE AND 'foo' IS NOT NULL", : "operands of type STRING and BOOLEAN are not comparable: 'foo' = TRUE"); : // No subqueries allowed. : RewritesError( : "(select max(tinyint_col) from functional.alltypessmall) is true", rule, : "Subqueries are not supported in the select list."); Hm, it's not clear to me why this should be here and not as an Analysis test. -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5856: Fix outer join predicate assignment. .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/8039/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS1, Line 1381: // Right-most full-outer join table ref that is equal to or to the left of the : // table ref materializing 'tids'. Trying to visualize this hurts my brain :) Not sure if it helps to understand the logic in this function. The comment in L1385 should suffice. PS1, Line 1385: this What is "this" refer to? After reading this more carefully, I think I understand what it means. I think it is better to move the comment below L1388. PS1, Line 1398: globalState_.ojClauseByConjunct.get(e.getId()); getOjRef(e)? http://gerrit.cloudera.org:8080/#/c/8039/1/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test: PS1, Line 1020: from the On-clause of a left outer join Out of curiosity, did you check if we have the same issue if there is a left outer join followed by a full outer join? If we don't have a test like that, maybe add it for completion. Line 1028: and t1.bigint_col > 10 and t2.bigint_col > 30 To make it more interesting, add a where clause with one predicate that references t1 and t2 and one that references t3. -- To view, visit http://gerrit.cloudera.org:8080/8039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 421 insertions(+), 336 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/3 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (16 comments) http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS2, Line 233: 0, if they have then skip this step and use that data > nit: perhaps easier to read? "...version 0. If so, then skip ..." Done Line 305: BuildTopicUpdatesHelper(catalog_objects.updated_objects, false); > I like the previous version more where this is inlined into L292. A quick c Done Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; > Move this above L312? Done PS2, Line 336: if (topic_deletions) { : VLOG(1) << "Publishing deletion: " << entry_key << "@" : << catalog_object.catalog_version; : } else { : VLOG(1) << "Publishing update: " << entry_key << "@" : << catalog_object.catalog_version; : } > perhaps remove some redundancy this way: Done http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1340: LOG(INFO) << "Received large catalog update(>100mb): " > Received a large catalog item? Done PS2, Line 1366: if (item.deleted) { : VLOG(3) << "Deleted item: " << item.key << " version: " : << catalog_object.catalog_version; : } else { : VLOG(3) << "Added item: " << item.key << " version: " : << catalog_object.catalog_version; : } > same suggestion as in catalog_server to reduce redundancy. Done http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 86: 3: required bool deleted = false; > Isn't it better to not have a default value to make sure this is always set Done http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: * determine which catalog objects were deleted since the last catalog update topic. > catalog topic update Done Line 57: * Once the catalog update topic is constructed, the old deleted catalog objects are > catalog topic update Done Line 74: // Retrieve all the removed catalog objects with version > 'fromVersion'. > /** style comment Done http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS2, Line 152: . > the word "delta" implies that we're dealing with inserts, deletes, and poss Done Line 347: // Identify the catalog objects that were updated (added/dropped) in the catalog with > /** style comment Done Line 351: Set addedCatalogObjectKeys = Sets.newHashSet(); > updatedCatalogObjects? Done Line 363: TCatalogObject catalogTbl = new TCatalogObject(TCatalogObjectType.TABLE, > Why not just iterate over db.getTables()? Done Line 441: addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege)); > Instead of having a duplicate add() in all places, how about creating the a Done Line 448: for (TCatalogObject removedObject: deltaLog_.retrieveObjects(fromVersion)) { > Nice! Much clearer. Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 18: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS18, Line 117: DCHECK_LE(len, numeric_limits::max()); Is this actually needed? Isn't the cast in the line above going to fail if the condition in the DCHECK is false? http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: PS18, Line 155: Catalog nit: catalog -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 18: Taking a look now. -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 6: (22 comments) http://gerrit.cloudera.org:8080/#/c/8014/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS1, Line 2876: don't think you need that PS1, Line 2878: same here Line 2891: | UNMATCHED_STRING_LITERAL:l expr:e nit: extra spaces (see marked as red). http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 273: KW_UNBOUNDED, KW_UNCACHED, KW_UNION, KW_UPDATE, KW_UPDATE_FN, KW_UPSERT, KW_USE, nit: long line (see vertical separator, that's ok for .test files but for everything else, we try to stay < 90). http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS6, Line 337: rules.add(BoolTestToCompoundRule.INSTANCE); Maybe add a brief comment about this rule as well? PS6, Line 1093: it nit: them PS6, Line 1098: For all conjuncts other than : // the ones listed below, the method should be a no-op. remove. Similar comment above (L1082). http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS2, Line 26: predicate that tests a boolean value using IS. "a boolean test predicate."? I think you can also add the grammar spec here besides the example. PS2, Line 33: it to be executable, i.e., it is illegal to call For literals we usually make them static as well; also, remove "_" from the name. PS2, Line 38: ivate No need for "this". We use "_" in the name to indicate private fields. PS2, Line 41: super(); : this.isNegated_ = You can use the addChild() function here, same thing. PS2, Line 88: Will that work with something like "select 1 is true"? For instance, "select 1 = true" returns true. Type is not boolean but it can be casted into one. PS2, Line 94: new AnalysisException("Ex Similar comment as above. e.g. select 1 is 1; http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS2, Line 29: expressions nit re terminology: "compound predicate". Technically, it is an expression (subclass of Expr) but try to use the more specific term. http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS6, Line 29: to compound expressions "a compound predicate." Line 64: } Preconditions.checkNotNull(result); http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: PS6, Line 578: nit: extra space PS6, Line 580: TestBoolTestPredicate A few more test cases to consider: 1. bool test predicate in the select list 2. nested bool test predicates, e.g. ((bool_col is true) is true) 3. other bool exprs than bool_col, e.g. function returning bool, case expr 4. wrap the bool test in an istrue/isfalse function PS6, Line 586: null Also use "unknown" PS6, Line 591: where 1 is true" This doesn't seem to be consistent with how we handle for instance expr like 1 = true (we allow that). PS6, Line 591: AnalysisError Add a few more negative cases. Example: 1. subquery 2. literals that can't be cast to true/false, e.g. 10 is true http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS6, Line 168: // TODO: figure out how/if parens are printed to reflect expression tree. Did you figure that out? If so, remove the TODO. -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: PS13, Line 43: return 0; > It can throw an OutOfMemory*Error* which is fatal. Are you saying we should It is but we're catching it and trying to release resources. We need to make sure that this path works as expected. Right? -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: PS13, Line 43: return 0; > Not sure I understand, NBAOS.tryAllocate() reads this 0 and throws IllegalS My point is that the allocator doesn't only return 0, is it? It could always throw an OutOfMemoryException which the tryAllocate will catch, try to free any allocated memory and then re-throw. Shouldn't we be testing that path as well? -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: PS13, Line 100: private void tryAllocate(long len) { : Preconditions.checkState(bufferPtr_ >= 0); : try { : if (len <= 0) { : throw new IllegalArgumentException(INVALID_BUFFER_LEN_MSG + ": " + len); : } : if (len >= BUFFER_MAX_SIZE_LIMIT) { : throw new IllegalArgumentException(BUFFER_LIMIT_EXCEEDED_MSG + ": " + len); : } : long newBufferPtr = nativeAllocator_.allocate(bufferPtr_, len); : if (newBufferPtr == 0) { : throw new IllegalStateException(ALLOCATION_FAILED_MSG + ": " + len : + " Is realloc: " + (bufferPtr_ > 0)); : } : bufferPtr_ = newBufferPtr; : bufferLen_ = len; : } catch (Throwable e) { : nativeAllocator_.free(bufferPtr_); : throw e; : } : } I think that belongs to the Allocator class. http://gerrit.cloudera.org:8080/#/c/7955/13/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: PS13, Line 43: return 0; Shouldn't the allocate throw an OutOfMemoryException as well? -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 438 insertions(+), 331 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/2 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 1: (28 comments) http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 288: BuildTopicUpdates(catalog_objects.updated_objects, false); > Is there a requirement/assumption that the deletions must come after the up No, it doesn't matter. Line 300: void CatalogServer::BuildTopicUpdates(const vector& catalog_objects, > IMO, it is more readable if the signature of BuildTopicUpdates takes TGetAl These two "do something" blocks are almost identical. I refactored the code a bit to make it a bit less flaky. Line 333: << catalog_object.catalog_version; > indent Done http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: Line 154: /// determine items that have been deleted, it saves the set of topic entry keys sent > Update comment Done Line 164: bool topic_deletions); > What does this param do? Updated the comment. http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: Line 481: item.key = host.ip; > Use __set fn like above to be consistent Done http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 141: if (delta.is_delta && delta.topic_entries.empty()) { > single line Done http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1323: TCatalogObject catalog_object; > Please move this right before DeserializeThriftMsg() so we can see where it Done Line 1343: << item.key << " is " > indentation Done Line 1367: LOG(INFO) << "Deleted item: " << item.key << " version: " << catalog_object.catalog_version; > That's a lot of logging, sure we need this at INFO level? Sorry, that was for debugging. Line 1398: if (existing_object.catalog_version <= catalog_object.catalog_version) { > Shouldn't this be < and not <=? Done PS1, Line 1406: batch_size_bytes > Looks like we didn't account this earlier for deletions. This logic makes m Done Line 1527: if (item.deleted) { > Is it possible that within a single list of topic_entries there is both an No, it's not possible to have both an addition and a deletion in the same list of topic_entries. http://gerrit.cloudera.org:8080/#/c/7731/1/be/src/statestore/statestore.cc File be/src/statestore/statestore.cc: Line 485: item.value.empty() ? Statestore::TopicEntry::NULL_VALUE : item.value, > Why not just 'item.value'? The value gets copied. Good point. Done Line 527: deleted_key_size_bytes -= itr->first.size(); > += ? Done http://gerrit.cloudera.org:8080/#/c/7731/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 92 > - I kind of like this separation of updates and deletions in different list By switching to TTopicItem for topic deletions there is now significant overlap between operations performed on both added and deleted items. We could try to extract most of the common logic in separate functions and keep the existing logic in place, but for now it seems simple enough to have one loop over one list of items and separate the logic with an if statement when needed. If we had to add many of these if statements in multiple places, I'd agree that splitting the logic would have been better. Line 97: 5: required bool is_delta; > fix member numbering Done http://gerrit.cloudera.org:8080/#/c/7731/1/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: private MapremovedCatalogObjectVersionsByKey_ = Maps.newHashMap(); > I might be wrong, but I think we can end up having the same object deleted Good catch, that was a bug indeed. Removed the second map and simplified the logic around handling the deleted objects. Line 70:* key 'key'. > update comment No longer applicable. Function is removed. Line 90: public synchronized void garbageCollect(long currentCatalogVersion) { > Although these methods are synchronized, the underlying maps aren't thread That's not correct. The essence of having synchronized functions is exactly that, to ensure proper synchronization even though the underlying collections are not thread safe. Two threads calling garbageCollect and addRemoved cannot both proceed, one will grab the lock on the CatalogDeltaLog object and make progress while the other will simply block. In any case, this class is simplified and it only contains one collection. Line 154: private String toCatalogObjectKey(TCatalogObject
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/7955/9/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: PS9, Line 159: public void writeTo(OutputStream out) { : throw new IllegalStateException("Not implemented."); : } : : public byte toByteArray()[] { : throw new IllegalStateException("Not implemented."); : } : : public int size() { : throw new IllegalStateException("Not implemented."); : } > These are from ByteArrayOutputStream. Given we mimic it, someone might expe I get that, but you're not extending the ByteArrayOutputStream class, you're extending OutputStream. So, either change that or remove anything that is not defined in the parent class. -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 9: (9 comments) http://gerrit.cloudera.org:8080/#/c/7955/9/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: PS9, Line 67: . nit: remove PS9, Line 94: reAllocateMemory Unsafe.reallocateMemory() PS9, Line 159: public void writeTo(OutputStream out) { : throw new IllegalStateException("Not implemented."); : } : : public byte toByteArray()[] { : throw new IllegalStateException("Not implemented."); : } : : public int size() { : throw new IllegalStateException("Not implemented."); : } These are not defined in OutputStream, so what is the point of defining them here? Am I missing something? http://gerrit.cloudera.org:8080/#/c/7955/9/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java: PS9, Line 38: @SuppressWarnings("restriction") explain? Also add a brief description of what this class tests. PS9, Line 78: huge "huge" doesn't really convey much information here. Maybe a bit more explicit what we're testing (>1GB, < 4GB)? Also, you may want to test multiple sizes to exercise all paths in the reallocation logic (e.g. 500MB, 1GB, 1.5GB, 2GB). You can decide which ones make more sense. PS9, Line 85: char[] chars = new char[128 * 1024 * 1024]; : Arrays.fill(chars, 'a'); : String hugeString = new String(chars); nit: see if you can use Guava's Strings.repeat() here, may be cheaper. PS9, Line 99: nit: extra space PS9, Line 99: Create a huge string of size 512MB. The combined size of the test object : // crosses 4GB. Maybe say something like "create an object with a serialization size which is larger than 4GB" PS9, Line 105: never ha, famous last words :) -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 8: (2 comments) Flushing minor comments from patch #8 and switching to patch #9. http://gerrit.cloudera.org:8080/#/c/7955/8/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS8, Line 603: // Struct containing eight strings, used for testing. Mention what are we testing with this weird looking struct. Also, leave a TODO to move this out of here. http://gerrit.cloudera.org:8080/#/c/7955/8/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: PS8, Line 51: import org.apache.impala.thrift.TNativeByteBuffer; dup (L49) -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 4: > For this options serialization question, we have a C++ > implementation of "set x=y" -> Thrift at impala::SetQueryOption() > (https://github.com/apache/incubator-impala/blob/545eab6d6202ca3968279d14fad28bd2a6d566f6/be/src/service/query-options.cc#L113 > ). I don't know how important it is to keep one implementation of > that, but that one can be made available to us here. Good point. It's not clear how are we going to expose that to the PlannerTests through. Thoughts? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 4: > > > > (1 comment) > > > > > > The tests in this change use only 3 query options. Adding a new > > > section to the .test files to support only these 3 options > > wouldn't > > > be too much work. > > > > > > On the other hand, adding support for all the query options > that > > > Impala supports would be a lot harder. Probably we would have > to > > > implement that using some Java reflection trickery. > > > > I don't think you need to use Java reflection. The generated Java > > class for TQueryOptions has a number of helper functions to > search > > and set a field by name. So, for instance the QUERY_OPTIONS > section > > could have key=value pairs that correspond to query options. Then > > we could write a small function that parses the key value pairs > and > > uses the helper functions to check for valid query options and > set > > the values. Do you want to give it a try? Thanks > > Thanks. I'm still confused though how to implement setting enum > query options without reflection. e.g.: > QUERYOPTIONS > COMPRESSION_CODEC=GZIP > > Here we have to know that the type of TQueryOptions.compression_codec > is org.apache.impala.thrift.THdfsCompression, otherwise we cannot > parse "GZIP". Am I missing something? If you detect that this query option refers to a compression_codec, the only thing you need to do is map the string literal "GZIP" to an instance of HdfsCompression (the latter is just an enum). From that you can get the THdfsCompression object that you pass in the setFieldValue() method. You don't need to handle all the query options in one go. You can add the infrastructure needed, i.e. parsing the key=value pairs, validating the key part and then simply add the logic to handle the options we're interesting in for this patch. Once the infrastructure is in place, then adding support for other options should be quite easy, but you don't need to do everything. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] [PREVIEW] Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: [PREVIEW] Use native allocation while building catalog updates .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/7955/3//COMMIT_MSG Commit Message: PS3, Line 20: 4-5GB You need to comment about the limit (4GB) that currently thrift is imposing. http://gerrit.cloudera.org:8080/#/c/7955/3/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: PS3, Line 142: This method serializes :* the output of CatalogService#getCatallogObjects() into native memory allocated :* using a custom TNativeSerializer and returns the corresponding TNativeByteBuffer :* serialized into bytes. Alternatively: "Uses a custom serializer (TNativeSerializer) to serialize the catalog objects into a byte buffer that is backed by native memory." http://gerrit.cloudera.org:8080/#/c/7955/3/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: PS3, Line 36: 512MB update value Line 54: // Limit on the size to which the underlying buffer can grow Comment why that limit exists PS3, Line 103: off offset PS3, Line 115: if (bufferLen_ >= BUFFER_DOUBLING_RESIZE_LIMIT) { : while (newBufferSize < bytesWritten_ + len) { : newBufferSize += BUFFER_RESIZE_INCREMENTS; : } : } else { : while (newBufferSize < bytesWritten_ + len) { : newBufferSize <<= 1; : } : } I think if you put the if/else block inside the while block it may be a bit easier to follow plus it will give you a slightly better behavior because you'll be doubling as long as newBufferSize < 1GB and then you'll switch to incremental increases if you cross the 1GB boundary. PS3, Line 135: public synchronized void reset() { : } single line (here and for some of the other functions as well). -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 4: > > (1 comment) > > The tests in this change use only 3 query options. Adding a new > section to the .test files to support only these 3 options wouldn't > be too much work. > > On the other hand, adding support for all the query options that > Impala supports would be a lot harder. Probably we would have to > implement that using some Java reflection trickery. I don't think you need to use Java reflection. The generated Java class for TQueryOptions has a number of helper functions to search and set a field by name. So, for instance the QUERY_OPTIONS section could have key=value pairs that correspond to query options. Then we could write a small function that parses the key value pairs and uses the helper functions to check for valid query options and set the values. Do you want to give it a try? Thanks -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] [PREVIEW] Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: [PREVIEW] Use native allocation while building catalog updates .. Patch Set 1: (17 comments) http://gerrit.cloudera.org:8080/#/c/7955/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS1, Line 106: RETURN_IF_ERROR(DeserializeThriftMsg(jni_env, result_bytes, )); What happens if DeserializeThriftMsg returns a non-ok status? Who will release the allocated memory? PS1, Line 108: uint32_t len = static_cast(nbuffer.buffer_len); Isn't buffer_len i64? What will happen if the cast fails? PS1, Line 110: buf Interestingly, DeserializeThriftMsg() casts the const away of the first param (buf). The comment in this function still says that it is treated as a const even though the const is casted away but let's make sure we check the implementation to ensure nothing bad happens to buf after that call. PS1, Line 111: desrialize nit: deserialize (typo) PS1, Line 113: free(buf); > I used free after checking that the unsafe.freeMemory() also calls the same Yes, I think we need to place it safe here. If the implementation changes in some later Java version, we may run into issues. http://gerrit.cloudera.org:8080/#/c/7955/1/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS1, Line 596: native byte buffer In the context of reading this thrift file, I don't think it is clear what "native byte buffer" is. Maybe say at the beginning that it represents a buffer allocated by the JVM using native memory (unsafe). http://gerrit.cloudera.org:8080/#/c/7955/1/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: Line 137:* Gets all catalog objects Please expand the comment. We're adding some non-trivial behavior here. http://gerrit.cloudera.org:8080/#/c/7955/1/fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java File fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java: Line 1: package org.apache.impala.thrift; Apache header? PS1, Line 26: @SuppressWarnings("restriction") explain? PS1, Line 27: class nit: make it final (do you plan to extend it?) PS1, Line 33: private static final long BUFFER_DOUBLING_RESIZE_LIMIT = 1 * 1024 * 1024 * 1024; /* 1GB */ nit: long line PS1, Line 38: length length (in bytes) PS1, Line 39: protected why protected? PS1, Line 48: public NativeByteArrayOutputStream() { : this(BUFFER_INITIAL_SIZE_DEFAULT); : } single line? PS1, Line 57: // Unsafe#allocateMemory() can handle negative inputs. Why do I need to know about this here? Maybe remove. PS1, Line 94: if (bufferLen_ >= BUFFER_DOUBLING_RESIZE_LIMIT) { : newBufferSize = bufferLen_ + BUFFER_RESIZE_INCREMENTS; : } else { : newBufferSize = bufferLen_ << 1; : } How do you guarantee that newBufferSize > bytesWritten_ + len? E.g. bufferLen_ = 128MB, len = 1GB. Isn't bufferLen_ going to be 256MB although you need 1128MB? Am I missing something? http://gerrit.cloudera.org:8080/#/c/7955/1/fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java File fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java: PS1, Line 31: native by array nit: "native by array"? -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] [PREVIEW] Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: [PREVIEW] Use native allocation while building catalog updates .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7955/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS1, Line 113: free(buf); This freaks me out a bit. Any documentation I've read says that you should be using Unsafe.freeMemory() to release memory you allocated using Unsafe. How do we know this is safe? -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 0 : self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector) : : def test_singlenode_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 1 : self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector) > Let me give you some context: Can't we expand our testing infrastructure to enable query options to be set on a per query basis in addition to those set before the call to runPlannerTestFile()? I am not suggesting supporting SET statements inside the .test files but we could add another section e.g. QUERY_OPTIONS, that we parse and use to set query options accordingly. You may want to take a look at PlannerTestBase.java (testPlan() function) and see how much work it would require to do something like that. -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5850: Cast sender partition exprs under unions.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5850: Cast sender partition exprs under unions. .. Patch Set 1: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/7884/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: PS1, Line 184: In particular, partitioned hash joins under a union are :* treated as different series of joins (could have different data partitions). Can you expand the comment to explain what 'node' represents? Also, that part of the comment doesn't really help understanding this function and is similar to the comment in L178. Maybe remove? http://gerrit.cloudera.org:8080/#/c/7884/1/testdata/workloads/functional-query/queries/QueryTest/joins.test File testdata/workloads/functional-query/queries/QueryTest/joins.test: PS1, Line 761: nit: extra space -- To view, visit http://gerrit.cloudera.org:8080/7884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0aa801bcad8c2324d848349c7967d949224404e0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3548: Prune runtime filters based on query options in the FE
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3548: Prune runtime filters based on query options in the FE .. Patch Set 2: (4 comments) Just some answers on the comments. Haven't looked at the new patch yet. http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS2, Line 119: singleNodePlan > The single node plan is created on L104 and assigned to 'singleNodePlan'. D Let's expand to the comment to mention that 'singleNodePlan' now points to the root of the distributed plan. PS2, Line 117: // create runtime filters : if (ctx_.getQueryOptions().getRuntime_filter_mode() != TRuntimeFilterMode.OFF) { : RuntimeFilterGenerator.generateRuntimeFilters(ctx_, singleNodePlan); : ctx_.getAnalysisResult().getTimeline().markEvent("Runtime filters computed"); : } > The block was moved here because the distributed plan is created on L114 an Thanks for explaining. http://gerrit.cloudera.org:8080/#/c/7564/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: PS2, Line 165: tFilterTarget.setIs_bound_by_partition_columns(isBoundByPartitionColumns); : tFilterTarget.setIs_local_target(isLocalTarget); > The BE uses these flags to create instances of Coordinator::FilterTarget th I see. Thanks http://gerrit.cloudera.org:8080/#/c/7564/2/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: PS2, Line 87: def test_distrib_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 0 : self.run_test_case('QueryTest/runtime_filters_distrib_pruning', vector) : : def test_singlenode_plan_filter_pruning(self, vector): : vector.get_value('exec_option')['num_nodes'] = 1 : self.run_test_case('QueryTest/runtime_filters_singlenode_pruning', vector) > These are written as query tests and not planner tests because the planner I don't follow. Did you check how the query option is set in testRuntimeFilterPropagation() in PlannerTest.java? -- To view, visit http://gerrit.cloudera.org:8080/7564 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0f0b200e02442edcad8df3979f652d66c6e52eb Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5836: Improvements to Eclipse frontend configuration.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5836: Improvements to Eclipse frontend configuration. .. Patch Set 1: Code-Review+2 Thank you! -- To view, visit http://gerrit.cloudera.org:8080/7803 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia723fbf706cf409a8fb6b5ff0297c2b1ff7c9590 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. Patch Set 5: Code-Review+2 Rebase, keep Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has uploaded a new patch set (#4). Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. IMPALA-5531: Fix correctness issue in correlated aggregate subqueries This commit fixes an issue where a query will return wrong results if it has an aggregate subquery with a correlated inequality predicate. Since the rewrite for this case is not currently supported, an exception is now thrown during the analysis. Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java 3 files changed, 115 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/4 -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/7706/3/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 1417: public static String toSqlHelper(List exprs) { > listToSql()? Done http://gerrit.cloudera.org:8080/#/c/7706/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: Line 399: // Ensure that all the extracted correlated predicates can be added to the on-clause > nit: we usually use caps for clauses and such, i.e. ON-clause Done Line 648: private static void canRewriteCorrelatedSubquery(Expr expr) throws AnalysisException { > validateCorrelatedSubqueryStmt()? Done Line 676:* added to the on-clause of the join that results from the subquert rewrite; It throws > ON-clause and typo: subquery. Also use "." instead of ";" Done Line 683: Preconditions.checkNotNull(correlatedPredicates); > Preconditions.checkState(inlineView.isAnalyzed()); Done -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. IMPALA-5531: Fix correctness issue in correlated aggregate subqueries This commit fixes an issue where a query will return wrong results if it has an aggregate subquery with a correlated inequality predicate. Since the rewrite for this case is not currently supported, an exception is now thrown during the analysis. Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java 3 files changed, 115 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/3 -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/7706/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: Line 341: canRewriteCorrelatedSubquery(expr, onClauseConjuncts); > Why not fold the new logic into this function and move everything down? Our Done Line 721: com.google.common.base.FunctiontoSql = > Add a static helper that does toSql() on a list. Might live in Expr or ToSq Done http://gerrit.cloudera.org:8080/#/c/7706/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: Line 733: "id %s (select %s from functional.alltypestiny t where t.bool_col = false " + > Add a correlated predicate in the usbquery that does not reference 't' from Done Line 737: AnalyzesOk(String.format("select id from functional.allcomplextypes t where id " + > Just to be sure: Your original fix broke this test right? Yes Line 777: "where t.int_struct_col.f1 = v.id)", cmpOp, aggFn), > Why this change? Forgot to revert. Done -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/7706/1/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: Line 681: List slotRefs = Lists.newArrayList(); > Might be easier/shorter: Done Line 690: // Inequality correlated predicates are not currently supported in aggregate > This bug is not specific to inequality predicates. I think it's really that Done Line 691: // subqueries (see IMPALA-5531). > We should be able to run queries with arbitrary correlated predicates if th Done Line 692: if (expr instanceof BinaryPredicate && !correlatedPredicates.isEmpty() && > Let's report which predicate makes the subquery unsupported. I am not sure the BinaryPredicate check is redundant. The first is applied on 'expr' which is the subquery expr while the latter is applied on the correlated predicates. Is this what you mean? -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. IMPALA-5531: Fix correctness issue in correlated aggregate subqueries This commit fixes an issue where a query will return wrong results if it has an aggregate subquery with a correlated inequality predicate. Since the rewrite for this case is not currently supported, an exception is now thrown during the analysis. Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java 3 files changed, 82 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/2 -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/7731 Change subject: [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects .. [PREVIEW] IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 401 insertions(+), 289 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/1 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5531: Fix correctness issue in correlated aggregate subqueries
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/7706 Change subject: IMPALA-5531: Fix correctness issue in correlated aggregate subqueries .. IMPALA-5531: Fix correctness issue in correlated aggregate subqueries This commit fixes an issue where a query will return wrong results if it has an aggregate subquery with a correlated inequality predicate. Since the rewrite for this case is not currently supported, an exception is now thrown during the analysis. Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java 3 files changed, 64 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/7706/1 -- To view, visit http://gerrit.cloudera.org:8080/7706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6ca7b60ef0543430d2f5a802285254ebb52db2ab Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4847: Simplify HdfsTable block metadata loading code
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4847: Simplify HdfsTable block metadata loading code .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I963d647bd2ba11e3843c6ef2ac6be113c74280bf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4847: Simplify HdfsTable block metadata loading code
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4847: Simplify HdfsTable block metadata loading code .. Patch Set 2: (3 comments) Yay, more lined deleted :) http://gerrit.cloudera.org:8080/#/c/7652/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS2, Line 283: fsBlockLocationSupport nit: maybe rename to synthesizeBlocks? synthesizeBlocks = !FileSystemUtil.supporsStorageIds(fs); PS2, Line 616: fsBlockLocationSupport same comment as above PS2, Line 645: Preconditions.checkNotNull(fd); move this above L647 -- To view, visit http://gerrit.cloudera.org:8080/7652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I963d647bd2ba11e3843c6ef2ac6be113c74280bf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4847: Simplify HdfsTable block metadata loading code
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4847: Simplify HdfsTable block metadata loading code .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7652/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS1, Line 252: for all partitions in 'partitions' nit: "of all the 'partitions' that ..." Line 258:* and disk IDs. Expand this to mention that we synthesize block metadata for FS that don't support disk ids. Also, mention (maybe in the beginning) that 'pathDir' could be in HDFS, S3 and ADLS. PS1, Line 331: if (partitions == null || partitions.isEmpty()) return; : RemoteIterator fileStatusIter = : FileSystemUtil.listFiles(fs, partDir, false); : if (fileStatusIter == null) return; : while (fileStatusIter.hasNext()) { : LocatedFileStatus fileStatus = fileStatusIter.next(); : if (!FileSystemUtil.isValidDataFile(fileStatus)) continue; : // For the purpose of synthesizing block metadata, we assume that all partitions : // with the same location have the same file format. : FileDescriptor fd = FileDescriptor.createWithSynthesizedBlockMd(fileStatus, : partitions.get(0).getFileFormat(), hostIndex_); : // Update the partitions' metadata that this file belongs to. : for (HdfsPartition partition: partitions) { : partition.getFileDescriptors().add(fd); : } : } With the exception of L340, this function resembles loadBlockMetadata. Maybe see if there is a nice way to merge these two. PS1, Line 581: Path tblLocation = FileSystemUtil.createFullyQualifiedPath(getHdfsBaseDirPath()); It looks like this is only used in L590. Maybe move it closer to that. PS1, Line 695: .keys() remove -- To view, visit http://gerrit.cloudera.org:8080/7652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I963d647bd2ba11e3843c6ef2ac6be113c74280bf Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7257/4/testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test File testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test: PS4, Line 261: nit: trailing space PS4, Line 327: Also, consider adding one or two cases with outer joins. You may expand some existing query if you want. Also, you may want to add a few cases that we know are kind of problematic, similar to q64, that has regression due to the group by or analytic function. -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 2: (2 comments) Just minor responses; thanks for the clarifications. I'll take a look at the tests when submitted. http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS2, Line 280: fkPkCandidate.get(0) > Might have misunderstood your comment, but I'll give it a shot. Ah yes, thanks for clarifying. I forgot the grouping effect :) PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0); > Maybe you misread the code: It's greater or equal to zero, so zero is valid oops yes. sorry for the confusion -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS2, Line 27: import org.apache.impala.analysis.SlotDescriptor; Is this used? PS2, Line 40: import org.apache.kudu.shaded.com.google.common.base.Joiner; replace http://gerrit.cloudera.org:8080/#/c/7257/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS2, Line 79: - nit: remove PS2, Line 80: ordered based on the tuple ids I am not sure I get what that ordering is. Do you mean grouped? PS2, Line 202: based on the single most :* selective join predicate. We do not attempt to estimate the joint selectivity of :* multiple join predicates to avoid underestimation. Much better. Thanks PS2, Line 280: fkPkCandidate.get(0) Can you plz explain this? I could be wrong about this but it seems that the decision of whether this is a pk/fk depends on which equi-join slots we process first. Maybe I am missing something. In either case, plz add a comment. PS2, Line 299: Preconditions.checkState(lhsCard >= 0 && rhsCard >= 0); Are we certain this is a valid precondition check? Why isn't a 0 rhsCard possible? Unless this is guaranteed to be the case... PS2, Line 337: we adjustments nit: grammar -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5489: Improve Sentry authorization for Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5489: Improve Sentry authorization for Kudu tables .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/7307/1/testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test File testdata/workloads/functional-query/queries/QueryTest/grant_revoke.test: PS1, Line 712: QUERY : insert into grant_rev_db.test_kudu_tbl values (1, "foo"); : : QUERY : upsert into grant_rev_db.test_kudu_tbl values (1, "bar"); Maybe you also want to run these statements before granting insert to show that authz fails. PS1, Line 737: SELECT hm, why 'SELECT'? PS1, Line 740: GRANT SELECT(a) ON TABLE grant_rev_db.test_kudu_tbl TO grant_revoke_test_KUDU : RESULTS : : QUERY : update grant_rev_db.test_kudu_tbl set a = "zzz" This is kind of weird to me. What if you had a where clause in the update statement on column i? Would that work or would it require select privs on i as well? My point is that insert + select (on something) = UPDATE is not intuitive. Maybe it would be simpler to just allow UPDATE only on ALL. What do you think? Line 804: drop role grant_revoke_test_ROOT; Don't you need to clean up the role here? -- To view, visit http://gerrit.cloudera.org:8080/7307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib12d2b32fa3e142e69bd8b0f24f53f9e5cbf7460 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5547: Rework FK/PK join detection.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. .. Patch Set 1: (17 comments) I have one comment about testing. We have lots of test changes but it's not easy to verify if the end result makes sense or not. I think it may worth having a planner test with explain_level > 2 that covers a few interesting join cases so that we can see what kind of pk/fk conjuncts are detected and what the impact is on estimated join cardinalities. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) { : output.append("assumed fk/pk"); I haven't read the entire patch yet, so I am curious what this case represents. PS1, Line 178: for (int j = 0; j < slots.size(); ++j) { : output.append(slots.get(j).toString()); : if (j + 1 != slots.size()) output.append(", "); : } Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots) http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS1, Line 37: import org.apache.kudu.client.shaded.com.google.common.collect.Maps; ? PS1, Line 271: scanSlotsByTids nit: Each pair represents two joined tables, no? So, maybe rename this to 'scanSlotsByJoinedTids'? A bit more explicit on what this thing represents. PS1, Line 281: numRows rhsTableCardinality? Line 290: nit: extra line PS1, Line 294: conjuncts conjunct slots PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || joinOp_.isOuterJoin()); This private function is only called in L253? I think you can remove this check. PS1, Line 312: lhsNdv Can this ever be 0? PS1, Line 313: rhsNumRows Same here, can this be 0? PS1, Line 318: result = Math.min(result, joinCard); That part I think is very important and I am not sure it is explained anywhere. Maybe expand the comment in L209? PS1, Line 329: conjuncts conjunct slots PS1, Line 341: slots.lhs().getStats().getNumDistinctValues(); nit: these are used in couple of places. Since you already have the EqJoinConjunctScanSlots class, why don't your wrap them in some nice util functions there (e.g. getLhsNdvs() or something like that). PS1, Line 343: slots.lhs().getParent().getTable().getNumRows(); same comment as above. PS1, Line 373: Preconditions I don't think these checks are useful. This private constructor is only called through the static function that has tight control on what gets passed in this ctor. PS1, Line 382: . nit: remove PS1, Line 399: tupleDesc nit: inline -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 10: Code-Review+2 Rebase, keep Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7268 to look at the new patch set (#9). Change subject: IMPALA-5500: Reduce catalog update topic size .. IMPALA-5500: Reduce catalog update topic size Problem: IMPALA-4029 introduced the use of the flatbuffers serialization libary for storing file and block metadata. That change reduced the effectiveness of the Thrift compaction protocol (when --compact_catalog_topic is used), thereby causing a 2X increase in catalog update topic size when the compact protocol is used. Fix: LZ4 compress the catalog topic updates before sent to the statestore when --compact_catalog_topic is set to true. Results: ~4X reduction in catalog update topic size Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/service/impala-server.cc A tests/custom_cluster/test_compact_catalog_updates.py 5 files changed, 136 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/9 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7268 to look at the new patch set (#8). Change subject: IMPALA-5500: Reduce catalog update topic size .. IMPALA-5500: Reduce catalog update topic size Problem: IMPALA-4029 introduced the use of the flatbuffers serialization libary for storing file and block metadata. That change reduced the effectiveness of the Thrift compaction protocol (when --compact_catalog_topic is used), thereby causing a 2X increase in catalog update topic size when the compact protocol is used. Fix: LZ4 compress the catalog topic updates before sent to the statestore when --compact_catalog_topic is set to true. Results: ~4X reduction in catalog update topic size Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/service/impala-server.cc M fe/src/main/java/org/apache/impala/analysis/TableSampleClause.java M fe/src/main/java/org/apache/impala/analysis/TimestampLiteral.java M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java A tests/custom_cluster/test_compact_catalog_updates.py 8 files changed, 139 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/8 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 335: pending_topic_updates_.pop_back(); > I think we could incorrectly end up popping two pending topic updates here. Good catch. Done http://gerrit.cloudera.org:8080/#/c/7268/7/be/src/catalog/catalog-util.cc File be/src/catalog/catalog-util.cc: Line 202: ReadWriteUtil::PutInt(output_buffer_ptr, (uint32_t)catalog_object->size()); > static_cast instead of C-style cast Done -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 7: > The code changes look good. We don't seem to have any test coverage > for this flag though - I grepped for it and don't see it referenced > in any tests. Can we add a custom cluster test that starts up > Impala with this flag and runs a query? > > Although I'm actually wondering if this should just be the > default/only mode. It seems like compression and decompression are > probably less of a bottleneck than the network. But I added a custom test for in this patch. Do you mean an additional test? -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has uploaded a new patch set (#7). Change subject: IMPALA-5500: Reduce catalog update topic size .. IMPALA-5500: Reduce catalog update topic size Problem: IMPALA-4029 introduced the use of the flatbuffers serialization libary for storing file and block metadata. That change reduced the effectiveness of the Thrift compaction protocol (when --compact_catalog_topic is used), thereby causing a 2X increase in catalog update topic size when the compact protocol is used. Fix: LZ4 compress the catalog topic updates before sent to the statestore when --compact_catalog_topic is set to true. Results: ~4X reduction in catalog update topic size Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/service/impala-server.cc A tests/custom_cluster/test_compact_catalog_updates.py 5 files changed, 135 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/7 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has uploaded a new patch set (#6). Change subject: IMPALA-5500: Reduce catalog update topic size .. IMPALA-5500: Reduce catalog update topic size Problem: IMPALA-4029 introduced the use of the flatbuffers serialization libary for storing file and block metadata. That change reduced the effectiveness of the Thrift compaction protocol (when --compact_catalog_topic is used), thereby causing a 2X increase in catalog update topic size when the compact protocol is used. Fix: Snappy compress the catalog topic updates before sent to the statestore when --compact_catalog_topic is set to true. Results: ~4X reduction in catalog update topic size Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-util.cc M be/src/catalog/catalog-util.h M be/src/service/impala-server.cc A tests/custom_cluster/test_compact_catalog_updates.py 5 files changed, 135 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/6 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7268/5/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 358: RETURN_IF_ERROR(Codec::CreateCompressor(nullptr, false, THdfsCompression::SNAPPY, > Why not LZ4 to be consistent with what we do for exchanges? I don't have a LZ4 decompressor requires a pre-allocated output. The problem in this case is that we don't know the initial (pre-compressed) size of the catalog object. -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has uploaded a new patch set (#5). Change subject: IMPALA-5500: Reduce catalog update topic size .. IMPALA-5500: Reduce catalog update topic size Problem: IMPALA-4029 introduced the use of the flatbuffers serialization libary for storing file and block metadata. That change reduced the effectiveness of the Thrift compaction protocol (when --compact_catalog_topic is used), thereby causing a 2X increase in catalog update topic size when the compact protocol is used. Fix: Snappy compress the catalog topic updates before sent to the statestore when --compact_catalog_topic is set to true. Results: ~4X reduction in catalog update topic size Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/service/impala-server.cc M be/src/service/impala-server.h A tests/custom_cluster/test_compact_catalog_updates.py 5 files changed, 138 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/7268/5 -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5500: Reduce catalog update topic size
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5500: Reduce catalog update topic size .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7268/4/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: Line 328: Status status = thrift_serializer_.Serialize(_object, _object); > Let's try to avoid the extra copy for the !FLAGS_compact_catalog_topic case Done -- To view, visit http://gerrit.cloudera.org:8080/7268 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f725cd8596205e6101d5b56abf08125faa30b0a Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes