[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs .. IMPALA-3224: De-Cloudera non-docs JIRA URLs John Russell is planning to fix the URLS in docs in a separate commit. Fixed using: (git ls-files | xargs replace \ 'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \ git checkout HEAD docs Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Reviewed-on: http://gerrit.cloudera.org:8080/6487 Reviewed-by: Jim AppleTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M shell/shell_output.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/setup-hdfs-env.sh M tests/comparison/db_connection.py M tests/comparison/discrepancy_searcher.py M tests/comparison/query_generator.py M tests/custom_cluster/test_kudu_not_available.py M tests/stress/concurrent_select.py 11 files changed, 24 insertions(+), 36 deletions(-) Approvals: Impala Public Jenkins: Verified Jim Apple: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 6: (9 comments) http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 46: // the storage ID of a particular disk is unique across all the nodes in the cluster. > ? Oops, sorry missed that one. Done http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 112: ListMap hostIndex, Reference unknownDiskIds) > numUnknown...? Done Line 275:* File Block metadata > provide more information (such as the fact that it's used in conjunction wi Done Line 283: * Constructs the metadata of a file block from its block location metadata > "Constructs an FbFileBlock..."? Done Line 291: ListMap hostIndex, Reference unknownDiskIds) > numUnknown...? Done Line 305: boolean isReplicaCached = cachedHosts.contains(loc.getHosts()[i]); > doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), l Not in Guava, ArrayUtils is a class in Apache Commons. Line 327: * using 'fbb' and returns the offset in the underlying buffer where the encoded file > "in the underlying...starts": fine to leave out, that's implied by fb seman Agreed but maybe it's good to explicitly mention it for future readers that may not be familiar with FB semantics. http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 287: Reference unknownDiskIds = new Reference(); > numUnknown...? Unfortunately not, generics don't work with primitive types. Miss C++ :)? Line 741:* Helper method to load the partition file metadata from scratch. This method is > is this from a rebase? Yes, this is from the performance improvements of REFRESH. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
For consistency, I believe we should at least allow an empty SORT BY() clause in the CREATE TABLE statement, but I'll defer the decision to Alex or Marcel. Dimitris On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < ger...@cloudera.org> wrote: > Lars Volker has posted comments on this change. > > Change subject: IMPALA-4166: Add SORT BY sql clause > .. > > > Patch Set 20: > > > There is still one thing that is not clear to me. Why is it allowed > > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > > TABLE? > > The easiest way to specify an empty list of sort by columns during CREATE > TABLE is to omit the SORT BY clause altogether. To keep things simple I did > not add additional support for an empty SORT BY () clause. > > For ALTER TABLE there needs to be a way to specify an empty list of > columns, but since SORT BY is used to identify the command, the most simple > form seemed to be an empty column list(). > > Do you think we should allow CREATE TABLE SORT BY() in addition to specify > an empty set of column? > > -- > To view, visit http://gerrit.cloudera.org:8080/6495 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 > Gerrit-PatchSet: 20 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Lars Volker> Gerrit-Reviewer: Alex Behm > Gerrit-Reviewer: Dimitris Tsirogiannis > Gerrit-Reviewer: Lars Volker > Gerrit-Reviewer: Marcel Kornacker > Gerrit-Reviewer: Thomas Tauber-Marshall > Gerrit-HasComments: No >
[Impala-ASF-CR] IMPALA-5031: remove undefined behavior: call to strncmp with nullptr
Jim Apple has abandoned this change. Change subject: IMPALA-5031: remove undefined behavior: call to strncmp with nullptr .. Abandoned Done in another commit -- To view, visit http://gerrit.cloudera.org:8080/6767 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: Id563e81720a0a4847664fa2828ecfdcad870da5b Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs
Jim Apple has posted comments on this change. Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs .. Patch Set 6: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/539/ -- To view, visit http://gerrit.cloudera.org:8080/6487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5273: Replace StringCompare with glibc memcmp
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5273: Replace StringCompare with glibc memcmp .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs
Lars Volker has posted comments on this change. Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs .. Patch Set 5: Code-Review+2 Thank you for iterating over this. I had another look and it looks good to me. -- To view, visit http://gerrit.cloudera.org:8080/6487 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc
Dan Hecht has posted comments on this change. Change subject: IMPALA-2550: Switch to per-query exec rpc .. Patch Set 10: (20 comments) http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS10, Line 204: GetPrepareStatus > the barrier isn't really present, because getfinstancestate already waits f Still, it's not a pattern I think we should encourage in the code base (DCHECK(x) where x has side-effects). How about at least making the non-barrier explicit by shortcircuiting it like this: DCHECK(coord_instance_->IsPrepared() && coord_instance_->GetPrepareStatus().ok()); PS10, Line 413: backend_state->GetStatus(); > good catch, this could be cancelled at this point. In your current code, will the real status eventually be propagated as the query status? Or will this race cause the query to result in CANCELLED even though there was a real error? PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to cancel > when does the query get displayed on the debug page? you still haven't retu It gets displayed as soon as there is a client request state registered, I believe. (i.e. RegisterQuery()). Which is what I meant - webserver cancellation is definitely available before exec() returns. PS10, Line 964: #if 0 : do we really want this? > remove it unless someone says we want it. fine with me to delete. http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS11, Line 200: // fragment instance's executor will not comple DCHECK(coord_instance_->IsPrepared() && coord_instance_->GetPrepareStatus().ok()); to make it explicit that this has no side-effects. http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS10, Line 51: Prepare > i think we should do some initial setup here so we can return an error stat the renaming helped. PS10, Line 59: exec-fragment-instance-$0 > i left everything as is, because i didn't want to break existing tests. hap Which tests? CAn't they be fixed? You renamed the other thread - I think it would be best to do this one as well. The current name is really confusing because it's not plural. http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: PS11, Line 57: n) QueryState reference PS11, Line 59: exec-fragment-instance start-query-finstances http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS10, Line 195: // TODO: this might flood the log : LOG(WARNING) << "Couldn't get a client for " << query_ctx().coord_address : <<"\tReason: " << coord_status.GetDetail(); > address what, flooding the log? Yes, flooding the log. i.e. the TODO would be better if it says what should happen in the future. And before it looks like we'd remember the status via UpdateStatus() and then potentially report it back the next time we try (for periodic reports). Not saying that's the right solution but it does look like a subtle behavior change so want to think it through. I think the new behavior is probably not really worse though (for periodic reports, just report and try again later; for final report, in both old and new we'd drop it on the floor). PS10, Line 255: // TODO: should we try to keep rpc_status for the final report? (but the final : // report, following this Cancel(), may not succeed anyway.) > if the report rpc fails we fail the query on this node and then do nothing/ Okay. I think the old code used to remember this status (via UpdateStatus()) and then would report it back if it could talk to the coordinator later when the fragment is complete. But I agree the protocol needs to be fixed. Line 319: prepare_status = instance_status; > fis.h points out that all public non-getter functions block until the prepa That fis.h comment was just added. But I don't see how it's relevant to my initial comment here -- how can prepare_status be CANCELLED? http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: PS11, Line 211: arams.instance_exec_status.emplace_back(); : params.__isset.instance_exec_status = true; but why doesn't the coordinator initiate cancellation if any FIS reports error? Is this to simplify the coordinator logic? I guess I'm okay with how you have this currently, but it just seems clearer to me to keep FIS status in only one place. PS11, Line 264: despite http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.h File be/src/runtime/query-state.h: PS10, Line 59: cancelled > vague in what way? Vague because it's not clear what all the side effects of "cancel" are, and when they
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 46: private ConcurrentHashMapstorageUuidToDiskId = > add trailing _ to member vars ? http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 161: * Synthesizes the block metadata of a file represented by 'fileStatus' that resides > Done i meant: how does it "synthesize" it? e.g., by creating artificial blocks on some boundary. http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 112: ListMap hostIndex, FileDescStats stats) throws IOException { numUnknown...? Line 275: } provide more information (such as the fact that it's used in conjunction with an FbFileBlock...) Line 283: String[] ip_port = location.split(":"); "Constructs an FbFileBlock..."? Line 291: numUnknown...? Line 305: // Use ~REPLICA_HOST_IDX_MASK to extract the cache info (stored in MSB). doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), loc.gethosts()[i])? Line 327: "in the underlying...starts": fine to leave out, that's implied by fb semantics. http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 287: // Find the partition that this file belongs (if any). numUnknown...? does Reference also work? Line 741: HashMap partsByPath) { is this from a rebase? -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4623: Enable file handle cache
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4623: Enable file handle cache .. Patch Set 4: (18 comments) looking good. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: Line 894: runtime_state_->io_mgr()->cached_file_handles_hit_count(reader_context_)); pretty awkward. why not call the reader directly (and maybe introduce a new getter)? http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: Line 441: RuntimeProfile::Counter* cached_file_handles_hit_count_; initialize to nullptr here, that way you don't need to do it in the c'tor (which you have to edit if the member order changes, or which you might have to duplicate if you multiple c'tors). do that for all pointer members. http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 25: #include why? http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr-scan-range.cc File be/src/runtime/disk-io-mgr-scan-range.cc: Line 342: io_mgr_->CacheOrCloseFileHandle(file(), exclusive_hdfs_fh_); yes, the name really needs to distinguish between cache operations and direct operations on handles. Line 548: remote_bytes_ += stats->totalBytesRead - stats->totalLocalBytesRead; why not stick that into reader_ as well? http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h File be/src/runtime/disk-io-mgr.h: Line 231: /// Threads checkout a file handle for exclusive access and return it when finished. "check out" (checkout is a noun) Line 237: template please pull this along with hdfsfilehandle out into a separate file, this file is large enough as it is. Line 242: /// first k partitions will each have one additional file handle. let's make the size uniform across all partitions (e.g., by rounding up), that should simplify the code. Line 282: /// the partitions are aligned to cache line boundaries (64 bytes). hm, isn't there some predefined constant somewhere for the cache line size? Line 549: Status Open(bool use_file_handle_cache) WARN_UNUSED_RESULT; does it ever make sense to call open on the same scan range with different values of the bool? if not, make a parameter of the c'tor? point out clearly in the comment that if the param is true, open does nothing. Line 573: void* meta_data_; initialize nullptr here (for all pointer members) Line 587: /// Total number of bytes read remotely point out why we need to maintain this particular metric instead of in requestcontext. Line 601: /// handle, and no sharing takes place. also mention when this is populated and how long it's valid. Line 856: HdfsFileHandle* OpenHdfsFile(const hdfsFS& fs, if this always returns a cached handle, would be good to reflect that in the name. Line 860: void CacheOrCloseFileHandle(const char* fname, HdfsFileHandle* fid); rename http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py File tests/query_test/test_hdfs_fd_caching.py: Line 76: # File deleted: either # rows is the same or # rows = 0 we don't make any guarantees for this, so might as well not check the result. we just don't want to crash. Line 98: new_table_location = "{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database) long lines Line 105: # Delete the whole directory (including data file) this function is pretty much identical to the one above (and below), except for the specific modification action. consolidate into a single function and drive that with an enum parameter. -- To view, visit http://gerrit.cloudera.org:8080/6478 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5273: Replace StringCompare with glibc memcmp
Jim Apple has posted comments on this change. Change subject: IMPALA-5273: Replace StringCompare with glibc memcmp .. Patch Set 3: Code-Review+2 Forgot to carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5273: Replace StringCompare with glibc memcmp
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5273: Replace StringCompare with glibc memcmp .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/538/ -- To view, visit http://gerrit.cloudera.org:8080/6768 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP .. Patch Set 6: (9 comments) http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 150: row_format_flags |= kudu::client::KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES; why start with no_flags instead of setting this directly in l149? Line 201: if (slot->type().type != TYPE_TIMESTAMP) continue; pull out timestamp slots into a separate vector (which you set up in prepare), no need to check every column on every row whether it's a timestamp. http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: Line 111: Status WriteKuduRowValue(kudu::KuduPartialRow* row, int col, PrimitiveType type, fix signature (output params go at end). also, is "row value" a kudu term (that means a column value)? to me this sounds like it's writing a whole row, which isn't the case. http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.cc File be/src/runtime/timestamp-value.cc: Line 19: #include "runtime/timestamp-value.inline.h" remove this, unless you're calling these functions here (which doesn't appear to be the case). by including this you avoided linker errors, which would have forced you to include the .inline.h in the referencing .cc files. Line 98: if (UNLIKELY(nullptr == localtime_r(, ))) { fix order http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 193: /// Interpret 'this' as a timestamp in UTC and convert to unix time in microseconds. > I'd prefer to keep this consistent with the ToUnixTime() code (see l224), i i don't know when that's going to happen. at least leave a todo in the implementation to revisit. http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 25 are there other includes you can remove? http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.inline.h File be/src/runtime/timestamp-value.inline.h: Line 34: bool TimestampValue::UtcToUnixTime(time_t* unix_time) const { you need to prefix all of these with inline. what you have right now removes the inlining and potentially degrades performance. (you need to include the .inline.h file in every .cc file that includes the .h file right now and calls the inlined functions.) http://gerrit.cloudera.org:8080/#/c/6526/5/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test: Line 6:ts timestamp) > Done ? -- To view, visit http://gerrit.cloudera.org:8080/6526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: Line 104: StringValue* result = reinterpret_cast(slot); > I assumed that the comparisons on StringValues work on variable length data let's disable support. (there is a jira, but the number escapes me.) with what you have here, you'd do padding for char(30) but not for char(300). it doesn't make sense. -- To view, visit http://gerrit.cloudera.org:8080/6563 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: > There is still one thing that is not clear to me. Why is it allowed > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > TABLE? The easiest way to specify an empty list of sort by columns during CREATE TABLE is to omit the SORT BY clause altogether. To keep things simple I did not add additional support for an empty SORT BY () clause. For ALTER TABLE there needs to be a way to specify an empty list of columns, but since SORT BY is used to identify the command, the most simple form seemed to be an empty column list(). Do you think we should allow CREATE TABLE SORT BY() in addition to specify an empty set of column? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No