[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs

2017-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs
..


IMPALA-3224: De-Cloudera non-docs JIRA URLs

John Russell is planning to fix the URLS in docs in a separate commit.

Fixed using:

(git ls-files | xargs replace \
'https://issues.cloudera.org/browse/IMPALA' 'IMPALA' --) && \
git checkout HEAD docs

Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Reviewed-on: http://gerrit.cloudera.org:8080/6487
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M shell/shell_output.py
M testdata/bin/compute-table-stats.sh
M testdata/bin/create-load-data.sh
M testdata/bin/load-test-warehouse-snapshot.sh
M testdata/bin/setup-hdfs-env.sh
M tests/comparison/db_connection.py
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/custom_cluster/test_kudu_not_available.py
M tests/stress/concurrent_select.py
11 files changed, 24 insertions(+), 36 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jim Apple: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6487
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46: // the storage ID of a particular disk is unique across all the 
nodes in the cluster.
> ?
Oops, sorry missed that one. Done


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 112: ListMap hostIndex, Reference 
unknownDiskIds)
> numUnknown...?
Done


Line 275:* File Block metadata
> provide more information (such as the fact that it's used in conjunction wi
Done


Line 283:  * Constructs the metadata of a file block from its block 
location metadata
> "Constructs an FbFileBlock..."?
Done


Line 291: ListMap hostIndex, Reference 
unknownDiskIds)
> numUnknown...?
Done


Line 305: boolean isReplicaCached = 
cachedHosts.contains(loc.getHosts()[i]);
> doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), l
Not in Guava, ArrayUtils is a class in Apache Commons.


Line 327:  * using 'fbb' and returns the offset in the underlying buffer 
where the encoded file
> "in the underlying...starts": fine to leave out, that's implied by fb seman
Agreed but maybe it's good to explicitly mention it for future readers that may 
not be familiar with FB semantics.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:   Reference unknownDiskIds = new Reference();
> numUnknown...?
Unfortunately not, generics don't work with primitive types. Miss C++ :)?


Line 741:* Helper method to load the partition file metadata from scratch. 
This method is
> is this from a rebase?
Yes, this is from the performance improvements of REFRESH.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-06 Thread Dimitris Tsirogiannis
For consistency, I believe we should at least allow an empty SORT BY()
clause in the CREATE TABLE statement, but I'll defer the decision to Alex
or Marcel.

Dimitris

On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
ger...@cloudera.org> wrote:

> Lars Volker has posted comments on this change.
>
> Change subject: IMPALA-4166: Add SORT BY sql clause
> ..
>
>
> Patch Set 20:
>
> > There is still one thing that is not clear to me. Why is it allowed
>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>  > TABLE?
>
> The easiest way to specify an empty list of sort by columns during CREATE
> TABLE is to omit the SORT BY clause altogether. To keep things simple I did
> not add additional support for an empty SORT BY () clause.
>
> For ALTER TABLE there needs to be a way to specify an empty list of
> columns, but since SORT BY is used to identify the command, the most simple
> form seemed to be an empty column list().
>
> Do you think we should allow CREATE TABLE SORT BY() in addition to specify
> an empty set of column?
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6495
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
> Gerrit-PatchSet: 20
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Lars Volker 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Dimitris Tsirogiannis 
> Gerrit-Reviewer: Lars Volker 
> Gerrit-Reviewer: Marcel Kornacker 
> Gerrit-Reviewer: Thomas Tauber-Marshall 
> Gerrit-HasComments: No
>


[Impala-ASF-CR] IMPALA-5031: remove undefined behavior: call to strncmp with nullptr

2017-05-06 Thread Jim Apple (Code Review)
Jim Apple has abandoned this change.

Change subject: IMPALA-5031: remove undefined behavior: call to strncmp with 
nullptr
..


Abandoned

Done in another commit

-- 
To view, visit http://gerrit.cloudera.org:8080/6767
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Id563e81720a0a4847664fa2828ecfdcad870da5b
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs

2017-05-06 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs
..


Patch Set 6: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6487
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs

2017-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs
..


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/539/

-- 
To view, visit http://gerrit.cloudera.org:8080/6487
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5273: Replace StringCompare with glibc memcmp

2017-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5273: Replace StringCompare with glibc memcmp
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3224: De-Cloudera non-docs JIRA URLs

2017-05-06 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-3224: De-Cloudera non-docs JIRA URLs
..


Patch Set 5: Code-Review+2

Thank you for iterating over this. I had another look and it looks good to me.

-- 
To view, visit http://gerrit.cloudera.org:8080/6487
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I28ea06e89341de234f9005fdc72a2e43f0ab8182
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2550: Switch to per-query exec rpc

2017-05-06 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2550: Switch to per-query exec rpc
..


Patch Set 10:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS10, Line 204: GetPrepareStatus
> the barrier isn't really present, because getfinstancestate already waits f
Still, it's not a pattern I think we should encourage in the code base 
(DCHECK(x) where x has side-effects).  How about at least making the 
non-barrier explicit by shortcircuiting it like this:

DCHECK(coord_instance_->IsPrepared() && 
coord_instance_->GetPrepareStatus().ok());


PS10, Line 413:  backend_state->GetStatus();
> good catch, this could be cancelled at this point.
In your current code, will the real status eventually be propagated as the 
query status? Or will this race cause the query to result in CANCELLED even 
though there was a real error?


PS10, Line 424: DCHECK(query_status_.ok()); // nobody should have been able to 
cancel
> when does the query get displayed on the debug page? you still haven't retu
It gets displayed as soon as there is a client request state registered, I 
believe. (i.e. RegisterQuery()). Which is what I meant - webserver cancellation 
is definitely available before exec() returns.


PS10, Line 964: #if 0
  : do we really want this?
> remove it unless someone says we want it.
fine with me to delete.


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS11, Line 200: // fragment instance's executor will not comple
DCHECK(coord_instance_->IsPrepared() && 
coord_instance_->GetPrepareStatus().ok());

to make it explicit that this has no side-effects.


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS10, Line 51: Prepare
> i think we should do some initial setup here so we can return an error stat
the renaming helped.


PS10, Line 59: exec-fragment-instance-$0
> i left everything as is, because i didn't want to break existing tests. hap
Which tests? CAn't they be fixed? You renamed the other thread - I think it 
would be best to do this one as well. The current name is really confusing 
because it's not plural.


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS11, Line 57: n)
QueryState reference


PS11, Line 59: exec-fragment-instance
start-query-finstances


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS10, Line 195: // TODO: this might flood the log
  : LOG(WARNING) << "Couldn't get a client for " << 
query_ctx().coord_address
  : <<"\tReason: " << coord_status.GetDetail();
> address what, flooding the log?
Yes, flooding the log. i.e. the TODO would be better if it says what should 
happen in the future.

And before it looks like we'd remember the status via UpdateStatus() and then 
potentially report it back the next time we try (for periodic reports). Not 
saying that's the right solution but it does look like a subtle behavior change 
so want to think it through. I think the new behavior is probably not really 
worse though (for periodic reports, just report and try again later; for final 
report, in both old and new we'd drop it on the floor).


PS10, Line 255: // TODO: should we try to keep rpc_status for the final 
report? (but the final
  : // report, following this Cancel(), may not succeed anyway.)
> if the report rpc fails we fail the query on this node and then do nothing/
Okay. I think the old code used to remember this status (via UpdateStatus()) 
and then would report it back if it could talk to the coordinator later when 
the fragment is complete.

But I agree the protocol needs to be fixed.


Line 319:   prepare_status = instance_status;
> fis.h points out that all public non-getter functions block until the prepa
That fis.h comment was just added.

But I don't see how it's relevant to my initial comment here -- how can 
prepare_status be CANCELLED?


http://gerrit.cloudera.org:8080/#/c/6535/11/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

PS11, Line 211: arams.instance_exec_status.emplace_back();
  : params.__isset.instance_exec_status = true;
but why doesn't the coordinator initiate cancellation if any FIS reports error? 
 Is this to simplify the coordinator logic? I guess I'm okay with how you have 
this currently, but it just seems clearer to me to keep FIS status in only one 
place.


PS11, Line 264: 
despite


http://gerrit.cloudera.org:8080/#/c/6535/10/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS10, Line 59: cancelled
> vague in what way?
Vague because it's not clear what all the side effects of "cancel" are, and 
when they 

[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-05-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 46: private ConcurrentHashMap storageUuidToDiskId =
> add trailing _ to member vars
?


http://gerrit.cloudera.org:8080/#/c/6406/5/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 161:  * Synthesizes the block metadata of a file represented by 
'fileStatus' that resides
> Done
i meant: how does it "synthesize" it? e.g., by creating artificial blocks on 
some boundary.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 112: ListMap hostIndex, FileDescStats stats) 
throws IOException {
numUnknown...?


Line 275: }
provide more information (such as the fact that it's used in conjunction with 
an FbFileBlock...)


Line 283:   String[] ip_port = location.split(":");
"Constructs an FbFileBlock..."?


Line 291: 
numUnknown...?


Line 305: // Use ~REPLICA_HOST_IDX_MASK to extract the cache info (stored 
in MSB).
doesn't guava have some kind of arrayutils.contains(loc.getcachedhosts(), 
loc.gethosts()[i])?


Line 327: 
"in the underlying...starts": fine to leave out, that's implied by fb semantics.


http://gerrit.cloudera.org:8080/#/c/6406/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287: // Find the partition that this file belongs (if any).
numUnknown...?

does Reference also work?


Line 741:   HashMap partsByPath) {
is this from a rebase?


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4623: Enable file handle cache

2017-05-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4623: Enable file handle cache
..


Patch Set 4:

(18 comments)

looking good.

http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

Line 894: 
runtime_state_->io_mgr()->cached_file_handles_hit_count(reader_context_));
pretty awkward. why not call the reader directly (and maybe introduce a new 
getter)?


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

Line 441:   RuntimeProfile::Counter* cached_file_handles_hit_count_;
initialize to nullptr here, that way you don't need to do it in the c'tor 
(which you have to edit if the member order changes, or which you might have to 
duplicate if you multiple c'tors).

do that for all pointer members.


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/buffered-block-mgr.h
File be/src/runtime/buffered-block-mgr.h:

Line 25: #include 
why?


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr-scan-range.cc
File be/src/runtime/disk-io-mgr-scan-range.cc:

Line 342:   io_mgr_->CacheOrCloseFileHandle(file(), exclusive_hdfs_fh_);
yes, the name really needs to distinguish between cache operations and direct 
operations on handles.


Line 548: remote_bytes_ += stats->totalBytesRead - 
stats->totalLocalBytesRead;
why not stick that into reader_ as well?


http://gerrit.cloudera.org:8080/#/c/6478/4/be/src/runtime/disk-io-mgr.h
File be/src/runtime/disk-io-mgr.h:

Line 231:   /// Threads checkout a file handle for exclusive access and return 
it when finished.
"check out" (checkout is a noun)


Line 237:   template 
please pull this along with hdfsfilehandle out into a separate file, this file 
is large enough as it is.


Line 242: /// first k partitions will each have one additional file handle.
let's make the size uniform across all partitions (e.g., by rounding up), that 
should simplify the code.


Line 282: /// the partitions are aligned to cache line boundaries (64 
bytes).
hm, isn't there some predefined constant somewhere for the cache line size?


Line 549: Status Open(bool use_file_handle_cache) WARN_UNUSED_RESULT;
does it ever make sense to call open on the same scan range with different 
values of the bool? if not, make a parameter of the c'tor?

point out clearly in the comment that if the param is true, open does nothing.


Line 573: void* meta_data_;
initialize nullptr here (for all pointer members)


Line 587: /// Total number of bytes read remotely
point out why we need to maintain this particular metric instead of in 
requestcontext.


Line 601: /// handle, and no sharing takes place.
also mention when this is populated and how long it's valid.


Line 856:   HdfsFileHandle* OpenHdfsFile(const hdfsFS& fs,
if this always returns a cached handle, would be good to reflect that in the 
name.


Line 860:   void CacheOrCloseFileHandle(const char* fname, HdfsFileHandle* fid);
rename


http://gerrit.cloudera.org:8080/#/c/6478/4/tests/query_test/test_hdfs_fd_caching.py
File tests/query_test/test_hdfs_fd_caching.py:

Line 76: # File deleted: either # rows is the same or # rows = 0
we don't make any guarantees for this, so might as well not check the result. 
we just don't want to crash.


Line 98: new_table_location = 
"{0}/test-warehouse/{1}".format(FILESYSTEM_PREFIX, unique_database)
long lines


Line 105: # Delete the whole directory (including data file)
this function is pretty much identical to the one above (and below), except for 
the specific modification action. consolidate into a single function and drive 
that with an enum parameter.


-- 
To view, visit http://gerrit.cloudera.org:8080/6478
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe5ff60971dd653c3b6a0e13928cfa9fc59d078d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5273: Replace StringCompare with glibc memcmp

2017-05-06 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-5273: Replace StringCompare with glibc memcmp
..


Patch Set 3: Code-Review+2

Forgot to carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5273: Replace StringCompare with glibc memcmp

2017-05-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5273: Replace StringCompare with glibc memcmp
..


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/538/

-- 
To view, visit http://gerrit.cloudera.org:8080/6768
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4786a4a75fdaffedd6e17cf076b5368ba4b4e3e
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5137: Support Kudu UNIXTIME MICROS as Impala TIMESTAMP

2017-05-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5137: Support Kudu UNIXTIME_MICROS as Impala TIMESTAMP
..


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 150:   row_format_flags |= 
kudu::client::KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES;
why start with no_flags instead of setting this directly in l149?


Line 201:   if (slot->type().type != TYPE_TIMESTAMP) continue;
pull out timestamp slots into a separate vector (which you set up in prepare), 
no need to check every column on every row whether it's a timestamp.


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

Line 111: Status WriteKuduRowValue(kudu::KuduPartialRow* row, int col, 
PrimitiveType type,
fix signature (output params go at end).

also, is "row value" a kudu term (that means a column value)? to me this sounds 
like it's writing a whole row, which isn't the case.


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 19: #include "runtime/timestamp-value.inline.h"
remove this, unless you're calling these functions here (which doesn't appear 
to be the case). by including this you avoided linker errors, which would have 
forced you to include the .inline.h in the referencing .cc files.


Line 98: if (UNLIKELY(nullptr == localtime_r(, ))) {
fix order


http://gerrit.cloudera.org:8080/#/c/6526/5/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 193:   /// Interpret 'this' as a timestamp in UTC and convert to unix time 
in microseconds.
> I'd prefer to keep this consistent with the ToUnixTime() code (see l224), i
i don't know when that's going to happen. at least leave a todo in the 
implementation to revisit.


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 25
are there other includes you can remove?


http://gerrit.cloudera.org:8080/#/c/6526/6/be/src/runtime/timestamp-value.inline.h
File be/src/runtime/timestamp-value.inline.h:

Line 34: bool TimestampValue::UtcToUnixTime(time_t* unix_time) const {
you need to prefix all of these with inline.

what you have right now removes the inlining and potentially degrades 
performance. (you need to include the .inline.h file in every .cc file that 
includes the .h file right now and calls the inlined functions.)


http://gerrit.cloudera.org:8080/#/c/6526/5/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

Line 6:ts timestamp)
> Done
?


-- 
To view, visit http://gerrit.cloudera.org:8080/6526
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iae6ccfffb79118a9036fb2227dba3a55356c896d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet Statistics for remaining types

2017-05-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4815, IMPALA-4817, IMPALA-4819: Populate Parquet 
Statistics for remaining types
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6563/6/be/src/exec/parquet-column-stats.cc
File be/src/exec/parquet-column-stats.cc:

Line 104: StringValue* result = reinterpret_cast(slot);
> I assumed that the comparisons on StringValues work on variable length data
let's disable support. (there is a jira, but the number escapes me.)

with what you have here, you'd do padding for char(30) but not for char(300). 
it doesn't make sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/6563
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ef4a5d25a57c82577fd498d6d1c4297ecf39312
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-06 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20:

> There is still one thing that is not clear to me. Why is it allowed
 > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
 > TABLE?

The easiest way to specify an empty list of sort by columns during CREATE TABLE 
is to omit the SORT BY clause altogether. To keep things simple I did not add 
additional support for an empty SORT BY () clause.

For ALTER TABLE there needs to be a way to specify an empty list of columns, 
but since SORT BY is used to identify the command, the most simple form seemed 
to be an empty column list().

Do you think we should allow CREATE TABLE SORT BY() in addition to specify an 
empty set of column?

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No