Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11575 )

Change subject: IMPALA-6964: Track stats about column and page sizes in Parquet 
reader
......................................................................


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc@904
PS12, Line 904:     boost::shared_lock<boost::shared_mutex> 
bytes_read_per_col_guard_read_lock(
> Please wrap the lock in a scope block to release it after the for() stateme
Done


http://gerrit.cloudera.org:8080/#/c/11575/12/be/src/exec/hdfs-scan-node-base.cc@962
PS12, Line 962:   if (bytes_read_itr != bytes_read_per_col_.end()) {
> There's a race between unlocking and re-locking here. You can use boost:upg
boost:upgrade_lock doesn't really work for this use case. For upgrade_lock to 
work, you first have to obtain an upgradeable lock. Only a single thread can 
acquire the upgrade lock at a time. So for this use case it would be equivalent 
of just using a single exclusive lock.

There is a race condition, but it should be harmless. If the slot_id key, value 
pair is created after the shared_lock is released, but before the 
exclusive_lock is acquired, the code will just increment the the value of the 
existing key, value pair.

The R/W locking mechanism is just meant to ensure that multiple threads don't 
try to add a new key, value pair to the map at the same time.

Looking at the code a bit more though, I realized that the map should contain a 
std::atomic<int64_t>, which I've added in the most recent patch.


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py
File tests/infra/test_utils.py:

http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@36
PS12, Line 36: def test_get_bytes_summary_stats_counter():
> Can you double-check that this actually gets executed? :)
Yeah it does. I double checked the Jenkins job and it runs the tests in this 
class.


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@37
PS12, Line 37: T
> nit: remove this whitespace
Done


http://gerrit.cloudera.org:8080/#/c/11575/12/tests/infra/test_utils.py@40
PS12, Line 40:   runtime_profile = "- ExampleCounter: (Avg: 8.00 KB (8192) ; " \
> maybe pick an example that's not all the same values?
Done



--
To view, visit http://gerrit.cloudera.org:8080/11575
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I322f9b324b6828df28e5caf79529085c43d7c817
Gerrit-Change-Number: 11575
Gerrit-PatchSet: 14
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Nov 2018 21:42:10 +0000
Gerrit-HasComments: Yes

Reply via email to