Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-2979: Fix scheduling on remote hosts
......................................................................


Patch Set 24:

(3 comments)

Just passing through.

http://gerrit.cloudera.org:8080/#/c/2200/24/be/src/scheduling/simple-scheduler.cc
File be/src/scheduling/simple-scheduler.cc:

PS24, Line 1089: std::mt19937 g(rand())
You'd probably want to use a std::random_device() to seed the generator here as 
rand() returns a deterministic value always; seeing as you aren't seeding with 
srand() first. We already do this here:
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/rpc/authentication.cc#L489-490


http://gerrit.cloudera.org:8080/#/c/2200/24/be/src/scheduling/simple-scheduler.h
File be/src/scheduling/simple-scheduler.h:

PS24, Line 192: int next_unused_backend_idx_;
Shouldn't this be an atomicInt?
If multiple queries are running in parallel and they all are scheduling at the 
same time, this could break the logic.


PS24, Line 353: replica_preference: This value is used as a minimum memory 
distance for all
              :   ///                     replicas. For example, by setting 
this to DISK_LOCAL, all
              :   ///                     cached replicas will be treated as if 
they were not cached,
              :   ///                     but local disk replicas. This can 
help prevent hot-spots by
              :   ///                     spreading the assignments over more 
replicas. Allowed values
              :   ///                     are CACHE_LOCAL, DISK_LOCAL and 
REMOTE.
              :   ///
              :   /// disable_cached_reads: Setting this value to true is 
equivalent to setting
              :   ///                       replica_preference to DISK_LOCAL 
and takes precedence over
              :   ///                       replica_preference.
              :   ///
              :   /// schedule_random_replica: When equivalent backends are 
found for a scan range (same
              :   ///                          memory distance, same amount of 
assigned work), then
              :   ///                          the first one will be picked 
deterministically. This aims
              :   ///                          to make better use of OS buffer 
caches, but can lead to
              :   ///                          performance bottlenecks on 
single hosts. Setting this
              :   ///                          option to true will randomly 
change the order in which
              :   ///                          equivalent replicas are picked 
for different plan nodes.
              :   ///                          This helps to compute a more 
even assignment, on the
              :   ///                          downside of increased memory 
usage for OS buffer caches.
It be useful to mention the defaults of these query options here.


-- 
To view, visit http://gerrit.cloudera.org:8080/2200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858
Gerrit-PatchSet: 24
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: anujphadke <apha...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to