Change in asterixdb[master]: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation lo...

Xikui Wang (Code Review) Mon, 13 Nov 2017 16:14:28 -0800

Xikui Wang has posted comments on this change.

Change subject: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation 
location
......................................................................

Patch Set 12:

(2 comments)

Added two comments. One of them obviously exceeded the reviewer friendly
comment size limit. Sorry about that. :)

https://asterix-gerrit.ics.uci.edu/#/c/2114/12/asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties
File asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties:

PS12, Line 121: Invalid computation location
> Yes, but it might be nice to report the invalid location if one is invalid.
Oh. I misunderstood your question. I thought you were asking about possibility
here. Will address this in next patch.

https://asterix-gerrit.ics.uci.edu/#/c/2114/12/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java
File
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java:

PS12, Line 118: setLocationConstraint
> But I'm wondering why a location constraint is always needed for an assign
Alright. I spent some time investigating the constraints. Let me see if I can
convince you. :)
Here we talk about the UDF in feed case only, as we don't do anything special
for udf evaluation for common queries currently.
1. The partition constraint here is slightly different than the
locationConstraint in dataset Ops which is tied to physical properties. The
location constraint here depends on the computation locations (i.e.,
partitions) and it's decided dynamically during the query compilation. The user
specified parallelism level, which is similar to the countConstraint, is
translated to locationConstraints with computation location assigned in a round
robin fashion.
2. We could also only assign count constraint and let hyracks decide which node
to run at runtime. However, for the current implementation, the node assignment
is random which cannot distribute the workload evenly. ps. there is a bug in
the random assignments also, and I submitted another patch for it.
3. One possibility is to do round robin in the node assignment for start task.
However, hyracks treats all tasks equally.We can't really do round robin for
the udf evaluation tasks only. In that sense, I guess assign location
constraint here probably better.
4. Currently, the locationConstraint for assign is only set in the feed
context. The feed datasource obtains computation nodes list and we use that as
the count constraint for udf evaluation. My feeling is we have the full
workload distribution information, but we ignore the detailed answer and cross
our fingers to hope hyracks give us an answer....
5. Further, if we have advanced load balance implemented in hyracks, this
should go away for sure. :)

--
To view, visit https://asterix-gerrit.ics.uci.edu/2114
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7eed5dac03c2f260507e16cf687162d65787bd1
Gerrit-PatchSet: 12
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Xikui Wang <xkk...@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Xikui Wang <xkk...@gmail.com>
Gerrit-HasComments: Yes

Change in asterixdb[master]: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation lo...

Reply via email to