Xikui Wang has posted comments on this change.

Change subject: [ASTERIXDB-2152][FUN][COMP] Enable specifying computation 
location
......................................................................


Patch Set 12:

(2 comments)

Added two comments. One of them obviously exceeded the reviewer friendly 
comment size limit. Sorry about that. :)

https://asterix-gerrit.ics.uci.edu/#/c/2114/12/asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties
File asterixdb/asterix-common/src/main/resources/asx_errormsg/en.properties:

PS12, Line 121: Invalid computation location
> Yes, but it might be nice to report the invalid location if one is invalid.
Oh. I misunderstood your question. I thought you were asking about possibility 
here. Will address this in next patch.


https://asterix-gerrit.ics.uci.edu/#/c/2114/12/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java
File 
hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AssignPOperator.java:

PS12, Line 118: setLocationConstraint
> But I'm wondering why a location constraint is always needed for an assign 
Alright. I spent some time investigating the constraints. Let me see if I can 
convince you. :) 
Here we talk about the UDF in feed case only, as we don't do anything special 
for udf evaluation for common queries currently.
1. The partition constraint here is slightly different than the 
locationConstraint in dataset Ops which is tied to physical properties. The 
location constraint here depends on the computation locations (i.e., 
partitions) and it's decided dynamically during the query compilation. The user 
specified parallelism level, which is similar to the countConstraint, is 
translated to locationConstraints with computation location assigned in a round 
robin fashion. 
2. We could also only assign count constraint and let hyracks decide which node 
to run at runtime. However, for the current implementation, the node assignment 
is random which cannot distribute the workload evenly. ps. there is a bug in 
the random assignments also, and I submitted another patch for it.
3. One possibility is to do round robin in the node assignment for start task. 
However, hyracks treats all tasks equally.We can't really do round robin for 
the udf evaluation tasks only. In that sense, I guess assign location 
constraint here probably better.
4. Currently, the locationConstraint for assign is only set in the feed 
context. The feed datasource obtains computation nodes list and we use that as 
the count constraint for udf evaluation. My feeling is we have the full 
workload distribution information, but we ignore the detailed answer and cross 
our fingers to hope hyracks give us an answer....
5. Further, if we have advanced load balance implemented in hyracks, this 
should go away for sure. :)


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/2114
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id7eed5dac03c2f260507e16cf687162d65787bd1
Gerrit-PatchSet: 12
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Xikui Wang <xkk...@gmail.com>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Xikui Wang <xkk...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to