reductionista commented on pull request #525:
URL: https://github.com/apache/madlib/pull/525#issuecomment-732492520
Thanks for testing it out Frank!
You're the first to test this on with tensorflow 1.13, and based on one of
the errors you're seeing, I'm guessing you're also the first to test it with
gpdb5. I only tested it on gpdb6 and postgres.
> WARNING: This version of tensorflow does not support XLA auto-cluster JIT
optimization. HINT: upgrading tensorflow may improve performance. (seg1
slice1 10.128.0.41:40001 pid=6271)
> CONTEXT: PL/Python function "fit_transition"
> ```
>
> What does user need to do to enable XLA? I am on TF 1.13.1 currently.
Nothing, probably just need to upgrade to TF 1.14. I should add the
specific minimum version requirement in the warning message. I think it was
added in 1.14 but I've only used 1.10 and 1.14, so I'll need to confirm that.
Could also be that it's been in there for a while, and they just changed how
you access it. I'll look both of those up, and also run some more
comprehensive performance tests to see how much of a difference it actually
makes and in what kind of setups. If it looks like it's not helping much
anyway (even though the TF docs say it should) then we can get rid of this
message... or maybe even not bother to enable it. I figure if it does give
better performance consistently across models, then we may as well let the user
know there's an easy way to get that, without having to tune any parameters or
upgrade hardware. Would also be important to know if there are some cases
where it could hurt performance.
> ERROR: plpy.Error: madlib_keras_fit_multiple_model error: No GPUs
configured on hosts. (plpython.c:5038)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "madlib_keras_fit_multiple_model", line 23, in
<module>
> fit_obj = madlib_keras_fit_multiple_model.FitMultipleModel(**globals())
> PL/Python function "madlib_keras_fit_multiple_model", line 147, in
__init__
> PL/Python function "madlib_keras_fit_multiple_model", line 295, in
get_accessible_gpus_for_seg
> PL/Python function "madlib_keras_fit_multiple_model"
> ```
>
> so it looks like `use gpus=NULL` is now defaulting to `TRUE` but it should
default to `FALSE` i.e., CPUs. It used to default to CPUs.
Good catch! For some reason, I get a different error when I tried to
reproduce this ( about using BOOLEAN::None in a query). But I think I see the
cause of it--fixing. I thought we had some dev-check tests for this, will add
one if we don't.
> ```
> ERROR: plpy.SPIError: PRIMARY KEY and DISTRIBUTED BY definitions
incompatible
> HINT: When there is both a PRIMARY KEY, and a DISTRIBUTED BY clause, the
DISTRIBUTED BY clause must be equal to or a left-subset of the PRIMARY KEY
> CONTEXT: Traceback (most recent call last):
> PL/Python function "madlib_keras_fit_multiple_model", line 24, in
<module>
> fit_obj.fit_multiple_model()
> PL/Python function "madlib_keras_fit_multiple_model", line 241, in
fit_multiple_model
> PL/Python function "madlib_keras_fit_multiple_model", line 509, in
init_model_output_tbl
> PL/Python function "madlib_keras_fit_multiple_model"
> ```
>
> I have also seen this error when running on a single segment.
Interesting! Looks like this was due to an unreasonably strict check in the
gpdb5 server code, where it required a specific order for the keys in the
PRIMARY KEY constraint. The server team has fixed that in gpdb6:
https://github.com/greenplum-db/gpdb/commit/e572af54e4255d501d261d51d32e1f3d3cf4b9c9
I can't recall for sure if we decided we care about supporting deep learning
in gpdb5 for this release. But since it's an easy fix, I'll just reverse the
order of the keys so that it works on both.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]