Below my opinion about initial python part refactoring PR -
https://github.com/apache/incubator-nlpcraft/pull/18
I tried to review how it works and want to shortly describe my remarks
which should be fixed in current PR.
After we 'll do detailed code review code, etc
If somebody has another opinion - please let's discuss.
1) python part should be optional.
- required libs for project are just: java 11 and maven.
- `mvn clean package/verify/install` should work as in master branch
without python stuff.
- Apache actions should work
https://github.com/apache/incubator-nlpcraft/actions
- ctx server and spacy are disabled by default.
So, by default everything should be same as in master branch and
requires installed java and maven only.
2) python stuff usage.
there are few python stuff installations ways:
- via scripts,
- via maven to simplify OS depended issues.
if maven way is prefered - it should be processed with some parameter -
like `mvn clean package/verify/install -some_parameter_use_python` (if
it is possible with maven)
- required libs for project: java 11, maven, python, conda (? -is it
possible to install it on backstage?)
(Suggested flag -Dauto-rm-existing-python-env= - is good thinks I guess,
but it is about general approach)
3) main current remarks about implementation.
3.1) We can't keep such progress bar in out installation output
(Look please at attached file)
How to avoid it - another question, but our final installation product
can't print it I guess
3.2) Also we have to try to find some way to avoid warning like
>> python/fasttext_module/fasttext/pybind/fasttext_pybind.cc:346:35:
warning: comparison of integers of different signs: 'int32_t' (aka
'int') and 'std::__1::vector<long long>::size_type' (aka 'unsigned
long') [-Wsign-compare]
>> for (int32_t i = 0; i < vocab_freq.size(); i++) {
I see that some of these things are out of our control - but we have to
do some workarounds, change our approaches etc - we can't provide it to
our users (Especially progressbar)
3.3) start_server.sh
Duplicated output - should be fixed (it is still here)
>>CUDA is not available for Torch.
>>CUDA is not available for Torch.
>>[2021-08-23 12:52:42]: 80055 DEBUG Registering ctxserver blueprint
>>[2021-08-23 12:52:42]: 80057 DEBUG Registering ctxserver blueprint
3.3) server starts too slowly comparing to previous version
I think 3.3 and 3.4 - significant issues and should be resolved in this
ticket (we can't merge implementations with such degradation)
Conclusions:
I guess after these 6 points (1, 2, 3.1-3.4) resolving, we can start
deep code review and detailed tests on various OS.
Other improvements etc can be moved on next tickets.
Regards,
Sergey