skhdl edited a comment on pull request #18:
URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-903092819
Hi!
I didn't review code yet but tried to test it.
1) I have updated sources
2) I deleted <user_home>/.nlpcraft-python and all local models which were
downloaded into source folder before.
3) `mvn clean package` from project root folder
Some remarks:
3.1) We have many suspicious warnings like below, it can confuse users.
src/fasttext.cc:701:26: warning: comparison of integers of different signs:
'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
for (size_t j = 0; j < dim; j++) {
~ ^ ~~~
3.2) progress bar - very good, but is it possible to decrease progress step ?
.....
[INFO] 931227/4398040K
[INFO] 931243/4398040K
[INFO] 931259/4398040K
.....
We have too many lines in console, it overrides all other output.
maybe better to have just 20 (by 5%) or 50 (by 2%) steps which should be
shown? (Or any other way to decrease progress output lines count)
4) Model downloaded into
<project_root>/nlpcraft/src/main/python/ctxword/data/cc.en.300.bin
Is it ok? Maybe better to download and use it from
<user_home>/.nlpcraft-python ?
Reason - if we download new NLpCraft new version (even minor binary release)
we have to install and download everything again.
Maybe better to save these models files in separated folder but not in
installation folder?
5) cd nlpcraft/src/main/python/ctxword
./start_server.sh
ok, it works!
5.1) Have some warnings
./start_server.sh: line 43: deactivate: command not found
IMPORTANT REMINDER: Please do not forget to set the path to your conda.sh
file
DeprecationWarning: 'source deactivate' is deprecated. Use 'conda
deactivate'.
5.2) Some duplicated log messages:
BERT ROOT DIR:
/Users/skamov/apache/incubator-nlpcraft/nlpcraft/src/main/python/ctxword/bertft/..
BERT ROOT DIR:
/Users/skamov/apache/incubator-nlpcraft/nlpcraft/src/main/python/ctxword/bertft/..
CUDA is not available for Torch.
CUDA is not available for Torch.
[2021-08-21 12:09:01]: 64618 DEBUG Registering ctxserver blueprint
[2021-08-21 12:09:01]: 64616 DEBUG Registering ctxserver blueprint
6) I stopped server
7) mvn clean package (again)
Seems it doesn't download models again (rigth? - Do you copy them to
<project_root>/nlpcraft/src/main/python/ctxword/data from some cache ?
or how do they appear here?)
it starts to download and build some python related staff again. Should it?
I guess it should be one time
8) I deleted <project_root>/nlpcraft/src/main/python/ctxword/data folder.
9) mvn clean package - it creates folder again.. ok.
BTW:
start_server.sh works very slow for me. Previously server started slowly
first time, but next starts were quick enough, but now all starts are too slow
on my env.
General
1) I guess we have to provide some detailed instructions and explanations
for users related to installation process like
- if you first run `mvn clean package` - python stuff and models downloaded
- second run - we check that models are already downloaded and do nothing
with them - Also I guess python stuff shouldn't be reinstalled. What do you
think ?
- if you want to re-download models for some reasons(broken installation
etc) - find and delete the folder (which folder) and run `mvn clean package`
again (like first run)
same for python stuff
2) Note please that we have some checks:
https://github.com/apache/incubator-nlpcraft/pull/18/checks.
Now these tests failed.
We have to support these tests. I guess we have to skip python files
installations and all related tests based on some flag.
Something like this nlpcraft/pom.xml line 392
```
<excludes>
<!--
Some tests skipped on maven `verify` phase.
===========================================
-->
<!-- Reason: output is to big. -->
<exclude>**/NCDateGeneratorSpec.*</exclude>
<!-- Reason: 'contextWordServer' should be started. -->
<exclude>**/NCRestModelSpec1.*</exclude>
</excludes>
```
Check configured here <project_root>/.github/workflows/build.yml
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]