rahul3 commented on pull request #18:
URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-903142689
Hello @skhdl ,
Thank you for the review and the remarks. While all the remarks are
certainly valid and need to be worked on, I find that some of them are out of
scope for this pull request. In my opinion, this is a large architectural
change and we need to make progress incrementally. I would recommend raising
separate tickets for things such as changing the download model location (for
one). My replies are below:
> 3.1) We have many suspicious warnings like below, it can confuse users.
src/fasttext.cc:701:26: warning: comparison of integers of different signs:
'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
for (size_t j = 0; j < dim; j++) {
~ ^ ~~~
I think that this is a valid concern but is out of scope and out of our
control given that we use FastText. This is the result of the [FastText
installation](https://fasttext.cc/docs/en/python-module.html) and this is the
gcc compiler throwing warnings.
> 3.2) progress bar - very good, but is it possible to decrease progress
step ?
.....
[INFO] 931227/4398040K
[INFO] 931243/4398040K
[INFO] 931259/4398040K
.....
Downloading is done via a Maven plugin. There are two plugins mainly used
from what I have seen (on the internet). There are two options (with the two
plugins), either we show the progress in this way (m/n K) or a series of
'`.......`'. This way is also long but I don't think will impact the terminal
buffer as much.
**Action taken:** I've changed it to the other way (`.................. +`)
and used a different plugin (antrun). Let me know if this way is better.
The only other option is that we do not show download progress.
>Model downloaded into
<project_root>/nlpcraft/src/main/python/ctxword/data/cc.en.300.bin
Is it ok? Maybe better to download and use it from
<user_home>/.nlpcraft-python ?
Reason - if we download new NLpCraft new version (even minor binary release)
we have to install and download everything again.
Maybe better to save these models files in separated folder but not in
installation folder?
I have already mentioned this in my first reply on this thread. I think it
is better to download and use from <user_home>/.nlpcraft-python/<some_folder>.
This is however, out of scope for this pull request (and for all the JIRA
tickets referenced). I would change the way we use Python first and then make
these changes incrementally after creating a separate JIRA ticket for the
change of model location.
> 5.1) Have some warnings
./start_server.sh: line 43: deactivate: command not found
This is to deactivate a virtual environment with `virtualenv` that the user
may have created. The thing is, `virtualenv` is deactivated with `deactivate`
but `conda` environments are deactivated with `conda deactivate`. `conda` used
to be also deactivated with `deactivate` but that has changed and you see a
Deprecation Warning. I think we can look into this once we have successfully
stepped away from using system Python.
> mvn clean package (again)
Seems it doesn't download models again (rigth? - Do you copy them to
<project_root>/nlpcraft/src/main/python/ctxword/data from some cache ?
or how do they appear here?)
It will not download models if it already finds them. As you know, the
models are large files. The Maven `clean` phase does not erase the models. I do
not think that it is a good idea to delete the models during the `clean` phase
because these large files will need to be downloaded each time.
To your question as to how the models appear there so quickly if you deleted
them, the previous plugin I used saved them to a cache. I have changed the
plugin (due to the progress bar). Please check now.
> it starts to download and build some python related staff again. Should
it? I guess it should be one time
Yes, it should. In case there are changes to the Python code and
dependencies, the `requirements.txt` file will change. For regular use and the
most part, it will be fast because most of it should be cached. The installer
will take care of this.
However, I added a property that lets the developer skip a conda environment
re-creation If you want the conda env not to created you can use:
`mvn clean compile -Dauto-rm-existing-python-env=`
To skip the dependency install is more complex. There are a few ways of
doing it. In the future, we can look at it as an enhancement. As of now, the
dependencies are basically cached after the first time by `pip` in any case. It
does not take very long.
> start_server.sh works very slow for me. Previously server started slowly
first time, but next starts were quick enough, but now all starts are too slow
on my env.
I suspect that there is something going on with FastText that we can look
into as a separate issue. Even when the `cc.en.200.bin.gz` model is not
available. I see it being autodownloaded by FastText. This can be a separate
ticket.
There is a different ticket that needs to boot up the ctxsever automatically
when we start the NLPCraft server. I think that is a good time to look into
performance issues and ways to optimize this.
> second run - we check that models are already downloaded and do nothing
with them - Also I guess python stuff shouldn't be reinstalled. What do you
think ?
I think it should be reinstalled for now. If there is a change to the
`requirements.txt` Python dependencies (as a result of code changes) the
environment would be automatically adjusted.
>if you want to re-download models for some reasons(broken installation etc)
- find and delete the folder (which folder) and run mvn clean package again
(like first run)
This will only be a delete of the models folder. We can include this in
documentation later. I would suggest doing this AFTER we change the model
downloads directory.
This is also a valid concern but I feel that this is out of the scope of
this pull request as well.
> Note please that we have some checks:
https://github.com/apache/incubator-nlpcraft/pull/18/checks.
Now these tests failed.
`conda` is not installed there. I have sent an email about this to the dev
group on
We have to support these tests. I guess we have to skip python files
installations and all related tests based on some flag.
Something like this nlpcraft/pom.xml line 392
I could be mistaken but I don't think the python installations would make it
fail. I think that these tests should pass if `conda` is available. This is the
error that I have seen so far. Would this be possible to have conda on the git
build? I'm not sure how to do this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]