rahul3 commented on pull request #18:
URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-903142689


   Hello @skhdl ,
   
   Thank you for the review and the remarks. While all the remarks are 
certainly valid and need to be worked on, I find that some of  them are out of 
scope for this pull request. In my opinion, this is a large architectural 
change and we need to make progress incrementally. I would recommend raising 
separate tickets for things such as changing the download model location (for 
one). My replies are below:
   
   > 3.1) We have many suspicious warnings like below, it can confuse users.
   src/fasttext.cc:701:26: warning: comparison of integers of different signs: 
'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare]
   for (size_t j = 0; j < dim; j++) {
   ~ ^ ~~~
   
   I think that this is a valid concern but is out of scope and out of our 
control given that we use FastText. This is the result of the [FastText 
installation](https://fasttext.cc/docs/en/python-module.html) and this is the 
gcc compiler throwing warnings. 
   
   
   > 3.2) progress bar - very good, but is it possible to decrease progress 
step ?
   .....
   [INFO] 931227/4398040K
   [INFO] 931243/4398040K
   [INFO] 931259/4398040K
   .....
   
   Downloading is done via a Maven plugin. There are two plugins mainly used 
from what I have seen (on the internet). There are two options (with the two 
plugins), either we show the progress in this way (m/n K) or a series of 
'`.......`'. This way is also long but I don't think will impact the terminal 
buffer as much. 
   
   **Action taken:** I've changed it to the other way (`.................. +`) 
and used a different plugin (antrun). Let me know if this way is better.  
   
   The only other option is that we do not show download progress.
   
   >Model downloaded into 
<project_root>/nlpcraft/src/main/python/ctxword/data/cc.en.300.bin
   Is it ok? Maybe better to download and use it from 
<user_home>/.nlpcraft-python ?
   Reason - if we download new NLpCraft new version (even minor binary release) 
we have to install and download everything again.
   Maybe better to save these models files in separated folder but not in 
installation folder?
   
   I have already mentioned this in my first reply on this thread. I think it 
is better to download and use from <user_home>/.nlpcraft-python/<some_folder>. 
This is however, out of scope for this pull request (and for all the JIRA 
tickets referenced). I would change the way we use Python first and then make 
these changes incrementally after creating a separate JIRA ticket for the 
change of model location.
   
   > 5.1) Have some warnings
   ./start_server.sh: line 43: deactivate: command not found
   
   This is to deactivate a virtual environment with `virtualenv` that the user 
may have created. The thing is, `virtualenv` is deactivated with `deactivate` 
but `conda` environments are deactivated with `conda deactivate`. `conda` used 
to be also deactivated with `deactivate` but that has changed and you see a 
Deprecation Warning. I think we can look into this once we have successfully 
stepped away from using system Python.
   
   
   > mvn clean package (again)
   Seems it doesn't download models again (rigth? - Do you copy them to 
<project_root>/nlpcraft/src/main/python/ctxword/data from some cache ?
   or how do they appear here?)
   
   
   It will not download models if it already finds them. As you know, the 
models are large files. The Maven `clean` phase does not erase the models. I do 
not think that it is a good idea to delete the models during the `clean` phase 
because these large files will need to be downloaded each time.
   
   To your question as to how the models appear there so quickly if you deleted 
them, the previous plugin I used saved them to a cache. I have changed the 
plugin (due to the progress bar). Please check now.
   
   > it starts to download and build some python related staff again. Should 
it? I guess it should be one time
   
   Yes, it should. In case there are changes to the Python code and 
dependencies, the `requirements.txt` file will change. For regular use and the 
most part, it will be fast because most of it should be cached. The installer 
will take care of this.
   
   However, I added a property that lets the developer skip a conda environment 
re-creation If you want the conda env not to created you can use:
   
   `mvn clean compile -Dauto-rm-existing-python-env=`
   
   To skip the dependency install is more complex. There are a few ways of 
doing it. In the future, we can look at it as an enhancement. As of now, the 
dependencies are basically cached after the first time by `pip` in any case. It 
does not take very long.
   
   > start_server.sh works very slow for me. Previously server started slowly 
first time, but next starts were quick enough, but now all starts are too slow 
on my env.
   
   I suspect that there is something going on with FastText that we can look 
into as a separate issue. Even when the `cc.en.200.bin.gz` model is not 
available. I see it being autodownloaded by FastText. This can be a separate 
ticket.
   
   There is a different ticket that needs to boot up the ctxsever automatically 
when we start the NLPCraft server. I think that is a good time to look into 
performance issues and ways to optimize this.
   
   > second run - we check that models are already downloaded and do nothing 
with them - Also I guess python stuff shouldn't be reinstalled. What do you 
think ?
   
   I think it should be reinstalled for now. If there is a change to the 
`requirements.txt` Python dependencies (as a result of code changes) the 
environment would be automatically adjusted.
   
   >if you want to re-download models for some reasons(broken installation etc) 
- find and delete the folder (which folder) and run mvn clean package again 
(like first run)
   
   This will only be a delete of the models folder. We can include this in 
documentation later. I would suggest doing this AFTER we change the model 
downloads directory.
   
   This is also a valid concern but I feel that this is out of the scope of 
this pull request as well.
   
   
   > Note please that we have some checks: 
https://github.com/apache/incubator-nlpcraft/pull/18/checks.
   Now these tests failed.
   
   `conda` is not installed there. I have sent an email about this to the dev 
group on 
   
   We have to support these tests. I guess we have to skip python files 
installations and all related tests based on some flag.
   Something like this nlpcraft/pom.xml line 392
   
   I could be mistaken but I don't think the python installations would make it 
fail. I think that these tests should pass if `conda` is available. This is the 
error that I have seen so far. Would this be possible to have conda on the git 
build? I'm not sure how to do this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to