rahul3 commented on pull request #18:
URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-900704222


   Hello,
   Thank you for reviewing this thoroughly.
   
   To answer your questions (I'm sorry for this being lengthy but I don't know 
how else to convey this):
   
   > I don't think that we have to install python stuff from maven command. I 
guess it is not standard way.
   I think it should be scripts - install.sh(cmd), maybe with uninstall.sh(cmd)
   
   **If the Ctxserver is considered to be an integral part of NLPCraft**, I 
would disagree with this for the following reasons:
   
   - Verification of every release will become more complex. This is what is 
currently happening where `mvn clean package verify` (or its variants) are 
being used as a factor to determine if the release is good without checking the 
Python part. It will be much cleaner and more accurate to validate a release if 
there is a single way to build this that tests all the areas of the project and 
their dependencies. The setup scripts running in Python with standard libraries 
(given we require Python > 3)  and making it OS independent helps, in my 
opinion.
   
   - In my opinion, implementation of Maven builds are project dependent. In 
cases where a project has dependencies on multiple languages (and their 
dependencies). I think this way (pull request 18) is a good way to do it, 
considering that our project falls in this area. I think it will be more 
beneficial if we build profiles per operating system. In my personal 
experience, I have seen this implementation used in a production system.
   
   - One of the issues with `.sh` and `.cmd` scripts is that they are OS 
dependent. Sure, we can build profiles and then implement the script based on 
the OS. That is also a solution but a different one.
   
   
   > Do we need conda ? (I am not familiar with python tools, is it necessary 
for python?)
   I ask this question because we require too many tools for nlpcraft 
installation.
   
   
   Managing Python and its dependencies is not as streamlined as Java is (in my 
opinion).
   
   Regarding Python, most users today, download Anaconda or Miniconda and 
install Python that way. [Conda](https://docs.conda.io/en/latest/) is a package 
management and environment management system for Python. It comes with Anaconda 
and Miniconda and is extremely common in the Python world.
   
   Case example (for why we need a virtual environment for Python):
   Suppose your system python version is 3.5. You use dependency X with version 
1.1. Now if you want to install NLPCraft and NLPCraft needs dependency X with 
version 1.3. Installing NLPCraft will uninstall 1.1 and you will have an issue 
with anything else that has used 1.1 (assuming something that you are using, 
changed between 1.1. and 1.3).  
   
   Also, certain dependencies support certain Python versions at times. For 
example,  there was a case when Tensorflow supported Python 3.6 and Python 3.7+ 
was released, this is one such example.
   
   @skhdl Note: I'm just explaining in case someone else who is not as familiar 
with the project and dependency management reads this. I'm well aware that you 
are very familiar with dependency management and I don't mean to be 
condescending :)
   
   
   The production like way to do this (without taking Docker as a factor), in 
my opinion is, to have a virtual environment so that the system Python is not 
affected. In this case, you are left with largely two options:
   
   - [virtualenv](https://docs.python.org/3/library/venv.html)
   - [conda](https://docs.conda.io/)
   
   The issue with virtualenv is that it uses only the system version of Python 
which the user has. For example, if you have 3.5, your virtual environment will 
use Python 3.5 only.
   
   Conda lets you manage the version of Python as well. This is primarily the 
reason I chose it. In this case, for our project, we eliminate technical debt 
by supporting one specific version of Python for this project. We will not have 
to care if 3.9, 4.0 etc. gets released, unless our dependencies are affected. 
We also give the developer the option to set the Python version and environment 
(in the way I currently designed it).
   
   In my opinion, a conda environment is a good option in production as well. 
All the developer has to do is either create a `environment.yml` file to 
recreate the conda environment or just make the entire folder a zip file and 
install it that way. I work with the assumption that, in production, I need to 
assume that there is no internet connection available after packaging. This is 
the case in many companies (especially Financial ones). My goal was to take our 
project as close to a real world production implementation as possible so there 
is minimal work left for the developer using our project.
   
   > If we need conda I guess we have to require user to configure CONDA_HOME 
env variable, but shouldn't ask him to update scripts to set path to conda.
   
   I agree. I could work on this. 
   
   
   > Where is bert models etc installation place?
   
   Same location. I did not change this. 
<project_root>/nlpcraft/src/main/python/ctxword/
   
   Eventually, we should also put this in `.nlpcraft-python` 
   
   
   
   > When the models should be downloaded?
   > - during installation (I think so) or
   
   I think so too. This pull request is to first establish a virtual 
environment for Python and get everything setup there. We can deal with the 
model download location once that is complete. This is my opinion.
   
   
   > I guess we need to add some progressbar for such long processes as 
downloading models (It is not clear - is the process going and hanging)
   
   Yes, you are right. I can work on that after we establish the python 
environment separate from the user system python environment. It works fine on 
Ubuntu 20.04 for me. I tried it on MacOS and you are correct, it hangs. I will 
work on fixing this.
   
   
   > When downloading process is killed on some intermediate phase (for 
example, some files downloaded, some aren't, some broken etc) - how should it 
be resolved?
   Is it supported to fix such situations?
   
   Can you give me an example? I believe you might be talking about FastText. I 
noticed this too recently, and I am fixing it. Thank you for pointing it out.
   
   
   > I guess server should be started and accept HTTP connections after all 
models downloading process finished.
   
   I agree.
   
   > When I deleted models from 
<project_root>/nlpcraft/src/main/python/ctxword/, start-server script printed 
that downloading in progress, but new models files weren't appear.
   The process hangs and I don't see any errors.
   
   Yes, I mentioned above that I found this issue in MacOS. I'll fix it.
   
   > Other minor issues
   
   > org.apache.nlpcraft.server.nlp.core.spacy.NCSpaCyNerEnricher.Config - 
should be fixed
   
   Will look into it and fix it.
   
   > Why some logs seem duplicated?
   
   There was a previous issue  which I fixed, where I mistakenly added a second 
log handler in the logging module. I thought I resolved this. Does it happen 
even when you pull with the most recent updates?
   
   
   Note: I also found some issues on Windows that I am working on fixing. I'm 
not sure if I should keep the pull request open and fix these issues or close 
it and start a new one. Please feel free to advise me on this.
   
   Thank you for your review.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to