skhdl commented on pull request #18:
URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-900512842


   Hi!
   I have few questions for discussion:
   1) I don't think that we have to install python stuff from maven command. I 
guess it is not standard way. 
   I think it should be scripts - install.sh(cmd), maybe with uninstall.sh(cmd)
   
   2) Do we need conda ?  (I am not familiar with python tools, is it necessary 
for python?)
   I ask this question because we require too many tools for nlpcraft 
installation.
    - java
    - maven
    - pytnon 3.8/3.9
    - and conda now
   
   If we need conda I guess we have to require user to configure CONDA_HOME env 
variable, but shouldn't ask him to update scripts to set path to conda.
   
   
   3) Where is bert models etc installation place? 
    - Now it is <project_root>/nlpcraft/src/main/python/ctxword/
    - Is new place under <user_home>/.nlpcraft-python ?
   
   OT - if yes - we have to delete from .gitignore `**/__pycache__`, `**/data` 
and `**/.ipynb_checkpoints`
   
   3) When the models should be downloaded?
    - during installation (I think so) or
    - when first attempt to start server?
   
   Related questions
    - I guess we need to add some progressbar for such long processes as 
downloading models (It is not clear  - is the process going and hanging)
    - When downloading process is killed on some intermediate phase (for 
example, some files downloaded, some aren't, some broken etc) - how should it 
be resolved? 
      Is it supported to fix such situations?
    - I guess server should be started and accept HTTP connections after all 
models downloading process finished. 
   
   
   Problems:
    - When I deleted models from 
<project_root>/nlpcraft/src/main/python/ctxword/, start-server script printed 
that downloading in progress, but new models files weren't appear. 
    The process hangs and I don't see any errors.
   
   
    Other minor issues
    1) org.apache.nlpcraft.server.nlp.core.spacy.NCSpaCyNerEnricher.Config - 
should be fixed
    2) Why some logs seem duplicated?
   
   Below log for one start 
   2021-08-16 11:32:02,783 - bertft - WARNING - CUDA is not available for Torch.
   2021-08-16 11:32:02,784 - bertft - INFO - Initializing fast text
   2021-08-16 11:32:02,784 - bertft - INFO - Found existing model, loading.
   
   2021-08-16 11:32:02,871 - bertft - WARNING - CUDA is not available for Torch.
   2021-08-16 11:32:02,872 - bertft - INFO - Initializing fast text
   2021-08-16 11:32:02,872 - bertft - INFO - Found existing model, loading.
   11:33
   ….
   2021-08-16 11:32:40,614 - bertft - INFO - Server started in 37.8301 seconds
   2021-08-16 11:32:40,639 - transformers.modeling_utils - INFO - Weights of 
RobertaForMaskedLM not initialized from pretrained model: 
[‘lm_head.decoder.bias’]
   2021-08-16 11:32:40,641 - bertft - INFO - Server started in 37.7691 seconds
   2021-08-16 11:32:41,903 - root - DEBUG - Registering ctxserver blueprint
   2021-08-16 11:32:41,903 - root - DEBUG - Registering ctxserver blueprint
   11:37
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to