skhdl commented on pull request #18:
URL: https://github.com/apache/incubator-nlpcraft/pull/18#issuecomment-900512842
Hi!
I have few questions for discussion:
1) I don't think that we have to install python stuff from maven command. I
guess it is not standard way.
I think it should be scripts - install.sh(cmd), maybe with uninstall.sh(cmd)
2) Do we need conda ? (I am not familiar with python tools, is it necessary
for python?)
I ask this question because we require too many tools for nlpcraft
installation.
- java
- maven
- pytnon 3.8/3.9
- and conda now
If we need conda I guess we have to require user to configure CONDA_HOME env
variable, but shouldn't ask him to update scripts to set path to conda.
3) Where is bert models etc installation place?
- Now it is <project_root>/nlpcraft/src/main/python/ctxword/
- Is new place under <user_home>/.nlpcraft-python ?
OT - if yes - we have to delete from .gitignore `**/__pycache__`, `**/data`
and `**/.ipynb_checkpoints`
3) When the models should be downloaded?
- during installation (I think so) or
- when first attempt to start server?
Related questions
- I guess we need to add some progressbar for such long processes as
downloading models (It is not clear - is the process going and hanging)
- When downloading process is killed on some intermediate phase (for
example, some files downloaded, some aren't, some broken etc) - how should it
be resolved?
Is it supported to fix such situations?
- I guess server should be started and accept HTTP connections after all
models downloading process finished.
Problems:
- When I deleted models from
<project_root>/nlpcraft/src/main/python/ctxword/, start-server script printed
that downloading in progress, but new models files weren't appear.
The process hangs and I don't see any errors.
Other minor issues
1) org.apache.nlpcraft.server.nlp.core.spacy.NCSpaCyNerEnricher.Config -
should be fixed
2) Why some logs seem duplicated?
Below log for one start
2021-08-16 11:32:02,783 - bertft - WARNING - CUDA is not available for Torch.
2021-08-16 11:32:02,784 - bertft - INFO - Initializing fast text
2021-08-16 11:32:02,784 - bertft - INFO - Found existing model, loading.
2021-08-16 11:32:02,871 - bertft - WARNING - CUDA is not available for Torch.
2021-08-16 11:32:02,872 - bertft - INFO - Initializing fast text
2021-08-16 11:32:02,872 - bertft - INFO - Found existing model, loading.
11:33
….
2021-08-16 11:32:40,614 - bertft - INFO - Server started in 37.8301 seconds
2021-08-16 11:32:40,639 - transformers.modeling_utils - INFO - Weights of
RobertaForMaskedLM not initialized from pretrained model:
[‘lm_head.decoder.bias’]
2021-08-16 11:32:40,641 - bertft - INFO - Server started in 37.7691 seconds
2021-08-16 11:32:41,903 - root - DEBUG - Registering ctxserver blueprint
2021-08-16 11:32:41,903 - root - DEBUG - Registering ctxserver blueprint
11:37
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]