Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/93
  
    • Shall I commit all the changes? Or would it better to pinpoint my own 
files only?
    
    Just push your files by `git add <select only your files>` .
    
    • Could you rename the title of this PR to `[WIP][HVIEMALL-126] Maximum 
Entropy Model using OpenNLP` to clarify this PR more? 
    
    • As commented in the review, why you are using multiple threads for 
training and OOB tests? Single thread execution is enough for MaxEntropy 
classifier and your OOB test scheme is incorrect.
    
    •  I have a little concern about memory usage of OpenNLP MaxEnt 
implementation. Not like other online classifier implementation, it holds 
entire training data in memory. Hivemall's RandomForest implementation holds 
entire dataset in memory but tried to consume less memory using CSRMatrix.
    
    Max Entropy classifier is also known as Multinominal Logistic Regression. 
It might be better to evaluate [Smile's 
one](https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/classification/LogisticRegression.java#L264)
 as well. I need to evaluate memory consumptions of your implementation and 
OpenNLP maxent because each task of Hadoop workers can use limited relatively 
small memory space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to