[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model usin...

2017-08-02 Thread kottmann
Github user kottmann commented on the issue:

https://github.com/apache/incubator-hivemall/pull/93
  
@myui that was done for the 1.6.0 release, and in maxent 3.0.3 it was 
modified to run in multiple threads. 

You probably need to take a similar approach as we took for multi-threaded 
training e.g. split the amount of work done per iteration and scale it out to 
multiple machines, merge the parameters, and repeat for the next iteration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model usin...

2017-08-02 Thread kottmann
Github user kottmann commented on the issue:

https://github.com/apache/incubator-hivemall/pull/93
  
@myui the maxent 3.0.1 version went through Apache IP clearance when the 
code base was moved from SourceForge, and should be almost identical to 3.0.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model usin...

2017-08-02 Thread kottmann
Github user kottmann commented on the issue:

https://github.com/apache/incubator-hivemall/pull/93
  
Sure, there are ways to make this work across multiple machines, but then 
you can't use it like we ship it. Maybe the best solution for you would be to 
just take the code you need, strip it down and get rid of opennlp as a 
dependency?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model usin...

2017-08-02 Thread kottmann
Github user kottmann commented on the issue:

https://github.com/apache/incubator-hivemall/pull/93
  
@helenahm as far as I know the training data is stored once in memory, and 
then for each thread a copy of the parameters is stored. 

Yeah, so if you have a lot of training data then running out of memory is 
one symptom you run into, but that is not the actual problem of this 
implementation. The actual cause is that it won't scale beyond one machine.

Bottom line if you want to use GIS training with lots of data don't use 
this implementation,  the training requires a certain amount of CPU time and it 
increases with the amount of training data. In case you manage to make this run 
with much more data the time it will take to run will be uncomfortably high.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #93: [WIP][HIVEMALL-126] Maximum Entropy Mod...

2017-08-01 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/93#discussion_r130568528
  
--- Diff: core/pom.xml ---
@@ -103,6 +103,12 @@
${guava.version}
provided

+   
+   opennlp
+   maxent
+   3.0.0
--- End diff --

The same as before. I recommend to use a recent version.

Since you are including parts of the code directly I kindly ask you to also 
update the NOTICE file with Apache OpenNLP attribution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #93: [WIP][HIVEMALL-126] Maximum Entropy Mod...

2017-07-31 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/93#discussion_r130337244
  
--- Diff: core/pom.xml ---
@@ -103,6 +103,12 @@
${guava.version}
provided

+   
+   opennlp
+   maxent
+   3.0.0
--- End diff --

We fixed a couple of bugs over time, and added new features, usual 
maintenance (e.g. testing on recent java versions),  so yeah, it probably makes 
sense to use a recent version when you build something new. Also it supports 
now multi-threaded training.  

The version you are linking above is 7 years old.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #93: [WIP][HIVEMALL-126] Maximum Entropy Mod...

2017-07-31 Thread kottmann
Github user kottmann commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/93#discussion_r130319727
  
--- Diff: core/pom.xml ---
@@ -103,6 +103,12 @@
${guava.version}
provided

+   
+   opennlp
+   maxent
+   3.0.0
--- End diff --

Consider using a recent version of OpenNLP instead. The maxent code was 
moved into opennlp-tools (latest version is 1.8.1). The version you use here is 
from a couple of years ago.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---