[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model usin...
Github user kottmann commented on the issue: https://github.com/apache/incubator-hivemall/pull/93 @myui that was done for the 1.6.0 release, and in maxent 3.0.3 it was modified to run in multiple threads. You probably need to take a similar approach as we took for multi-threaded training e.g. split the amount of work done per iteration and scale it out to multiple machines, merge the parameters, and repeat for the next iteration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model usin...
Github user kottmann commented on the issue: https://github.com/apache/incubator-hivemall/pull/93 @helenahm as far as I know the training data is stored once in memory, and then for each thread a copy of the parameters is stored. Yeah, so if you have a lot of training data then running out of memory is one symptom you run into, but that is not the actual problem of this implementation. The actual cause is that it won't scale beyond one machine. Bottom line if you want to use GIS training with lots of data don't use this implementation, the training requires a certain amount of CPU time and it increases with the amount of training data. In case you manage to make this run with much more data the time it will take to run will be uncomfortably high. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #93: [WIP][HIVEMALL-126] Maximum Entropy Mod...
Github user kottmann commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/93#discussion_r130319727 --- Diff: core/pom.xml --- @@ -103,6 +103,12 @@ ${guava.version} provided + + opennlp + maxent + 3.0.0 --- End diff -- Consider using a recent version of OpenNLP instead. The maxent code was moved into opennlp-tools (latest version is 1.8.1). The version you use here is from a couple of years ago. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---