GitHub user KellenSunderland reopened a pull request: https://github.com/apache/incubator-joshua/pull/51
Changes to improve performance of KenLM This change cleans up the kenlm_wrap.cc file to get rid of the multimap that was recently added. It replaces the multimap with a vector/unordered_set which should allow for faster lookups (and is also less code). Also included in this change is a modification to the probRule call that packs the state and probability returned from that call into 64 bits, which is then unpacked on the Java side. This eliminates the need to reference a Java object across the JNI boundary. Finally the scope of Chart objects in KenLM is changed to be per sentence, per language model, which should guarantee that there are no crashes due to collisions. Many thanks to @kpu for providing the ideas behind these optimizations. You can merge this pull request into a Git repository by running: $ git pull https://github.com/KellenSunderland/incubator-joshua master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/51.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #51 ---- commit 5e9547526ad4bc15f48e665608897def552cb9ab Author: Kenneth Heafield <git...@kheafield.com> Date: 2016-09-13T08:58:26Z Probably won't compile but gets the idea across commit 929760a35dda5f88792c44d6eef41f3e58cf7250 Author: Kellen Sunderland <kell...@amazon.com> Date: 2016-09-13T09:23:42Z Merge branch 'master' of https://github.com/KellenSunderland/incubator-joshua commit 4e07bb66d28e55357ee6b19b3c60a76a31d8dd75 Author: Kellen Sunderland <kell...@amazon.com> Date: 2016-09-13T10:39:41Z Adapted Java side of JNI interface to get state and prob from packed long commit 0252942dafc1679f2c5d6b8d6da7cd6884ca40c3 Author: Kellen Sunderland <kell...@amazon.com> Date: 2016-09-13T11:58:05Z Manage pool of states on a per LM, per sentence basis commit 3aa528d3129809bf6a5caefbe95da7dddb655a78 Author: Kellen Sunderland <kell...@amazon.com> Date: 2016-09-13T14:32:37Z Merge branch 'master' of https://github.com/KellenSunderland/incubator-joshua commit 797f4ff606e9f573038f1e824097ec2d72815c20 Author: Kellen Sunderland <kell...@amazon.com> Date: 2016-09-13T15:28:51Z Make ChartState start at index 1. Fixes bug with state 0 which was getting confused for the vocab id 0 aka <unk>. The sign bit distinguishes a word from a ChartState id. Written by @kpu on Kellen's laptop ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---