On Oct 10, 7:32 am, Akira <ak...@yayakoshi.net> wrote: > Out of curiosity did you, in the early stages, create a single > javascript object with all the words as properties? I.e. one object > with approx 90,000 properties, one for each entry in JDIC. I tried > this once in a firefox extension and got an awful amount of > collisions. It seems the max may have been 64k properties and as you > approach that the collision rate approaches 100%.
Hi Akira. This was my first approach. Each word, in dictionary form, as a property on an object literal. In Chrome, I didn't notice any collisions. However, that might simply be due to the fact that I don't know the language, and my perception ability is low. I changed approaches after my friend told me about conjugation, and how many words that appear in use must be transformed back into dictionary form. As a single flat object, I have to fetch each word as property on the object as many times as the number of characters in the longest word in the dictionary, or until I run out of text that the mouse cursor is over. For example, copying and pasting from Google News, "島根県出雲市多伎町で出土し た、...." even though the longest valid match starting at the first character is only 3 characters, I have to first try 島, then 島根, then 島根 県, then 島根県出, etc. In the actual version of the extension, this was done in reverse, so that I could stop after the first valid match, since that would have been the longest (most specific.) However, it turns out it's useful to display more than one match, so I ended up having it continue anyway, in order to collect all of the possible matches. This is fast and works well. However, when you must also test for conjugations along each step, and then for conjugations on top of each previous conjugation, the search space grows dramatically. The way it is now, a search tree is constructed. The root is a JavaScript object literal, and it has the first character of every word as a property. The value for each of these characters is any word definitions (or none, if a word isn't formed by these characters at this particular position), plus the next object with all of the possible next characters, etc. As soon as there are no longer any matches, I can stop searching. This tree is created for both the kanji version of each word, and for the hiragana reading, so that colloquial uses are found. To avoid redundancy, the actual word data is not stored in the tree. It's in a separate object instead, and the tree merely contains the index references on the other object. Still, though, a rather huge number of objects are created, leading to high memory usage. I've minimized the number of objects by consolidating redundant definition texts and munging together the definition fields into a single string with a simple separater character, to avoid needlessly creating objects in memory. This saves about 30mb alone. The definitions are split apart into objects after being looked up. There's probably room for further optimization. On Oct 10, 5:14 pm, edvakf <taka.atsu...@googlemail.com> wrote: > As a Japanese native speaker, I thought you can probably adjust it so > that it won't look up one-letter Hiragana & Katakana characters. (but > maybe the current approach is good for a learner, I don't know) Hi edvakf. There's definitely a lot of things like that which should be added, such as minimum valid word size, etc. > For the dictionary data, I think Web Database would be THE way to go, > although it's not working properly on Mac yet (openDatabase returns > null). That sounds like a good idea, actually. In a long string of valid characters, there tend to be around 30 - 20 property lookups in the current version. The current "lookup rate" is 50ms as you pan the cursor around, for easy reading (speed was more important than memory usage.) SQLite is very fast, but I've never used it for something like this. Would it work? I'd love to be able to use relational data to reduce redundancy, instead of the current methods. I could always just decrease the lookup rate; there's a lot of room for it to slow down, and a lot of memory that it would be nice to regain :) Decreasing load time on the extension would be nice, too, as it takes a few seconds for it to read the objects into memory when Chrome starts or the extension is installed (though it happens once, in the background, so it's hard to notice.) Since I'm on Mac and the toolbar stuff doesn't seem to be working yet, I haven't added any preferences and such just yet. I'll have to wait for that to make its way here to try it, as well as Web Database. Thanks! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Chromium-extensions" group. To post to this group, send email to chromium-extensions@googlegroups.com To unsubscribe from this group, send email to chromium-extensions+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/chromium-extensions?hl=en -~----------~----~----~----~------~----~------~--~---