Hello all, I've been interested in this topic for some time but have never taken the time to run any tests. I don't have the time now (for sure), but I took some anyway. What I did was grab a huge unique word file, clear out words that are obviously illegal JSON key names and tried doing lookups three ways:
* Sequential Find in array * Binary Find in sorted array * Object lookup For reference, here's a link to the original file full of words: https://github.com/dwyl/english-words/blob/master/words.zip I tried not to bias the tests as what I'd like are useful results. Still, I didn't test a whole lot of different ways and bias is nearly impossible to avoid. Even if my test is totally fair, there's no way that it's complete - results always depend on the data under test. This is part of why algorithms are described using big O terminology. You have a way of talking about the performance envelope around the algorithm under various sorts of conditions. (That's a terrible description, but so be it.) As an example, it's very easy to compare s sequential find in array with a binary search. There are only a couple of cases where sequential is faster, no matter the size of the array. With 4D's object lookups, we just don't know. Even if they are a hash table (likely but not confirmed), this doesn't tell us much. (Hash tables have a whole lot of components in their implementation, some of which can behave in weird ways, depending on your data set+hashing function. It also matters what you use to find actual values, not just hash bins.) Anyway, here are a set of results in a compiled system with ~465,000 words/keys: Words: 466,474 Tests: 10,000 Sequential: 107,777 Binary: 153 Object: 9 The three times are in milliseconds. As in, "Searching for 10,000 different words in an array of 466,474 unique words took a little over 1/10th of a second using a sorted array." That's roughly 1/2 of a blink. (Not kidding.) Comments and take-aways: * Binary search is great. * Object search is great. * Sequential search is not so great, but it still only took about 11 seconds. * I noticed that setting up the sorted array took no time and that setting up the object took time that I could feel. I didn't do timing results on this. But if it's true, the *overall* time (including setup) for the object was *unfavorable.* Conclusion: I'll use objects when I need them and sorted arrays when I need them. The performance difference is too small to be a factor, it will come down to other properties of these data structures. ++ To Justin on the whole 'use the lookup value as the key' tip (Rob has mentioned this too.) I sue that all of the time in objects, it's a really excellent practice. If anyone wants to re-run the tests or check my code for logic errors, dumb errors, bias, etc., here's the code with comments: If (False) // https://github.com/dwyl/english-words/blob/master/words.zip // Imported into a new table in 4D. // Cleared ones starting with numbers or punctuation. // 466,475 words left. End if //---------------------------------------------------------------- // Setup //---------------------------------------------------------------- ALL RECORDS([word]) ORDER BY([word];[word]word) // 4D indexed sort ARRAY TEXT($words_at;0) SELECTION TO ARRAY([word]word;$words_at) // Sorted arraa of 464K+ words C_OBJECT($words_object) C_LONGINT($words_count) C_LONGINT($word_index) $words_object:=JSON Parse("{}") $words_count:=Size of array($words_at) For ($word_index;1;$words_count) C_TEXT($word) $word:=$words_at{$word_index} // {"hello":"HELLO"} - no reason for the lower/upper other than to make it read in the Debugger. OB SET($words_object;Lowercase($word);Uppercase($word)) End for // Let's build an array of random words from the main array of words. C_LONGINT($test_words_count) $test_words_count:=10000 // Note: Cannot be larger than the $words_count // Hmmm. Not getting a good distribution of indexes from Random. // Instead, I'll pick words from different positions along the array. ARRAY TEXT($test_words_at;$test_words_count) $test_words_at{1}:=$words_at{1} // Best case for a sequential scan $test_words_at{2}:=$words_at{$words_count} // Worst case for a sequential scan C_LONGINT($interval) // Now we want to fill in the rest of the test array. // The selected words are grabbed from even intervals along the array. // The speed difference for a sequential search should be linear. // The speed difference for a binary search should be very small amongst words. // It should take up to about ~18 reads to find the word. // The speed difference for the object? No clue, we don't know how they're implemented. // With a fast hash and a large hash table, it could be very quick. Hard to say. $interval:=$words_count\$test_words_count C_LONGINT($test_word_index) For ($test_word_index;3;$test_words_count) // Start at 3 because we just filled in 1 & 2 by hand. C_LONGINT($word_index) $word_index:=$interval*$test_word_index $test_words_at{$test_word_index}:=$words_at{$word_index} End for //---------------------------------------------------------------- // Tests //---------------------------------------------------------------- // And we're good to go. //--------------------------- // Sequential search in array //--------------------------- C_LONGINT($sequential_start) $sequential_start:=Milliseconds C_LONGINT($test_word_index) For ($test_word_index;1;$test_words_count) C_TEXT($word) C_LONGINT($match_index) $word:=$test_words_at{$test_word_index} $match_index:=Find in array($words_at;$word) End for C_LONGINT($sequential_elapsed) $sequential_elapsed:=Milliseconds-$sequential_start //--------------------------- // Binary search in array //--------------------------- C_LONGINT($binary_start) $binary_start:=Milliseconds C_LONGINT($test_word_index) For ($test_word_index;1;$test_words_count) C_TEXT($word) C_LONGINT($match_index) C_BOOLEAN($found) $word:=$test_words_at{$test_word_index} $found:=Find in sorted array($words_at;$word;>;$match_index) End for C_LONGINT($binary_elapsed) $binary_elapsed:=Milliseconds-$binary_start //--------------------------- // Object lookup //--------------------------- C_LONGINT($object_start) $object_start:=Milliseconds C_LONGINT($test_word_index) For ($test_word_index;1;$test_words_count) C_TEXT($word) C_TEXT($word_returned) $word:=$test_words_at{$test_word_index} $word_returned:=OB Get($words_object;$word) End for C_LONGINT($object_elapsed) $object_elapsed:=Milliseconds-$object_start //--------------------------- // Results summary //--------------------------- C_TEXT($tab) C_TEXT($cr) $tab:=Char(Tab) $cr:=Char(Carriage return) C_TEXT($results_text) $results_text:="" $results_text:=$results_text+"Words:"+$tab+String($words_count;"###,###,###,##0")+$cr $results_text:=$results_text+"Tests:"+$tab+String($test_words_count;"###,###,###,##0")+$cr $results_text:=$results_text+"Sequential:"+$tab+String($sequential_elapsed;"###,###,###,##0")+$cr $results_text:=$results_text+"Binary:"+$tab+String($binary_elapsed;"###,###,###,##0")+$cr $results_text:=$results_text+"Object:"+$tab+String($object_elapsed;"###,###,###,##0")+$cr SET TEXT TO PASTEBOARD($results_text) ALERT($results_text) ********************************************************************** 4D Internet Users Group (4D iNUG) FAQ: http://lists.4d.com/faqnug.html Archive: http://lists.4d.com/archives.html Options: http://lists.4d.com/mailman/options/4d_tech Unsub: mailto:4d_tech-unsubscr...@lists.4d.com **********************************************************************