Good questions. Didn't realize anyone would think through it so much:-). Answers are inline. I appreciate your help.
Pradeep -------- Are you going to process the strings-and-associated-integers all at once in a run-it-once task, then distribute an on-disk rendition of the info for access later? Or does the data structure need to be one that's able to be updated on the fly as the strings come in (possibly while the structure is queried to retrieve data), and it will eventually have to deal with 100 million string-and-ints items? Answer: The program is an analytics application. The data is fetched just once from a database and indexed. There is no need to fetch the data again until the user explicitly requests for it. What does it mean, in your case, to "index...strings...associated with a list of integers"? What do you need to be able to do after the strings have been indexed? - display (or print) the strings in alpha order (with or without their associated integers) - count # occurrences of each distinct string (or are there 100 million distinct strings?) - get associated integers based on exact match only, or case-insensitive match, or either - search for individual words or phrases (case-sensitively?) within the strings - find strings are associated with some particular integer value Ans: The users will specify regex expression to search for strings. Can there be duplicate strings that have different associated sets of integers, and you need to be able to get all the integer_sets for a particular string? Ans: no What range are the integers (16-bit, 32-bit; are they signed)? Ans: Unsigned. 32-bit. How many integers are associated with the typical string? Ans: Depends on the input data. (Do you need variable-length storage of the groups of integers, or can you decide you'll store e.g. 8 integers for each string and have that be a rational approach?) Ans: Variable length. Does the amount of disk space used for the sets of integers matter much? Ans: Not really. Multiple tasks / threads querying the data at the same time? Ans: Yes Queries coming in while updates take place? Ans: No =================================== This list is hosted by DevelopMentorŪ http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com