[algogeeks] Searching In a large file
Given a file containing roughly 300 million social security numbers(9-digit numbers), find a 9-digit number that is not in the file. You have unlimited drive space but only 2megabytes of RAM at your disposal. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
Re: [algogeeks] Searching In a large file
Clearly this is an external sorting question.. Merge sort Can Be Used.. On 10/27/11, shiva@Algo shiv.jays...@gmail.com wrote: Given a file containing roughly 300 million social security numbers(9-digit numbers), find a 9-digit number that is not in the file. You have unlimited drive space but only 2megabytes of RAM at your disposal. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
Re: [algogeeks] Searching In a large file
I've been working on similar things lately. 1st, if the file is unsorted, you'll have to just go through it, it'll be O(n) worst case, which is preferred rather than sorting it. 2nd if it is sorted, read and store the maximum chunk of the file that you can, a program is allotted 4 GB of address space, out of which in reality may be 3 GB is useable. So an array worth 3 GB, which is roughly ( ( 3 * 1024 * 1024 * 1024 ) / 4 ) 805306368 integer elements ( assuming each number fits in 4 bytes ), do a binary search. If not found, go to the next chunk.. etc On Thu, Oct 27, 2011 at 10:17 PM, Prem Krishna Chettri hprem...@gmail.comwrote: Clearly this is an external sorting question.. Merge sort Can Be Used.. On 10/27/11, shiva@Algo shiv.jays...@gmail.com wrote: Given a file containing roughly 300 million social security numbers(9-digit numbers), find a 9-digit number that is not in the file. You have unlimited drive space but only 2megabytes of RAM at your disposal. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- Anup Ghatage -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
Re: [algogeeks] Searching In a large file
read the question guys, only 2 mb of memory... i would perhaps try to find the range of the numbers... say the range is 300 million here... so i would maintain a count array, count[i]+=1 if on traversing the large file i find a number from chunk i, here chunk can be decided suitably, for example ith chunk means all numbers occuring between 1i and 1(i+1)... (chunk size decided upon the size of the RAM)... then suppose each chunk should have say 5000 elements, so each count should have 5000 elements... suppose the jth count variable has less than 5k elements... so on the nest pass, i will maintain a 1 element bit array, and set correcponding bit to 1 if the number is found in the file... after going through the bit array once, we can find the desired number... -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.