[algogeeks] Searching In a large file

2011-10-27 Thread shiva@Algo
Given a file containing roughly 300 million social security
numbers(9-digit numbers), find a 9-digit number that is not in the file.
You have unlimited drive space but only 2megabytes of RAM at your
disposal.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Searching In a large file

2011-10-27 Thread Prem Krishna Chettri
Clearly this is an external sorting question..
  Merge sort Can Be Used..

On 10/27/11, shiva@Algo shiv.jays...@gmail.com wrote:
 Given a file containing roughly 300 million social security
 numbers(9-digit numbers), find a 9-digit number that is not in the file.
 You have unlimited drive space but only 2megabytes of RAM at your
 disposal.

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Searching In a large file

2011-10-27 Thread Anup Ghatage
I've been working on similar things lately.

1st, if the file is unsorted, you'll have to just go through it, it'll be
O(n) worst case, which is preferred rather than sorting it.

2nd if it is sorted, read and store the maximum chunk of the file that you
can, a program is allotted 4 GB of address space, out of which in reality
may be 3 GB is useable. So an array worth 3 GB, which is roughly ( ( 3 *
1024 * 1024 * 1024 ) / 4 ) 805306368 integer elements ( assuming each number
fits in 4 bytes ), do a binary search.
If not found, go to the next chunk.. etc



On Thu, Oct 27, 2011 at 10:17 PM, Prem Krishna Chettri
hprem...@gmail.comwrote:

 Clearly this is an external sorting question..
  Merge sort Can Be Used..

 On 10/27/11, shiva@Algo shiv.jays...@gmail.com wrote:
  Given a file containing roughly 300 million social security
  numbers(9-digit numbers), find a 9-digit number that is not in the file.
  You have unlimited drive space but only 2megabytes of RAM at your
  disposal.
 
  --
  You received this message because you are subscribed to the Google Groups
  Algorithm Geeks group.
  To post to this group, send email to algogeeks@googlegroups.com.
  To unsubscribe from this group, send email to
  algogeeks+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/algogeeks?hl=en.
 
 

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.




-- 
Anup Ghatage

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Searching In a large file

2011-10-27 Thread Siddhartha Banerjee
read the question guys, only 2 mb of memory...
i would perhaps try to find the range of the numbers... say the range is 300
million here...
so i would maintain a count array, count[i]+=1 if on traversing the large
file i find a number from chunk i, here chunk can be decided suitably, for
example ith chunk means all numbers occuring between 1i and
1(i+1)... (chunk size decided upon the size of the RAM)... then suppose
each chunk should have say 5000 elements, so each count should have 5000
elements... suppose the jth count variable has less than 5k elements... so
on the nest pass, i will maintain a 1 element bit array, and set
correcponding bit to 1 if the number is found in the file... after going
through the bit array once, we can find the desired number...

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.