Why not convert the hash into a series of disk directories+file entries?

For example, if the hash is "ABCDEF0123456789" (assuming hex)
create a directory /ABC/DEF/012/345/ and an empty file "6789"

In this way, you let the disk become a primitive lookup hash table.
and to do a look-up, you convert the hash to a dir+file path and then
check if the dir+file path exists.

Using 3-hex digits will set a max of 4K entries per directory level.

which most filesystem can handle well. Plus, if you share the disk,

multiple systems can reference the hash data.  


You can do this in any programming language and doing a simple disk
backup preserves your hash information in event of a server crash.

On the downside, this disk lookup will be slow due to the disk I/O 
random seeks needed to look-up the disk hashes...an SSD will come
in handy in this situation.

Ambo

--- On Thu, 4/7/11, Ludwig Isaac Lim <[email protected]> wrote:

From: Ludwig Isaac Lim <[email protected]>
Subject: Re: [plug] Disk Hash in Perl
To: [email protected]
Date: Thursday, April 7, 2011, 10:56 AM


Hi Homer:


> Message:  2
> Date: Tue, 29 Mar 2011 18:59:55 +0800
> From: John Homer H Alvero <[email protected]>
> Subject: Re:  [plug] Disk Hash in Perl
> To: "Philippine Linux Users' Group (PLUG) Technical  Discussion List"
>     <[email protected]>
> Hello Ludwig,
> 
> What exactly is the  problem you are encountering right now? Is there
> any issue that you want to  address? Maybe this issue can be taken from
> a different approach as the  saying goes - theres more than one way to
> skin a cat (poor  cat).
> 

    Presently, no problem really. I figuring ways to prevent the perl program 
from using large amount of memory without suffering large performance loss. 
Just 
like to learn clever hacks from folks also :-)



Regards,
Ludwig

> 
> On Tue, Mar 29, 2011 at 6:22 PM, Ludwig Isaac Lim <[email protected]> wrote:
> > Hi  Guys:
> >
> > ? ? ? ?
 Here's the scenario:
> > ? ? ? ? A program  written in perl reads a large file and creates a very 
>large
> > hash  (actually a hash of records, hash of hash) and then used that hash in 
a
> >  lookup for other data processing. The hash has about 3,096,000 entries 
> >and  
>is
> > incrementing day by day. Right now the perl process consumes 1 G of  RAM 
>using a
> > 32 bit perl, and there's no problem (i.e. out-of-memory  error).
> >
> > ? ? ? Is there a way to put in into a disk hash that is  using written in 
>pure
> > perl (No DBMS or tools such as Berkeley DB)? I saw  on Google people 
>recommend
> > DBM::Deep. Anyone here uses DBM::Deep? I like  to know if people uses 
>DBM::Deep
> > for large disk hash and learn the  performance of using it.
> >
> > ? ? ?
 Thanks in  advance.
> >
> > Regards,
> > Ludwig
> >
_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
http://lists.linux.org.ph/mailman/listinfo/plug
Searchable Archives: http://archives.free.net.ph



      
_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
http://lists.linux.org.ph/mailman/listinfo/plug
Searchable Archives: http://archives.free.net.ph

Reply via email to