[Ferret-talk] Using ferret as a base64-encoded numerical db

Kelly Jones Mon, 06 Jul 2009 07:15:48 -0700

I'm using ferret to store random base64 strings of length 72 (courtesy
"dd if=/dev/random ... | mmencode"), with the long-term goal of
storing floating point/integral numbers (converted to
base64). Problems:


 % Ferret regards the base64 characters "+" and "/" as word
 separators, so a search for "content:[xji xjj]" yields things like
 "FqWu9uXM99HXZEJMl0Ux/jdOSP0+XJiL9v1ZDK24D0LMp60PUMPdhkbnFQykVMfilxecQFU6"
 where "xji" appears after a plus sign. How to avoid this? I could
 change "+" to "_", but I'm not sure changing "/" to "." or ":" or "-"
 or "!" would work.

 % Ferret's default search is case-insensitive, so I get things like
 "xJiQf0PEagWJME9Tf5pFu6dk4UGGFw5Lc0PIfa9N70Mb2IG2IWO36VCsC0y7Q1zOrLjk2Lz4",
 which match "xJi" but not "xji". How to fix?

 % When I do a range query, does ferret return *all* documents
 matching the query or only the highest scoring 10? For my purposes, I
 need *all* documents matching a query, not just the first few.

Is anyone else using ferret as a db? Since it's hash-based, it's much
faster at indexing large numbers of strings than sqlite3.

I realize I could just 0-pad my numbers (eg, "000005" for 5), but I've
got a LOT of data (400M pairs of floating point numbers), so I prefer
compactness.

-- 
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

[Ferret-talk] Using ferret as a base64-encoded numerical db

Reply via email to