Re: best way to index numerical data ?

2006-03-31 Thread Liu Jin
> "Jack" == Jack  <[EMAIL PROTECTED]> writes:
> Hi I have a lot of data that is in a TEXT file which are numbers
> does anyone have a good suggestion for indexing TEXT numbers
> (zip codes, other codes, dollar amounts, quantities, etc). since
> Lucene and other indexers are really optimized for Alpha
> character indexing. What approaches are typically taken in
> computer science for example to index text numbers..hash maps or
> something else ??

Lucene is not optimized for Alpha character indexing. It's for natural
language indexing. The assumption is that the dictionary is relatively
small (say, <1M words for English), and doesn't grow linearly with the
amount of text being indexed. If your data fits into this model,
Lucene can effeciently index it, no matter what the characters are.

Regards,
Liu Jin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best way to index numerical data ?

2006-03-31 Thread benwbrewster
here is a sample of a .txt file :
I want to search for the whole number. If possible, fuzzy search would
be nice too, but not mandatory..
1975|Y|35136|72|1927|||3|005503|003|19870301|19950301|14416887|151|2301|100039292|N|84|F|50||10|A|100|Y|037|Y|89005|3042|M|S|P|

Thanks!
Jack

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best way to index numerical data ?

2006-03-31 Thread benwbrewster
here is a sample of a .txt file :
I want to search for the whole number. If possible, fuzzy search would
be nice too, but not mandatory..
1975|Y|35136|72|1927|||3|005503|003|19870301|19950301|14416887|151|2301|100039292|N|84|F|50||10|A|100|Y|037|Y|89005|3042|M|S|P|

Thanks!
Jack

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: best way to index numerical data ?

2006-03-31 Thread Paddy
What do you want to search for in the file?
how big is the file?
What format is the data in the file?

- Paddy.

-- 
http://mail.python.org/mailman/listinfo/python-list


best way to index numerical data ?

2006-03-31 Thread Jack
Hi I have a lot of data that is in a TEXT file which are numbers does
anyone have a good suggestion for indexing TEXT numbers (zip codes,
other codes, dollar amounts, quantities, etc). since Lucene and other
indexers are really optimized for Alpha character indexing. What
approaches are typically taken in computer science for example to index
text numbers..hash maps or something else ??

Thanks,

Jack

-- 
http://mail.python.org/mailman/listinfo/python-list