Sorry for cross posting, but why the word 'Farsi' instead of 'Persian'? No
one says Lucnce français or Español, or Deutsch - so why Farsi?
Please read the following article, I found it quite enlightening.
http://www.cais-soas.com/CAIS/Languages/persian_not_farsi.htm
PV
--
View this message i
Thanks so much, Mike. Those runtime errors were caused by one corrupted
index, somehow corrupted during scp. It has Nothing to do with lucene 2.3.2.
For those who come by this thread:
Please "CheckIndex"
That would saved me many hours of fruitless debugging.
Cheers,
Charlie
Michael M
Yes, there is many-to-one mapping to the content index. And the size of
content data is varying say from 1K to multiple Gs. That why it is not wise
to repeat the same content in a index document.
Thanks for telling that the doc IDs are not constant. Yes, the keys to
content are generated on the f
Hi Mike:
Well the problem is consitently, but to test the code and the
project its necesary an Oracle 11g database :(
I don't know why the computation of bufferUpto variable is wrong in
the last step, during all other calls pool.buffers.length is
consitently to 366 so I asume that its OK. And t
Hi Marcelo,
Hmmm something is not right.
Somehow the byte slices, which DocumentsWriter uses to hold the
postings in RAM, became corrupt.
Is this easily reproduced?
Mike
Marcelo Ochoa wrote:
Hi Lucene experts:
I am working upgrading Lucene-Oracle integration project to latest
Lucene 2.
Hi Lucene experts:
I am working upgrading Lucene-Oracle integration project to latest
Lucene 2.3.1 code.
After correcting a minor issue on OJVMDirectory file implementation I
have the integration running with latest 2.3.1 code.
But it only works with small indexes, I think index which are lower
Sure, just include different fields in different docs in your index.
Then, when you search since each term is on a field, docs without
that field are excluded from the search.
But this is really not very different in terms of a solution than
your earlier one. You still have the issue of searching
My problem is: the [content] value can be huge. Duplicating it in more than
one index document waste disk space (and search time?). In additions, when
new documents are added to the second index, it will be faster to just index
the linked [content] once (in first index file) and any subsequent refe
Hi,
Could you run org.apache.lucene.index.CheckIndex on your index and
post the result?
Are these exceptions easily reproduced starting from scratch (new
index)?
More responses/questions below:
crspan wrote:
-- OS: Linux lg99 2.6.5-7.276-smp #1 SMP Fri Sep 28 20:33:22 AKDT
2007 x86
-- OS: Linux lg99 2.6.5-7.276-smp #1 SMP Fri Sep 28 20:33:22 AKDT 2007
x86_64 x86_64 x86_64 GNU/Linux
-- Lucene: 2.3.2 (tried 2.2.0 as well, since the index was built
around 2.2.0, jdk1.6.0_01 )
-- JDK: Sun jdk1.6.0_06 ( from jdk-6u6-linux-x64.bin ) & Sun
jdk1.5.0_15 ( from jdk-1_5_0_
Can you not convert all postcodes to coordinates and do actual distance-based
matching?
You will have to pay Royal Mail or 3rd party suppliers to get hold of the PAF
data required for this geocoding (despite having funded this already as a UK
tax payer- g)
Cheers
Mark
- Original Messa
Maybe I'm oversimplifying it, and maybe this isn't what you desire, but...
What about breaking the postcode into two (or three) different fields? Seems
easy to parse on the ingestion-side, as you just break the string at the
"middle" space. Then store "postal_area", "postal_street", and option
You could split up the field into 2 separate fields:
Postcode:NW10 7NY -> post1:NW10 post2:7NY
Then rewrite user's queries using the same logic: ie if the enter 1 term
'NW10' it gets rewritten to post1:NW10, if they enter 2 terms post1:NW10 AND
post2:7NY.
It also lets you do fuzzy search ie pos
Well, it's the one I'd use. Whether it's the best or not is...er...not so
certain .
Erick
On Tue, May 6, 2008 at 12:37 PM, Kelvin Foo Chuan Lyi <[EMAIL PROTECTED]>
wrote:
> Thanks... that's what I thought of ... but was wondering if that was the
> best method to do so... i guess it is then... :)
Have you looked at PrefixQuery? If that doesn't work for you, could you give
a few
more examples of expected inputs and outputs?
Best
Erick
On Tue, May 6, 2008 at 12:28 PM, Chris Mannion <[EMAIL PROTECTED]>
wrote:
> Hi all
>
> I've got a bit of a niggling problem with how one of my searches is
>
No easy way unless you merge your 2 indexes into:
Index: [who][accessed] [key] [content]
David1/1/2007 Abc"blah blah 123 ..."
Someone 1/2/2005 Abc"blah blah 123 ..."
Guess12/1/2000Xyz
You might have a look at using a phrase query when you have more than
one term in the query in addition to your term query, but giving the
phrase query more weight (i.e. give an exact match more weight) and
keep your original tokenization process.
Something like:
"NW10 7NY"^5 OR NW10 OR 7NY
Thanks... that's what I thought of ... but was wondering if that was the
best method to do so... i guess it is then... :)
On Wed, May 7, 2008 at 12:32 AM, Erick Erickson <[EMAIL PROTECTED]>
wrote:
> One of my favorite quotes from Roger Zelazny... "postulating
> infinity, the rest is easy".
>
>
You don't. You really have to roll your own solution here, there's
no "inter-index" awareness that I know of in Lucene.
Typically, people either do a half-half solution (that is, put the
text search in Lucene and leave the DB parts in the DB) or
de-normalize the data in a Lucene index so you don't
One of my favorite quotes from Roger Zelazny... "postulating
infinity, the rest is easy".
In this case, "infinity" is how you break up your query. The easy part is
making your search return what you want.
Assuming you know that you want "greatest" and
"hits" to go against the title field and "bea
Hi all
I've got a bit of a niggling problem with how one of my searches is working
as opposed to how my users would like it too work. We're indexing on UK
postcodes, which are in the format of a 3 or 4 character area code followed
by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M1
Hi,
I am a newbie to Lucene. I have a question for making a query that associate
2 index files:
- One index has the content index for a list of documents and a key to the
document. That means the Lucene document of this index contains 2 fields:
the 'content' and the 'key'.
- another index
I'm new to lucene and have a question on how to create a query for the
following example... Say I have two fields, Title and Description, with the
following data
Item 1
Title: The greatest hits
Description : Collection of the best music from The Beatles.
Item 2
Title: U2 collections
Description :
Eran,
Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi:
> Hi,
>
> I am looking for a way to filter a SpanQuery according to some other
> query (on another field from the one used for the SpanQuery). I need
> to get access to the spans themselves of course.
> I don't care about the scoring of the
Hi Steven ,
Hi Steven,
i tried the class and it works fine with the locale parameter "ar".
Actually we are using "fa" for farsi and "ar" for arabic.
I have added a little control for the locale parameter in my code and now i
can see the correct results.
Thank you very much for ypur help.
Esra.
Thanks Mike. Sorry, I should have mentioned that I'm using 1.6.0_04. I
happened to look at the thread a while ago and used -Xbatch but that didn't
help which made me think may be it's a different issue. I'll try with -Xint
before downgrading to 1.6.0_03 to be doubly sure.
-Gopi
On 5/6/08, Michae
Could you provide more detail on how you hit these two exceptions?
Are they reproducible from scratch (creating a new index)?
Are you using multiple threads against IndexWriter? Is autoCommit
true or false? Any prior exceptions hit? Do your documents have
varying number/configuration
Are you using JRE 1.6.0_04 or 1.6.0_05?
This sounds exactly the same as this:
http://www.gossamer-threads.com/lists/lucene/java-user/59650
If it is the same issue, which seems to be a bug in the hotspot
compiler, downgrading to JRE 1.6.0_03, or running Java with -Xbatch
(forces up-fron
[ Sorry if I'm hijacking this thread, if you feel this error is unrelated to
this thread, I'll move this to a separate thread. ]
Even after upgrading to 2.3.1 I'm running into index corruption problems.
I'm posting below the exception that is generated while searching. The stack
trace looks like,
Hi,
I am looking for a way to filter a SpanQuery according to some other query
(on another field from the one used for the SpanQuery). I need to get access
to the spans themselves of course.
I don't care about the scoring of the filter results and just need the
positions of hits found in the docu
30 matches
Mail list logo