date:20040430

potential synchronization problem

2004-04-30 Thread Sebastian Ho

Hi

I forsee the following scenario in my project and hope to get a reply to
this before I start coding :

I have an standalone application which runs lucene indexing in the
background at a user specified interval (e.g. every 2 days). In the
meantime, user will be able to force a indexing operation anytime he
wish to. I assume this will cause two process of lucene writing to the
same index files (one from the background lucene and the other one by
the user). Will this cause any problem with regards to race condition or
synchronization issues if any?

Thanks

Sebastian Ho
BII


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problems From the Word Go

2004-04-30 Thread Marten Senkel


Hi Alex,

I just installed Lucene one week ago on a W2K box and it took me some time to get it 
running. But
now Lucene is fully integrated in our intranet. I index and search with Lucene our 
menu, all
documents and the user profiles and display them according to user's access rights.

By habit I always try to compile the sources of such 'external modules' inside our 
projects in order
to see at once whether things are incompatible or missing. Lucene did actually compile 
very well as
Lucene is not depending on any other modules. At runtime, however, I always got 
ClassNotFound
exceptions for classes which have been compiled and which existed at the needed 
locations. This is
really strange.
I've first upgraded to 1.4.2 and installed then the Enterprise Edition.

I tried many other things, but the only way to get things working was listing the JARs 
in the
compile class path in my IDE (for yours the inclusion in the environment classpath may 
work) and the
listing of the same JARs in the runtime class path of my application server.

This way things worked fine.

I had more trouble getting PDFbox to run in order to index PDFs as it would never 
compile as pdfbox
was depending on ant, ant on bsf, and so on.
In order to get that running I just placed the following JARs in my IDE's compile 
classpath and run
the program.
IDE: lucene-1.3-final.jar;lucene-demos-1.3-final.jar;pdfbox-0.6.5.jar
At runtime I was required to place some more JARs in the runtime classpath of the 
application
server:
Appserv: 
lucene-1.3-final.jar;lucene-demos-1.3-final.jar;PDFBox-0.6.5.jar;log4j-1.2.8.jar

If you need to run Ant, then just add the Ant JAR in your classpathes and go ahead! 
You'll then see
at compile and run time which classes are still needed and then it's up to you to find 
and download
them.

Let me know if this helps you advance in the matter.

At the beginning it was a bit hard to get started, but once the demo is running one 
progresses
really fast. Lucene is really great!

-Marten



   
   
  Alex Wybraniec 
   
  [EMAIL PROTECTED]To:   [EMAIL PROTECTED]  
 
  utions.netcc:   
   
 Subject:  Problems From the Word Go   
   
  2004-04-29 17:53 
   
  Please respond to
   
  Lucene Users List  
   
   
   
   
   




I'm sorry if this is not the correct place to post this, but I'm very
confused, and getting towards the end of my tether.

I need to install/compile and run Lucene on a Windows XP Pro based machine,
running J2SE 1.4.2, with ANT.

I downloaded both the source code and the pre-compile versions, and as yet
have not been able to get either running. I've been through the
documentation, and still I can find little to help me set it up properly.

All I want to do (to start with) is compile and run the demo version.

I'm sorry to ask such a newbie question, but I'm really stuck.

So if anyone can point me to an idiots guide, or offer me some help, I would
be most grateful.

Once I get past this stage, I'll have all sorts of juicer questions for you,
but at the minute, I can't even get past stage 1

Thank you in advance
Alex
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.672 / Virus Database: 434 - Release Date: 28/04/2004


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: potential synchronization problem

2004-04-30 Thread Otis Gospodnetic

Yes.
I suggest you devise a 'index request queue' mechanism to handle
situations like this.  This can probably be made quite generic (i.e.
not Lucene and not indexing specific).  How you go about implementing
this is up to.

Otis


--- Sebastian Ho [EMAIL PROTECTED] wrote:
 Hi
 
 I forsee the following scenario in my project and hope to get a reply
 to
 this before I start coding :
 
 I have an standalone application which runs lucene indexing in the
 background at a user specified interval (e.g. every 2 days). In the
 meantime, user will be able to force a indexing operation anytime he
 wish to. I assume this will cause two process of lucene writing to
 the
 same index files (one from the background lucene and the other one by
 the user). Will this cause any problem with regards to race condition
 or
 synchronization issues if any?
 
 Thanks
 
 Sebastian Ho
 BII
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: read only file system

2004-04-30 Thread Otis Gospodnetic

If you have a very recent Lucene, then you can disable locks with
command line parameters.  I believe a page describing various command
line parameters is on Lucene's Wiki.

Otis

--- Supun Edirisinghe [EMAIL PROTECTED] wrote:
 I think I'm alittle confused on how and index is put into use on a
 readonly file system
 
 I'm using Lucene in my web application. Our indexes are built off our
 database nightly and copied into our web app servers. 
 
 I think our web app dies from time to time and sometimes a lock is
 left
 behind from Lucene in /tmp/.  
 
 I have read that there is a disableLuceneLocks System Property(is
 that
 the full name or is it something like
 org.apache.jakarta...disableLuceneLocks?). But, I'm still not sure
 how I
 can set that. Do I give it as commandline arg to the java VM? 
 
 thanks
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Lucene] XML Indexing

2004-04-30 Thread Erik Hatcher

On Apr 29, 2004, at 11:29 PM, Samuel Tang wrote:
I've fixed the problem by myself. Thank you.
What was the solution?  Choosing a different analyzer?

I've recently been doing some work on Chinese analysis:

	http://www.blogscene.org/erik/LuceneInAction/i18n.html

But not within the context of XML.  There are obviously many variables 
in the equation (XML file encoding, the analyzer, and more).

	Erik

Samuel Tang [EMAIL PROTECTED] wrote:Any comments and 
suggestions. Please help!

Note: forwarded message attached.

...
 
http://ringtone.yahoo.com.hk/

ATTACHMENT part 2 message/rfc822
: Wed, 28 Apr 2004 23:39:30 +0800 (CST)
: Samuel Tang
: [Lucene] XML Indexing
: [EMAIL PROTECTED]
XMLIndexingDemo seems not able to index traditional Chinese 
characters. I can only search for English text and not Chinese. In 
fact, my XML document contains both Chinese and English text. How can 
I fix this problem? Is it necessary for me to convert the Chinese 
characters in BIG5 to UTF-8 before doing the file indexing? If it is, 
then how can we do it? This problem won't happen on indexing bilingual 
HTML files (Chinese  English) with Lucene Demo HTML parser.

...
 
http://ringtone.yahoo.com.hk/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
...
  
http://ringtone.yahoo.com.hk/


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problems From the Word Go

2004-04-30 Thread Erik Hatcher

Unfortunately the demo that comes with Lucene is harder to run than it 
really should be.  My suggestion is to just get the Lucene JAR, and try 
out examples from the many articles available.  My intro Lucene article 
at java.net should be easy to get up and running in only a few minutes 
of having the JAR (and basic Java know-how with classpath and such).

	Erik

On Apr 29, 2004, at 11:53 AM, Alex Wybraniec wrote:

I'm sorry if this is not the correct place to post this, but I'm very
confused, and getting towards the end of my tether.
I need to install/compile and run Lucene on a Windows XP Pro based 
machine,
running J2SE 1.4.2, with ANT.

I downloaded both the source code and the pre-compile versions, and as 
yet
have not been able to get either running. I've been through the
documentation, and still I can find little to help me set it up 
properly.

All I want to do (to start with) is compile and run the demo version.

I'm sorry to ask such a newbie question, but I'm really stuck.

So if anyone can point me to an idiots guide, or offer me some help, I 
would
be most grateful.

Once I get past this stage, I'll have all sorts of juicer questions 
for you,
but at the minute, I can't even get past stage 1

Thank you in advance
Alex
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.672 / Virus Database: 434 - Release Date: 28/04/2004
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problems From the Word Go

2004-04-30 Thread Terry Steichen

Erik,

Maybe you could donate some of those demo modules (and the accompanying
article/text) to Lucene, so they'd be incorporated officially in the
website?

Regards,

Terry

- Original Message - 
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, April 30, 2004 8:48 AM
Subject: Re: Problems From the Word Go


 Unfortunately the demo that comes with Lucene is harder to run than it
 really should be.  My suggestion is to just get the Lucene JAR, and try
 out examples from the many articles available.  My intro Lucene article
 at java.net should be easy to get up and running in only a few minutes
 of having the JAR (and basic Java know-how with classpath and such).

 Erik

 On Apr 29, 2004, at 11:53 AM, Alex Wybraniec wrote:

  I'm sorry if this is not the correct place to post this, but I'm very
  confused, and getting towards the end of my tether.
 
  I need to install/compile and run Lucene on a Windows XP Pro based
  machine,
  running J2SE 1.4.2, with ANT.
 
  I downloaded both the source code and the pre-compile versions, and as
  yet
  have not been able to get either running. I've been through the
  documentation, and still I can find little to help me set it up
  properly.
 
  All I want to do (to start with) is compile and run the demo version.
 
  I'm sorry to ask such a newbie question, but I'm really stuck.
 
  So if anyone can point me to an idiots guide, or offer me some help, I
  would
  be most grateful.
 
  Once I get past this stage, I'll have all sorts of juicer questions
  for you,
  but at the minute, I can't even get past stage 1
 
  Thank you in advance
  Alex
  ---
  Outgoing mail is certified Virus Free.
  Checked by AVG anti-virus system (http://www.grisoft.com).
  Version: 6.0.672 / Virus Database: 434 - Release Date: 28/04/2004
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problems From the Word Go

2004-04-30 Thread Erik Hatcher

On Apr 30, 2004, at 8:52 AM, Terry Steichen wrote:
Erik,

Maybe you could donate some of those demo modules (and the accompanying
article/text) to Lucene, so they'd be incorporated officially in the
website?
Sure... and in fact that has been my intention all along.  One idea 
that I had with the Lucene book effort was to build a complete 
Searchblox-like (no offense guys!) application that could be used as a 
real intranet search system.  It has turned out that this was too bold 
of an idea to develop for the book because it is marginally useful in 
the context of book examples since it would not be able to demonstrate 
all the various bells and whistles without being contrived.

I have no problem with any code I've done for the articles or the book 
becoming part of Lucene as examples.  It will be a couple of months 
before my plate is clear enough to package it up nicely enough though, 
so for now the articles will have to suffice.

Rest assured, though, that my intention is to eventually flesh out a 
really nice example web application that is easily usable.  The current 
example app is usable now, it just takes jumping through some odd 
hoops to get running unfortunately.

By the way, my JavaDevWithAnt project has been freely available for 
quite some time now, and in its current for it is an easy-to-build web 
app that searches a Lucene index of a snapshot of Ant's documentation 
(the HTML files).  You can grab it here:

	http://www.ehatchersolutions.com/JavaDevWithAnt

Eventually I'll beef that application up with new Ant 1.6 best 
practices and replace Struts with Tapestry.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Understanding Boolean Queries

2004-04-30 Thread Gerard Sychay

FWIW, I'll relate a general note from my brief experience.  I try to
structure the index to avoid the need for boolean queries as much as
possible, in order to avoid issues like yours.

For example, I was indexing dozens of columns from a database table. 
Each database row was a document, each column a field.  In order to
query all these fields, I had to construct huge boolean queries (via
MultiFieldQueryParser).  It was too slow.  After browsing some of mail
archives, i realized the proper way to do this was combine all the
columns into one field, and then add a second stored field with the name
of the column.  Now I had only one searchable field, and the queries
sped up dramatically.

 Tate Avery [EMAIL PROTECTED] 04/29/04 12:12PM 
Hello,

I have been reviewing some of the code related to boolean queries and
I
wanted to see if my understanding is approximately correct regarding
how
they are handled and, more importantly, the limitations.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Count for a keyword occurance in a file

2004-04-30 Thread Gerard Sychay

I had the same need recently.  Specifically, I wanted the ability to
display along with the results something like:

- The query jra occurred 1000 times in 600 documents.

For simple queries, the IndexReader.docFreq(Term) and
IndexReader.termDocs(Term) methods are the way to go.  But for like
phrases:

- The query juvenile arthritis occurred 100 times in 20 documents.

and wildcard queries (rheum*):

- The query rheumatology occurred 10 times in 5 documents.
- The query rheumatoid occurred 10 times in 5 documents.
- The query rheumatic occurred 10 times in 5 documents.

I had to do quite a bit more.  I ended up modifying all of the Query
classes and writing a Frequencies class. If y ou're interested, mail me
directly.

BTW, I joined the list only recently.  Lucene is GREAT!

 Ype Kingma [EMAIL PROTECTED] 04/29/04 02:56AM 
On Thursday 29 April 2004 08:14, Nader S. Henein wrote:
 Tricky, scoring has to do with the frequency of the occurrence of the
word
 as opposed to the amount of words in the file in general (Somebody
correct
 me if I'm wrong) , so short of an educated approximation, you could
hack

Lucene uses two frequencies for a term: the nr. of docs in which it
occurs
in an index (basis for IDF), and the nr of times a term occurs in a
document.

 the indexer to dynamically store the frequency of a word (oh so
 unadvisable). Personally I recommend the educated approximation,
because
 you could index the document with the number of words in it ( you
would
 have to make sure you're not using Stop Word Analyzer or Port
Stemmer) and
 then based on the score reverse engineer the result you want.

 Nader Henein

 -Original Message-
 From: hemal bhatt [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, April 28, 2004 5:50 PM
 To: Lucene Users List
 Subject: Count for a keyword occurance in a file


 Hi,

 How can I get a count of the score given by Hits.Score().
 i.e I want to know how many times a keyword occurs in a file. Any
help on
 this would be appreciated.

The easiest way is to use IndexReader. I don't know what you mean by
file
(index or document), but you can have both frequencies I mentioned
above
from an IndexReader, evt. using skipTo() to go to the document.
The methods are docFreq(Term) and termDocs(Term).

Regards,
Ype



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Zilverline webapplication

2004-04-30 Thread info

All,

For those that are interested, I've created a web application based on 
lucene that's ready to roll, and can be simply dropped in a Servlet 
Engine. It runs out of the box, doing PDF, WORD, HTML. TXT, and can (on 
WIndows for now) index zip, rart, and CHM.  I've just put up a website 
for it, and will issue the src as GPL soon later, if people are interested.

Please take look at zilverline.org, and have a swing at the war.

cheers,

  Michael Franken

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Which Field contained the hit?

2004-04-30 Thread Szakács Botond



Hi,

I've an Index, which contains documents with muliple fields. I'm searching in
several fieds. If I find a document matching my search criteria, is it possible
to determine which Filed(s) contained the matching values? Those fields
includes large documents, so not all of them is stored in the Index.

Thanks in advance,
Botond. 


This message was sent using IMP, the Internet Messaging Program.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Which Field contained the hit?

2004-04-30 Thread Erik Hatcher

Have a look at IndexSearcher.explain and see if that gives you the info 
you want.  Also, the Highlighter code may be able to help here as well 
(it is in the sandbox).

	Erik

On Apr 30, 2004, at 11:06 AM, Szakács Botond wrote:



Hi,

I've an Index, which contains documents with muliple fields. I'm 
searching in
several fieds. If I find a document matching my search criteria, is it 
possible
to determine which Filed(s) contained the matching values? Those fields
includes large documents, so not all of them is stored in the Index.

Thanks in advance,
Botond.

This message was sent using IMP, the Internet Messaging Program.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: read only file system

2004-04-30 Thread Nader Henein

I hate  to speak after Otis, but the way we deal with this is by clearing
locks on server restart, in case a server crash occurs mid indexing and we
also optimize on server restart, it doesn't happen often (God bless Resin)
but when it has we faced no problems from Lucene.

Just fir the record we have a validate function that the LuceneInit calls it
looks something like this:

try {
Directory directory =
FSDirectory.getDirectory(indexPath,false);
if ( directory.list().length == 0 ) clear() ;
Lock writeLock = directory.makeLock(writeFileName); 
if (!writeLock.obtain()) {
IndexReader.unlock(directory) ;
} else {
writeLock.release() ;
}
} catch (IOException e) {
logger.error(Index Validate,e) ;
}


Nader 

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 30, 2004 4:09 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: read only file system

If you have a very recent Lucene, then you can disable locks with command
line parameters.  I believe a page describing various command line
parameters is on Lucene's Wiki.

Otis

--- Supun Edirisinghe [EMAIL PROTECTED] wrote:
 I think I'm alittle confused on how and index is put into use on a 
 readonly file system
 
 I'm using Lucene in my web application. Our indexes are built off our 
 database nightly and copied into our web app servers.
 
 I think our web app dies from time to time and sometimes a lock is 
 left behind from Lucene in /tmp/.
 
 I have read that there is a disableLuceneLocks System Property(is that 
 the full name or is it something like 
 org.apache.jakarta...disableLuceneLocks?). But, I'm still not sure how 
 I can set that. Do I give it as commandline arg to the java VM?
 
 thanks
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Disappearing segments

2004-04-30 Thread Nader Henein

Could you share you're indexing code, and just to make sure id there
anything running on your machine that could delete these files, like an a
cron job that'll back up the index.

You could go by process of elimination and shut down your server and see if
the files disappear, coz if the problem is contained within the server you
know that you can safely go on the DEBUG rampage.

Nader 

-Original Message-
From: Kelvin Tan [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 30, 2004 9:15 AM
To: Lucene Users List
Subject: Re: Disappearing segments

An update:

Daniel Naber suggested using IndexWriter.setUseCompoundFile() to see if it
happens with the compound index format. Before I had a chance to try it out,
this happened: 

java.io.FileNotFoundException: C:\index\segments (The system cannot find the
file specified)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:200)
at
org.apache.lucene.store.FSInputStream$Descriptor.init(FSDirectory.j
ava:321)
at
org.apache.lucene.store.FSInputStream.init(FSDirectory.java:329)
at
org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:71)
at
org.apache.lucene.index.IndexWriter$1.doBody(IndexWriter.java:154)
at org.apache.lucene.store.Lock$With.run(Lock.java:116)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:149)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:131)

so even the segments file somehow got deleted. Hoping someone can shed some
light on this...

Kelvin

On Thu, 29 Apr 2004 11:45:36 +0800, Kelvin Tan said:
 Errr, sorry for the cross-post to lucene-dev as well, but I realized 
 this mail really belongs on lucene-user...
 
 I've been experiencing intermittent disappearing segments which result 
 in the following stacktrace:
 
 Caused by: java.io.FileNotFoundException: C:\index\_1ae.fnm (The 
 system cannot find the file specified) at 
 java.io.RandomAccessFile.open(Native Method) at 
 java.io.RandomAccessFile.init(RandomAccessFile.java:200)
 at
 org.apache.lucene.store.FSInputStream$Descriptor.init(FSDirectory.ja
 va:321) at 
 org.apache.lucene.store.FSInputStream.init(FSDirectory.java:329)
 at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
 at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:78)
 at 
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:104)
 at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:95)
 at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:112)
 at org.apache.lucene.store.Lock$With.run(Lock.java:116)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:103)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
 at 
 org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:75)
 
 The segment that disappears (_1ae.fnm) varies.
 
 I can't seem to reproduce this error consistently, so don't have a 
 clue what might cause it, but it usually happens after the application 
 has been running for some time. Has anyone experienced something 
 similar, or can anyone point
me
 in the right direction?
 
 When this occurs, I need to rebuild the entire index for it to be 
 usable. Very troubling indeed...
 
 Kelvin
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

ordering Search results

2004-04-30 Thread Supun Edirisinghe

I have read these 2 threads in the mail group:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06943.html

http://www.mail-archive.com/[EMAIL PROTECTED]/msg06319.html

I'm still wondering how people stil order by a field. I know that it is
possible to lexiographicaly sort results. 

I've been trying to play with boost values to massage Hits into a good
order but I think I need to order by the fields to get strict order. 

thanks


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Preventing duplicate document insertion during optimize

2004-04-30 Thread Kevin A. Burton

Let's say you have two indexes each with the same document literal.  All 
the fields hash the same and the document is a binary duplicate of a 
different document in the second index.

What happens when you do a merge to create a 3rd index from the first 
two?  I assume you now have two documents that are identical in one 
index.  Is there any way to prevent this?

It would be nice to figure out if there's a way to flag a field as a 
primary key so that if it has already added it to just skip.

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster



signature.asc
Description: OpenPGP digital signature

RemoteSearchable

2004-04-30 Thread Venu Durgam


I was wondering if you implemented search using RemoteSearchable.
Would you kindly email sample source code.

Thanks.
Venu Durgam

Re: Preventing duplicate document insertion during optimize

2004-04-30 Thread James Dunn

Kevin,

I have a similar issue.  The only solution I have been
able to come up with is, after the merge, to open an
IndexReader against the merge index, iterate over all
the docs and delete duplicate docs based on my
primary key field.

Jim

--- Kevin A. Burton [EMAIL PROTECTED] wrote:
 Let's say you have two indexes each with the same
 document literal.  All 
 the fields hash the same and the document is a
 binary duplicate of a 
 different document in the second index.
 
 What happens when you do a merge to create a 3rd
 index from the first 
 two?  I assume you now have two documents that are
 identical in one 
 index.  Is there any way to prevent this?
 
 It would be nice to figure out if there's a way to
 flag a field as a 
 primary key so that if it has already added it to
 just skip.
 
 Kevin
 
 -- 
 
 Please reply using PGP.
 
 http://peerfear.org/pubkey.asc
 
 NewsMonster - http://www.newsmonster.org/
 
 Kevin A. Burton, Location - San Francisco, CA, Cell
 - 415.595.9965
AIM/YIM - sfburtonator,  Web -
 http://peerfear.org/
 GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D
 8D04 99F1 4412
   IRC - freenode.net #infoanarchy | #p2p-hackers |
 #newsmonster
 
 

 ATTACHMENT part 2 application/pgp-signature
name=signature.asc






__
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs  
http://hotjobs.sweepstakes.yahoo.com/careermakeover 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

potential synchronization problem

Re: Problems From the Word Go

Re: potential synchronization problem

Re: read only file system

Re: [Lucene] XML Indexing

Re: Problems From the Word Go

Re: Problems From the Word Go

Re: Problems From the Word Go

Re: Understanding Boolean Queries

Re: Count for a keyword occurance in a file

Zilverline webapplication

Which Field contained the hit?

Re: Which Field contained the hit?

RE: read only file system

RE: Disappearing segments

ordering Search results

Preventing duplicate document insertion during optimize

RemoteSearchable

Re: Preventing duplicate document insertion during optimize

19 matches

Site Navigation

Mail list logo

Footer information