date:20080124

Re: Archiving Index using partitions

2008-01-24 Thread vivek sar

Thanks Otis for your response. I've few more questions, 1) Is it recommended to do index partitioning for large indexes? - We index around 35 fields (storing only two of them - simple ids) - Each document is around 200 bytes - Our index grows to around 50G a week 2) The reaso

Threads blocking on isDeleted when swapping indices for a very long time...

2008-01-24 Thread Michael Stoppelman

Hi all, I've been tracking down a problem happening in our production environment. When we switch an index after doing deletes & adds, running some searches, and finally changing the pointer from old index to new all the threads start stacking up all waiting on isDeleted(). The threads seem to fin

[ANNOUNCE] Lucene Java 2.3.0 release available

2008-01-24 Thread Michael Busch

Release 2.3.0 of Lucene Java is now available! Many new features, optimizations, and bug fixes have been added since 2.2, including: * significantly improved indexing performance * segment merging in background threads * refreshable IndexReaders * faster StandardAnalyzer and improved Toke

Re: Lucene to index OCR text

2008-01-24 Thread Kyle Maxwell

> I've been poking around the list archives and didn't really come up against > anything interesting. Anyone using Lucene to index OCR text? Any > strategies/algorithms/packages you recommend? > > I have a large collection (10^7 docs) that's mostly the result of OCR. We > index/search/etc. with Luc

Re: Lucene to index OCR text

2008-01-24 Thread Erick Erickson

Lots of luck to you, because I haven't a clue. My company deals with OCR data and we haven't had a single workable idea. Of course, our data sets are minuscule compared to what you're talking about, so we haven't tried to heuristically clean up the data. But given that Google is scanning the entir

MapReduce usage with Lucene Indexing

2008-01-24 Thread roger dimitri

Hi, I am very new to Lucene & Hadoop, and I have a project where I need to use Lucene to index some input given either as a a huge collection of Java objects or one huge java object. I read about Hadoop's MapReduce utilities and I want to leverage that feature in my case described above.

Lucene to index OCR text

2008-01-24 Thread Renaud Waldura

I've been poking around the list archives and didn't really come up against anything interesting. Anyone using Lucene to index OCR text? Any strategies/algorithms/packages you recommend? I have a large collection (10^7 docs) that's mostly the result of OCR. We index/search/etc. with Lucene withou

RE: Design questions

2008-01-24 Thread spring

> Or, you could just do things twice. That is, send your text through > a TokenStream, then call next() and count. Then send it all > through doc.add(). Hm. This means read the content twice, doesn't matter using an own analyzer oder overriding/wrapping the main analyzer. Is there anywhere a hoo

Re: stange exception while indexing

2008-01-24 Thread Michael McCandless

Oh, also, I don't think not using CFS would lead to this, unless it's somehow triggering too many file descriptors... Mike Cam Bazz wrote: no. only after that there was a gc error. I am also not using the compound index file format in order to increase indexing speed. could it be becaus

Re: stange exception while indexing

2008-01-24 Thread Michael McCandless

Hmm, you should have seen an exception before that one from optimize. Can you post the GC error? Was it an OutOfMemoryError situation? Mike On Jan 24, 2008, at 5:32 PM, Cam Bazz wrote: no. only after that there was a gc error. I am also not using the compound index file format in order to

Re: stange exception while indexing

2008-01-24 Thread Cam Bazz

no. only after that there was a gc error. I am also not using the compound index file format in order to increase indexing speed. could it be because of that? I will run the test case again tomorrow. What can I do to increase logging? Best, -C.B. On Jan 24, 2008 11:52 PM, Michael McCandless <[EMA

Re: stange exception while indexing

2008-01-24 Thread Michael McCandless

That means that one of the merges, which run in the background by default with 2.3, hit an unhandled exception. Did you see another exception logged / printed to stderr before this one? Mike Cam Bazz wrote: Does anyone have any idea about the error I got while indexing? Best Regards,

RE: Lucene, HTML and Hebrew

2008-01-24 Thread Steven A Rowe

Hi Itamar, On 01/24/2008 at 2:55 PM, Itamar Syn-Hershko wrote: > > Lucene does not store proximity relations between data in different > > fields, only within individual fields > > So are 2 calls for doc->add with the same field but different > texts are considered as 1 field (latter call being i

stange exception while indexing

2008-01-24 Thread Cam Bazz

Does anyone have any idea about the error I got while indexing? Best Regards, -C.B. Exception in thread "main" java.io.IOException: background merge hit exception: _kq:C962870 _kr:C2591 into _ks [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1749) at org.apach

RE: Design questions

2008-01-24 Thread spring

OK, I will give this a try. Now I have the problem that I do not know how to get the offsets (or positions? What is the difference?) back from the searched document... There is a IndexReader#termPositions (Term t) - but this returns the positions for the whole index, not a single document. > -

FYI: parallel corpus in 22 languages

2008-01-24 Thread Andrzej Bialecki

Hi all, Just FYI, perhaps this is old news for you ... This large corpus is freely available and it is pairwise sentence-aligned for all language combinations. This looks like a good resource for linguistic information, such as frequent words and phrases, n-gram profiles, etc. http://wt.jrc.

RE: Lucene, HTML and Hebrew

2008-01-24 Thread Itamar Syn-Hershko

Steve and all, I didn't know whether to send a detailed description of my case to aid with seeing the whole picture, or to send a list of short questions which will require loads of follow-up. I guess I know what is better now, thanks >> Lucene does not store proximity relations between data

Re: Design questions

2008-01-24 Thread Erick Erickson

I think you'll have to implement your own Analyzer and count. That is, every call to next() that returns a token will have to also increment some counter by 1. To use this, you must have some way of knowing when a page ends, and at that point you call your instance of your custom analyzer to see w

RE: Design questions

2008-01-24 Thread spring

> -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 11. Januar 2008 16:16 > To: java-user@lucene.apache.org > Subject: Re: Design questions > But you could also vary this scheme by simply storing in your document > the offsets for the beginning of each p

RE: Creating search query

2008-01-24 Thread spring

Yes, sorry, that's the case. Thank you! > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Donnerstag, 24. Januar 2008 19:49 > To: java-user@lucene.apache.org > Subject: Re: Creating search query > > That should work fine, assuming that foo and bar are the un

Re: Creating search query

2008-01-24 Thread Erick Erickson

That should work fine, assuming that foo and bar are the untokenized fields and content is the tokenized content. Erick On Jan 24, 2008 1:18 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > I have an index with some fields which are indexed and un_tokenized > (keywords) and one field which is indexed an

RE: Compass

2008-01-24 Thread spring

Thank you. > -Original Message- > From: Lukas Vlcek [mailto:[EMAIL PROTECTED] > Sent: Mittwoch, 23. Januar 2008 08:23 > To: java-user@lucene.apache.org > Subject: Re: Compass > > Hi, > > I am using Compass with Spring and JPA. It works pretty nice. > I don't store > index into databas

Creating search query

2008-01-24 Thread spring

Hi, I have an index with some fields which are indexed and un_tokenized (keywords) and one field which is indexed and tokenized (content). Now I want to create a Query-Object: TermQuery k1 = new TermQuery(new Term("foo", "some foo")); TermQuery k2 = new TermQuery(new Term("bar",

Re: Full Text Searching a Relational Model

2008-01-24 Thread Chris Lu

In general, you just need to denorm the data and create a list of Genes, and add each Genes' related information by SQLs. Ranking can be easily adjusted via each field's weight, not a big deal. Seems an ideal case for using DBSight. It can also do incremental indexing, which you may also need. --

Re: LogMergePolicy

2008-01-24 Thread Koji Sekiguchi

Thank you Steven and Yonik, I think I got it. And I can find LogMergePolicy uses Math.log() to find merges. :-) Thank you again, Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECT

Re: LogMergePolicy

2008-01-24 Thread Yonik Seeley

On Jan 24, 2008 8:40 AM, Steven Parkes <[EMAIL PROTECTED]> wrote: > I'm curious, why is LogMergePolicy named *Log*MergePolicy? > (Why not ExpMergePolicy? :-) > > Well, I guess it's a matter of perspective. When you look at the way the > algorithm works, the merge decisions are based

RE: LogMergePolicy

2008-01-24 Thread Steven Parkes

I'm curious, why is LogMergePolicy named *Log*MergePolicy? (Why not ExpMergePolicy? :-) Well, I guess it's a matter of perspective. When you look at the way the algorithm works, the merge decisions are based on a concept of level and levels are assigned based on the log of the numb

LogMergePolicy

2008-01-24 Thread Koji Sekiguchi

Hello, I'm curious, why is LogMergePolicy named *Log*MergePolicy? (Why not ExpMergePolicy? :-) Thank you, Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Full Text Searching a Relational Model

2008-01-24 Thread yarong

Hi, (Warning, not for the weak-hearted) I'm currently working on a project where we have a large and complex data model, related to Genomics. We are trying to build a search engine that provides "full text" and "field-based text" searches for our customer base (mostly academic research), and are

Re: Using RangeFilter

2008-01-24 Thread Antony Bowesman

vivek sar wrote: I've a field as NO_NORM, does it has to be untokenized to be able to sort on it? NO_NORMS is the same as UNTOKENIZED + omitNorms, so you can sort on that. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] F

Lucene search strings two

2008-01-24 Thread Prathiba Paka

Hi all. I need to check two conditions in search first i need to find out bank name next in those i need to find documents consisting particular city finally i need the documents which satisfy both conditions i.e., documents with bank+city please can anyone help me Thanks, prathiba.P

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

2008-01-24 Thread Toke Eskildsen

On Thu, 2008-01-24 at 08:18 +1100, Antony Bowesman wrote: > These are odd. The last case in both of the above shows a slowdown compared > to > 2.1 index and version and in the first 50K queries, the 2.3 index and version > is > even slower than 2.3 with 2.1 index. It catches up in the longer

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-24 Thread Fabrice Robini

Is there anything I can do to pass my Unit-Test ? Or it is impossible ? Thanks a lot, Fabrice Fabrice Robini wrote: > > Hi Srikant, > > I really thank you for your reply, it's very interesting. > I have to say I am confused with that now... > I do not know what I can to for passing this U

Re: Using RangeFilter

2008-01-24 Thread vivek sar

I've a field as NO_NORM, does it has to be untokenized to be able to sort on it? On Jan 21, 2008 12:47 PM, Antony Bowesman <[EMAIL PROTECTED]> wrote: > vivek sar wrote: > > I need to be able to sort on optime as well, thus need to store it . > > Lucene's default sorting does not need the field to

Re: Archiving Index using partitions

Threads blocking on isDeleted when swapping indices for a very long time...

[ANNOUNCE] Lucene Java 2.3.0 release available

Re: Lucene to index OCR text

Re: Lucene to index OCR text

MapReduce usage with Lucene Indexing

Lucene to index OCR text

RE: Design questions

Re: stange exception while indexing

Re: stange exception while indexing

Re: stange exception while indexing

Re: stange exception while indexing

RE: Lucene, HTML and Hebrew

stange exception while indexing

RE: Design questions

FYI: parallel corpus in 22 languages

RE: Lucene, HTML and Hebrew

Re: Design questions

RE: Design questions

RE: Creating search query

Re: Creating search query

RE: Compass

Creating search query

Re: Full Text Searching a Relational Model

Re: LogMergePolicy

Re: LogMergePolicy

RE: LogMergePolicy

LogMergePolicy

Full Text Searching a Relational Model

Re: Using RangeFilter

Lucene search strings two

Re: Multiple searchers (Was: CachingWrapperFilter: why cache per IndexReader?)

Re: Is Fair Similarity working with lucene 2.2 ?

Re: Using RangeFilter

34 matches

Site Navigation

Mail list logo

Footer information