Comments inline: ===
-----Original Message----- From: Siva Thumma [mailto:[email protected]] Sent: Saturday, November 15, 2014 8:06 AM To: [email protected] Subject: Re: How can I make better project than Lucene? To build such a big product, One would obviously attribute the license. Sent from iPhone > On 15-Nov-2014, at 5:12 pm, Will Martin <[email protected]> wrote: > > Btw: SwSong should not steal code; which implies an existing license whose > terms he is willing to break. Not a good first step. ;-) > > will > > -----Original Message----- > From: Michael McCandless [mailto:[email protected]] > Sent: Saturday, November 15, 2014 6:22 AM > To: [email protected] > Subject: Re: How can I make better project than Lucene? > > Actually I think competing projects is very healthy for open source > development. > > There are many things you could explore to "contrast" with Lucene, e.g. write > your new search engine in Go not Java: Java has many problems, maybe Go fixes > them. Go also has a low-latency garbage collector in development ... and > Java's GC options still can't scale to the heap sizes that are practical now. :::>wmartin: if there is a problem with GC for a domain then the jdk team should be contacted or our index design maybe revisited. > > Lucene has many limitations, so your competing engine could focus on them. > E.g. the "schemalessness" of Lucene has become a big problem, and near > impossible to fix at this point, and prevents new important features like > LUCENE-5879 from being possible, so you could give your engine a "gentle" > schema from the start. :::> I'm always amazed when I find references to fieldnames...rather than enums or ids. A scehma should and often does in Lucene, result in an automata or maybe (fst)....so why isn't the schema implemented as such? Too slow? > > The Lucene Filter/Query situation is a mess: one should extend the other. > :::> um doesn't IntellliJ JetBrains refactor? Is it too dumb? > Lucene has weak support for proximity queries (SpanQuery is slow and does not > get much attention). > :::> I wrote proximity for CPL (DataTimes, DOW-JONES, LoC, AOL). Give me more information. > Lucene is showing its age, missing some compelling features like a builtin > transaction log, "core" support for numerics (they are sort of hacked on > top), optimistic concurrency support (sequence ids, versions, something), > distributed support (near real time replication, etc.), multi-tenancy, an > example server implementation, so the search servers on top of Lucene have > had to fill these gaps. Maybe you could make your engine distributed from > the start (Go is a great match for that, from what little I know). > > All 3 highlighter options have problems. > :::> Well since Postings uses a plain-jane tune of BM25raw and uses internediaries to read posts, its not surprising. Question is maybe the first thing that should be done Is profile the damn things. The DAPO (DAGO) benchmark framework has lucene search and indexing. Maybe an extension to the search collector there. > The analysis chain (attributes) is overly complex. > > In your competing engine you can borrow/copy/steal from Lucene's good parts > to get started... > > > Mike McCandless > > http://blog.mikemccandless.com > > >> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <[email protected]> wrote: >> I’m developing search engine, Fastcatsearch. http://github >> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch >> >> Lucene is widely known and famous project and I cannot beat Lucene for now. >> >> But is there any chance to beat Lucene? >> >> Anything like features, performance. >> >> Please, let me know what to do to make better product than Lucene. >> >> Thank you. >
