IMHO, Instantiated sucks GC-wise. Put more docs in it, do enough
queries, and RAMDir eventually outperforms it.
On Thu, Aug 26, 2010 at 11:24, Li Li wrote:
> I have about 70k document, the total indexed size is about 15MB(the
> orginal text files' size).
> dir=new RAMDirectory();
>
[
https://issues.apache.org/jira/browse/LUCENE-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897325#action_12897325
]
Earwin Burrfoot commented on LUCENE-2593:
-
Yeehaw! This looks very much li
I believe I've seen a similar condition a few times.
A segments file referring zero-length segment files after a disk full event.
On Fri, Jul 9, 2010 at 13:37, Michael McCandless
wrote:
> I responded on the original thread.
>
> Disk full should never cause index corruption except on very old
> ve
On Wed, Jun 9, 2010 at 15:39, Doron Cohen wrote:
> I think you'd still not modify a nicely extendible/wrapable API just to
> avoid the extra call, unless benchmarking shows that the cost is high.
Current Query API is NOT nicely extensible :)
Look above for BM25BooleanQuery mention.
--
Kirill Za
n his own? Do we need more than that?
>
> I'm not saying we should refactor the API to Matcher + Scorer, just thinking
> on what do we really need to do and what's the best way to achieve that.
>
> Shai
>
> On Wed, Jun 9, 2010 at 2:24 PM, Earwin Burrfoot wrote:
> Can we represent the Query
> state in some general structure, that no matter which Query you get, you'll
> know how to score it?
No. You could go for unified interface that allows you to express
different query states, like a set of untyped key-values, but you'll
end up switching on these keyval
Lies, lies, lies :)
I mean, Sun JIT is overrelied on. Especially in regards to inlining.
But, there are some cases when you can trust it. I.e. if you call a
virtual method and this exact call-site gets refs to different objects
at runtime (meaning here - you wrap different Queries in your
WrapperQ
uery wants it to. Wouldn't it make it more flexible?
>> -John
>>
>> On Tue, Jun 8, 2010 at 10:54 AM, Earwin Burrfoot wrote:
>>>
>>> To compute a score you have to see which of your subqueries did not
>>> match, which did, and what are the docfreqs/
ouldn't it make it more flexible?
> -John
>
> On Tue, Jun 8, 2010 at 10:54 AM, Earwin Burrfoot wrote:
>>
>> To compute a score you have to see which of your subqueries did not
>> match, which did, and what are the docfreqs/positions for them.
>> When iterating, and
BQ?
>
> please elaborate.
>
> Thanks
>
> -John
>
> On Tue, Jun 8, 2010 at 10:10 AM, Earwin Burrfoot wrote:
>>
>> The problem with your proposal is that, currently, Lucene uses current
>> iteration state to compute score.
>> I.e. it already knows wh
Shai, his wrapper Scorer will just look like:
DISI getDISI() {
return delegate.getDISI();
}
float score(int doc) {
return calcMyAwesomeScore(doc);
}
this saves delegate.nextDoc(), delegate.advance() indirection calls.
But I already offered a better alternative :)
On Tue, Jun 8, 2010 at 21:09
The problem with your proposal is that, currently, Lucene uses current
iteration state to compute score.
I.e. it already knows which of SHOULD BQ clauses matched for current
doc, so it's easier to calculate the score.
If you change API to allow scoring arbitrary documents (even those
that didn't ma
[
https://issues.apache.org/jira/browse/LUCENE-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876203#action_12876203
]
Earwin Burrfoot commented on LUCENE-2491:
-
Or we can force the same Codec for
[
https://issues.apache.org/jira/browse/LUCENE-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874857#action_12874857
]
Earwin Burrfoot commented on LUCENE-2355:
-
* NRT Reader shared live SegmentI
[
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874687#action_12874687
]
Earwin Burrfoot commented on LUCENE-2485:
-
bq. w/o this ability, there&
[
https://issues.apache.org/jira/browse/LUCENE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874650#action_12874650
]
Earwin Burrfoot commented on LUCENE-2485:
-
bq. As long as warming a new seg
[
https://issues.apache.org/jira/browse/LUCENE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874634#action_12874634
]
Earwin Burrfoot commented on LUCENE-2311:
-
bq. Does your pending patch (wh
[
https://issues.apache.org/jira/browse/LUCENE-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874224#action_12874224
]
Earwin Burrfoot commented on LUCENE-2311:
-
This is not the issue of re
[
https://issues.apache.org/jira/browse/LUCENE-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873469#action_12873469
]
Earwin Burrfoot commented on LUCENE-2480:
-
bq. Strange, there were lines i
[
https://issues.apache.org/jira/browse/LUCENE-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873465#action_12873465
]
Earwin Burrfoot commented on LUCENE-2480:
-
Wow! So fast! :)
bq. You di
[
https://issues.apache.org/jira/browse/LUCENE-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873324#action_12873324
]
Earwin Burrfoot commented on LUCENE-2481:
-
Still dislike that string key you
> I disagree about time limiting MS. It may not be useful in many cases,
> true. But I have a scenario in which machines are used to perform all
> sorts of tasks and the are windows in which I'm allowed to do 'heavy
> operations'.
>
> It's true I can just choose not to merge large segments, but I t
[
https://issues.apache.org/jira/browse/LUCENE-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Earwin Burrfoot updated LUCENE-2480:
Comment: was deleted
(was: Doing that now, plus some additions to Shai's patch)
>
[
https://issues.apache.org/jira/browse/LUCENE-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Earwin Burrfoot updated LUCENE-2480:
Attachment: LUCENE-2480.patch
Here we go. Pre-utf8/compressed fields support removed
[
https://issues.apache.org/jira/browse/LUCENE-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873292#action_12873292
]
Earwin Burrfoot commented on LUCENE-2480:
-
Doing that now, plus some addition
them together - MS would know of both ME and MP, and
> IW would interact w/ MS only. This I admit is something that just
> popped into my mind when writing this email, so perhaps it doesn't
> make a lot of sense and needs some more mulling.
>
> Maybe MS should be renamed to
We just need an Executor-based MS, then we can throw all other out of
the window, as threading concerns are now resolved by a proper choice
of Executor supplied to constructor.
Also an application has much more control over threading in
multiple-index situations, as single Executor can be reused fo
I wanted to do this for some time, so let's open an issue!
On Thu, May 27, 2010 at 19:13, Shai Erera wrote:
> Ok ... that was rather fast and short !
>
> So regarding trunk, is SegmentInfos the only place to look in? Can you give
> me more pointers? I'd like to create an issue for that (not sure
I wonder, what's going to be the relationship between this and Lucy?
Also, how do both of them compare to Sphinx?
2010/5/27 Itamar Syn-Hershko :
> Ryan, thanks. I understand, and obviously if the PMC will think the same
> this is what we'll be doing.
>
> Unfortunately, I haven't heard from the PMC
> The QP should work like that:
> (1) It parses the query, creating fragments
> (2) It does some out-of-the-box handling of those fragments
>
> People should be able to override that handling of fragments. But people
> should not touch (1).
In fact QP should work like that:
(1) Tokenizer parses th
[
https://issues.apache.org/jira/browse/LUCENE-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869813#action_12869813
]
Earwin Burrfoot commented on LUCENE-2355:
-
* Norms are now in fact loaded upf
[
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869595#action_12869595
]
Earwin Burrfoot commented on LUCENE-2471:
-
I actually suggested separatin
[
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869579#action_12869579
]
Earwin Burrfoot commented on LUCENE-2471:
-
The only reason for keeping
[
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869537#action_12869537
]
Earwin Burrfoot commented on LUCENE-2471:
-
Ah. Actually there was two met
[
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869522#action_12869522
]
Earwin Burrfoot commented on LUCENE-2471:
-
Ahem. Why did you remove them? :)
[
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869504#action_12869504
]
Earwin Burrfoot commented on LUCENE-2471:
-
Bad link? The issue is closed alr
Supporting bulk copies in Directory
---
Key: LUCENE-2471
URL: https://issues.apache.org/jira/browse/LUCENE-2471
Project: Lucene - Java
Issue Type: Improvement
Reporter: Earwin Burrfoot
A method
[
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868604#action_12868604
]
Earwin Burrfoot commented on LUCENE-2468:
-
Reusing fieldCacheKey is probab
[
https://issues.apache.org/jira/browse/LUCENE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868571#action_12868571
]
Earwin Burrfoot commented on LUCENE-2468:
-
Or, you do it so various caches
[
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868268#action_12868268
]
Earwin Burrfoot commented on LUCENE-2454:
-
I think, here - LUCENE-
[
https://issues.apache.org/jira/browse/LUCENE-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868019#action_12868019
]
Earwin Burrfoot commented on LUCENE-2465:
-
My special use case requires accep
[
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866134#action_12866134
]
Earwin Burrfoot commented on LUCENE-2454:
-
Both things can be combined for
[
https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866133#action_12866133
]
Earwin Burrfoot commented on LUCENE-2454:
-
An alternate approach - there w
I've used something very similar to fold matching documents by some
field value, like author_id.
The very same issue with keeping all the parts in same segment, solved
with composite documents that go through all the pipeline and flushing
segments manually.
On Fri, May 7, 2010 at 20:25, mark harwo
[
https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865133#action_12865133
]
Earwin Burrfoot commented on LUCENE-2369:
-
FieldCache should move to beco
[
https://issues.apache.org/jira/browse/LUCENE-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864477#action_12864477
]
Earwin Burrfoot commented on LUCENE-2440:
-
+1
> Add support for
[
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864476#action_12864476
]
Earwin Burrfoot commented on LUCENE-2447:
-
I think it was LUCENE-2041?
But
[
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864460#action_12864460
]
Earwin Burrfoot commented on LUCENE-2447:
-
I think there was a recent issue a
[
https://issues.apache.org/jira/browse/LUCENE-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864448#action_12864448
]
Earwin Burrfoot commented on LUCENE-2447:
-
Is creating MultiSearcher/MultiRe
[
https://issues.apache.org/jira/browse/LUCENE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863744#action_12863744
]
Earwin Burrfoot commented on LUCENE-2439:
-
I got sidetracked for some time,
I believe Uwe tagged trunk just before flex landing.
On Fri, Apr 30, 2010 at 23:00, Shai Erera wrote:
> I would take the last rev just before the first rev when flex landed,
> so that in includes as much as possible. I don't know though which rev
> is it and whether it was before/after the lucene
The best way to match documents that have no values for a specific
field, is to have a special term in that (or another) field, that you
add to the index when, well, a document has no values for that field.
Let's call this term - NULL. You then directly match on it with a
TermFilter/Query.
With you
I think we should enhance SDP.
I also think we shouldn't do IDs. snapshot() returns IndexCommitPoint,
release() should get a parameter accepting IndexCommitPoint, that's
all.
On Tue, Apr 27, 2010 at 18:54, Michael McCandless
wrote:
> This would be great!
>
> I think we should just enhance SDP rat
We use ANTLR for query parsing. Works good for the lazy guys :)
On Tue, Apr 27, 2010 at 06:17, Tavi Nathanson wrote:
> Hey everyone,
>
> My organization uses our own homebrew QueryParser class, unrelated to
> Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently
> use anythi
I'd like to +1 on this with all my tiny non-committer might.
On Mon, Apr 26, 2010 at 23:06, Michael McCandless
wrote:
> This is exactly the intention behind the proposal we are voting on.
>
> Big changes, that'd be destabilizing if attempted on the stable
> branch, would be done only on unstable
> And, it's not the committer's job to port each little commit to stable
> over to the unstable branch. Instead, we periodically re-sync stable
> --> unstable, like we did with the long-lived flex branch.
>
> So, then, little would change on how stable is developed, today. And
> stable would stil
There's also place for alternate Directories, which can throw
readable-loggable exceptions without waiting for nio2.
On Fri, Apr 23, 2010 at 14:20, Michael McCandless
wrote:
> Deletion can conceivably fail for a number of interesting reasons :)
> File doesn't exist, permission is denied, file sys
> My main problem with devleoping new features on trunk first and then porting
> by adding backwards cruft is, that you first don’t care with backwards and
> then suddenly have to think about it. This may change the API on trunk
> again, to get nearer to backwards or maybe because a backwards layer
Shai. People are free to bash their brains out against back-compat on
a stable branch. IF they want.
If they don't want, they work on trunk. When stuff is ported from
stable to trunk, cruft is removed. When (if) stuff is ported from
trunk to stable, cruft is added.
The only point Mike's offer diff
Okay, let's live with parallel development, but make sure we 'always'
port things from stable to trunk, and 'always' remove possible
back-compat layers when doing such a port?
On Thu, Apr 22, 2010 at 18:04, Mark Miller wrote:
> I'd vote -1 on Shai's variation and +1 on Mike's proposal.
>
> I don'
+1 for developing in a single place (trunk) and backporting on on-demand basis.
The other points are fine.
On Wed, Apr 21, 2010 at 21:56, Shai Erera wrote:
> So basically, API-wise, the stable branch will remain like it is
> today: API changes under deprecation path, bw breaks as long as they
>
I believe the big part of the speedup is due to HPPC's ability to
mutate Map values inplace, doing a single key lookup instead of two?
On Wed, Apr 21, 2010 at 13:56, Dawid Weiss wrote:
> I have some cross-checks offline (fastutil, pcj, colt, trove). I
> didn't want to publish them because a great
Hmmm.. can anybody compare these to fastutil?
On Mon, Apr 19, 2010 at 20:44, Ted Dunning wrote:
>
> The cycle is closed.
>
> Lucene begat Mahout. Mahout incorporated Colt. Benson Margulies ripped
> apart the Colt collections to produce the Mahout collections. The Dawid
> Weisz picked up the ba
[
https://issues.apache.org/jira/browse/LUCENE-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858426#action_12858426
]
Earwin Burrfoot commented on LUCENE-2402:
-
Lets reuse IW.deleteUnusedFiles()
O, I don't really care myself (= won't notice it not showing up), I'm
on mercurial :)
Just pointed out the dead link.
On Mon, Apr 19, 2010 at 03:27, Jukka Zitting wrote:
> Hi,
>
> On Mon, Apr 19, 2010 at 1:18 AM, Earwin Burrfoot wrote:
>> Aha, missed that. Github
[
https://issues.apache.org/jira/browse/LUCENE-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858358#action_12858358
]
Earwin Burrfoot commented on LUCENE-2401:
-
I think at least I will hit
Aha, missed that. Github's not yet synced though, says - that page
doesn't exist!
On Mon, Apr 19, 2010 at 02:58, Lance Norskog wrote:
> Already done.
>
> http://github.com/apache/lucene-solr
>
> Search for git.apache.org for 'lucene' and you'll see.
>
201 - 267 of 267 matches
Mail list logo