Re: Moving towards Lucene 4.0

2011-05-19 Thread Earwin Burrfoot
On Thu, May 19, 2011 at 21:44, Chris Hostetter wrote: > > : I think we should focus on everything that's *infrastructure* in 4.0, so > : that we can develop additional features in subsequent 4.x releases. If we > : end up releasing 4.0 just to discover many things will need to wait to 5.0, > : it'

Re: FST and FieldCache?

2011-05-19 Thread Earwin Burrfoot
On Thu, May 19, 2011 at 20:43, Michael McCandless wrote: > On Thu, May 19, 2011 at 12:35 PM, Jason Rutherglen > wrote: >>> And I do agree there are times when mmap is appropriate, eg if query >>> latency is unimportant to you, but it's not a panacea and it comes >>> with serious downsides >> >> D

Re: FST and FieldCache?

2011-05-19 Thread Earwin Burrfoot
This is more about compressing strings in TermsIndex, I think. And ability to use said TermsIndex directly in some cases that required FieldCache before. (Maybe FC is still needed, but it can be degraded to docId->ord map, storing actual strings in TI). This yields fat space savings when we, eg, n

Re: FST and FieldCache?

2011-05-19 Thread Earwin Burrfoot
On Thu, May 19, 2011 at 16:45, Dawid Weiss wrote: > >> That's what I invented, and yes, it was invented by countless people >> before :) > You know I didn't mean to sound rude, right? I'm really admiring your > ability to come up with these solutions by yourself, I'm merely copying > other folks'

Re: FST and FieldCache?

2011-05-19 Thread Earwin Burrfoot
>> I think, if we add ord as an output to the FST, then it builds >> everything we need?  Ie no further data structures should be needed? >> Maybe I'm confused :) > > If you put the ord as an output the common part will be shifted towards the > front of the tree. This will work if you want to look

Re: FST and FieldCache?

2011-05-19 Thread Earwin Burrfoot
You cannot get a string out of automaton by its ordinal without storing additional data. The string is stored there not as a single arc, but as a sequence of them (basically.. err.. as a string), so referencing them is basically writing the string asis. Space savings here come from sharing arcs bet

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot
mailing list. > Thanks anyway. > > On Wed, May 18, 2011 at 6:32 PM, Earwin Burrfoot wrote: >> >> You aren't likely to encounter strings like "abc company inc" in >> Lucene index, as it will be tokenized into three tokens "abc", >> "c

Re: Lucene/Solr JIRA

2011-05-18 Thread Earwin Burrfoot
+1 to Chris. Even if the code is partially shared and project is the same, the end products are completely different. Merging lists/jira will force niche developers/users to manually sift through heaps of irrelevant emails/issues. On Thu, May 19, 2011 at 00:53, Chris Hostetter wrote: > > : just

Re: Fuzzy search always returning docs sorted by the highest match

2011-05-18 Thread Earwin Burrfoot
You aren't likely to encounter strings like "abc company inc" in Lucene index, as it will be tokenized into three tokens "abc", "company", "inc" under most Analyzers. So, for this exact example you don't even need fuzzy matching. Also, maybe you should try 'user' mailing list for questions regardi

[jira] [Commented] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

2011-05-17 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034640#comment-13034640 ] Earwin Burrfoot commented on LUCENE-3105: - Hmm.. Ok, it *is* still used,

[jira] [Commented] (LUCENE-3105) String.intern() calls slow down IndexWriter.close() and IndexReader.open() for index with large number of unique field names

2011-05-17 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034639#comment-13034639 ] Earwin Burrfoot commented on LUCENE-3105: - StringInterner is in fact faster

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-13 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033279#comment-13033279 ] Earwin Burrfoot commented on LUCENE-2793: - As mentioned @LUCENE-3092, it w

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-13 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032997#comment-13032997 ] Earwin Burrfoot commented on LUCENE-3092: - bq. The IOCtx should reference

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-13 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032989#comment-13032989 ] Earwin Burrfoot commented on LUCENE-3092: - bq. but I couldn't disagree

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-13 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032936#comment-13032936 ] Earwin Burrfoot commented on LUCENE-3092: - Chris, I don't like th

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032841#comment-13032841 ] Earwin Burrfoot commented on LUCENE-3092: - *highfive Uwe* was going to sug

[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be List not SegmentInfos

2011-05-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032099#comment-13032099 ] Earwin Burrfoot commented on LUCENE-3084: - bq. Merges are ordered Hmm..

[jira] [Commented] (LUCENE-3084) MergePolicy.OneMerge.segments should be List not SegmentInfos

2011-05-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032046#comment-13032046 ] Earwin Burrfoot commented on LUCENE-3084: - * Speaking logically, merges ope

[jira] [Commented] (LUCENE-3077) DWPT doesn't see changes to DW#infoStream

2011-05-06 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029881#comment-13029881 ] Earwin Burrfoot commented on LUCENE-3077: - We should just make it f

[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-05 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029421#comment-13029421 ] Earwin Burrfoot commented on LUCENE-3065: - It's sad NumericFields are

[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running

2011-05-05 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029408#comment-13029408 ] Earwin Burrfoot commented on LUCENE-2904: - Ok, I'm wrong. We need both

[jira] [Commented] (LUCENE-2904) non-contiguous LogMergePolicy should be careful to not select merges already running

2011-05-05 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029403#comment-13029403 ] Earwin Burrfoot commented on LUCENE-2904: - I think we should simply change

Re: I was accepted in GSoC!!!

2011-05-05 Thread Earwin Burrfoot
By the way, guys. LuSolr SVN repository is mirrored @ git://git.apache.org/lucene-solr.git , which is in turn mirrored @ https://github.com/apache/lucene-solr . Working with git (maybe with stgit) is easier than juggling patches by hand. On Wed, May 4, 2011 at 15:00, David Nemeskey wrote: > Hi Uw

Re: MergePolicy Thresholds

2011-05-02 Thread Earwin Burrfoot
e may be. And this is exactly what I want. And defining max cap on segment size is not what I want. So the same set of knobs can be intuitive and meaningful for one person, and useless for another. And you can't pick the "best" one. > Will BalancedMP stop merging such seg

Re: MergePolicy Thresholds

2011-05-02 Thread Earwin Burrfoot
f > those parameters? If so, then we only need two thresholds (size + > mergeFactor), and we can reuse BalancedMP's findBalancedMerges logic > (perhaps w/ some adaptations) to derive a merge plan. > > Shai > > On Mon, May 2, 2011 at 4:42 PM, Earwin Burrfoot wrote: >&g

Re: MergePolicy Thresholds

2011-05-02 Thread Earwin Burrfoot
Have you checked BalancedSegmentMergePolicy? It has some more knobs :) On Mon, May 2, 2011 at 17:03, Shai Erera wrote: > Hi > > Today, LogMP allows you to set different thresholds for segments sizes, > thereby allowing you to control the largest segment that will be > considered for merge + the l

[jira] [Commented] (LUCENE-3061) Open IndexWriter API to allow custom MergeScheduler implementation

2011-05-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027626#comment-13027626 ] Earwin Burrfoot commented on LUCENE-3061: - Mark these as @experimental? &g

[jira] [Issue Comment Edited] (LUCENE-3041) Support Query Visting / Walking

2011-05-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027612#comment-13027612 ] Earwin Burrfoot edited comment on LUCENE-3041 at 5/2/11 10:3

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2011-05-02 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027612#comment-13027612 ] Earwin Burrfoot commented on LUCENE-3041: - The static cache is now

Re: Setting the max number of merge threads across IndexWriters

2011-05-01 Thread Earwin Burrfoot
t;> required. >> Then, instead of trying to factor out IW members from this MS, you could >> share the same ES with all MS instances, each will keep a reference to a >> different IW member. This is just a thought though, I haven't tried it. >> Shai >> >> On Thu

Re: Setting the max number of merge threads across IndexWriters

2011-05-01 Thread Earwin Burrfoot
(e.g., stalling), but some juggling will be > required. > Then, instead of trying to factor out IW members from this MS, you could > share the same ES with all MS instances, each will keep a reference to a > different IW member. This is just a thought though, I haven't tried it. >

[jira] [Commented] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

2011-04-30 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027361#comment-13027361 ] Earwin Burrfoot commented on LUCENE-3055: - Could anyone remind me, why the

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2011-04-29 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027080#comment-13027080 ] Earwin Burrfoot commented on LUCENE-3041: - I vehemently oppose introducing

[jira] [Commented] (LUCENE-2571) Indexing performance tests with realtime branch

2011-04-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020217#comment-13020217 ] Earwin Burrfoot commented on LUCENE-2571: - bq. Merges are NOT blocking inde

Re: Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Earwin Burrfoot
Can't remember. Probably no. I started an experimental MS api rewrite (incorporating ability to share MSs between IWs) some time ago, but never had the time to finish it. On Thu, Apr 14, 2011 at 19:56, Simon Willnauer wrote: > On Thu, Apr 14, 2011 at 5:52 PM, Earwin Burrfoot wro

Re: Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Earwin Burrfoot
I proposed to decouple MergeScheduler from IW (stop keeping a reference to it). Then you can create a single CMS and pass it to all your IWs. On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen wrote: > I think the proposal involved using a ThreadPoolExecutor, which seemed > to not quite work as well

An IDF variation with penalty for very rare terms

2011-04-12 Thread Earwin Burrfoot
Excuse me for somewhat of an offtopic, but have anybody ever seen/used -subj- ? Something that looks like like http://dl.dropbox.com/u/920413/IDFplusplus.png Traditional log(N/x) tail, but when nearing zero freq, instead of going to +inf you do a nice round bump (with controlled height/location/sha

Re: Numerical ids for terms?

2011-04-12 Thread Earwin Burrfoot
On Tue, Apr 12, 2011 at 13:41, Gregor Heinrich wrote: > Hi -- has there been any effort to create a numerical representation of > Lucene indices. That is, to use the Lucene Directory backend as a large > term-document matrix at index level. As this would require bijective mapping > between terms (

Re: character escapes in source? ... was: Re: Eclipse: Invalid character constant

2011-04-07 Thread Earwin Burrfoot
On Fri, Apr 8, 2011 at 03:01, Robert Muir wrote: > On Thu, Apr 7, 2011 at 6:48 PM, Chris Hostetter > wrote: >> >> : -1. These files should be readable, for maintaining, debugging and >> : knowing whats going on. >> >> Readability is my main concern ... i don't know (and frequently can't >> tell)

Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Earwin Burrfoot
On Thu, Apr 7, 2011 at 01:11, Robert Muir wrote: > On Wed, Apr 6, 2011 at 5:07 PM, Earwin Burrfoot wrote: >> >> Handling Unicode code points outside of BMP is highly expert stuff as >> well. And is totally unneeded by 80% of the users for any other reason >> except

Re: [POLL] JTS compile/test dependency

2011-04-06 Thread Earwin Burrfoot
On Wed, Apr 6, 2011 at 22:43, Robert Muir wrote: > On Wed, Apr 6, 2011 at 2:12 PM, Ryan McKinley wrote: >> Some may be following the thread on spatial development...  here is a >> quick summary, and a poll to help decide what may be the best next >> move. >> >> I'm hoping to introduce a high leve

[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-31 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014108#comment-13014108 ] Earwin Burrfoot commented on LUCENE-2981: - Bye-bye, DB. Few things can com

Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Earwin Burrfoot
On Tue, Mar 22, 2011 at 06:21, Chris Hostetter wrote: > > (replying to the dev list, see context below) > > : Unfortunately, you can't easily recover from this (except by > : reindexing your docs again). > : > : Failing to call IW.commit() or IW.close() means no segments file was > written... > >

Re: IndexReader.indexExists declares throwing IOE, but never does

2011-03-21 Thread Earwin Burrfoot
e.exists() parallel is a good one. So, maybe, it's ok ) >> Otherwise please keep the throws declaration so that you won't break >> public APIs if this changes implementation. > > Removing the throws declaration doesn't break apps. In the worse case, > they'll hav

Re: IndexReader.indexExists declares throwing IOE, but never does

2011-03-21 Thread Earwin Burrfoot
Technically, there's a big difference between "I checked, and there was no index", and "I was unable to check the disk because file system went BANG!". So the proper behaviour is to return false & IOE (on proper occasion)? On Mon, Mar 21, 2011 at 13:53, Michael McCandless wrote: > On Mon, Mar 21,

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007136#comment-13007136 ] Earwin Burrfoot commented on LUCENE-2960: - You avoid deprecation/undepreca

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007048#comment-13007048 ] Earwin Burrfoot commented on LUCENE-2960: - bq. Oh yeah. But then we'd

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-14 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006759#comment-13006759 ] Earwin Burrfoot commented on LUCENE-2960: - bq. infoStream is a PrintSt

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-13 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006227#comment-13006227 ] Earwin Burrfoot commented on LUCENE-2960: - {quote} Why such purity? What d

Re: GPU acceleration

2011-03-13 Thread Earwin Burrfoot
On Sun, Mar 13, 2011 at 00:15, Ken O'Brien wrote: > To clarify, I've not yet written any code. I aim to bring a large speedup to > any functionality that is computationally expensive. I'm wondering which > components are candidates for this. > > I'll be looking through the code but if anyone is aw

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005891#comment-13005891 ] Earwin Burrfoot commented on LUCENE-2960: - bq. Furthermore, closing the IW

Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-11 Thread Earwin Burrfoot
ime) settings, then I > don't think this issue should block 3.1? We can anyway add other runtime > settings following 3.1, and we won't undeprecate anything. So maybe mark > that issue as a non-blocker? > > Shai > > On Fri, Mar 11, 2011 at 2:20 PM, Earwin Burrfoo

[jira] Commented: (LUCENE-2960) Allow (or bring back) the ability to setRAMBufferSizeMB on an open IndexWriter

2011-03-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005617#comment-13005617 ] Earwin Burrfoot commented on LUCENE-2960: - As I said on the list - if one n

Re: IndexWriter#setRAMBufferSizeMB removed in trunk

2011-03-11 Thread Earwin Burrfoot
Is it really that hard to recreate IndexWriter if you have to change the settings?? Yeah, yeah, you lose all your precious reused buffers, and maybe there's a small indexing latency spike, when switching from old IW to new one, but people aren't changing their IW configs several times a second? I

[jira] Commented: (LUCENE-2908) clean up serialization in the codebase

2011-02-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994769#comment-12994769 ] Earwin Burrfoot commented on LUCENE-2908: - Oh, damn :) On my project

Re: [REINDEX] Note: re-indexing required !

2011-02-07 Thread Earwin Burrfoot
Lucene maintains compatibility with earlier stable release index versions, and to some extent transparently upgrades them. But there is no guaranteed compatibility between different in-development indexes. E.g. 3.2 reads 3.1 indexes and upgrades them, but 3.2-dev-snapshot-10 (while happily handlin

[jira] Commented: (LUCENE-2871) Use FileChannel in FSDirectory

2011-01-20 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984222#action_12984222 ] Earwin Burrfoot commented on LUCENE-2871: - Before arguing where to put this

Re: Let's drop Maven Artifacts !

2011-01-18 Thread Earwin Burrfoot
On Tue, Jan 18, 2011 at 20:13, Robert Muir wrote: >>> Unfortunately there is a very loud minority that care about maven >> >> I would wager that there is a sizable silent *majority* of users who >> literally depend on Lucene's Maven artifacts. > > I can't help but remind myself, this is the same

Re: Let's drop Maven Artifacts !

2011-01-18 Thread Earwin Burrfoot
On Tue, Jan 18, 2011 at 17:00, Robert Muir wrote: > On Tue, Jan 18, 2011 at 8:54 AM, Grant Ingersoll wrote: >> It seems to me that if we have a fix for the things that ail our Maven >> support (Steve's work), that it isn't then the reason for holding up a >> release and we should just keep them

[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-18 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983162#action_12983162 ] Earwin Burrfoot commented on LUCENE-2657: - Thanks, but I'm not the one

[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-18 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983160#action_12983160 ] Earwin Burrfoot commented on LUCENE-2657: - bq. we need to be very clear

[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-18 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983152#action_12983152 ] Earwin Burrfoot commented on LUCENE-2657: - I am *amazed* at how idea/eclipse

Re: Let's drop Maven Artifacts !

2011-01-18 Thread Earwin Burrfoot
Somehow, they were made available since 2.0 -> http://repo2.maven.org/maven2/org/apache/lucene/lucene-core/ The pom's are minimal, sans dependencies, so eg if your project depends on lucene-spellchecker, lucene-core won't be transitively included and your build is gonna fail (you therefore had to

Re: Let's drop Maven Artifacts !

2011-01-17 Thread Earwin Burrfoot
You're not alone. :) But, I bet, much more people would like to skip that step and have their artifacts downloaded from central. On Mon, Jan 17, 2011 at 19:06, Steven A Rowe wrote: > On 1/17/2011 at 1:53 AM, Michael Busch wrote: >> I don't think any user needs the ability to run an ant target on

[jira] Commented: (LUCENE-2755) Some improvements to CMS

2011-01-17 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982564#action_12982564 ] Earwin Burrfoot commented on LUCENE-2755: - bq. if you still want to work o

[jira] Commented: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl

2011-01-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982440#action_12982440 ] Earwin Burrfoot commented on LUCENE-2374: - Another step in the same direc

[jira] Commented: (LUCENE-2374) Add introspection API to AttributeSource/AttributeImpl

2011-01-16 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982437#action_12982437 ] Earwin Burrfoot commented on LUCENE-2374: - Nice. Except maybe introduce a si

Re: Let's drop Maven Artifacts !

2011-01-16 Thread Earwin Burrfoot
Maven is a defacto package/dependency manager for Java. Like it or not. All "better" tools out there, like Ant+Ivy, or SBT - support Maven repositories. Lots of people rely on Maven or "better" tools for their builds and as soon as you're on declarative dependency management train, it's a bother to

[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982166#action_12982166 ] Earwin Burrfoot commented on LUCENE-2858: - APIs have to be there still. All

[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982132#action_12982132 ] Earwin Burrfoot commented on LUCENE-2858: - bq. Still, i think we would need

[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-15 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982126#action_12982126 ] Earwin Burrfoot commented on LUCENE-2858: - bq. Any comments about removing w

[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-14 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981774#action_12981774 ] Earwin Burrfoot commented on LUCENE-2868: - We here use an intermediate query

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-13 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981388#action_12981388 ] Earwin Burrfoot commented on LUCENE-2324: - Maan, this comment list is infi

[jira] Commented: (LUCENE-2863) Updating a documenting looses its fields that only indexed, also NumericField tries are completely lost

2011-01-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980965#action_12980965 ] Earwin Burrfoot commented on LUCENE-2863: - updateDocument() is an atomic ver

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980736#action_12980736 ] Earwin Burrfoot commented on LUCENE-2793: - {quote} As I said before thoug

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980732#action_12980732 ] Earwin Burrfoot commented on LUCENE-2793: - bq. Because in your example code a

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980649#action_12980649 ] Earwin Burrfoot commented on LUCENE-2793: - What's with ongoing crazynes

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980458#action_12980458 ] Earwin Burrfoot commented on LUCENE-2793: - In fact, I suggest dropping buffer

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980454#action_12980454 ] Earwin Burrfoot commented on LUCENE-2793: - {quote} bq. You get IOFactory

[jira] Commented: (LUCENE-2856) Create IndexWriter event listener, specifically for merges

2011-01-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980448#action_12980448 ] Earwin Burrfoot commented on LUCENE-2856: - A SegmentListener that has a numbe

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980400#action_12980400 ] Earwin Burrfoot commented on LUCENE-2793: - Looks crazy. In a -bad- tangled

[jira] Commented: (LUCENE-2856) Create IndexWriter event listener, specifically for merges

2011-01-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980390#action_12980390 ] Earwin Burrfoot commented on LUCENE-2856: - A CompositeSegmentListener nif

[jira] Commented: (LUCENE-2858) Separate SegmentReaders (and other atomic readers) from composite IndexReaders

2011-01-11 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980388#action_12980388 ] Earwin Burrfoot commented on LUCENE-2858: - bq. On the other side, atomic rea

[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979888#action_12979888 ] Earwin Burrfoot commented on LUCENE-2474: - bq. Earwin's working on impro

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-01-10 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979522#action_12979522 ] Earwin Burrfoot commented on LUCENE-2312: - Some questions to align myself

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979366#action_12979366 ] Earwin Burrfoot commented on LUCENE-2843: - bq. Nope, havent looked at their

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979346#action_12979346 ] Earwin Burrfoot commented on LUCENE-2843: - bq. I don't like the reaso

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2011-01-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979306#action_12979306 ] Earwin Burrfoot commented on LUCENE-2840: - A lot of fork-join type framew

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979305#action_12979305 ] Earwin Burrfoot commented on LUCENE-2843: - As I said, there's already

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979277#action_12979277 ] Earwin Burrfoot commented on LUCENE-2843: - And we're nearing a day whe

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2011-01-09 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979276#action_12979276 ] Earwin Burrfoot commented on LUCENE-2840: - bq. But doesn't that mean th

Re: [jira] Commented: (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets

2011-01-08 Thread Earwin Burrfoot
On Mon, Jan 3, 2011 at 18:18, Yonik Seeley wrote: > On Thu, Nov 11, 2010 at 3:22 PM, Jan Høydahl / > Cominvent wrote: >> The problem with large "start" is probably worse when sharding is involved. >> Anyone know how the shard component goes about fetching >> start=100&rows=10 from say 10 sh

Re: strange problem of PForDelta decoder

2010-12-30 Thread Earwin Burrfoot
>>>until we fix Lucene to run a single search concurrently (which we >>>badly need to do). > I am interested in this idea.(I have posted it before) do you have some > resources such as papers or tech articles about it? > I have tried but it need to modify index format dramatically and we use > solr

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2010-12-30 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976027#action_12976027 ] Earwin Burrfoot commented on LUCENE-2840: - I use the following scheme: * Ther

Re: is the classes ended with PerThread(*PerThread) multithread

2010-12-28 Thread Earwin Burrfoot
There is a single indexchain, with a single instance of each chain component, except those ending in -PerThread. Though that's gonna change with https://issues.apache.org/jira/browse/LUCENE-2324 On Tue, Dec 28, 2010 at 13:10, Simon Willnauer wrote: > On Tue, Dec 28, 2010 at 10:57 AM, xu cheng w

[jira] Commented: (LUCENE-2825) FSDirectory.open should return MMap on 64-bit Solaris

2010-12-26 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975212#action_12975212 ] Earwin Burrfoot commented on LUCENE-2825: - {quote} bq. CPU cache / TLB eff

Re: LuceneTestCase.threadCleanup incorrectly reports left running threads

2010-12-25 Thread Earwin Burrfoot
I think we can take some public algos like lookup3 / murmurhash2/3, and stuff them into Lucene utils. Java implementations (very simple and fast ones) exist for both of these. I.e. lookup3 done by Yonik (http://people.apache.org/~yonik/code/hash/), murmurhash2 - by Andrzej Bialecki ( http://www.g

Re: RT branch status

2010-12-22 Thread Earwin Burrfoot
Cool! I'm getting to this on a weekend. On Tue, Dec 21, 2010 at 11:44, Michael Busch wrote: > After merging trunk into the RT branch it's finally compiling again and > up-to-date. > > Several tests are failing now after the merge (43 out of 1427 are failing), > which is not too surprising, becaus

[jira] Commented: (LUCENE-2829) improve termquery "pk lookup" performance

2010-12-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974350#action_12974350 ] Earwin Burrfoot commented on LUCENE-2829: - Nobody halts your progress, w

[jira] Commented: (LUCENE-2829) improve termquery "pk lookup" performance

2010-12-22 Thread Earwin Burrfoot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974274#action_12974274 ] Earwin Burrfoot commented on LUCENE-2829: - Term lookup misses can be allevi

  1   2   3   >