[jira] Commented: (SOLR-1395) Integrate Katta

2011-01-10 Thread JohnWu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979502#action_12979502
 ] 

JohnWu commented on SOLR-1395:
--

TomLiu:

in slave node the katta.node.properties also set as follows?

#node.server.class=net.sf.katta.lib.lucene.LuceneServer
node.server.class=org.apache.solr.katta.DeployableSolrKattaServer


 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Plugin Idea: Index configuration at runtime?

2011-01-10 Thread Tanguy
Hi all,

I'd like to contribute to Apache Solr with a small plugin, but I'd like to
have your opinion first.

In my project, the user wants to configure the indexs on web app side, not
in solrconfig.xml. For example, the user can select an Ignore case option
for the index myindex. This way, at indexing or querying time, a Lowercase
filter must be applied.

My idea was to developp a SolrPlugin that will use configurations keys sent
at runtime with the string to index or query. Based on the configurations
keys, the plugin will apply the good tokenizer/filters:

{lowercase=true,stopwords={this,the,to,is}}This is the string to index!

At index or query time, the solrplugin will apply the filters, the string
becomes string index

What do you think about such a plugin?

Thanks,
Tanguy


[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-01-10 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979522#action_12979522
 ] 

Earwin Burrfoot commented on LUCENE-2312:
-

Some questions to align myself with impending reality.

Is that right that future RT readers are no longer immutable snapshots (in a 
sense that they have variable maxDoc)?
If it is so, are you keeping current NRT mode, with fast turnaround, yet 
immutable readers?

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene-trunk - Build # 1421 - Failure

2011-01-10 Thread Michael McCandless
OK I snatched this index and indeed I can reproduce the OOME during
CheckIndex!  The hunt begins :)

Mike

On Sun, Jan 9, 2011 at 10:35 PM, Robert Muir rcm...@gmail.com wrote:
 On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server
 hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

 Error Message:
 CheckIndex failed

 maybe this is specific to pulsing? I noticed its failed 3 times with
 this identical pulsing stacktrace:
 Lucene-trunk/1421, tests-only/3590, tests-only/3570

 However, this time it failed in a nightly build (perhaps the indexes
 are still available on the hudson machine if we salvage before the
 next nightly build?)
 it should be under lucene/build/test/N/jrecrashXXtmp/

 all 3 times the stacktrace is:
 test: terms, freq, prox...ERROR [Java heap space]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189)
        at 
 org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515)
        at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489)
        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83)
        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2296) Upgrade Carrot2 binaries to version 3.4.2

2011-01-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2296.
--

Resolution: Fixed

Thanks, I committed 1.5 jar file.

trunk: Committed revision 1057149.
3x: Committed revision 1057150.

Unfortunately, I think 3.4.2 jar doesn't solve SOLR-2282 test issue (see above 
error on trunk) but I'm marking this issue as resolved.
I think I look at the test error in SOLR-2282.

 Upgrade Carrot2 binaries to version 3.4.2
 -

 Key: SOLR-2296
 URL: https://issues.apache.org/jira/browse/SOLR-2296
 Project: Solr
  Issue Type: Task
  Components: contrib - Clustering
Reporter: Stanislaw Osinski
Assignee: Koji Sekiguchi
 Fix For: 3.1, 4.0

 Attachments: carrot2-core-3.4.2-jdk1.5.jar, carrot2-core-3.4.2.jar, 
 SOLR-2296-branch_3.1.patch, SOLR-2296-trunk.patch


 Version 3.4.2 fixes a concurrency bug in Carrot2 that may be causing 
 SOLR-2282. I'll attach patches in a minute.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Intermitted Failure on TestFST

2011-01-10 Thread Michael McCandless
I too cannot repro with this seed (separately: not good!).  I started
up a while(1) on beast but so far no failure... h.

Mike

On Mon, Jan 10, 2011 at 5:37 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 I just ran into this while working on LUCENE-2694. I can not reproduce
 this so far but I don't think this is related to my changes though.

 [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
    [junit] Testcase:
 testRealTerms(org.apache.lucene.util.automaton.fst.TestFSTs):   FAILED
    [junit] expected:59507 but was:16982
    [junit] junit.framework.AssertionFailedError: expected:59507 but
 was:16982
    [junit]     at
 org.apache.lucene.util.automaton.fst.TestFSTs.assertSame(TestFSTs.java:1058)
    [junit]     at
 org.apache.lucene.util.automaton.fst.TestFSTs.testRealTerms(TestFSTs.java:1018)
    [junit]     at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:)
    [junit]     at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049)
    [junit]
    [junit]
    [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 13.364 sec
    [junit]
    [junit] - Standard Error -
    [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
    [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs
 -Dtestmethod=testRealTerms -Dtests.seed=6383451564727439626:0
    [junit] NOTE: test params are: codec=RandomCodecProvider:
 {id=MockSep, body=MockRandom, title=MockRandom,
 titleTokenized=Standard, date=MockSep}, locale=ar,
 timezone=America/Argentina/Tucuman
    [junit] NOTE: all tests run in this JVM:
    [junit] [TestToken, TestAddIndexes, TestCrash, TestFlex,
 TestIndexWriter, TestIndexWriterOnJRECrash, TestMultiReader,
 TestOmitTf, TestPersistentSnapshotDeletionPolicy,
 TestSnapshotDeletionPolicy, TestTransactionRollback,
 TestBooleanScorer, TestDateSort, TestFieldCacheTermsFilter,
 TestMultiTermQueryRewrites, TestPositionIncrement, TestRegexpQuery,
 TestSimpleExplanations, TestTermScorer, TestDocValues,
 TestFieldMaskingSpanQuery, TestSpansAdvanced, TestBufferedIndexInput,
 TestRAMDirectory, TestArrayUtil, TestFieldCacheSanityChecker,
 TestSmallFloat, TestFSTs]

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene-trunk - Build # 1421 - Failure

2011-01-10 Thread Michael McCandless
OK, so this looks be caused by 1) the fact that we are indexing Greek
stop words with the TestNRTThreads test, and 2) Pulsing codec is
horribly inefficient in how it handles pulsed terms that have many
many positions.

But it's odd we've only hit this failure in the JRE crash test... I've
added a CheckIndex to TestNRTThreads itself and I'll see if that too
can provoke an OOME.  It should.

Mike

On Mon, Jan 10, 2011 at 5:14 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 OK I snatched this index and indeed I can reproduce the OOME during
 CheckIndex!  The hunt begins :)

 Mike

 On Sun, Jan 9, 2011 at 10:35 PM, Robert Muir rcm...@gmail.com wrote:
 On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server
 hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

 Error Message:
 CheckIndex failed

 maybe this is specific to pulsing? I noticed its failed 3 times with
 this identical pulsing stacktrace:
 Lucene-trunk/1421, tests-only/3590, tests-only/3570

 However, this time it failed in a nightly build (perhaps the indexes
 are still available on the hudson machine if we salvage before the
 next nightly build?)
 it should be under lucene/build/test/N/jrecrashXXtmp/

 all 3 times the stacktrace is:
 test: terms, freq, prox...ERROR [Java heap space]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189)
        at 
 org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515)
        at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489)
        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83)
        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2011-01-10 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2694:


Attachment: LUCENE-2694.patch

Another iteration on this after LUCENE-2831 was committed last week. 

 - updated to trunk  all test pass
- re-added all ord() related stuff back to TermsEnum since I think we should 
decouple this and solve it in a different issue. There is already enough 
changes in here and discussions should be focused on making MTQ single pass.
- Changed IndexSearcher to run concurrent searches on a leaf slice rather 
than on a leaf converted to a Top-Level Context. That made the callables a bit 
simpler and is more consistent since the hierarchy is preserved.
- TermState is now referenced by leaf ordinal and asserted using the leaf's 
top-level ctx.
- TermQuery is not single pass for all queries while state is only hold in 
Weight unless PerReaderTermState as not set. But even then the top-level ctx 
must be identical to the given IS's top-level ctx otherwise the give 
PerReaderTermState is not used.

this one seems pretty close 

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Consolidating FunctionQuery

2011-01-10 Thread Doron Cohen
+1 for having a single function query - actually this is what LUCENE-1081
and SOLR-192 is about. I'd look at this after LUCENE-1812, but this is
waiting so long now, please go ahead with this, I'll follow/join.

-0 for moving function to modules - I think this is used as core capability
by many applications/users and I don't see why it should be in modules, as
to me this is more like .. I was going to say like payloads but this is
not true, it is not nearly as involved and as internal as payloads, so
replaced -1 with -0 here, but, could you explain why move this from core
to modules?

Doron

On Mon, Jan 10, 2011 at 1:03 PM, Chris Male gento...@gmail.com wrote:

 +1 to this idea.

 I recall talking to Robert and Mark about it as a good first step as part
 of the spatial code consolidation as well.


 On Mon, Jan 10, 2011 at 11:59 PM, Simon Willnauer 
 simon.willna...@googlemail.com wrote:

 hey,

 today I came across function query in lucene and that reminded me that
 Solr is already using its own derived version which is no good IMO. We
 should try to consolidate the two version and make solr use the
 consolidated version which would even be good for lucene users. It
 seems it would make lots of sense to make the entire function query
 stuff a module and drop it from core in 4.0. I didn't look too close
 into the solr version but it seems to be not that hard to luceneify it
 and move it to modules, thoughts?

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Chris Male | Software Developer | JTeam BV.| www.jteam.nl



[jira] Updated: (SOLR-2310) DocBuilder's getTimeElapsedSince Error

2011-01-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2310:
-

 Priority: Trivial  (was: Major)
Affects Version/s: (was: 4.0)
   1.3
   1.4
   1.4.1
Fix Version/s: 4.0
   3.1

 DocBuilder's getTimeElapsedSince Error
 --

 Key: SOLR-2310
 URL: https://issues.apache.org/jira/browse/SOLR-2310
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4, 1.4.1
 Environment: JDK1.6
Reporter: tom liu
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1, 4.0


 i has a job which runs about 65 hours, but the dataimport?command=status http 
 requests returns 5 hours.
 in getTimeElapsedSince method of DocBuilder:
 {noformat} 
 static String getTimeElapsedSince(long l) {
 l = System.currentTimeMillis() - l;
 return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000)
 % 60 + . + l % 1000;
   }
 {noformat} 
 the hours Compute is wrong, it mould be :
 {noformat} 
 static String getTimeElapsedSince(long l) {
 l = System.currentTimeMillis() - l;
 return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000)
 % 60 + . + l % 1000;
   }
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979553#action_12979553
 ] 

Michael McCandless commented on LUCENE-2843:


bq. If a system is forced to swap out, it'll swap your explicitly managed RAM 
just as likely as memory-mapped files.

In fact, even if it's not under any real memory pressure the OS will swap out 
your not-recently-accessed RAM.  Net/net this is a good policy, if your metric 
is total throughput accomplished by all programs.

But if your metric is latency to search queries, this is an awful policy.

Fortunately OSs (at least Windows  Linux) give you some tunability here.  
Unfortunately, the tunable is global and it defaults badly for those programs 
that do make a careful distinction b/w what data structures are best held in 
RAM and what data is best left on disk.

If I could I would offer an option to pin these pages, so the OS cannot swap 
them out, but I don't think we can do (easily) that from javaland (and I think 
you'd have to be root).  Lacking pinning the best (approximation) we can do is 
pull these ourselves into RAM.

 Add variable-gap terms index impl.
 --

 Key: LUCENE-2843
 URL: https://issues.apache.org/jira/browse/LUCENE-2843
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2843.patch, LUCENE-2843.patch


 PrefixCodedTermsReader/Writer (used by all real core codecs) already
 supports pluggable terms index impls.
 The only impl we have now is FixedGapTermsIndexReader/Writer, which
 picks every Nth (default 32) term and holds it in efficient packed
 int/byte arrays in RAM.  This is already an enormous improvement (RAM
 reduction, init time) over 3.x.
 This patch adds another impl, VariableGapTermsIndexReader/Writer,
 which lets you specify an arbitrary IndexTermSelector to pick which
 terms are indexed, and then uses an FST to hold the indexed terms.
 This is typically even more memory efficient than packed int/byte
 arrays, though, it does not support ord() so it's not quite a fair
 comparison.
 I had to relax the terms index plugin api for
 PrefixCodedTermsReader/Writer to not assume that the terms index impl
 supports ord.
 I also did some cleanup of the FST/FSTEnum APIs and impls, and broke
 out separate seekCeil and seekFloor in FSTEnum.  Eg we need seekFloor
 when the FST is used as a terms index but seekCeil when it's holding
 all terms in the index (ie which SimpleText uses FSTs for).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979554#action_12979554
 ] 

Michael McCandless commented on LUCENE-2312:


I believe the goal for RT readers is still point in time reader semantics.

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979558#action_12979558
 ] 

Michael McCandless commented on LUCENE-2324:


I think on commit if we hit an aborting exception flushing a given DWPT, we 
throw it then  there.

Any segs already flushed remain flushed (but not committed).  Any segs not yet 
flushed remain not yet flushed...

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Consolidating FunctionQuery

2011-01-10 Thread Simon Willnauer
On Mon, Jan 10, 2011 at 12:56 PM, Doron Cohen cdor...@gmail.com wrote:
 +1 for having a single function query - actually this is what LUCENE-1081
 and SOLR-192 is about. I'd look at this after LUCENE-1812, but this is
 waiting so long now, please go ahead with this, I'll follow/join.

good stuff...

 -0 for moving function to modules - I think this is used as core capability
 by many applications/users and I don't see why it should be in modules, as
 to me this is more like .. I was going to say like payloads but this is
 not true, it is not nearly as involved and as internal as payloads, so
 replaced -1 with -0 here, but, could you explain why move this from core
 to modules?

So moving stuff to modules has several advantages compared to trunk. I
think we all agree that we want lucene to be able to make use of this
functinality right?! So if we keep it in core we can either try to
make it right this time or we stick with bw compat for a long time
again risking that the same thing happens again and solr changes stuff
in incompatible ways. While this is a minor risk, I see FQ not really
a core part of lucene but rather something that is similar to
Analysis, a package with a small base interface (which could stay in
lucene core) and a large useful impl base like all the funcs in solr.
So it seems to me that modules is the right place for at least those
implemenations.

does that make sense?

 Doron

 On Mon, Jan 10, 2011 at 1:03 PM, Chris Male gento...@gmail.com wrote:

 +1 to this idea.
 I recall talking to Robert and Mark about it as a good first step as part
 of the spatial code consolidation as well.

 On Mon, Jan 10, 2011 at 11:59 PM, Simon Willnauer
 simon.willna...@googlemail.com wrote:

 hey,

 today I came across function query in lucene and that reminded me that
 Solr is already using its own derived version which is no good IMO. We
 should try to consolidate the two version and make solr use the
 consolidated version which would even be good for lucene users. It
 seems it would make lots of sense to make the entire function query
 stuff a module and drop it from core in 4.0. I didn't look too close
 into the solr version but it seems to be not that hard to luceneify it
 and move it to modules, thoughts?

 simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Chris Male | Software Developer | JTeam BV.| www.jteam.nl



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2831) Revise Weight#scorer Filter#getDocIdSet API to pass Readers context

2011-01-10 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979561#action_12979561
 ] 

Simon Willnauer commented on LUCENE-2831:
-

bq. And also Collector?
yeah I think that one can move to ARC too.

{code}
  ValueSource#getValues(IndexReader)
{code}

is another one

 Revise Weight#scorer  Filter#getDocIdSet API to pass Readers context
 -

 Key: LUCENE-2831
 URL: https://issues.apache.org/jira/browse/LUCENE-2831
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch, 
 LUCENE-2831.patch, LUCENE-2831.patch, LUCENE-2831.patch


 Spinoff from LUCENE-2694 - instead of passing a reader into Weight#scorer(IR, 
 boolean, boolean) we should / could revise the API and pass in a struct that 
 has parent reader, sub reader, ord of that sub. The ord mapping plus the 
 context with its parent would make several issues way easier. See 
 LUCENE-2694, LUCENE-2348 and LUCENE-2829 to name some.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2011-01-10 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979562#action_12979562
 ] 

Robert Muir commented on LUCENE-2694:
-

One question, I'm look at the definition of TermState:
{noformat}
Holds all state required for {...@link TermsEnum} to produce a {...@link 
DocsEnum} without re-seeking the terms dict.
{noformat}

So why do we have seek(BytesRef, TermState)
shouldnt it just be seek(Termstate) ?
I think its confusing it takes an unnecessary bytes parameter.

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Consolidating FunctionQuery

2011-01-10 Thread Doron Cohen
On Mon, Jan 10, 2011 at 2:13 PM, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 So moving stuff to modules has several advantages compared to trunk. I
 think we all agree that we want lucene to be able to make use of this
 functinality right?! So if we keep it in core we can either try to
 make it right this time or we stick with bw compat for a long time
 again risking that the same thing happens again and solr changes stuff
 in incompatible ways. While this is a minor risk, I see FQ not really
 a core part of lucene but rather something that is similar to
 Analysis, a package with a small base interface (which could stay in
 lucene core) and a large useful impl base like all the funcs in solr.
 So it seems to me that modules is the right place for at least those
 implemenations.

 does that make sense?


Do you mean that modules are not subject to backcompat guidelines? I was not
ware of that.
Relieving backcompat burden is a motivation by itself, but that asside,
function query (or custom score query) seems like basic capability to me -
it is not something that both Lucene and Solr are using, but rather
something that Lucene is providing and Solr and user applications are using,
right?


[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-10 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979496#action_12979496
 ] 

Steven Rowe edited comment on LUCENE-2657 at 1/10/11 8:03 AM:
--

Added profiles to populate internal repositories at {{lucene/dist/maven/}} and 
{{solr/dist/maven/}} with generated artifacts.

To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc 
artifacts, run the following from the top level Lucene/Solr directory:

{code}
mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy
cd lucene
mvn -DskipTests 
-Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy
cd ../modules
mvn -DskipTests 
-Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy
{code}

To populate {{solr/dist/maven/}}, run the following from the top level 
Lucene/Solr directory:

{code}
mvn -N -P bootstrap install
cd solr
mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar 
deploy
{code}

  was (Author: steve_rowe):
Added profiles to populate internal repositories at {{lucene/dist/maven/}} 
and {{solr/dist/maven/}} with generated artifacts.

To populate {{lucene/dist/maven/}} with POMs and binary, source and javadoc 
artifacts, run the following from the top level Lucene/Solr directory:

{code}
mvn -N -P bootstrap,deploy-to-lucene-dist-maven-repository deploy
cd lucene
mvn -DskipTests 
-Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy
cd ../modules
mvn -DskipTests 
-Pdeploy-to-lucene-dist-maven-repository,javadocs-jar,source-jar deploy
{code}

To populate {{lucene/dist/solr/}}, run the following from the top level 
Lucene/Solr directory:

{code}
mvn -N -P bootstrap install
cd solr
mvn -DskipTests -Pdeploy-to-solr-dist-maven-repository,javadocs-jar,source-jar 
deploy
{code}
  
 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2011-01-10 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2694:


Attachment: LUCENE-2694_hack.patch

here's a hack patch (dont think it actually works) just showing what i mean.

I think termsenum should only have seek(TermState).
in the hack-patch, i made the termState() and seekTermState() non-abstract:
the default impl returns a 'SimpleTermState' containing the term bytes and 
saved docFreq and implements seek(TermState) with those bytes.

This is basically what the patch had everywhere anyway as an implementation 
(for many of these, we should use more efficient impls, i fixed this for 
MemoryIndex as an example, but MultiTermsEnum comes to mind).

Also, i don't understand what was going on with setting bytes on the 
DeltaBytesReader with your seek(BytesRef, TermState) before.

If StandardCodec needs to know the shared byte[] prefix or something like that 
to reposition the enum, then it
should put this in its termstate.




 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694_hack.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: LICENSE/NOTICE file contents

2011-01-10 Thread karl.wright
Everyone should (carefully) read the Apache License 2.0 section 4(d).  It turns 
out that Apache has a somewhat unusual definition for the term derivative 
work.  It has to be something you actually modified, not just include.  So the 
incubator approach seems correct; neither the HSQLDB notice nor the Jetty 
notice belong in the Solr NOTICE.txt file.

For ManifoldCF, I just moved them to the end of the README.txt.  The old notice 
text is different from the corresponding LICENSE.txt text in both cases, so it 
did not make sense to either eliminate them or move them to LICENSE.txt.

Thanks,
Karl  

-Original Message-
From: ext Robert Muir [mailto:rcm...@gmail.com] 
Sent: Saturday, January 08, 2011 10:16 AM
To: dev@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: LICENSE/NOTICE file contents

On Sat, Jan 8, 2011 at 10:06 AM, Yonik Seeley
yo...@lucidimagination.com wrote:

 There also wasn't any business about and then add _nothing_ unless
 you can find explicit policy documented
 somewhere in the ASF that says it is required.  I was following
 examples from other projects and any docs I could find at the time,
 but this was back in '06.


Not sure there is now either, this is likely just someone's opinion.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-10 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Attachment: LUCENE-2657.patch

carrot2 dependency upgraded to 3.4.2: SOLR-2296

 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache

2011-01-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979588#action_12979588
 ] 

Christian Kohlschütter commented on LUCENE-2500:


I guess it would not be difficult to add Mac OS X support (via F_NOCACHE)?

see http://evanjones.ca/write-latency-alignment.html


 A Linux-specific Directory impl that bypasses the buffer cache
 --

 Key: LUCENE-2500
 URL: https://issues.apache.org/jira/browse/LUCENE-2500
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2500.patch


 I've been testing how we could prevent Lucene's merges from evicting
 pages from the OS's buffer cache.  I tried fadvise/madvise (via JNI)
 but (frustratingly), I could not get them to work (details at
 http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html).
 The only thing that worked was to use Linux's O_DIRECT flag, which
 forces all IO to bypass the buffer cache entirely... so I created a
 Linux-specific Directory impl to do this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-10 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979589#action_12979589
 ] 

David Smiley commented on LUCENE-2611:
--

I don't know why the vcs.xml change isn't working for you, but it's absolutely 
wonderful for the commit log history to show the JIRA references as links that 
work.  FWIW, I'm using IntelliJ 10. 

I understand RE workspace.xml.

 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x-part2.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2310) DocBuilder's getTimeElapsedSince Error

2011-01-10 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979593#action_12979593
 ] 

Koji Sekiguchi commented on SOLR-2310:
--

Good catch, tom. I'll commit shortly.

 DocBuilder's getTimeElapsedSince Error
 --

 Key: SOLR-2310
 URL: https://issues.apache.org/jira/browse/SOLR-2310
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4, 1.4.1
 Environment: JDK1.6
Reporter: tom liu
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1, 4.0


 i has a job which runs about 65 hours, but the dataimport?command=status http 
 requests returns 5 hours.
 in getTimeElapsedSince method of DocBuilder:
 {noformat} 
 static String getTimeElapsedSince(long l) {
 l = System.currentTimeMillis() - l;
 return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000)
 % 60 + . + l % 1000;
   }
 {noformat} 
 the hours Compute is wrong, it mould be :
 {noformat} 
 static String getTimeElapsedSince(long l) {
 l = System.currentTimeMillis() - l;
 return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000)
 % 60 + . + l % 1000;
   }
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2310) DocBuilder's getTimeElapsedSince Error

2011-01-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2310.
--

Resolution: Fixed

trunk: Committed revision 1057221.
3x: Committed revision 1057226.

 DocBuilder's getTimeElapsedSince Error
 --

 Key: SOLR-2310
 URL: https://issues.apache.org/jira/browse/SOLR-2310
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3, 1.4, 1.4.1
 Environment: JDK1.6
Reporter: tom liu
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1, 4.0


 i has a job which runs about 65 hours, but the dataimport?command=status http 
 requests returns 5 hours.
 in getTimeElapsedSince method of DocBuilder:
 {noformat} 
 static String getTimeElapsedSince(long l) {
 l = System.currentTimeMillis() - l;
 return (l / (6 * 60)) % 60 + : + (l / 6) % 60 + : + (l / 1000)
 % 60 + . + l % 1000;
   }
 {noformat} 
 the hours Compute is wrong, it mould be :
 {noformat} 
 static String getTimeElapsedSince(long l) {
 l = System.currentTimeMillis() - l;
 return (l / (6 * 60)) + : + (l / 6) % 60 + : + (l / 1000)
 % 60 + . + l % 1000;
   }
 {noformat} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-10 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979606#action_12979606
 ] 

Ryan McKinley commented on LUCENE-2657:
---

This is looking good!

IIUC, this will be a parallel build system to ant.  The build and test is 
independent of anything the ant build does.

If we take this route, we should probably drop the -pom.xml.templates and the 
and --generate-maven-artifacts target.



 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-3.x - Build # 3594 - Failure

2011-01-10 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/3594/

1 tests failed.
REGRESSION:  org.apache.lucene.search.TestThreadSafe.testLazyLoadThreadSafety

Error Message:
unable to create new native thread

Stack Trace:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:614)
at 
org.apache.lucene.search.TestThreadSafe.doTest(TestThreadSafe.java:133)
at 
org.apache.lucene.search.TestThreadSafe.testLazyLoadThreadSafety(TestThreadSafe.java:152)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:255)




Build Log (for compile errors):
[...truncated 8565 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979625#action_12979625
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

bq. Any segs already flushed remain flushed (but not committed). Any segs not 
yet flushed remain not yet flushed...

If the segment are flushed, then they will be deleted?  Or they will be made 
available in a subsequent and completely successful commit?

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979631#action_12979631
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

{quote}Is that right that future RT readers are no longer immutable snapshots
(in a sense that they have variable maxDoc)?{quote}

The RT readers'll be point-in-time. There are many mechanisms to make this
happen that mainly revolve around a static maxDoc per reader while allowing
some of the underlying data structures to change during indexing. There are two
overall design issues right now and that is how to handle norms and the
system.arraycopy per getReader to create static read only parallel upto arrays.

I think system.arraycopy should be fast enough given it's a native instruction
on Intel. And for norms we may need to relax their accuracy in order to create
less garbage. That would involve either using a byte[][] for point-in-timeness
or a byte[] that is recalculated only as it's grown (meaning newer readers
created since the last array growth may see a slightly inaccurate norm value).
The norm byte[] would essentially be grown every N docs.



 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-01-10 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979633#action_12979633
 ] 

Michael Busch commented on LUCENE-2312:
---

bq. I believe the goal for RT readers is still point in time reader semantics.

True.  At twitter our RT solution also guarantees point-in-time readers (with 
one exception; see below).  We have to provide at least a fixed macDoc 
per-query to guarantee consistency across terms (posting lists).  Eg.  imagine 
your query is 'a AND NOT b'. Say a occurs in doc 100. Now you don't find a 
posting in b's posting list for doc 100.  Did doc 100 not have term b, or is 
doc 100 still being processed and that particular posting hasn't been written 
yet?  If the reader's maxDoc however is set to 99 (the last completely indexed 
document) you can't get into this situation.

Before every query we reopen the readers, which effectively simply updates the 
maxDoc.

The one exception to point-in-time-ness are the df values in the dictionary, 
which for obvious reasons is tricky.  I think a straightforward way to solve 
this problem is to count the df by iterating the corresponding posting list 
when requested. We could add a special counting method that just uses the skip 
lists to perform this task. Here the term buffer becomes even more important, 
and also documenting that docFreq() can be expensive in RT mode, ie. not O(1) 
as in non-RT mode, but rather O(log indexSize) in case we can get multi-level 
skip lists working in RT.

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-10 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979635#action_12979635
 ] 

Steven Rowe commented on LUCENE-2657:
-

bq. IIUC, this will be a parallel build system to ant. The build and test is 
independent of anything the ant build does.

Yes, except that the two systems share build output directories.

bq. If we take this route, we should probably drop the -pom.xml.templates and 
the and --generate-maven-artifacts target.

I agree.  

The -pom.xml.templates have never been fully correct (e.g. missing 
dependencies) and are unmaintained.  

I'm working on replacing generate-maven-artifacts.


 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-10 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979637#action_12979637
 ] 

Steven Rowe commented on LUCENE-2611:
-

bq. I've used the copyright plugin a lot and its a great way to ensure that the 
ASL is added to any new files. Might be useful to add it to reduce the hassle 
for new contributors.

OK, I'll investigate.

 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x-part2.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2611) IntelliJ IDEA and Eclipse setup

2011-01-10 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979643#action_12979643
 ] 

Steven Rowe commented on LUCENE-2611:
-

bq. I don't know why the vcs.xml change isn't working for you, but it's 
absolutely wonderful for the commit log history to show the JIRA references as 
links that work.

I agree, that would be nice.

bq. FWIW, I'm using IntelliJ 10.

I'm running both ATM, in part to insure that the configuration provided by this 
issue works under both IntelliJ 9 and 10.  I haven't tried the log comment 
issue auto-linkification yet under IntelliJ 10.

I *do* see auto-linkified issue IDs in the Repository Changes view, as well as 
in the [Version Control | Subversion | Show History] view in IntelliJ 9 - very 
nice!  (Just not in the log comment editor or in the svnbar plugin's SVN Diff 
popup.)

 IntelliJ IDEA and Eclipse setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x-part2.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-part2.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA or Eclipse can be time-consuming.
 The attached patches add a new top level directory {{dev-tools/}} with 
 sub-dirs {{idea/}} and {{eclipse/}} containing basic setup files for trunk, 
 as well as top-level ant targets named idea and eclipse that copy these 
 files into the proper locations.  This arrangement avoids the messiness 
 attendant to in-place project configuration files directly checked into 
 source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit run configuration per module 
 is included.
 The Eclipse configuration includes a source entry for each 
 source/test/resource location and classpath setup: a library entry for each 
 jar.
 For IDEA, once {{ant idea}} has been run, the only configuration that must be 
 performed manually is configuring the project-level JDK.  For Eclipse, once 
 {{ant eclipse}} has been run, the user has to refresh the project 
 (right-click on the project and choose Refresh).
 If these patches is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination IDEA and Eclipse configuration 
 locations.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch for IDEA: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979646#action_12979646
 ] 

Jason Rutherglen commented on LUCENE-2312:
--

{quote}The one exception to point-in-time-ness are the df values in the
dictionary, which for obvious reasons is tricky.{quote}

Right, forgot about those. I think we'd planned on using a multi-dimensional
array, eg int[][]. However we'd need to test how they'll affect indexing
performance. If that doesn't work then we'll need to think about other
solutions like building them on demand, which is offloading the problem
somewhere else. It looks like docFreq is used only for phrase queries? However
I think paying a potentially small penalty during indexing (only when RT is on)
is better than a somewhat random penalty during querying.

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979649#action_12979649
 ] 

Michael Busch commented on LUCENE-2324:
---

bq. Longer term c) would be great, or, if IW has an ES then it'd send multiple 
flush jobs to the ES.

Lost in abbreviations :) - Can you remind me what 'ES' is?

bq. But, you're right: maybe we should sometimes prune DWPTs. Or simply stop 
recycling any RAM, so that a just-flushed DWPT is an empty shell.

I'm not sure I understand what the problem here with recycling RAM is.  Could 
someone elaborate?


 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979654#action_12979654
 ] 

Michael Busch commented on LUCENE-2324:
---

bq. I think aborting a flush should only lose the docs in that one DWPT (as it 
is today).

Yeah I'm convinced now I don't want the nuke the world approach.  Btw, Mike, 
you're very good with giving things intuitive names :)


bq. I think on commit if we hit an aborting exception flushing a given DWPT, we 
throw it then  there.

Yes sounds good.


{quote}
bq. Any segs already flushed remain flushed (but not committed). Any segs not 
yet flushed remain not yet flushed...

If the segment are flushed, then they will be deleted? Or they will be made 
available in a subsequent and completely successful commit?
{quote}

The aborting exception might be thrown due to a disk-full situation.  This can 
be fixed and commit() called again, which then would flush the remaining DWPTs 
and commit all flushed segments.
Otherwise, those flushed segments will be orphaned and deleted sometime later 
by a different IW because they don't belong to any SegmentInfos.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979655#action_12979655
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

bq. Lost in abbreviations  - Can you remind me what 'ES' is?

I read it as ExecutorService, ie, a thread pool.

bq. I'm not sure I understand what the problem here with recycling RAM is. 
Could someone elaborate?

Mainly that we could have DWPT(s) lying around unused, consuming [recycled] 
RAM, eg, from a sudden drop in the number of incoming threads after a flush.  
This is a drop the code, and put it back in if that was a bad idea solution.



 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-10 Thread Antonio Verni (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Verni updated SOLR-2307:


Attachment: TestPHPSerializedResponseWriter.java
PHPSerializedResponseWriter.java.patch

Patch update to handle single values in multi valued fields. Added a junit test 
case just to test the issue.

 PHPSerialized fails with sharded queries
 

 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.3, 1.4.1
Reporter: Antonio Verni
Priority: Minor
 Attachments: PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java


 Solr throws a java.lang.IllegalArgumentException: Map size must not be 
 negative exception when using the PHP Serialized response writer with 
 sharded queries. 
 To reproduce the issue start your preferred example and try the following 
 query:
 http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr
 It is caused by the JSONWriter implementation of writeSolrDocumentList and 
 writeSolrDocument. Overriding this two methods in the 
 PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
 the issue.
 Attached my patch made against trunk rev 1055588.
 cheers,
 Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979671#action_12979671
 ] 

Michael Busch commented on LUCENE-2324:
---

{quote}
Mainly that we could have DWPT(s) lying around unused, consuming [recycled] 
RAM, eg, from a sudden drop in the number of incoming threads after a flush. 
This is a drop the code, and put it back in if that was a bad idea solution.
{quote}

Ah thanks, got it.  


bq. Or simply stop recycling any RAM, so that a just-flushed DWPT is an empty 
shell.

+1

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-10 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979683#action_12979683
 ] 

Ahmet Arslan commented on SOLR-2307:


This patch solves this problem http://search-lucene.com/m/lr7t42uWp4g , right?

 PHPSerialized fails with sharded queries
 

 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.3, 1.4.1
Reporter: Antonio Verni
Priority: Minor
 Attachments: PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java


 Solr throws a java.lang.IllegalArgumentException: Map size must not be 
 negative exception when using the PHP Serialized response writer with 
 sharded queries. 
 To reproduce the issue start your preferred example and try the following 
 query:
 http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr
 It is caused by the JSONWriter implementation of writeSolrDocumentList and 
 writeSolrDocument. Overriding this two methods in the 
 PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
 the issue.
 Attached my patch made against trunk rev 1055588.
 cheers,
 Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979700#action_12979700
 ] 

Michael McCandless commented on LUCENE-2474:


I started to implement the forwards to all subs and to all reopened readers 
and... it's kinda hairy.  I mean there are TONS of places where we make new 
readers (Earwin's working on improving this, I think, under LUCENE-2355).

So then I wondered: what if we just make this a static method, eg on 
IndexReader, add/removeReaderFinishedListener?  (Or we could put it on 
FieldCache).  That'd be a tiny change...

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-2851) Highlighting in UTF-8 documents

2011-01-10 Thread Maricris Villareal (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maricris Villareal closed LUCENE-2851.
--

Resolution: Not A Problem

Sorry, it turned out that the issue was in the way I was opening the file.

BufferedReader br = new BufferedReader(new FileReader(pageFile));

was changed to 

BufferedReader br = new BufferedReader(new java.io.InputStreamReader(new 
java.io.FileInputStream(pageFile), UTF-8));

and it worked.

 Highlighting in UTF-8 documents
 ---

 Key: LUCENE-2851
 URL: https://issues.apache.org/jira/browse/LUCENE-2851
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9.3
Reporter: Maricris Villareal

 When I try to highlight a Chinese document using 
 org.apache.lucene.search.highlight.Highlighter, I end up with a corrupted 
 document from getBestTextFragments().  This corruption happens both when I 
 try to highlight a pure English query or when I try to highlight a pure 
 Chinese query.  
 I believe that this issue is related to LUCENE-1500 which was closed by 
 preventing an exception from being thrown but did not fix the underlying 
 problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979713#action_12979713
 ] 

Michael McCandless commented on LUCENE-2324:


{quote}
bq. Lost in abbreviations - Can you remind me what 'ES' is?

I read it as ExecutorService, ie, a thread pool.
{quote}

Yes, sorry that's what I meant.

Ie someday IW can take an ES too and farm things out to it when it could make 
use of concurrency (like flush the world).  But that's for later /dream.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-10 Thread Antonio Verni (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979715#action_12979715
 ] 

Antonio Verni commented on SOLR-2307:
-

yes, exactly, and it could also fix SOLR-2278 but I didn't tested it.
As I discovered reading your SOLR-2291, the current patch I uploaded does not 
respect returnFields parameter. I've fixed it but I need to upload a third 
version of the patch (so please, sorry the mess) and a new test file.
In detail, to fix the issue I've added a numeric index to the response array 
of documents in writeSolrDocumentList as requested by the php serialization 
protocol and handled the field count in writeSolrDocument.


 PHPSerialized fails with sharded queries
 

 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.3, 1.4.1
Reporter: Antonio Verni
Priority: Minor
 Attachments: PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java


 Solr throws a java.lang.IllegalArgumentException: Map size must not be 
 negative exception when using the PHP Serialized response writer with 
 sharded queries. 
 To reproduce the issue start your preferred example and try the following 
 query:
 http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr
 It is caused by the JSONWriter implementation of writeSolrDocumentList and 
 writeSolrDocument. Overriding this two methods in the 
 PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
 the issue.
 Attached my patch made against trunk rev 1055588.
 cheers,
 Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979719#action_12979719
 ] 

Michael McCandless commented on LUCENE-2795:


https://issues.apache.org/jira/browse/LUCENE-2500?focusedCommentId=12979588page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12979588
 has details on what flags to pass to OS X to bypass its buffer cache...

 Genericize DirectIOLinuxDir - UnixDir
 --

 Key: LUCENE-2795
 URL: https://issues.apache.org/jira/browse/LUCENE-2795
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to 
 use it for indexWriter and not IndexReader (searching).  It's a trap.
 But, once we do LUCENE-2793, we can make it fully general purpose because 
 then a single native Dir impl can be used.
 I'd also like to make it generic to other Unices, if we can, so that it 
 becomes UnixDirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979718#action_12979718
 ] 

Michael McCandless commented on LUCENE-2500:


Nice!

Actually I'd like to generalize this Dir impl to be a UnixFSDirectory (adding 
ifdefs to handle the flags for the various flavors), and, fix it, once we have 
IOContext, to properly decide when to use direct IO and when not to.  This way 
it's safe to just use on any Unix platform... (see LUCENE-2795).


 A Linux-specific Directory impl that bypasses the buffer cache
 --

 Key: LUCENE-2500
 URL: https://issues.apache.org/jira/browse/LUCENE-2500
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2500.patch


 I've been testing how we could prevent Lucene's merges from evicting
 pages from the OS's buffer cache.  I tried fadvise/madvise (via JNI)
 but (frustratingly), I could not get them to work (details at
 http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html).
 The only thing that worked was to use Linux's O_DIRECT flag, which
 forces all IO to bypass the buffer cache entirely... so I created a
 Linux-specific Directory impl to do this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979720#action_12979720
 ] 

Jason Rutherglen commented on LUCENE-2474:
--

{quote}make this a static method, eg on IndexReader, 
add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be 
a tiny change...{quote}

This makes the most sense however it feels temporary as should should probably 
move to a unified IWC/IRC config where all parameters are set and shared for 
writers and readers?  This way we can eventually coordinate things like IO 
scheduling, eg, LUCENE-2793's IOContext.  Also shouldn't there simply be a 
reader event listener and perhaps even a writer event listener?

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-10 Thread Antonio Verni (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Verni updated SOLR-2307:


Attachment: TestPHPSerializedResponseWriter.java
PHPSerializedResponseWriter.java.patch

The previous implementation was not respecting the returnField parameter. 
Changes reflected in test code

 PHPSerialized fails with sharded queries
 

 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.3, 1.4.1
Reporter: Antonio Verni
Priority: Minor
 Attachments: PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java, 
 TestPHPSerializedResponseWriter.java


 Solr throws a java.lang.IllegalArgumentException: Map size must not be 
 negative exception when using the PHP Serialized response writer with 
 sharded queries. 
 To reproduce the issue start your preferred example and try the following 
 query:
 http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr
 It is caused by the JSONWriter implementation of writeSolrDocumentList and 
 writeSolrDocument. Overriding this two methods in the 
 PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
 the issue.
 Attached my patch made against trunk rev 1055588.
 cheers,
 Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979728#action_12979728
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

Shall I take this one?  With the plan being to add config options to IWC so 
that IW uses the DirectIOLinuxDirectory (and it's variants) only for merging?

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979731#action_12979731
 ] 

Michael McCandless commented on LUCENE-2793:


Yes please!

But, this issue only adds the IOContext, threading it down to when you open an 
input / create an output.  That context should hold enough information to 
allow the Dir impl to make decisions like buffer sizes and avoiding buffer 
cache, etc.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-975) admin-extra.html not currectly display when using multicore configuration

2011-01-10 Thread Edward Rudd (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979730#action_12979730
 ] 

Edward Rudd commented on SOLR-975:
--

I have confirmed this issue is fixed in the 4.0 nightly build from today.

 admin-extra.html not currectly display when using multicore configuration
 -

 Key: SOLR-975
 URL: https://issues.apache.org/jira/browse/SOLR-975
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
 Environment: Jetty openjdk 1.6.0 1.0.b12 (EPEL package for EL5)
Reporter: Edward Rudd

 I'm having cross-talk issues with using the Solr nightlies (and probably w/ 
 1.3.0 release but have not tested as I needed newer features of the 
 DataImportHandler in the nightlies) 
 Basic scenario for this bug is as follows
 I have two cores configured and BOTH have a customized admin-extra.html, 
 however going to the admin pages uses the SAME admin-extra.html for all 
 cores.   the one used is whichever core is browsed first..This looks like 
 a caching bug where the cache is not taking into account the Core.
 Basically my admin-extra.html has a link to the data importer script and a 
 link to reload the core (which has to have the core name explicitly in the 
 per-core admin-extra.html).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979733#action_12979733
 ] 

Michael McCandless commented on LUCENE-2474:


I think we can generalize this to any event and to writer in the future... for 
today, just letting something external be notified when a reader is gone, 
just as FieldCache is privately notified today, is a good baby step.

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979736#action_12979736
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

{quote}this issue only adds the IOContext, threading it down to when you open 
an input / create an output{quote}

Does this mean we're not implementing this part?

{quote}This will require fixing how IW pools readers, so that a reader opened 
for merging is not then used for searching, and vice/versa. Really, it's only 
all the open file handles that need to be different - we could in theory share 
del docs, norms, etc, if that were somehow possible.{quote}

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-10 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979737#action_12979737
 ] 

Ahmet Arslan commented on SOLR-2307:


By the way, I think you should call  req.close(); at the end the test.

 PHPSerialized fails with sharded queries
 

 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.3, 1.4.1
Reporter: Antonio Verni
Priority: Minor
 Attachments: PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java, 
 TestPHPSerializedResponseWriter.java


 Solr throws a java.lang.IllegalArgumentException: Map size must not be 
 negative exception when using the PHP Serialized response writer with 
 sharded queries. 
 To reproduce the issue start your preferred example and try the following 
 query:
 http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr
 It is caused by the JSONWriter implementation of writeSolrDocumentList and 
 writeSolrDocument. Overriding this two methods in the 
 PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
 the issue.
 Attached my patch made against trunk rev 1055588.
 cheers,
 Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979738#action_12979738
 ] 

Michael McCandless commented on LUCENE-2793:


Sorry, I think we should also do that as part of this issue.  Basically the 
IOContext needs to become part of the cache key uses in IW's ReaderPool?

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2856) Create IndexWriter event listener, specifically for merges

2011-01-10 Thread Jason Rutherglen (JIRA)
Create IndexWriter event listener, specifically for merges
--

 Key: LUCENE-2856
 URL: https://issues.apache.org/jira/browse/LUCENE-2856
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Jason Rutherglen


The issue will allow users to monitor merges occurring within IndexWriter using 
a callback notifier event listener.  This can be used by external applications 
such as Solr to monitor large segment merges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979740#action_12979740
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

bq. Basically the IOContext needs to become part of the cache key uses in IW's 
ReaderPool? 

Great, I'll implement this.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2857) Fix various problems with PulsingCodec

2011-01-10 Thread Michael McCandless (JIRA)
Fix various problems with PulsingCodec
--

 Key: LUCENE-2857
 URL: https://issues.apache.org/jira/browse/LUCENE-2857
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2857) Fix various problems with PulsingCodec

2011-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2857:
---

Attachment: LUCENE-2857.patch

Patch.  I changed PulsingCodec to:

  * Not use absurd RAM when cloning TermState

  * Don't decode the byte[] entry in the terms dict until docs/positions enum 
is needed

  * Use total TF (number of term positions across all docs) as the
threshold for storing in terms dict vs wrapped codec

This fixes the intermittent failure in
TestIndexWriterOnJRECrash.testNRTThreads that we've seen lately.


 Fix various problems with PulsingCodec
 --

 Key: LUCENE-2857
 URL: https://issues.apache.org/jira/browse/LUCENE-2857
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Codecs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2857.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2829) improve termquery pk lookup performance

2011-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2829.


Resolution: Fixed

 improve termquery pk lookup performance
 -

 Key: LUCENE-2829
 URL: https://issues.apache.org/jira/browse/LUCENE-2829
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Robert Muir
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2829.patch, LUCENE-2829.patch, LUCENE-2829.patch


 For things that are like primary keys and don't exist in some segments (worst 
 case is primary/unique key that only exists in 1)
 we do wasted seeks.
 While LUCENE-2694 tries to solve some of this issue with TermState, I'm 
 concerned we could every backport that to 3.1 for example.
 This is a simpler solution here just to solve this one problem in 
 termquery... we could just revert it in trunk when we resolve LUCENE-2694,
 but I don't think we should leave things as they are in 3.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENENET-387) Not able to sort by Title

2011-01-10 Thread Reshmi Kumari (JIRA)
Not able to sort by Title
-

 Key: LUCENENET-387
 URL: https://issues.apache.org/jira/browse/LUCENENET-387
 Project: Lucene.Net
  Issue Type: Bug
 Environment: Lucene.net incorporated in Sitecore CMS 6.2, OS : windows 
server 2008, IE 8
Reporter: Reshmi Kumari


Sorting by date is working perfectly fine but not able to sort by Title. I 
have indexed Title but not tokenized. (field target=sortTitle 
indexType=untokenizedTitle/field).

for sorting I have used: sort = new Sort(new SortField(sortTitle, 
SortField.STRING));

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Lucene-trunk - Build # 1421 - Failure

2011-01-10 Thread Michael McCandless
OK I got TestNRTThreads (alone, no crashing) to fail, once I added a
CheckIndex to it (and ran under while(1)).

So this is not particular to crashing... it's just because
PulsingCodec was using crazy RAM on cloning its TermState.  I fixed
this in LUCENE-2857.

Mike

On Mon, Jan 10, 2011 at 6:26 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 OK, so this looks be caused by 1) the fact that we are indexing Greek
 stop words with the TestNRTThreads test, and 2) Pulsing codec is
 horribly inefficient in how it handles pulsed terms that have many
 many positions.

 But it's odd we've only hit this failure in the JRE crash test... I've
 added a CheckIndex to TestNRTThreads itself and I'll see if that too
 can provoke an OOME.  It should.

 Mike

 On Mon, Jan 10, 2011 at 5:14 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 OK I snatched this index and indeed I can reproduce the OOME during
 CheckIndex!  The hunt begins :)

 Mike

 On Sun, Jan 9, 2011 at 10:35 PM, Robert Muir rcm...@gmail.com wrote:
 On Sun, Jan 9, 2011 at 9:40 PM, Apache Hudson Server
 hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1421/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

 Error Message:
 CheckIndex failed

 maybe this is specific to pulsing? I noticed its failed 3 times with
 this identical pulsing stacktrace:
 Lucene-trunk/1421, tests-only/3590, tests-only/3570

 However, this time it failed in a nightly build (perhaps the indexes
 are still available on the hudson machine if we salvage before the
 next nightly build?)
 it should be under lucene/build/test/N/jrecrashXXtmp/

 all 3 times the stacktrace is:
 test: terms, freq, prox...ERROR [Java heap space]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Position.clone(PulsingPostingsWriterImpl.java:104)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsWriterImpl$Document.clone(PulsingPostingsWriterImpl.java:74)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingTermState.clone(PulsingPostingsReaderImpl.java:72)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl$PulsingDocsEnum.reset(PulsingPostingsReaderImpl.java:234)
        at 
 org.apache.lucene.index.codecs.pulsing.PulsingPostingsReaderImpl.docs(PulsingPostingsReaderImpl.java:189)
        at 
 org.apache.lucene.index.codecs.PrefixCodedTermsReader$FieldReader$SegmentTermsEnum.docs(PrefixCodedTermsReader.java:515)
        at 
 org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:756)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:489)
        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:83)
        at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
        at 
 org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 3634 - Failure

2011-01-10 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3634/

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

Error Message:
CheckIndex failed

Stack Trace:
java.lang.RuntimeException: CheckIndex failed
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:87)
at org.apache.lucene.util._TestUtil.checkIndex(_TestUtil.java:73)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:131)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.checkIndexes(TestIndexWriterOnJRECrash.java:137)
at 
org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads(TestIndexWriterOnJRECrash.java:61)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1049)




Build Log (for compile errors):
[...truncated 3053 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENENET-380) Evaluate Sharpen as a port tool

2011-01-10 Thread Peter Mateja
The amount of custom work required for the conversion is starting to concern
me a bit.  Well, to clarify, the work itself doesn't concern me, but rather
I'm worried that this is going to make a purely automated conversion process
very difficult to pull off and probably very fragile.  The devil is
definitely in the details.

What are thoughts concerning how we can begin to tackle this?

How many of these issues can be handled by Sharpen, or a modified, custom
version of Sharpen?  What items are best handled by a pre/post processor?

A number of the items DIGY listed (thanks!) seem to fall under the scope of
code intent, vs pure syntactical mapping.  I'd suggest that it's
unrealistic to expect any conversion tool to manage those types of issues.

Perhaps a process such as the following should be our initial draft:

1) Start with Lucene.Java source, initially the latest 3.0.3 release.
2) Make specific hand coded changes to the java source code to assist with
certain automated conversion issues.  These changes should be expressed as a
set of patch files, to be automatically applied to the java source on
subsequent iterations of this process.  Any patch rejections should break
the build.  These patches should be maintained as new code updates come from
the java source.
3) Run an automated conversion tool (Sharpen most likely.)
4) Perform any desired post processing to modify the source code structure,
setup project / solution files, etc.  Essentially, get the project into a
state that it's loadable by Visual Studio.  At this point there will be
errors (lots of them.)  The output of this step should be checked in as the
raw conversion source.
5) Make changes to the converted C# code, including necessary helper
classes, in order to fix all the remaining issues alluded to by DIGY.  Also,
run any automated post processing, such as Resharper code formatting (the
formatting settings should be standardized across the project to ensure
normalized and repeatable refactorings), inline docs tweaks, etc.  These
changes should also be expressed as a set of patch files, to be
automatically applied to the raw conversion source on subsequent iterations
of this process.  Any patch rejections should break the build.  These
patches should represent the bulk of the efforts of the Lucene.Net core dev
team.  The output of this step should be checked in as the official
Lucene.Net source code.

This entire process needs to be checked into a conversion process branch.
 After the initial build of this system, workflow would be split into the
following 2 vectors:
A) On java source changes (probably at a courser level than individual
commits,) steps 1-4 would be run to build a new base raw conversion source.
 With the java changes, it's possible that changes to the patch files in
step 2 would be required.  Then step 5 would be run to create the official
Lucene.Net source.  Again, fixes to the patches may be in order depending on
the complexity of the original java changes
B) Most other changes would be considered C#-side specific.  This might
involve platform specific bug fixes, desired code refactorings, etc.  These
changes would be made based on the current checked in Lucene.Net source, and
the patch files for step 5 would be updated to reflect those changes.

Conversion process changes would fall outside the scope of standard
development, being fairly disruptive.

Of course, this process does complicate the development / maintenance
process quite a bit, by making many more vectors of change.  And, I'm aware
that what I've blathered on about here has probably already been discussed,
but I wanted to get some discussion going.  Thoughts?

Peter Mateja
peter.mat...@gmail.com



On Sun, Jan 9, 2011 at 4:09 PM, Digy digyd...@gmail.com wrote:

 Having a buildable  clean code is just a beginning and should not
 result in lost of know-hows.
 Before trying to fix the bugs of the output of these tools, everyone should
 see how they were fixed in Lucene.Net 2.9.2.
 There is no need to reinvent the wheel.

 Here is a quick list of tips  tricks as far as I can remember.

 * Decimal separator is not always ., some locales use , (while parsing
 float/double).
 * Set in Java accepts null as argument.  A null-control is needed while
 porting.
 * ReadResolve should be ported by implementing the interface
 System.Runtime.Serialization.IObjectReference
public Object
 GetRealObject(System.Runtime.Serialization.StreamingContext context)
{
return ReadResolve();
}
 * .NET emits \ufffd as invalid char but java as \x00
 * Use StringComparer.Ordinal while comparing strings.
 * FIPS compliance.  use SHA1 instead of MD5
 * Use System.Runtime.Serialization.OnDeserialized attribute on
 Serializable classes.
void OnDeserialized(System.Runtime.Serialization.StreamingContext
 context)
{
-
}
 * Use System.IO.Path.DirectorySeparatorChar or Path.Combine instead of
 using \\. (causes problems on 

[jira] Updated: (SOLR-2307) PHPSerialized fails with sharded queries

2011-01-10 Thread Antonio Verni (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antonio Verni updated SOLR-2307:


Comment: was deleted

(was: The previous implementation was not respecting the returnField parameter. 
Changes reflected in test code)

 PHPSerialized fails with sharded queries
 

 Key: SOLR-2307
 URL: https://issues.apache.org/jira/browse/SOLR-2307
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.3, 1.4.1
Reporter: Antonio Verni
Priority: Minor
 Attachments: PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, 
 PHPSerializedResponseWriter.java.patch, TestPHPSerializedResponseWriter.java, 
 TestPHPSerializedResponseWriter.java


 Solr throws a java.lang.IllegalArgumentException: Map size must not be 
 negative exception when using the PHP Serialized response writer with 
 sharded queries. 
 To reproduce the issue start your preferred example and try the following 
 query:
 http://localhost:8983/solr/select/?q=*:*wt=phpsshards=localhost:8983/solr,localhost:8983/solr
 It is caused by the JSONWriter implementation of writeSolrDocumentList and 
 writeSolrDocument. Overriding this two methods in the 
 PHPSerializedResponseWriter to handle the SolrDocument size seems to solve 
 the issue.
 Attached my patch made against trunk rev 1055588.
 cheers,
 Antonio

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2011-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2837.


Resolution: Fixed

3rd time's a charm?

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2837.patch, LUCENE-2837.patch, LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2474:
---

Attachment: LUCENE-2474.patch

OK, here's a patch exposing the readerFinishedListeners as static methods on 
IndexReader.

It was also nice to consolidate all the various places we were previously 
purging the FieldCache.

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch, LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-3.x - Build # 238 - Failure

2011-01-10 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/238/

All tests passed

Build Log (for compile errors):
[...truncated 21034 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2756) MultiSearcher.rewrite() incorrectly rewrites queries

2011-01-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2756.


   Resolution: Fixed
Fix Version/s: 4.0
   3.1

MultiSearcher is now deprecated/removed.

 MultiSearcher.rewrite() incorrectly rewrites queries
 

 Key: LUCENE-2756
 URL: https://issues.apache.org/jira/browse/LUCENE-2756
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2756_testcase.patch


 This was reported on the userlist, in the context of range queries.
 Its also easy to make our existing tests fail with my patch on LUCENE-2751:
 {noformat}
 ant test-core -Dtestcase=TestBoolean2 -Dtestmethod=testRandomQueries 
 -Dtests.seed=7679849347282878725:-903778383189134045
 {noformat}
 The fundamental problem is that MultiSearcher first rewrites against 
 individual subs, 
 then uses Query.combine() which simply OR's these sub-clauses.
 This is incorrect for expanded MUST_NOT queries (e.g. from wildcard), as it 
 violates demorgan's law.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2474) Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

2011-01-10 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979888#action_12979888
 ] 

Earwin Burrfoot commented on LUCENE-2474:
-

bq. Earwin's working on improving this, I think, under LUCENE-2355
I stalled, and then there were just so many changes under trunk, so I have to 
restart now :) Thanks for another kick.

 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey)
 

 Key: LUCENE-2474
 URL: https://issues.apache.org/jira/browse/LUCENE-2474
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Shay Banon
 Attachments: LUCENE-2474.patch, LUCENE-2474.patch


 Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean 
 custom caches that use the IndexReader (getFieldCacheKey).
 A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its 
 make a lot of sense to cache things based on IndexReader#getFieldCacheKey, 
 even Lucene itself uses it, for example, with the CachingWrapperFilter. 
 FieldCache enjoys being called explicitly to purge its cache when possible 
 (which is tricky to know from the outside, especially when using NRT - 
 reader attack of the clones).
 The provided patch allows to plug a CacheEvictionListener which will be 
 called when the cache should be purged for an IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2855) Contrib queryparser should not use CharSequence as Map key

2011-01-10 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani resolved LUCENE-2855.
--

Resolution: Fixed

patch applied on revision 1057454

 Contrib queryparser should not use CharSequence as Map key
 --

 Key: LUCENE-2855
 URL: https://issues.apache.org/jira/browse/LUCENE-2855
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/*
Affects Versions: 3.0.3
Reporter: Adriano Crestani
Assignee: Adriano Crestani
 Fix For: 3.0.4

 Attachments: lucene_2855_adriano_crestani_2011_01_08.patch, 
 lucene_2855_adriano_crestani_2011_01_09.patch


 Today, contrib query parser uses MapCharSequence,... in many different 
 places, which may lead to problems, since CharSequence interface does not 
 enforce the implementation of hashcode and equals methods. Today, it's 
 causing a problem with QueryTreeBuilder.setBuilder(CharSequence,QueryBuilder) 
 method, that does not works as expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1422 - Still Failing

2011-01-10 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1422/

All tests passed

Build Log (for compile errors):
[...truncated 16681 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr-3.x - Build # 224 - Failure

2011-01-10 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-3.x/224/

All tests passed

Build Log (for compile errors):
[...truncated 20277 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org