date:20110113


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981168#action_12981168
 ] 

Stanislaw Osinski commented on SOLR-2282:
-

Hi Robert,

What's the configuration (OS / JVM) on which the test is failing for you? I 
can't get it to fail on my machines (Win 7 64-bit with Sun JVM 1.6.0_20 and 
Oracle 1.6.0_23, Ubuntu 64-bit with Sun JVM 1.6.0_20). I'm running the test 
using the command I found in Hudson logs (ant test 
-Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch 
-Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3).

S.

 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2864) add maxtf to fieldinvertstate

add maxtf to fieldinvertstate
-

 Key: LUCENE-2864
 URL: https://issues.apache.org/jira/browse/LUCENE-2864
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Query/Scoring
Reporter: Robert Muir
 Fix For: 3.1, 4.0
 Attachments: LUCENE-2864.patch

the maximum within-document TF is a very useful scoring value, 
we should expose it so that people can use it in scoring

consider the following sim:
{code}
@Override
public float idf(int docFreq, int numDocs) {
  return 1.0F; /* not used */
}

@Override
public float computeNorm(String field, FieldInvertState state) {
  return state.getBoost() / (float) Math.sqrt(state.getMaxTF());
}
{code}

which is surprisingly effective, but more interesting for practical reasons.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2864) add maxtf to fieldinvertstate


 [ 
https://issues.apache.org/jira/browse/LUCENE-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2864:


Attachment: LUCENE-2864.patch

 add maxtf to fieldinvertstate
 -

 Key: LUCENE-2864
 URL: https://issues.apache.org/jira/browse/LUCENE-2864
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Query/Scoring
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2864.patch


 the maximum within-document TF is a very useful scoring value, 
 we should expose it so that people can use it in scoring
 consider the following sim:
 {code}
 @Override
 public float idf(int docFreq, int numDocs) {
   return 1.0F; /* not used */
 }
 @Override
 public float computeNorm(String field, FieldInvertState state) {
   return state.getBoost() / (float) Math.sqrt(state.getMaxTF());
 }
 {code}
 which is surprisingly effective, but more interesting for practical reasons.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-2282) Distributed Support for Search Result Clustering


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981168#action_12981168
 ] 

Stanislaw Osinski edited comment on SOLR-2282 at 1/13/11 3:19 AM:
--

Hi Robert,

What's the configuration (OS / JVM) on which the test is failing for you? I 
can't get it to fail on my machines (Win 7 64-bit with Sun JVM 1.6.0_20 Client 
VM and Oracle 1.6.0_23 Server VM, Ubuntu 64-bit with Sun JVM 1.6.0_20 Server 
VM). I'm running the test using the command I found in Hudson logs (ant test 
-Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch 
-Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3).

S.

  was (Author: stanislaw.osinski):
Hi Robert,

What's the configuration (OS / JVM) on which the test is failing for you? I 
can't get it to fail on my machines (Win 7 64-bit with Sun JVM 1.6.0_20 and 
Oracle 1.6.0_23, Ubuntu 64-bit with Sun JVM 1.6.0_20). I'm running the test 
using the command I found in Hudson logs (ant test 
-Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch 
-Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3).

S.
  
 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981178#action_12981178
 ] 

Robert Muir commented on SOLR-2282:
---

Stanislaw: it is true that with that exact random seed, the test passes for me.
But if i just run 'ant test', often it fails.

Below is the output... i put my OS configuration first here... sorry for the 
noise.

{noformat}
[junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 
(32-bit)/cpus=4,threads=4,free=5267640,total=16384000

test:
[junit] Testsuite: 
org.apache.solr.handler.clustering.DistributedClusteringComponentTest
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 13.18 sec
[junit] - Standard Error -
[junit] 2011-1-13 3:35:19 
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine cluster
[junit] SEVERE: Carrot2 clustering failed
[junit] java.lang.IndexOutOfBoundsException
[junit] at java.io.StringReader.read(StringReader.java:76)
[junit] at 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.zzRefill(ExtendedWhitespaceTokenizerImpl.java:557)
[junit] at 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.getNextToken(ExtendedWhitespaceTokenizerImpl.java:754)
[junit] at 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.nextToken(ExtendedWhitespaceTokenizer.java:46)
[junit] at 
org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:147)
[junit] at 
org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline.preprocess(CompletePreprocessingPipeline.java:54)
[junit] at 
org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.preprocess(BasicPreprocessingPipeline.java:92)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.cluster(LingoClusteringAlgorithm.java:198)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.access$000(LingoClusteringAlgorithm.java:43)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm$1.process(LingoClusteringAlgorithm.java:177)
[junit] at 
org.carrot2.text.clustering.MultilingualClustering.clusterByLanguage(MultilingualClustering.java:223)

[junit] at 
org.carrot2.text.clustering.MultilingualClustering.process(MultilingualClustering.java:111)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:170)
[junit] at 
org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:102)
[junit] at org.carrot2.core.Controller.process(Controller.java:347)
[junit] at org.carrot2.core.Controller.process(Controller.java:239)
[junit] at 
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:106)
[junit] at 
org.apache.solr.handler.clustering.ClusteringComponent.finishStage(ClusteringComponent.java:167)
[junit] at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:336)
[junit] at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
[junit] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296)
[junit] at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
[junit] at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
[junit] at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
[junit] at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
[junit] at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
[junit] at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
[junit] at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
[junit] at org.mortbay.jetty.Server.handle(Server.java:326)
[junit] at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
[junit] at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
[junit] at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
[junit] at 
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
[junit] at 
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
[junit] at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
[junit] at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
[junit] 2011-1-13 3:35:19 
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine cluster
[junit] SEVERE: Carrot2 clustering failed
[junit] java.lang.IndexOutOfBoundsException
[junit] at java.io.StringReader.read(StringReader.java:76)

[jira] Commented: (SOLR-1395) Integrate Katta

2011-01-13 Thread JohnWu (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981179#action_12981179
 ] 

JohnWu commented on SOLR-1395:
--

TomLiu:

in katta's lib, there were so many jars, but some jars must be there. you know, 
Solr must include Luence's jar .

I add some libs to katta,

Do you mean the solr embeded in katta?

Now the request can form the master to slave, but how the subproxy send the 
query to query core?

I configure the subproxy (katta with solr 1395 patch):

node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

and solr home is in katta sh, in solr home:

solr.config is solr.SearchHandler

but I do not know the katta can dispatch the query to query core, the solr,jar 
of katta will search the query in it's data directory, how about the shard 
confiure in subproxy?

can you give me a detailed reply?

Thanks alot!


 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (SOLR-1395) Integrate Katta

2011-01-13 Thread JohnWu (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981179#action_12981179
 ] 

JohnWu edited comment on SOLR-1395 at 1/13/11 3:44 AM:
---

TomLiu:

in katta's lib, there were so many jars, but some jars must be there. you know, 
Solr must include Luence's jar .

I add some libs to katta,

Do you mean the solr embeded in katta?

Now the request can form the master to slave, but how the subproxy send the 
query to query core?

I configure the subproxy (katta with solr 1395 patch):

node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

and solr home is in katta sh, in solr home:

solr.config is solr.SearchHandler

but I do not know the katta can dispatch the query to query core, the solr.jar 
of katta will search the query in it's data directory, how about the shard 
confiure in subproxy?

can you give me a detailed reply?

Thanks alot!


  was (Author: johnwu):
TomLiu:

in katta's lib, there were so many jars, but some jars must be there. you know, 
Solr must include Luence's jar .

I add some libs to katta,

Do you mean the solr embeded in katta?

Now the request can form the master to slave, but how the subproxy send the 
query to query core?

I configure the subproxy (katta with solr 1395 patch):

node.server.class=org.apache.solr.katta.DeployableSolrKattaServer

and solr home is in katta sh, in solr home:

solr.config is solr.SearchHandler

but I do not know the katta can dispatch the query to query core, the solr,jar 
of katta will search the query in it's data directory, how about the shard 
confiure in subproxy?

can you give me a detailed reply?

Thanks alot!

  
 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2282) Distributed Support for Search Result Clustering

[
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stanislaw Osinski updated SOLR-2282:

Attachment: SOLR-2282-diagnostics.patch

Robert: I was using the random seed from the build result in the hope that it
will fail the test for me. I'm still unable to get the exception though, with
or without the seed. I suppose it shouldn't matter whether I run the complete
test suite or just this one test method? (I was doing the latter to save time)

If you have a spare moment, would you be able check the following two things on
your machine:

1. Apply the attached diagnostics patch and run the tests. If the test doesn't
fail after the change, this means there's some concurrency issue in Carrot2's
internal resource pooling mechanisms that we'll need to find. This patch is not
a solution to the problem though, just a diagnostic measure.

2. It's paranoid, but can you run the test with the
{{-Dargs=-XX:+TraceClassLoading}} option and check that there's no old (v3.4.0)
Carrot2 JAR hiding in the bushes? Version 3.4.0 had a subtle bug that could be
causing the exception. If there's no traces of Carrot2 3.4.0 JAR in the
classpath, we'll need to do further inspection of our code.

Distributed Support for Search Result Clustering

Key: SOLR-2282
URL: https://issues.apache.org/jira/browse/SOLR-2282
Project: Solr
Issue Type: New Feature
Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
Fix For: 3.1, 4.0

Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch,
SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch,
SOLR-2282_test.patch

Brad Giaccio contributed a patch for this in SOLR-769. I'd like to
incorporate it.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981192#action_12981192
]

Michael Busch commented on LUCENE-2324:
---

I made some progress with the concurrency model, especially removing the need
for various locks to make everything easier.

- DocumentsWriterPerThreadPool.ThreadState now extends ReentrantLock, which
means that standard methods like lock() and unlock() can be used to reserve a
DWPT for a task.
- The max. number of DWPTs allowed (config.maxThreadStates) is instantiated
up-front. Creating a DWPT is cheap, so this is not a performance concern; this
makes it easier to push config changes to the DWPTs without synchronizing on
the pool and without having to worry about newly created DWPTs getting the same
config settings.
- DocumentsWriterPerThreadPool.getActivePerThreadsIterator() gives the caller a
static snapshot of the active DWPTs at the time the iterator was acquired, e.g.
for flushAllThreads() or DW.abort(). Here synchronizing on the pool isn't
necessary either.
- deletes are now pushed to DW.pendingDeletes() if no active DWPTs are present.

TODOs:
- fix remaining testcases that still fail
- fix RAM tracking and flush-by-RAM
- write new testcases to test thread pool, thread assignment, etc
- review if all cases that were discussed in the recent comments here work as
expected (likely not :) )
- performance testing and code cleanup

Per thread DocumentsWriters that write their own private segments
-

Key: LUCENE-2324
URL: https://issues.apache.org/jira/browse/LUCENE-2324
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
Fix For: Realtime Branch

Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out

See LUCENE-2293 for motivation and more details.
I'm copying here Mike's summary he posted on 2293:
Change the approach for how we buffer in RAM to a more isolated
approach, whereby IW has N fully independent RAM segments
in-process and when a doc needs to be indexed it's added to one of
them. Each segment would also write its own doc stores and
normal segment merging (not the inefficient merge we now do on
flush) would merge them. This should be a good simplification in
the chain (eg maybe we can remove the *PerThread classes). The
segments can flush independently, letting us make much better
concurrent use of IO CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1395) Integrate Katta

2011-01-13 Thread tom liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981204#action_12981204
 ] 

tom liu commented on SOLR-1395:
---

In katta intergrated envs, solr is embeded.

Katta does as distributed compute manager, which manages:
# node startup/shutdown
# shard deploy/undeploy
# rpc invoke to application/Solr

and Solr does as application on distributed compute envs.

in Master Box, QueryHandler must be solr.KattaSearchHandler in solrconfig.xml
so that, kattaclient will be invoked by solrapp, and then invoked rpc to slave.

in Slave Box, Katta will startup embeded solr, which is the subproxy.

the shard, that is the query solrcore, will be deployed by katta's script: 
bin/katta addIndex indexName indexPath

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2864) add maxtf to fieldinvertstate

2011-01-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981219#action_12981219
 ] 

Michael McCandless commented on LUCENE-2864:


+1

 add maxtf to fieldinvertstate
 -

 Key: LUCENE-2864
 URL: https://issues.apache.org/jira/browse/LUCENE-2864
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Query/Scoring
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2864.patch


 the maximum within-document TF is a very useful scoring value, 
 we should expose it so that people can use it in scoring
 consider the following sim:
 {code}
 @Override
 public float idf(int docFreq, int numDocs) {
   return 1.0F; /* not used */
 }
 @Override
 public float computeNorm(String field, FieldInvertState state) {
   return state.getBoost() / (float) Math.sqrt(state.getMaxTF());
 }
 {code}
 which is surprisingly effective, but more interesting for practical reasons.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2751) add LuceneTestCase.newSearcher()

2011-01-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981220#action_12981220
 ] 

Michael McCandless commented on LUCENE-2751:


bq. There is a downside to this whole issue of course... i think its going to 
be harder to reproduce test fails since we will be using more multithreading.

Right.

But I think this (losing reproducibility sometimes) is the lesser evil?  Ie, 
making sure we tease out thread safety bugs trumps reproducibility...

 add LuceneTestCase.newSearcher()
 

 Key: LUCENE-2751
 URL: https://issues.apache.org/jira/browse/LUCENE-2751
 Project: Lucene - Java
  Issue Type: Test
  Components: Build
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2751.patch, LUCENE-2751.patch


 Most tests in the search package don't care about what kind of searcher they 
 use.
 we should randomly use MultiSearcher or ParallelMultiSearcher sometimes in 
 tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2011-01-13 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981222#action_12981222
]

Michael McCandless commented on LUCENE-2723:

bq. I merged us up to yesterday (1052991:1057836),

Awesome, thanks!!

bq. Mike can you assist in merging r1057897?

Will do.

Speed up Lucene's low level bulk postings read API
--

Key: LUCENE-2723
URL: https://issues.apache.org/jira/browse/LUCENE-2723
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 4.0

Attachments: LUCENE-2723-termscorer.patch,
LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch,
LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch,
LUCENE-2723.patch, LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch,
LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch,
LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch

Spinoff from LUCENE-1410.
The flex DocsEnum has a simple bulk-read API that reads the next chunk
of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR
(from LUCENE-1410). This is not unlike sucking coffee through those
tiny plastic coffee stirrers they hand out airplanes that,
surprisingly, also happen to function as a straw.
As a result we see no perf gain from using FOR/PFOR.
I had hacked up a fix for this, described at in my blog post at
http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
I'm opening this issue to get that work to a committable point.
So... I've worked out a new bulk-read API to address performance
bottleneck. It has some big changes over the current bulk-read API:
* You can now also bulk-read positions (but not payloads), but, I
have yet to cutover positional queries.
* The buffer contains doc deltas, not absolute values, for docIDs
and positions (freqs are absolute).
* Deleted docs are not filtered out.
* The doc freq buffers need not be aligned. For fixed intblock
codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
Group varint, etc.) they won't be.
It's still a work in progress...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1260) Norm codec strategy in Similarity


 [ 
https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-1260.
-

   Resolution: Fixed
Fix Version/s: 3.1

(Updating fix-version correctly, also).

I think its safe to mark this resolved... the issues are totally cleared up in 
4.0, 
and only some (documented) corner cases in 3.x where we still use the default 
sim.

 Norm codec strategy in Similarity
 -

 Key: LUCENE-1260
 URL: https://issues.apache.org/jira/browse/LUCENE-1260
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.3.1
Reporter: Karl Wettin
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, 
 Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, 
 LUCENE-1260_defaultsim.patch


 The static span and resolution of the 8 bit norms codec might not fit with 
 all applications. 
 My use case requires that 100f-250f is discretized in 60 bags instead of the 
 default.. 10?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981237#action_12981237
 ] 

Robert Muir commented on SOLR-2282:
---

bq. Robert: I was using the random seed from the build result in the hope that 
it will fail the test for me. I'm still unable to get the exception though, 
with or without the seed. I suppose it shouldn't matter whether I run the 
complete test suite or just this one test method? (I was doing the latter to 
save time)

well, its not completely consistent even with the seed to me (smells like a 
concurrency issue).

Silly question, but did you remove the @Ignore on 
DistributedClusteringComponentTest?
Otherwise, the reproducibility problem could be that it doesn't consistently 
fail every time, even with the same seed.

I ran my previous fail three times, with the patch: 
{noformat}
ant test -Dtestcase=DistributedClusteringComponentTest 
-Dtestmethod=testDistribSearch 
-Dtests.seed=8909233178291932652:-4859244606911873252
{noformat}

This failed two out of three times.

I also then ran it with traceclassloading, logging to a file:
{noformat}
ant test -Dtestcase=DistributedClusteringComponentTest 
-Dtestmethod=testDistribSearch 
-Dtests.seed=8909233178291932652:-4859244606911873252 
-Dargs=-XX:+TraceClassLoading  test.out
{noformat}

all the carrot classes are being loaded from 
solr/contrib/clustering/lib/carrot2-core-3.4.2.jar


 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering

[
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981241#action_12981241
]

Stanislaw Osinski commented on SOLR-2282:
-

{quote}
well, its not completely consistent even with the seed to me (smells like a
concurrency issue).
{quote}

This is what I've been suspecting from the beginning, I hope Dawid gets better
luck at reproducing the problem on his 4-core HT machine.

{quote}
Silly question, but did you remove the @Ignore on
DistributedClusteringComponentTest?
Otherwise, the reproducibility problem could be that it doesn't consistently
fail every time, even with the same seed.
{quote}

Yeah, I did remove the @Ignore, I'm getting Testsuite:
org.apache.solr.handler.clustering.DistributedClusteringComponentTest, Tests
run: 1, Failures: 0, Errors: 0, Time elapsed: 59,658 sec in the test results
dir. When it comes to reproducibility, I wasn't able to reproduce some other
concurrency issue on my 2-core machine, while on Dawid's 4-core hardware the
tests would fail sometimes, so I hope we can eventually get the exception
locally.

{quote}
I ran my previous fail three times, with the patch. This failed two out of
three times.
{quote}

Thanks for verifying this! It looks like the bug may be at some other place in
C2 code than I initially thought. Let us review the code once again, as soon as
we come up with the fix, I'll attach a patch.

Distributed Support for Search Result Clustering

Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch,
SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch,
SOLR-2282_test.patch

Brad Giaccio contributed a patch for this in SOLR-769. I'd like to
incorporate it.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans

Pass a context struct to Weight#scorer instead of naked booleans


 Key: LUCENE-2865
 URL: https://issues.apache.org/jira/browse/LUCENE-2865
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if 
another boolean like needsScoring or similar flags / information need to be 
passed to Scorers. An immutable struct would make such an extension trivial / 
way easier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans


 [ 
https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2865:


Attachment: LUCENE-2865.patch

here is a patch that adds a ScorerContext to replace those two booleans. 
ScorerContext follows a copy on write pattern similar to a builder pattern 
that only modifies the context if the values actually change. Seems pretty 
straight forward so far.

 Pass a context struct to Weight#scorer instead of naked booleans
 

 Key: LUCENE-2865
 URL: https://issues.apache.org/jira/browse/LUCENE-2865
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2865.patch


 Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if 
 another boolean like needsScoring or similar flags / information need to be 
 passed to Scorers. An immutable struct would make such an extension trivial / 
 way easier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2866) Unexpected search results

2011-01-13 Thread Alejandro (JIRA)

Unexpected search results
-

 Key: LUCENE-2866
 URL: https://issues.apache.org/jira/browse/LUCENE-2866
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
 Environment: *Operating System:* Windows Server 2003 and Windows 
Server 2008 R2
*System type:* 32 bits (Win Server 2003) and 64 bits (Win Server 2008)
*Platform:* Alfresco Community 3.3.g
*Processor:* Intel Celeron 1.80GHz 1.80GHz
*RAM Memory*: 2GB
Reporter: Alejandro


Hello...

I'm using Lucene search with Alfresco 3.3.g (I'm not sure what version of 
Lucene is used), and I'm havin problems when the search get me results... 
sometimes one search can bring me just 1 result, but when I instantly do the 
same search at a second time it can bring me a lot of results...
Sometimes the search takes too much time to bring results...
and sometimes the search stops at 1000 results.

I'm using simple and boolean searches and both types have the same mistakes.

Thanks for reading and for your support.

Alejandro Villa Betancur


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans

2011-01-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981259#action_12981259
 ] 

Uwe Schindler commented on LUCENE-2865:
---

Looks good!

I would make the ctor private and then use ScorerContext.default().x().y() as 
pattern (default returns the template). I like this design more :-)

 Pass a context struct to Weight#scorer instead of naked booleans
 

 Key: LUCENE-2865
 URL: https://issues.apache.org/jira/browse/LUCENE-2865
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2865.patch


 Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if 
 another boolean like needsScoring or similar flags / information need to be 
 passed to Scorers. An immutable struct would make such an extension trivial / 
 way easier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1395) Integrate Katta

2011-01-13 Thread JohnWu (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981260#action_12981260
 ] 

JohnWu commented on SOLR-1395:
--

Tomliu:
  Maybe the last step for me, but it's so long!
katta use the lucene version is 3.0, but the solr-1395 use lucene is 4.0 
snapshot, I package the solr-1395 to jar and put it in the katta class path, 
but the lucene version is different, 

so if I use the katta search SPIndex02 content:lovealice 1

slave return the lucene exception.

 how you make the lucene is same? add the keywordAnalyzer.class in 
lucene-40.-snapshot.jar?

Thanks!

JohnWu

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, 
 katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, 
 solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, 
 solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, 
 solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, 
 SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, 
 zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans


 [ 
https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2865:


Attachment: LUCENE-2865.patch

bq. I would make the ctor private and then use ScorerContext.default().x().y() 
as pattern (default returns the template). I like this design more 
Jawohl! :) - Since default is a keyword in java I used ScorerContext#def() 
instead. I fixed some JDoc issues, made all ScorerContext ctors private and 
added a changes.txt entry

seems like we are good to go

 Pass a context struct to Weight#scorer instead of naked booleans
 

 Key: LUCENE-2865
 URL: https://issues.apache.org/jira/browse/LUCENE-2865
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2865.patch, LUCENE-2865.patch


 Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if 
 another boolean like needsScoring or similar flags / information need to be 
 passed to Scorers. An immutable struct would make such an extension trivial / 
 way easier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2311) FileListEntityProcessor Fields Stored in SolrDocument do not Match Documentation

2011-01-13 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981271#action_12981271
 ] 

Koji Sekiguchi commented on SOLR-2311:
--

Thank you for reporting this, Matt!

For back-compat reasons, your patch:

{code}
@@ -254,7 +254,7 @@
 if (newerThan != null  lastModified.before(newerThan))
   return;
 details.put(DIR, dir.getAbsolutePath());
-details.put(FILE, name);
+details.put(FILE_NAME, name);
 details.put(ABSOLUTE_FILE, aFile.getAbsolutePath());
 details.put(SIZE, sz);
 details.put(LAST_MODIFIED, lastModified);
{code}

should be:

{code}
@@ -254,7 +254,7 @@
 if (newerThan != null  lastModified.before(newerThan))
   return;
 details.put(DIR, dir.getAbsolutePath());
 details.put(FILE, name);
+details.put(FILE_NAME, name);
 details.put(ABSOLUTE_FILE, aFile.getAbsolutePath());
 details.put(SIZE, sz);
 details.put(LAST_MODIFIED, lastModified);
{code}

But IMO updating documentation is enough in this case.

 FileListEntityProcessor Fields Stored in SolrDocument do not Match 
 Documentation
 

 Key: SOLR-2311
 URL: https://issues.apache.org/jira/browse/SOLR-2311
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1
 Environment: Java 1.6
Reporter: Matt Parker
Priority: Minor
 Attachments: SOLR-2311.patch


 The implicit fields generated by the FileListEntityProcessor do not match the 
 documentation, which are listed in the following excerpt:
 {quote} The implicit fields generated by the FileListEntityProcessor are 
 fileAbsolutePath, fileSize, fileLastModified, fileName and these are 
 available 
 for use within the entity X as shown above. {quote}
 The fileName field is not populated. The file's name is stored in the 
 implicit field named file.
 The hashmap that holds the metadata is (FileListEntityProcessor.java at line 
 255)
 stored the following using the associated constants:
 {quote}
 details.put(DIR, dir.getAbsolutePath());
 details.put(FILE, name);
 details.put(ABSOLUTE_FILE, aFile.getAbsolutePath());
 details.put(SIZE, sz);
 details.put(LAST_MODIFIED, lastModified);
 {quote}
 where DIR = fileDir, FILE = file, ABSOLUTE_FILE = fileAbsolutePath, 
 SIZE = fileSize, and LAST_MODIFIED = fileLastModified.
 Either the documentation must be updated, or the constant storing the return 
 value must be updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans

2011-01-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981273#action_12981273
 ] 

Uwe Schindler commented on LUCENE-2865:
---

+1 to commit, looks good.

For later we should fix BooleanQuery.explain() to use default context, too. 
topScorer=true is wrong for explain (but has no effect here).

 Pass a context struct to Weight#scorer instead of naked booleans
 

 Key: LUCENE-2865
 URL: https://issues.apache.org/jira/browse/LUCENE-2865
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2865.patch, LUCENE-2865.patch


 Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if 
 another boolean like needsScoring or similar flags / information need to be 
 passed to Scorers. An immutable struct would make such an extension trivial / 
 way easier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans


 [ 
https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2865.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Committed revision 1058592.

thanks uwe for the review

 Pass a context struct to Weight#scorer instead of naked booleans
 

 Key: LUCENE-2865
 URL: https://issues.apache.org/jira/browse/LUCENE-2865
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-2865.patch, LUCENE-2865.patch


 Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if 
 another boolean like needsScoring or similar flags / information need to be 
 passed to Scorers. An immutable struct would make such an extension trivial / 
 way easier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-13 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981305#action_12981305
]

Jason Rutherglen commented on LUCENE-2324:
--

{quote}DocumentsWriterPerThreadPool.ThreadState now extends ReentrantLock,
which means that standard methods like lock() and unlock() can be used to
reserve a DWPT for a task.{quote}

Really? That makes synchronized seem simpler?

bq. the max. number of DWPTs allowed (config.maxThreadStates) is instantiated
up-front.

What about the memory used, eg, the non-use of byte[] recycling? I guess it'll
be cleared on flush.

bq. fix RAM tracking and flush-by-RAM

I created a BytesUsed object that cascades the changes to parent BytesUsed
objects, this allows each individual SD, DWPT, DW, etc to keep track of their
bytes used, while also propagating the changes to the higher level objects, eg,
SD - DWPT, DWPT - DW.

Per thread DocumentsWriters that write their own private segments
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Initial committers list for Incubator Proposal

2011-01-13 Thread Troy Howard

Yes. I sent an announcement to lucene-net-dev and lucene-general yesterday.

We are now waiting on the Incubator community/PMC to provide feedback
and vote on our proposal. You can track that on the Incubator general
mailing list.

Thanks,
Troy


On Thu, Jan 13, 2011 at 4:23 AM, Simone Chiaretta
simone.chiare...@gmail.com wrote:
 Was wondering how the proposal is going: has it been published or sent to
 the ASF?
 Simone

 On Fri, Dec 31, 2010 at 1:01 AM, Troy Howard thowar...@gmail.com wrote:

 All,

 I'm working on the Incubator Proposal now, and need to establish a
 list of initial committers.

 So far, the following people have come forward and offered to be
 committers (in alphabetical order):

 Alex Thompson
 Ben Martz
 Chris Currens
 Heath Aldrich
 Michael Herndon
 Prescott Nasser
 Scott Lombard
 Simone Chiaretta
 Troy Howard

 I would like to place an open request for any interested parties to
 respond to this message with their request to be a Committer. For
 people who are either on that list or for people who would like to be
 added, please send a message explaining (briefly) why you think you
 will be qualified to be involved in the project and specifically what
 ways you hope to be able to contribute.

 One thing I would like to point out is that in the Apache world there
 is a distinction between Committers and Contributors (aka developers).
 See this link for details:

 http://incubator.apache.org/guides/participation.html#committer


 Please consider whether or not you wish to be a Committer or a Contributor.

 Some quick rules of thumb:

 Committers:

 - Committers must be willing to submit a Contributor License Agreement
 (CLA). See: http://www.apache.org/licenses/#clas

 - Committers must have enough *consistent* free time to fulfill the
 expectations of the ASF in terms of reporting,  process, documentation
 and remain responsive to the community in terms of communication and
 listening to, considering, and discussing community opinion. These
 kinds of tasks can consume a lot of time and are some of the first
 things people stop down when they start running out of time.

 - A Committer may not even write code, but may simply accept, review
 and commit code written by others. This is the primary responsibility
 of a Committer -- to commit code, whether they wrote it themselves or
 not

 - Committers may have to perform the unpleasant task of reject
 contribution from Contributors and explain why in a fair and objective
 manner. This can be frustrating and time consuming. You may need to
 play the part of a mentor or engage in debates. You may even be proved
 wrong and have to swallow your pride.

 - Committers have direct access to the source control and other
 resources and so must be personally accountable for the quality of the
 same and will need to operate under the process and restrictions ASF
 expects


 Contributors:

 - Contributors might have a lot of free time this month, but get
 really busy next month and have no time at all. They can develop code
 in short bursts but then drop off the face of the planet indefinitely
 after that.

 - Contributors could focus on code only or work from a task list
 without any need to interact with and be accountable to the community
 (as this is the responsibility of the Committers)

 - Contributors can do one-time or infrequently needed tasks like
 updating the website, documentation, wikis, etc..

 - Contributors will need to have anything they create reviewed by a
 Committer and ultimately included by a Committer. Some people find
 this frustrating, if the Committers are slow to respond or critical of
 their work.


 So in your responses, please be clear about whether you would like to
 offer your help as a Committer or as a Contributor.

 Thanks,
 Troy




 --
 Simone Chiaretta
 Microsoft MVP ASP.NET - ASPInsider
 Blog: http://codeclimber.net.nz
 RSS: http://feeds2.feedburner.com/codeclimber
 twitter: @simonech

 Any sufficiently advanced technology is indistinguishable from magic
 Life is short, play hard

[jira] Created: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier

2011-01-13 Thread Adriano Crestani (JIRA)

Change contrib QP API that uses CharSequence as string identifier
-

 Key: LUCENE-2867
 URL: https://issues.apache.org/jira/browse/LUCENE-2867
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 3.0.3
Reporter: Adriano Crestani
Priority: Minor
 Fix For: 3.0.4


There are some API methods on contrib queryparser that expects CharSequence as 
identifier. This is wrong, since it may lead to incorrect or mislead behavior, 
as shown on LUCENE-2855. To avoid this problem, these APIs will be changed and 
enforce the use of String instead of CharSequence on version 4. This patch 
already deprecate the old API methods and add new substitute methods that uses 
only String.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2011-01-13 Thread Ahmet Arslan (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981322#action_12981322
]

Ahmet Arslan commented on SOLR-1604:

Use the most recent file which is non-gray color. Also there is a date
attached info for files.
It works for (a b) c~10. This is equivalent to a c~10 OR b c~10.

SurroundQueryParser does not use Analyzer. It recommended to heavily use
wildcard operator instead.
e.g. instead of searching foo bar, you search foo* bar*

But if you are using Standard Analyzer which does not have stemming in it, I
think you can use Surround.
You can pre-lowercase your queries etc. You can even pre-analyze your queries
since your analyzer does not inject new tokens.
But your queries must be well formed, there is not default operator in this.

But I think it is better to discuss these things in solr/lucene-user mailing
list.

Wildcards, ORs etc inside Phrase Queries

Key: SOLR-1604
URL: https://issues.apache.org/jira/browse/SOLR-1604
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
Fix For: Next

Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip,
ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch

Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports
wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981380#action_12981380
]

Michael Busch commented on LUCENE-2324:
---

bq. Really? That makes synchronized seem simpler?

Well look at ThreadAffinityDocumentsWriterThreadPool. There I'm able to use
things like tryLock() and getQueueLength().
Also DocumentsWriterPerThreadPool has a getAndLock() method, that can be used
by DW for addDocument(), whereas DW.flush(), which needs to iterate the DWPTs,
can lock the individual DWPTs directly. I think it's simpler, but I'm open to
other suggestions of course :)

bq. What about the memory used, eg, the non-use of byte[] recycling? I guess
it'll be cleared on flush.

Yeah, sure. That is independent on whether they're all created upfront or not.
But yeah, after flush or abort we need to clear the DWPT's state to make sure
they're not consuming unused RAM (as you described in your earlier comment).

Per thread DocumentsWriters that write their own private segments
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-13 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981388#action_12981388
]

Earwin Burrfoot commented on LUCENE-2324:
-

Maan, this comment list is infinite.
How do I currently get the ..er.. current version? Latest branch + latest
Jason's patch?

Regardless of everything else, I'd ask you not to extend random things :) at
least if you can't say is-a about them.
DocumentsWriterPerThreadPool.ThreadState IS A ReentrantLock? No. So you're
better off encapsulating it rather than extending.
Same can be applied to SegmentInfos that extends Vector :/

Per thread DocumentsWriters that write their own private segments
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981390#action_12981390
]

Michael Busch commented on LUCENE-2324:
---

bq. How do I currently get the ..er.. current version?

Just do 'svn up' on the RT branch.

bq. Regardless of everything else, I'd ask you not to extend random things

This was a conscious decision, not random. Extending ReentrantLock is not an
uncommon pattern, e.g. ConcurrentHashMap.Segment does exactly that.
ThreadState basically is nothing but a lock that has a reference to the
corresponding DWPT it protects.

I encourage you to look at the code.

Per thread DocumentsWriters that write their own private segments
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

unsubscribe

2011-01-13 Thread Greg Solovyev

[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering

2011-01-13 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981453#action_12981453
 ] 

Dawid Weiss commented on SOLR-2282:
---

I confirm this must be something related to concurrency, although from whitebox 
code review I have no clue how this can happen. Seems like a long, fascinating 
weekend is waiting for me (I am busy tomorrow and won't be able to look into 
it). What is weird is that we're running this code on our demo server, we do 
have parallel stress tests and still this happens only here. Life.

{noformat}
test:
[junit] Testsuite: 
org.apache.solr.handler.clustering.DistributedClusteringComponentTest
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 12.311 sec
[junit] - Standard Error -
[junit] 2011-1-13 20:05:39 org.apache.solr.common.SolrException log
[junit] SEVERE: java.lang.Error: Error: could not match input
[junit] at 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.zzScanError(ExtendedWhitespaceTokenizerImpl.java:687)
[junit] at 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.getNextToken(ExtendedWhitespaceTokenizerImpl.java:836)
[junit] at 
org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.nextToken(ExtendedWhitespaceTokenizer.java:46)
[junit] at 
org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:147)
[junit] at 
org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline.preprocess(CompletePreprocessingPipeline.java:54)
[junit] at 
org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.preprocess(BasicPreprocessingPipeline.java:92)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.cluster(LingoClusteringAlgorithm.java:198)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.access$000(LingoClusteringAlgorithm.java:43)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm$1.process(LingoClusteringAlgorithm.java:177)
[junit] at 
org.carrot2.text.clustering.MultilingualClustering.clusterByLanguage(MultilingualClustering.java:223)
[junit] at 
org.carrot2.text.clustering.MultilingualClustering.process(MultilingualClustering.java:111)
[junit] at 
org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:170)
[junit] at 
org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:102)
[junit] at org.carrot2.core.Controller.process(Controller.java:347)
[junit] at org.carrot2.core.Controller.process(Controller.java:239)
[junit] at 
org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:106)
[junit] at 
org.apache.solr.handler.clustering.ClusteringComponent.finishStage(ClusteringComponent.java:167)
[junit] at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:336)
[junit] at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
[junit] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296)
[junit] at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
[junit] at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
[junit] at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
[junit] at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
[junit] at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
[junit] at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
[junit] at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
[junit] at org.mortbay.jetty.Server.handle(Server.java:326)
[junit] at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
[junit] at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
[junit] at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
[junit] at 
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
[junit] at 
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
[junit] at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
[junit] at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
[junit]
[junit] NOTE: reproduce with: ant test 
-Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch 
-Dtests.seed=8909233178291932652:-485924
4606911873252
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Thread-28 ***
[junit] junit.framework.AssertionFailedError:

[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981501#action_12981501
 ] 

Robert Muir commented on SOLR-2282:
---

Guys, thanks for the debugging help already.

Just as a side note: for these tricky non-reproducible ones, sometimes its 
helpful to use something like -Dtests.iter=10 
its just a convenient way to run the test method multiple times.


 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 3732 - Failure

2011-01-13 Thread Apache Hudson Server

Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3732/

4 tests failed.
REGRESSION:  org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059)
at 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2135)


REGRESSION:  org.apache.lucene.index.TestIndexWriter.testTermUTF16SortOrder

Error Message:
this writer hit an OutOfMemoryError; cannot commit

Stack Trace:
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot 
commit
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2334)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2416)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2398)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2382)
at 
org.apache.lucene.index.RandomIndexWriter.commit(RandomIndexWriter.java:114)
at 
org.apache.lucene.index.TestIndexWriter.testTermUTF16SortOrder(TestIndexWriter.java:2416)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059)


REGRESSION:  org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting

Error Message:
GC overhead limit exceeded

Stack Trace:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
org.apache.lucene.index.ParallelPostingsArray.init(ParallelPostingsArray.java:33)
at 
org.apache.lucene.index.TermVectorsTermsWriterPerField$TermVectorsPostingsArray.init(TermVectorsTermsWriterPerField.java:274)
at 
org.apache.lucene.index.TermVectorsTermsWriterPerField$TermVectorsPostingsArray.newInstance(TermVectorsTermsWriterPerField.java:285)
at 
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at 
org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:306)
at 
org.apache.lucene.util.BytesRefHash.addByPoolOffset(BytesRefHash.java:375)
at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:141)
at 
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:238)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:168)
at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:248)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:743)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1266)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1240)
at 
org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2604)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059)


REGRESSION:  org.apache.lucene.index.TestIndexWriter.testRandomStoredFields

Error Message:
this writer hit an OutOfMemoryError; cannot flush

Stack Trace:
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot 
flush
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2484)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2473)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1273)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1240)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:95)
at 
org.apache.lucene.index.TestIndexWriter.testRandomStoredFields(TestIndexWriter.java:2830)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059)




Build Log (for compile errors):
[...truncated 3145 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-13 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981519#action_12981519
]

Jason Rutherglen commented on LUCENE-2324:
--

bq. look at ThreadAffinityDocumentsWriterThreadPool. There I'm able to use
things like tryLock() and getQueueLength().

Makes sense, I had only read the DocumentsWriterPerThreadPool part.

* DWPT.perDocAllocator and freeLevel can be removed?

* DWPT's RecyclingByteBlockAllocator - DirectAllocator?

* Looks like the deletes handling is updated to the patch

* I don't think we need FlushControl anymore as the RAM tracking should occur
in DW and there's no need for IW to [globally] wait for flushes.

* The locking is more clear now, I can see DW.updateDocument locks the
threadstate as does flushAllThreads.

I'll reincorporate the RAM tracking, and then will try the unit tests again.
I'm curious if the file not found errors are gone.

Per thread DocumentsWriters that write their own private segments
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier

2011-01-13 Thread Adriano Crestani (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adriano Crestani updated LUCENE-2867:
-

Attachment: lucene_2867_adriano_crestani_2011_01_13.patch

Here is the patch that deprecates methods using CharSequence. Can someone
please review if I did the API deprecation correctly?

I was thinking initially that deprecated methods would be removed on version 4,
I'm not sure anymore. Will it be removed on 4 or 3.1?

Change contrib QP API that uses CharSequence as string identifier
-

Key: LUCENE-2867
URL: https://issues.apache.org/jira/browse/LUCENE-2867
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/*
Affects Versions: 3.0.3
Reporter: Adriano Crestani
Priority: Minor
Fix For: 3.0.4

Attachments: lucene_2867_adriano_crestani_2011_01_13.patch

There are some API methods on contrib queryparser that expects CharSequence
as identifier. This is wrong, since it may lead to incorrect or mislead
behavior, as shown on LUCENE-2855. To avoid this problem, these APIs will be
changed and enforce the use of String instead of CharSequence on version 4.
This patch already deprecate the old API methods and add new substitute
methods that uses only String.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2011-01-13 Thread Doron Cohen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doron Cohen updated LUCENE-1540:

Attachment: LUCENE-1540.patch

Initial patch - against 3.x - not ready to commit - refactors parsing of trec
text from TrecContentSource into interface TrecDocParser, currently with single
impl - TrecGov2Parser.

The interaction between TCS and TDP is less clean than I hoped, for two reasons:
# trying to keep the synchronization pattern added while ago to that class, in
which the reading of data from the file is synced but the parsing can go in
parallel. For this reason there are two methods in that interface.
# allowing the TDP impls to use whatever is in TCS caused required to expose
some of its methods, and also to pass TCS as param to TDP.

With this patch:
# TDP was cleaned to use ContentSource's method getInputStream() - this also
supporting .gz, .bz2, and plain text (before the patch it supports only .gz).
# should be easy to add parsers for other formats.

I removed the retry logic for opening the stream - I don't remember why it was
added in the first place and it seems strange - if opening failed in first
trial why would the next trial succeed?

Remaining to do:
- add parsers for the other formats
- add tests for the other formats and also for bz2, plain text.
- allow a single run to ingest file of different formats (needed for the disks
4+5 track).
- fix some documemtation.
- allow to specify the TDP to use in a property.
- changes.txt.
- port to trunk, so as to first commit in trunk and then backport to 3.x.

Improvements to contrib.benchmark for TREC collections
--

Key: LUCENE-1540
URL: https://issues.apache.org/jira/browse/LUCENE-1540
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Tim Armstrong
Assignee: Doron Cohen
Priority: Minor
Attachments: LUCENE-1540.patch

The benchmarking utilities for TREC test collections (http://trec.nist.gov)
are quite limited and do not support some of the variations in format of
older TREC collections.
I have been doing some benchmarking work with Lucene and have had to modify
the package to support:
* Older TREC document formats, which the current parser fails on due to
missing document headers.
* Variations in query format - newlines after title tag causing the query
parser to get confused.
* Ability to detect and read in uncompressed text collections
* Storage of document numbers by default without storing full text.
I can submit a patch if there is interest, although I will probably want to
write unit tests for the new functionality first.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

[
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981548#action_12981548
]

Michael Busch commented on LUCENE-2324:
---

bq. DWPT.perDocAllocator and freeLevel can be removed?

done.

bq. DWPT's RecyclingByteBlockAllocator - DirectAllocator?

done. Also removed more recycling code.

bq. I don't think we need FlushControl anymore as the RAM tracking should
occur in DW and there's no need for IW to [globally] wait for flushes.

I removed flushControl from DW.

bq. I'm curious if the file not found errors are gone.

I think there's something wrong with TermVectors - several related test cases
fail. We need to investigate more.

Per thread DocumentsWriters that write their own private segments
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2311) FileListEntityProcessor Fields Stored in SolrDocument do not Match Documentation

2011-01-13 Thread Matt Parker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981558#action_12981558
 ] 

Matt Parker commented on SOLR-2311:
---

I'm not sure I agree. 

I thought the change would improve the code's clarity. File is actually a 
misnomer to what is captured in the field. Filename would be more appropriate. 

Also, I thought the test case I wrote would have been of value and worth 
including.

Regardless of whether it's accepted, the documentation needs to be changed to 
reflect whatever you decide to implement.

 FileListEntityProcessor Fields Stored in SolrDocument do not Match 
 Documentation
 

 Key: SOLR-2311
 URL: https://issues.apache.org/jira/browse/SOLR-2311
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1
 Environment: Java 1.6
Reporter: Matt Parker
Priority: Minor
 Attachments: SOLR-2311.patch


 The implicit fields generated by the FileListEntityProcessor do not match the 
 documentation, which are listed in the following excerpt:
 {quote} The implicit fields generated by the FileListEntityProcessor are 
 fileAbsolutePath, fileSize, fileLastModified, fileName and these are 
 available 
 for use within the entity X as shown above. {quote}
 The fileName field is not populated. The file's name is stored in the 
 implicit field named file.
 The hashmap that holds the metadata is (FileListEntityProcessor.java at line 
 255)
 stored the following using the associated constants:
 {quote}
 details.put(DIR, dir.getAbsolutePath());
 details.put(FILE, name);
 details.put(ABSOLUTE_FILE, aFile.getAbsolutePath());
 details.put(SIZE, sz);
 details.put(LAST_MODIFIED, lastModified);
 {quote}
 where DIR = fileDir, FILE = file, ABSOLUTE_FILE = fileAbsolutePath, 
 SIZE = fileSize, and LAST_MODIFIED = fileLastModified.
 Either the documentation must be updated, or the constant storing the return 
 value must be updated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-13 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2324:
-

Attachment: test.out

Here's the latest test.out.  There's a lot of these:

{code}
[junit] junit.framework.AssertionFailedError: IndexFileDeleter doesn't know 
about file _5.fdt
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1156)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1088)
[junit] at 
org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3273)
[junit] at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3321)
[junit] at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2339)
[junit] at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2410)
[junit] at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1083)
[junit] at 
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1027)
[junit] at 
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:991)
[junit] at 
org.apache.lucene.index.TestAddIndexes.testMergeAfterCopy(TestAddIndexes.java:432)
{code}

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, 
 test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier

[
https://issues.apache.org/jira/browse/LUCENE-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981640#action_12981640
]

Simon Willnauer commented on LUCENE-2867:
-

bq. Here is the patch that deprecates methods using CharSequence. Can someone
please review if I did the API deprecation correctly?
those comments and annotations look good!

bq. I was thinking initially that deprecated methods would be removed on
version 4, I'm not sure anymore. Will it be removed on 4 or 3.1?
you should drop those methods from 4.0 code if you want to deprecate. Since
this is a contrib you don't necessarily need to deprecate so you could alo drop
them from 3.1 or even from 3.0.x

Change contrib QP API that uses CharSequence as string identifier
-

Attachments: lucene_2867_adriano_crestani_2011_01_13.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-13 Thread Nick Pellow (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981649#action_12981649
 ] 

Nick Pellow commented on LUCENE-2666:
-

Hi, 

I am getting this issue as well?
We are doing quite a lot of update updates during indexing. Could this be 
causing the problem ?

This seems to only have happened when we deployed to our linux test server - it 
didn't appear to occur on MAC OS X during development - with the same data set.

Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? 



 ArrayIndexOutOfBoundsException when iterating over TermDocs
 ---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon

 A user got this very strange exception, and I managed to get the index that 
 it happens on. Basically, iterating over the TermDocs causes an AAOIB 
 exception. I easily reproduced it using the FieldCache which does exactly 
 that (the field in question is indexed as numeric). Here is the exception:
 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
   at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
   at 
 org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
   at TestMe.main(TestMe.java:56)
 It happens on the following segment: _26t docCount: 914 delCount: 1 
 delFileName: _26t_1.del
 And as you can see, it smells like a corner case (it fails for document 
 number 912, the AIOOB happens from the deleted docs). The code to recreate it 
 is simple:
 FSDirectory dir = FSDirectory.open(new File(index));
 IndexReader reader = IndexReader.open(dir, true);
 IndexReader[] subReaders = reader.getSequentialSubReaders();
 for (IndexReader subReader : subReaders) {
 Field field = 
 subReader.getClass().getSuperclass().getDeclaredField(si);
 field.setAccessible(true);
 SegmentInfo si = (SegmentInfo) field.get(subReader);
 System.out.println(--  + si);
 if (si.getDocStoreSegment().contains(_26t)) {
 // this is the probleatic one...
 System.out.println(problematic one...);
 FieldCache.DEFAULT.getLongs(subReader, __documentdate, 
 FieldCache.NUMERIC_UTILS_LONG_PARSER);
 }
 }
 Here is the result of a check index on that segment:
   8 of 10: name=_26t docCount=914
 compound=true
 hasProx=true
 numFiles=2
 size (MB)=1.641
 diagnostics = {optimize=false, mergeFactor=10, 
 os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
 lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
 os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
 has deletions [delFileName=_26t_1.del]
 test: open reader.OK [1 deleted docs]
 test: fields..OK [32 fields]
 test: field norms.OK [32 fields]
 test: terms, freq, prox...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: stored fields...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: term vectorsERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
   at

[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-13 Thread Nick Pellow (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981650#action_12981650
 ] 

Nick Pellow commented on LUCENE-2666:
-

I've also noticed this occurring since I started using a numeric field and 
accessing the its field cache for boosting.

 ArrayIndexOutOfBoundsException when iterating over TermDocs
 ---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon

 A user got this very strange exception, and I managed to get the index that 
 it happens on. Basically, iterating over the TermDocs causes an AAOIB 
 exception. I easily reproduced it using the FieldCache which does exactly 
 that (the field in question is indexed as numeric). Here is the exception:
 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
   at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
   at 
 org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
   at TestMe.main(TestMe.java:56)
 It happens on the following segment: _26t docCount: 914 delCount: 1 
 delFileName: _26t_1.del
 And as you can see, it smells like a corner case (it fails for document 
 number 912, the AIOOB happens from the deleted docs). The code to recreate it 
 is simple:
 FSDirectory dir = FSDirectory.open(new File(index));
 IndexReader reader = IndexReader.open(dir, true);
 IndexReader[] subReaders = reader.getSequentialSubReaders();
 for (IndexReader subReader : subReaders) {
 Field field = 
 subReader.getClass().getSuperclass().getDeclaredField(si);
 field.setAccessible(true);
 SegmentInfo si = (SegmentInfo) field.get(subReader);
 System.out.println(--  + si);
 if (si.getDocStoreSegment().contains(_26t)) {
 // this is the probleatic one...
 System.out.println(problematic one...);
 FieldCache.DEFAULT.getLongs(subReader, __documentdate, 
 FieldCache.NUMERIC_UTILS_LONG_PARSER);
 }
 }
 Here is the result of a check index on that segment:
   8 of 10: name=_26t docCount=914
 compound=true
 hasProx=true
 numFiles=2
 size (MB)=1.641
 diagnostics = {optimize=false, mergeFactor=10, 
 os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
 lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
 os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
 has deletions [delFileName=_26t_1.del]
 test: open reader.OK [1 deleted docs]
 test: fields..OK [32 fields]
 test: field norms.OK [32 fields]
 test: terms, freq, prox...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: stored fields...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: term vectorsERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 The creation of the index does not do something fancy (all defaults), though

[jira] Issue Comment Edited: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-13 Thread Nick Pellow (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981649#action_12981649
 ] 

Nick Pellow edited comment on LUCENE-2666 at 1/14/11 2:17 AM:
--

Hi, 

I am getting this issue as well?
We are doing quite a lot of update updates during indexing. Could this be 
causing the problem ?

This seems to only have happened when we deployed to our linux test server - it 
didn't appear to occur on MAC OS X during development - with the same data set.

Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? 



The exact stack strace:
{code}
java.lang.ArrayIndexOutOfBoundsException: 5475
at org.apache.lucene.util.BitVector.get(BitVector.java:104)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
at 
org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:207)
at 
org.apache.lucene.search.PhrasePositions.skipTo(PhrasePositions.java:52)
at org.apache.lucene.search.PhraseScorer.advance(PhraseScorer.java:120)
at 
org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:249)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:218)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:199)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:177)
at 
org.apache.lucene.search.MultiSearcher$MultiSearcherCallableWithSort.call(MultiSearcher.java:410)
at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:230)
at org.apache.lucene.search.Searcher.search(Searcher.java:49)

{code}

  was (Author: npellow):
Hi, 

I am getting this issue as well?
We are doing quite a lot of update updates during indexing. Could this be 
causing the problem ?

This seems to only have happened when we deployed to our linux test server - it 
didn't appear to occur on MAC OS X during development - with the same data set.

Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? 


  
 ArrayIndexOutOfBoundsException when iterating over TermDocs
 ---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon

 A user got this very strange exception, and I managed to get the index that 
 it happens on. Basically, iterating over the TermDocs causes an AAOIB 
 exception. I easily reproduced it using the FieldCache which does exactly 
 that (the field in question is indexed as numeric). Here is the exception:
 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
   at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
   at 
 org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
   at TestMe.main(TestMe.java:56)
 It happens on the following segment: _26t docCount: 914 delCount: 1 
 delFileName: _26t_1.del
 And as you can see, it smells like a corner case (it fails for document 
 number 912, the AIOOB happens from the deleted docs). The code to recreate it 
 is simple:
 FSDirectory dir = FSDirectory.open(new File(index));
 IndexReader reader = IndexReader.open(dir, true);
 IndexReader[] subReaders = reader.getSequentialSubReaders();
 for (IndexReader subReader : subReaders) {
 Field field = 
 subReader.getClass().getSuperclass().getDeclaredField(si);
 field.setAccessible(true);
 SegmentInfo si = (SegmentInfo) field.get(subReader);
 System.out.println(--  + si);
 if (si.getDocStoreSegment().contains(_26t)) {
 // this is the probleatic one...
 System.out.println(problematic one...);
 FieldCache.DEFAULT.getLongs(subReader, __documentdate, 
 FieldCache.NUMERIC_UTILS_LONG_PARSER);
 }
 }
 Here is the result of a check index on that segment:
   8 of 10: name=_26t docCount=914
 compound=true
 hasProx=true
 numFiles=2
 size (MB)=1.641
 diagnostics = {optimize=false, mergeFactor=10, 
 os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
 lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
 os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
 has deletions

[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering

2011-01-13 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981651#action_12981651
 ] 

Dawid Weiss commented on SOLR-2282:
---

This tests.iter is exactly what I will need :) I'll most likely weave a runtime 
aspect into the code to verify when two threads enter the same critical 
section. Again, from whitebox review it seems impossible, but then actually 
detecting and fixing impossible things are what we love in our profession... 

 Distributed Support for Search Result Clustering
 

 Key: SOLR-2282
 URL: https://issues.apache.org/jira/browse/SOLR-2282
 Project: Solr
  Issue Type: New Feature
  Components: contrib - Clustering
Affects Versions: 1.4, 1.4.1
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, 
 SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, 
 SOLR-2282_test.patch


 Brad Giaccio contributed a patch for this in SOLR-769. I'd like to 
 incorporate it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2657:


Attachment: LUCENE-2657.patch

* Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} 
and the {{carrot2.org}} Maven repositories, so that they won't constantly be 
checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}; 
Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top 
level, run from the top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}


 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662
 ] 

Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:49 AM:
--

* Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} 
and the {{carrot2.org}} Maven repositories, so that they won't constantly be 
checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}
* Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile
* No longer setting the repositories' {{uniqueVersion}} to {{false}} when 
deploying under the {{dist}} profile; as a result, snapshot artifacts' names 
will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top 
level, run from the top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}


  was (Author: steve_rowe):
* Set snapshot {{updatePolicy}} to {{never}} for both the 
{{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they 
won't constantly be checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}; 
Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top 
level, run from the top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}

  
 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662
 ] 

Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:55 AM:
--

* Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} 
and the {{carrot2.org}} Maven repositories, so that they won't constantly be 
checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}
* Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile
* No longer setting the repositories' {{uniqueVersion}} to {{false}} when 
deploying under the {{dist}} profile; as a result, snapshot artifacts' names 
will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}.
* {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes 
{{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build 
directories already being removing.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the 
top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}


  was (Author: steve_rowe):
* Set snapshot {{updatePolicy}} to {{never}} for both the 
{{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they 
won't constantly be checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}
* Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile
* No longer setting the repositories' {{uniqueVersion}} to {{false}} when 
deploying under the {{dist}} profile; as a result, snapshot artifacts' names 
will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top 
level, run from the top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}

  
 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly


[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662
 ] 

Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:56 AM:
--

* Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} 
and the {{carrot2.org}} Maven repositories, so that they won't constantly be 
checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}
* Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile
* No longer setting the repositories' {{uniqueVersion}} to {{false}} when 
deploying under the {{dist}} profile; as a result, snapshot artifacts' names 
will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}.
* {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes 
{{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build 
directories already being removed.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the 
top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}


  was (Author: steve_rowe):
* Set snapshot {{updatePolicy}} to {{never}} for both the 
{{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they 
won't constantly be checked for snapshot updates.
* Consolidated distribution-related profiles to just one named {{dist}}
* Solr-specific noggit and commons-csv jars are now properly placed in 
{{solr/dist/maven/}} when deploying with the {{dist}} profile
* No longer setting the repositories' {{uniqueVersion}} to {{false}} when 
deploying under the {{dist}} profile; as a result, snapshot artifacts' names 
will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}.
* {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes 
{{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build 
directories already being removing.

To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the 
top level:
{code}
mvn -Pdist -DskipTests deploy
{code}

To only populate only {{lucene/dist/maven/}}, run from the top level:
{code}
mvn -N -Pdist deploy
cd lucene
mvn -Pdist -DskipTests deploy
cd ../modules
mvn -Pdist -DskipTests deploy
{code}

  
 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, 
 LUCENE-2657.patch


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 The full Maven POMs in the attached patch include the information necessary 
 to run a multi-module Maven build, in addition to serving the same purpose as 
 the current POM templates.
 Several dependencies are not available through public maven repositories.  A 
 profile in the top-level POM can be activated to install these dependencies 
 from the various {{lib/}} directories into your local repository.  From the 
 top-level directory:
 {code}
 mvn -N -Pbootstrap install
 {code}
 Once these non-Maven dependencies have been installed, to run all Lucene/Solr 
 tests via Maven's surefire plugin, and populate your local repository with 
 all artifacts, from the top level directory, run:
 {code}
 mvn install
 {code}
 When one Lucene/Solr module depends on another, the dependency is declared on 
 the *artifact(s)* produced by the other module and deposited in your local 
 repository, rather than on the other module's un-jarred compiler output in 
 the {{build/}} directory, so you must run {{mvn install}} on the other module 
 before its changes are visible to the module that depends on it.
 To create all the artifacts without running tests:
 {code}
 mvn -DskipTests install
 {code}
 I almost always include the {{clean}} phase when I do a build, e.g.:
 {code}
 mvn -DskipTests clean install
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly