Re: Initial committers list for Incubator Proposal
Awesome work Troy. Looks like we're getting some positive feedback. Thanks for managing this process! Peter Mateja peter.mat...@gmail.com On Thu, Jan 13, 2011 at 10:18 AM, Troy Howard thowar...@gmail.com wrote: Yes. I sent an announcement to lucene-net-dev and lucene-general yesterday. We are now waiting on the Incubator community/PMC to provide feedback and vote on our proposal. You can track that on the Incubator general mailing list. Thanks, Troy On Thu, Jan 13, 2011 at 4:23 AM, Simone Chiaretta simone.chiare...@gmail.com wrote: Was wondering how the proposal is going: has it been published or sent to the ASF? Simone On Fri, Dec 31, 2010 at 1:01 AM, Troy Howard thowar...@gmail.com wrote: All, I'm working on the Incubator Proposal now, and need to establish a list of initial committers. So far, the following people have come forward and offered to be committers (in alphabetical order): Alex Thompson Ben Martz Chris Currens Heath Aldrich Michael Herndon Prescott Nasser Scott Lombard Simone Chiaretta Troy Howard I would like to place an open request for any interested parties to respond to this message with their request to be a Committer. For people who are either on that list or for people who would like to be added, please send a message explaining (briefly) why you think you will be qualified to be involved in the project and specifically what ways you hope to be able to contribute. One thing I would like to point out is that in the Apache world there is a distinction between Committers and Contributors (aka developers). See this link for details: http://incubator.apache.org/guides/participation.html#committer Please consider whether or not you wish to be a Committer or a Contributor. Some quick rules of thumb: Committers: - Committers must be willing to submit a Contributor License Agreement (CLA). See: http://www.apache.org/licenses/#clas - Committers must have enough *consistent* free time to fulfill the expectations of the ASF in terms of reporting, process, documentation and remain responsive to the community in terms of communication and listening to, considering, and discussing community opinion. These kinds of tasks can consume a lot of time and are some of the first things people stop down when they start running out of time. - A Committer may not even write code, but may simply accept, review and commit code written by others. This is the primary responsibility of a Committer -- to commit code, whether they wrote it themselves or not - Committers may have to perform the unpleasant task of reject contribution from Contributors and explain why in a fair and objective manner. This can be frustrating and time consuming. You may need to play the part of a mentor or engage in debates. You may even be proved wrong and have to swallow your pride. - Committers have direct access to the source control and other resources and so must be personally accountable for the quality of the same and will need to operate under the process and restrictions ASF expects Contributors: - Contributors might have a lot of free time this month, but get really busy next month and have no time at all. They can develop code in short bursts but then drop off the face of the planet indefinitely after that. - Contributors could focus on code only or work from a task list without any need to interact with and be accountable to the community (as this is the responsibility of the Committers) - Contributors can do one-time or infrequently needed tasks like updating the website, documentation, wikis, etc.. - Contributors will need to have anything they create reviewed by a Committer and ultimately included by a Committer. Some people find this frustrating, if the Committers are slow to respond or critical of their work. So in your responses, please be clear about whether you would like to offer your help as a Committer or as a Contributor. Thanks, Troy -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic Life is short, play hard
--module option not playing nicely with relative paths
Hi, Until recently, I wasn't using --module parameter. But now I do and the compilation was failing, because I am not building things in the top folder, but from inside build - to avoid clutter. I believe I discovered a bug and I am sending a patch. Basically, jcc.py is copying modules into the build dir. my project is organized as: build build java python packageA packageB I build things inside build, if I specify a relative path, --module '../python/packageA', jcc will correctly copy the tree structure resulting in extension packageA packageB However, the package names (for distutils setup) will be set to ['extension', 'extension..python.packageA', 'extension..python.packageB'] Which ends up in this error: [exec] running install [exec] running bdist_egg [exec] running egg_info [exec] writing solrpie_java.egg-info/PKG-INFO [exec] writing top-level names to solrpie_java.egg-info/top_level.txt [exec] writing dependency_links to solrpie_java.egg-info/dependency_links.txt [exec] warning: manifest_maker: standard file '__main__.py' not found [exec] error: package directory 'build/solrpie_java/python/solrpye' does not exist Cheers, roman
Re: --module option not playing nicely with relative paths
Hi Roman, On Jan 13, 2011, at 5:47, Roman Chyla roman.ch...@gmail.com wrote: Until recently, I wasn't using --module parameter. But now I do and the compilation was failing, because I am not building things in the top folder, but from inside build - to avoid clutter. I believe I discovered a bug and I am sending a patch. I think you forgot to attach the patch ? Andi.. Basically, jcc.py is copying modules into the build dir. my project is organized as: build build java python packageA packageB I build things inside build, if I specify a relative path, --module '../python/packageA', jcc will correctly copy the tree structure resulting in extension packageA packageB However, the package names (for distutils setup) will be set to ['extension', 'extension..python.packageA', 'extension..python.packageB'] Which ends up in this error: [exec] running install [exec] running bdist_egg [exec] running egg_info [exec] writing solrpie_java.egg-info/PKG-INFO [exec] writing top-level names to solrpie_java.egg-info/ top_level.txt [exec] writing dependency_links to solrpie_java.egg-info/dependency_links.txt [exec] warning: manifest_maker: standard file '__main__.py' not found [exec] error: package directory 'build/solrpie_java/python/solrpye' does not exist Cheers, roman
Re: --module option not playing nicely with relative paths
Hi Roman, On Thu, 13 Jan 2011, Roman Chyla wrote: By mistake it had .py suffix, trying now with .patch. I integrated your patch into rev 1058713 of jcc's trunk with a minor change. (I renamed the _package_track variable) Thanks ! Andi.. Best, roman On Thu, Jan 13, 2011 at 4:54 PM, Andi Vajda va...@apache.org wrote: Hi Roman, On Jan 13, 2011, at 5:47, Roman Chyla roman.ch...@gmail.com wrote: Until recently, I wasn't using --module parameter. But now I do and the compilation was failing, because I am not building things in the top folder, but from inside build - to avoid clutter. I believe I discovered a bug and I am sending a patch. I think you forgot to attach the patch ? Andi.. Basically, jcc.py is copying modules into the build dir. my project is organized as: build build java python packageA packageB I build things inside build, if I specify a relative path, --module '../python/packageA', jcc will correctly copy the tree structure resulting in extension packageA packageB However, the package names (for distutils setup) will be set to ['extension', 'extension..python.packageA', 'extension..python.packageB'] Which ends up in this error: [exec] running install [exec] running bdist_egg [exec] running egg_info [exec] writing solrpie_java.egg-info/PKG-INFO [exec] writing top-level names to solrpie_java.egg-info/top_level.txt [exec] writing dependency_links to solrpie_java.egg-info/dependency_links.txt [exec] warning: manifest_maker: standard file '__main__.py' not found [exec] error: package directory 'build/solrpie_java/python/solrpye' does not exist Cheers, roman
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981168#action_12981168 ] Stanislaw Osinski commented on SOLR-2282: - Hi Robert, What's the configuration (OS / JVM) on which the test is failing for you? I can't get it to fail on my machines (Win 7 64-bit with Sun JVM 1.6.0_20 and Oracle 1.6.0_23, Ubuntu 64-bit with Sun JVM 1.6.0_20). I'm running the test using the command I found in Hudson logs (ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3). S. Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2864) add maxtf to fieldinvertstate
add maxtf to fieldinvertstate - Key: LUCENE-2864 URL: https://issues.apache.org/jira/browse/LUCENE-2864 Project: Lucene - Java Issue Type: New Feature Components: Query/Scoring Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2864.patch the maximum within-document TF is a very useful scoring value, we should expose it so that people can use it in scoring consider the following sim: {code} @Override public float idf(int docFreq, int numDocs) { return 1.0F; /* not used */ } @Override public float computeNorm(String field, FieldInvertState state) { return state.getBoost() / (float) Math.sqrt(state.getMaxTF()); } {code} which is surprisingly effective, but more interesting for practical reasons. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2864) add maxtf to fieldinvertstate
[ https://issues.apache.org/jira/browse/LUCENE-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2864: Attachment: LUCENE-2864.patch add maxtf to fieldinvertstate - Key: LUCENE-2864 URL: https://issues.apache.org/jira/browse/LUCENE-2864 Project: Lucene - Java Issue Type: New Feature Components: Query/Scoring Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2864.patch the maximum within-document TF is a very useful scoring value, we should expose it so that people can use it in scoring consider the following sim: {code} @Override public float idf(int docFreq, int numDocs) { return 1.0F; /* not used */ } @Override public float computeNorm(String field, FieldInvertState state) { return state.getBoost() / (float) Math.sqrt(state.getMaxTF()); } {code} which is surprisingly effective, but more interesting for practical reasons. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981168#action_12981168 ] Stanislaw Osinski edited comment on SOLR-2282 at 1/13/11 3:19 AM: -- Hi Robert, What's the configuration (OS / JVM) on which the test is failing for you? I can't get it to fail on my machines (Win 7 64-bit with Sun JVM 1.6.0_20 Client VM and Oracle 1.6.0_23 Server VM, Ubuntu 64-bit with Sun JVM 1.6.0_20 Server VM). I'm running the test using the command I found in Hudson logs (ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3). S. was (Author: stanislaw.osinski): Hi Robert, What's the configuration (OS / JVM) on which the test is failing for you? I can't get it to fail on my machines (Win 7 64-bit with Sun JVM 1.6.0_20 and Oracle 1.6.0_23, Ubuntu 64-bit with Sun JVM 1.6.0_20). I'm running the test using the command I found in Hudson logs (ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3). S. Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981178#action_12981178 ] Robert Muir commented on SOLR-2282: --- Stanislaw: it is true that with that exact random seed, the test passes for me. But if i just run 'ant test', often it fails. Below is the output... i put my OS configuration first here... sorry for the noise. {noformat} [junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 (32-bit)/cpus=4,threads=4,free=5267640,total=16384000 test: [junit] Testsuite: org.apache.solr.handler.clustering.DistributedClusteringComponentTest [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 13.18 sec [junit] - Standard Error - [junit] 2011-1-13 3:35:19 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine cluster [junit] SEVERE: Carrot2 clustering failed [junit] java.lang.IndexOutOfBoundsException [junit] at java.io.StringReader.read(StringReader.java:76) [junit] at org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.zzRefill(ExtendedWhitespaceTokenizerImpl.java:557) [junit] at org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.getNextToken(ExtendedWhitespaceTokenizerImpl.java:754) [junit] at org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.nextToken(ExtendedWhitespaceTokenizer.java:46) [junit] at org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:147) [junit] at org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline.preprocess(CompletePreprocessingPipeline.java:54) [junit] at org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.preprocess(BasicPreprocessingPipeline.java:92) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.cluster(LingoClusteringAlgorithm.java:198) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.access$000(LingoClusteringAlgorithm.java:43) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm$1.process(LingoClusteringAlgorithm.java:177) [junit] at org.carrot2.text.clustering.MultilingualClustering.clusterByLanguage(MultilingualClustering.java:223) [junit] at org.carrot2.text.clustering.MultilingualClustering.process(MultilingualClustering.java:111) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:170) [junit] at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:102) [junit] at org.carrot2.core.Controller.process(Controller.java:347) [junit] at org.carrot2.core.Controller.process(Controller.java:239) [junit] at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:106) [junit] at org.apache.solr.handler.clustering.ClusteringComponent.finishStage(ClusteringComponent.java:167) [junit] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:336) [junit] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) [junit] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296) [junit] at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) [junit] at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) [junit] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) [junit] at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) [junit] at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) [junit] at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) [junit] at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) [junit] at org.mortbay.jetty.Server.handle(Server.java:326) [junit] at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) [junit] at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) [junit] at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) [junit] at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) [junit] at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) [junit] at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) [junit] at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) [junit] 2011-1-13 3:35:19 org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine cluster [junit] SEVERE: Carrot2 clustering failed [junit] java.lang.IndexOutOfBoundsException [junit] at java.io.StringReader.read(StringReader.java:76)
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981179#action_12981179 ] JohnWu commented on SOLR-1395: -- TomLiu: in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar . I add some libs to katta, Do you mean the solr embeded in katta? Now the request can form the master to slave, but how the subproxy send the query to query core? I configure the subproxy (katta with solr 1395 patch): node.server.class=org.apache.solr.katta.DeployableSolrKattaServer and solr home is in katta sh, in solr home: solr.config is solr.SearchHandler but I do not know the katta can dispatch the query to query core, the solr,jar of katta will search the query in it's data directory, how about the shard confiure in subproxy? can you give me a detailed reply? Thanks alot! Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981179#action_12981179 ] JohnWu edited comment on SOLR-1395 at 1/13/11 3:44 AM: --- TomLiu: in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar . I add some libs to katta, Do you mean the solr embeded in katta? Now the request can form the master to slave, but how the subproxy send the query to query core? I configure the subproxy (katta with solr 1395 patch): node.server.class=org.apache.solr.katta.DeployableSolrKattaServer and solr home is in katta sh, in solr home: solr.config is solr.SearchHandler but I do not know the katta can dispatch the query to query core, the solr.jar of katta will search the query in it's data directory, how about the shard confiure in subproxy? can you give me a detailed reply? Thanks alot! was (Author: johnwu): TomLiu: in katta's lib, there were so many jars, but some jars must be there. you know, Solr must include Luence's jar . I add some libs to katta, Do you mean the solr embeded in katta? Now the request can form the master to slave, but how the subproxy send the query to query core? I configure the subproxy (katta with solr 1395 patch): node.server.class=org.apache.solr.katta.DeployableSolrKattaServer and solr home is in katta sh, in solr home: solr.config is solr.SearchHandler but I do not know the katta can dispatch the query to query core, the solr,jar of katta will search the query in it's data directory, how about the shard confiure in subproxy? can you give me a detailed reply? Thanks alot! Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stanislaw Osinski updated SOLR-2282: Attachment: SOLR-2282-diagnostics.patch Robert: I was using the random seed from the build result in the hope that it will fail the test for me. I'm still unable to get the exception though, with or without the seed. I suppose it shouldn't matter whether I run the complete test suite or just this one test method? (I was doing the latter to save time) If you have a spare moment, would you be able check the following two things on your machine: 1. Apply the attached diagnostics patch and run the tests. If the test doesn't fail after the change, this means there's some concurrency issue in Carrot2's internal resource pooling mechanisms that we'll need to find. This patch is not a solution to the problem though, just a diagnostic measure. 2. It's paranoid, but can you run the test with the {{-Dargs=-XX:+TraceClassLoading}} option and check that there's no old (v3.4.0) Carrot2 JAR hiding in the bushes? Version 3.4.0 had a subtle bug that could be causing the exception. If there's no traces of Carrot2 3.4.0 JAR in the classpath, we'll need to do further inspection of our code. Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981192#action_12981192 ] Michael Busch commented on LUCENE-2324: --- I made some progress with the concurrency model, especially removing the need for various locks to make everything easier. - DocumentsWriterPerThreadPool.ThreadState now extends ReentrantLock, which means that standard methods like lock() and unlock() can be used to reserve a DWPT for a task. - The max. number of DWPTs allowed (config.maxThreadStates) is instantiated up-front. Creating a DWPT is cheap, so this is not a performance concern; this makes it easier to push config changes to the DWPTs without synchronizing on the pool and without having to worry about newly created DWPTs getting the same config settings. - DocumentsWriterPerThreadPool.getActivePerThreadsIterator() gives the caller a static snapshot of the active DWPTs at the time the iterator was acquired, e.g. for flushAllThreads() or DW.abort(). Here synchronizing on the pool isn't necessary either. - deletes are now pushed to DW.pendingDeletes() if no active DWPTs are present. TODOs: - fix remaining testcases that still fail - fix RAM tracking and flush-by-RAM - write new testcases to test thread pool, thread assignment, etc - review if all cases that were discussed in the recent comments here work as expected (likely not :) ) - performance testing and code cleanup Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981204#action_12981204 ] tom liu commented on SOLR-1395: --- In katta intergrated envs, solr is embeded. Katta does as distributed compute manager, which manages: # node startup/shutdown # shard deploy/undeploy # rpc invoke to application/Solr and Solr does as application on distributed compute envs. in Master Box, QueryHandler must be solr.KattaSearchHandler in solrconfig.xml so that, kattaclient will be invoked by solrapp, and then invoked rpc to slave. in Slave Box, Katta will startup embeded solr, which is the subproxy. the shard, that is the query solrcore, will be deployed by katta's script: bin/katta addIndex indexName indexPath Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2864) add maxtf to fieldinvertstate
[ https://issues.apache.org/jira/browse/LUCENE-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981219#action_12981219 ] Michael McCandless commented on LUCENE-2864: +1 add maxtf to fieldinvertstate - Key: LUCENE-2864 URL: https://issues.apache.org/jira/browse/LUCENE-2864 Project: Lucene - Java Issue Type: New Feature Components: Query/Scoring Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2864.patch the maximum within-document TF is a very useful scoring value, we should expose it so that people can use it in scoring consider the following sim: {code} @Override public float idf(int docFreq, int numDocs) { return 1.0F; /* not used */ } @Override public float computeNorm(String field, FieldInvertState state) { return state.getBoost() / (float) Math.sqrt(state.getMaxTF()); } {code} which is surprisingly effective, but more interesting for practical reasons. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2751) add LuceneTestCase.newSearcher()
[ https://issues.apache.org/jira/browse/LUCENE-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981220#action_12981220 ] Michael McCandless commented on LUCENE-2751: bq. There is a downside to this whole issue of course... i think its going to be harder to reproduce test fails since we will be using more multithreading. Right. But I think this (losing reproducibility sometimes) is the lesser evil? Ie, making sure we tease out thread safety bugs trumps reproducibility... add LuceneTestCase.newSearcher() Key: LUCENE-2751 URL: https://issues.apache.org/jira/browse/LUCENE-2751 Project: Lucene - Java Issue Type: Test Components: Build Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2751.patch, LUCENE-2751.patch Most tests in the search package don't care about what kind of searcher they use. we should randomly use MultiSearcher or ParallelMultiSearcher sometimes in tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API
[ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981222#action_12981222 ] Michael McCandless commented on LUCENE-2723: bq. I merged us up to yesterday (1052991:1057836), Awesome, thanks!! bq. Mike can you assist in merging r1057897? Will do. Speed up Lucene's low level bulk postings read API -- Key: LUCENE-2723 URL: https://issues.apache.org/jira/browse/LUCENE-2723 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch Spinoff from LUCENE-1410. The flex DocsEnum has a simple bulk-read API that reads the next chunk of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR (from LUCENE-1410). This is not unlike sucking coffee through those tiny plastic coffee stirrers they hand out airplanes that, surprisingly, also happen to function as a straw. As a result we see no perf gain from using FOR/PFOR. I had hacked up a fix for this, described at in my blog post at http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html I'm opening this issue to get that work to a committable point. So... I've worked out a new bulk-read API to address performance bottleneck. It has some big changes over the current bulk-read API: * You can now also bulk-read positions (but not payloads), but, I have yet to cutover positional queries. * The buffer contains doc deltas, not absolute values, for docIDs and positions (freqs are absolute). * Deleted docs are not filtered out. * The doc freq buffers need not be aligned. For fixed intblock codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16, Group varint, etc.) they won't be. It's still a work in progress... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1260) Norm codec strategy in Similarity
[ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-1260. - Resolution: Fixed Fix Version/s: 3.1 (Updating fix-version correctly, also). I think its safe to mark this resolved... the issues are totally cleared up in 4.0, and only some (documented) corner cases in 3.x where we still use the default sim. Norm codec strategy in Similarity - Key: LUCENE-1260 URL: https://issues.apache.org/jira/browse/LUCENE-1260 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Karl Wettin Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: Lucene-1260-1.patch, Lucene-1260-2.patch, Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260_defaultsim.patch The static span and resolution of the 8 bit norms codec might not fit with all applications. My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981237#action_12981237 ] Robert Muir commented on SOLR-2282: --- bq. Robert: I was using the random seed from the build result in the hope that it will fail the test for me. I'm still unable to get the exception though, with or without the seed. I suppose it shouldn't matter whether I run the complete test suite or just this one test method? (I was doing the latter to save time) well, its not completely consistent even with the seed to me (smells like a concurrency issue). Silly question, but did you remove the @Ignore on DistributedClusteringComponentTest? Otherwise, the reproducibility problem could be that it doesn't consistently fail every time, even with the same seed. I ran my previous fail three times, with the patch: {noformat} ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=8909233178291932652:-4859244606911873252 {noformat} This failed two out of three times. I also then ran it with traceclassloading, logging to a file: {noformat} ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=8909233178291932652:-4859244606911873252 -Dargs=-XX:+TraceClassLoading test.out {noformat} all the carrot classes are being loaded from solr/contrib/clustering/lib/carrot2-core-3.4.2.jar Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981241#action_12981241 ] Stanislaw Osinski commented on SOLR-2282: - {quote} well, its not completely consistent even with the seed to me (smells like a concurrency issue). {quote} This is what I've been suspecting from the beginning, I hope Dawid gets better luck at reproducing the problem on his 4-core HT machine. {quote} Silly question, but did you remove the @Ignore on DistributedClusteringComponentTest? Otherwise, the reproducibility problem could be that it doesn't consistently fail every time, even with the same seed. {quote} Yeah, I did remove the @Ignore, I'm getting Testsuite: org.apache.solr.handler.clustering.DistributedClusteringComponentTest, Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 59,658 sec in the test results dir. When it comes to reproducibility, I wasn't able to reproduce some other concurrency issue on my 2-core machine, while on Dawid's 4-core hardware the tests would fail sometimes, so I hope we can eventually get the exception locally. {quote} I ran my previous fail three times, with the patch. This failed two out of three times. {quote} Thanks for verifying this! It looks like the bug may be at some other place in C2 code than I initially thought. Let us review the code once again, as soon as we come up with the fix, I'll attach a patch. Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans
Pass a context struct to Weight#scorer instead of naked booleans Key: LUCENE-2865 URL: https://issues.apache.org/jira/browse/LUCENE-2865 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if another boolean like needsScoring or similar flags / information need to be passed to Scorers. An immutable struct would make such an extension trivial / way easier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans
[ https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2865: Attachment: LUCENE-2865.patch here is a patch that adds a ScorerContext to replace those two booleans. ScorerContext follows a copy on write pattern similar to a builder pattern that only modifies the context if the values actually change. Seems pretty straight forward so far. Pass a context struct to Weight#scorer instead of naked booleans Key: LUCENE-2865 URL: https://issues.apache.org/jira/browse/LUCENE-2865 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2865.patch Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if another boolean like needsScoring or similar flags / information need to be passed to Scorers. An immutable struct would make such an extension trivial / way easier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2866) Unexpected search results
Unexpected search results - Key: LUCENE-2866 URL: https://issues.apache.org/jira/browse/LUCENE-2866 Project: Lucene - Java Issue Type: Bug Components: Search Environment: *Operating System:* Windows Server 2003 and Windows Server 2008 R2 *System type:* 32 bits (Win Server 2003) and 64 bits (Win Server 2008) *Platform:* Alfresco Community 3.3.g *Processor:* Intel Celeron 1.80GHz 1.80GHz *RAM Memory*: 2GB Reporter: Alejandro Hello... I'm using Lucene search with Alfresco 3.3.g (I'm not sure what version of Lucene is used), and I'm havin problems when the search get me results... sometimes one search can bring me just 1 result, but when I instantly do the same search at a second time it can bring me a lot of results... Sometimes the search takes too much time to bring results... and sometimes the search stops at 1000 results. I'm using simple and boolean searches and both types have the same mistakes. Thanks for reading and for your support. Alejandro Villa Betancur -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans
[ https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981259#action_12981259 ] Uwe Schindler commented on LUCENE-2865: --- Looks good! I would make the ctor private and then use ScorerContext.default().x().y() as pattern (default returns the template). I like this design more :-) Pass a context struct to Weight#scorer instead of naked booleans Key: LUCENE-2865 URL: https://issues.apache.org/jira/browse/LUCENE-2865 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2865.patch Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if another boolean like needsScoring or similar flags / information need to be passed to Scorers. An immutable struct would make such an extension trivial / way easier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981260#action_12981260 ] JohnWu commented on SOLR-1395: -- Tomliu: Maybe the last step for me, but it's so long! katta use the lucene version is 3.0, but the solr-1395 use lucene is 4.0 snapshot, I package the solr-1395 to jar and put it in the katta class path, but the lucene version is different, so if I use the katta search SPIndex02 content:lovealice 1 slave return the lucene exception. how you make the lucene is same? add the keywordAnalyzer.class in lucene-40.-snapshot.jar? Thanks! JohnWu Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta-solrcores.jpg, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, solr-1395-katta-0.6.2.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans
[ https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2865: Attachment: LUCENE-2865.patch bq. I would make the ctor private and then use ScorerContext.default().x().y() as pattern (default returns the template). I like this design more Jawohl! :) - Since default is a keyword in java I used ScorerContext#def() instead. I fixed some JDoc issues, made all ScorerContext ctors private and added a changes.txt entry seems like we are good to go Pass a context struct to Weight#scorer instead of naked booleans Key: LUCENE-2865 URL: https://issues.apache.org/jira/browse/LUCENE-2865 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2865.patch, LUCENE-2865.patch Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if another boolean like needsScoring or similar flags / information need to be passed to Scorers. An immutable struct would make such an extension trivial / way easier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2311) FileListEntityProcessor Fields Stored in SolrDocument do not Match Documentation
[ https://issues.apache.org/jira/browse/SOLR-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981271#action_12981271 ] Koji Sekiguchi commented on SOLR-2311: -- Thank you for reporting this, Matt! For back-compat reasons, your patch: {code} @@ -254,7 +254,7 @@ if (newerThan != null lastModified.before(newerThan)) return; details.put(DIR, dir.getAbsolutePath()); -details.put(FILE, name); +details.put(FILE_NAME, name); details.put(ABSOLUTE_FILE, aFile.getAbsolutePath()); details.put(SIZE, sz); details.put(LAST_MODIFIED, lastModified); {code} should be: {code} @@ -254,7 +254,7 @@ if (newerThan != null lastModified.before(newerThan)) return; details.put(DIR, dir.getAbsolutePath()); details.put(FILE, name); +details.put(FILE_NAME, name); details.put(ABSOLUTE_FILE, aFile.getAbsolutePath()); details.put(SIZE, sz); details.put(LAST_MODIFIED, lastModified); {code} But IMO updating documentation is enough in this case. FileListEntityProcessor Fields Stored in SolrDocument do not Match Documentation Key: SOLR-2311 URL: https://issues.apache.org/jira/browse/SOLR-2311 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4.1 Environment: Java 1.6 Reporter: Matt Parker Priority: Minor Attachments: SOLR-2311.patch The implicit fields generated by the FileListEntityProcessor do not match the documentation, which are listed in the following excerpt: {quote} The implicit fields generated by the FileListEntityProcessor are fileAbsolutePath, fileSize, fileLastModified, fileName and these are available for use within the entity X as shown above. {quote} The fileName field is not populated. The file's name is stored in the implicit field named file. The hashmap that holds the metadata is (FileListEntityProcessor.java at line 255) stored the following using the associated constants: {quote} details.put(DIR, dir.getAbsolutePath()); details.put(FILE, name); details.put(ABSOLUTE_FILE, aFile.getAbsolutePath()); details.put(SIZE, sz); details.put(LAST_MODIFIED, lastModified); {quote} where DIR = fileDir, FILE = file, ABSOLUTE_FILE = fileAbsolutePath, SIZE = fileSize, and LAST_MODIFIED = fileLastModified. Either the documentation must be updated, or the constant storing the return value must be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans
[ https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981273#action_12981273 ] Uwe Schindler commented on LUCENE-2865: --- +1 to commit, looks good. For later we should fix BooleanQuery.explain() to use default context, too. topScorer=true is wrong for explain (but has no effect here). Pass a context struct to Weight#scorer instead of naked booleans Key: LUCENE-2865 URL: https://issues.apache.org/jira/browse/LUCENE-2865 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2865.patch, LUCENE-2865.patch Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if another boolean like needsScoring or similar flags / information need to be passed to Scorers. An immutable struct would make such an extension trivial / way easier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2865) Pass a context struct to Weight#scorer instead of naked booleans
[ https://issues.apache.org/jira/browse/LUCENE-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2865. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) Committed revision 1058592. thanks uwe for the review Pass a context struct to Weight#scorer instead of naked booleans Key: LUCENE-2865 URL: https://issues.apache.org/jira/browse/LUCENE-2865 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2865.patch, LUCENE-2865.patch Weight#scorer(AtomicReaderContext, boolean, boolean) is hard to extend if another boolean like needsScoring or similar flags / information need to be passed to Scorers. An immutable struct would make such an extension trivial / way easier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981305#action_12981305 ] Jason Rutherglen commented on LUCENE-2324: -- {quote}DocumentsWriterPerThreadPool.ThreadState now extends ReentrantLock, which means that standard methods like lock() and unlock() can be used to reserve a DWPT for a task.{quote} Really? That makes synchronized seem simpler? bq. the max. number of DWPTs allowed (config.maxThreadStates) is instantiated up-front. What about the memory used, eg, the non-use of byte[] recycling? I guess it'll be cleared on flush. bq. fix RAM tracking and flush-by-RAM I created a BytesUsed object that cascades the changes to parent BytesUsed objects, this allows each individual SD, DWPT, DW, etc to keep track of their bytes used, while also propagating the changes to the higher level objects, eg, SD - DWPT, DWPT - DW. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Initial committers list for Incubator Proposal
Yes. I sent an announcement to lucene-net-dev and lucene-general yesterday. We are now waiting on the Incubator community/PMC to provide feedback and vote on our proposal. You can track that on the Incubator general mailing list. Thanks, Troy On Thu, Jan 13, 2011 at 4:23 AM, Simone Chiaretta simone.chiare...@gmail.com wrote: Was wondering how the proposal is going: has it been published or sent to the ASF? Simone On Fri, Dec 31, 2010 at 1:01 AM, Troy Howard thowar...@gmail.com wrote: All, I'm working on the Incubator Proposal now, and need to establish a list of initial committers. So far, the following people have come forward and offered to be committers (in alphabetical order): Alex Thompson Ben Martz Chris Currens Heath Aldrich Michael Herndon Prescott Nasser Scott Lombard Simone Chiaretta Troy Howard I would like to place an open request for any interested parties to respond to this message with their request to be a Committer. For people who are either on that list or for people who would like to be added, please send a message explaining (briefly) why you think you will be qualified to be involved in the project and specifically what ways you hope to be able to contribute. One thing I would like to point out is that in the Apache world there is a distinction between Committers and Contributors (aka developers). See this link for details: http://incubator.apache.org/guides/participation.html#committer Please consider whether or not you wish to be a Committer or a Contributor. Some quick rules of thumb: Committers: - Committers must be willing to submit a Contributor License Agreement (CLA). See: http://www.apache.org/licenses/#clas - Committers must have enough *consistent* free time to fulfill the expectations of the ASF in terms of reporting, process, documentation and remain responsive to the community in terms of communication and listening to, considering, and discussing community opinion. These kinds of tasks can consume a lot of time and are some of the first things people stop down when they start running out of time. - A Committer may not even write code, but may simply accept, review and commit code written by others. This is the primary responsibility of a Committer -- to commit code, whether they wrote it themselves or not - Committers may have to perform the unpleasant task of reject contribution from Contributors and explain why in a fair and objective manner. This can be frustrating and time consuming. You may need to play the part of a mentor or engage in debates. You may even be proved wrong and have to swallow your pride. - Committers have direct access to the source control and other resources and so must be personally accountable for the quality of the same and will need to operate under the process and restrictions ASF expects Contributors: - Contributors might have a lot of free time this month, but get really busy next month and have no time at all. They can develop code in short bursts but then drop off the face of the planet indefinitely after that. - Contributors could focus on code only or work from a task list without any need to interact with and be accountable to the community (as this is the responsibility of the Committers) - Contributors can do one-time or infrequently needed tasks like updating the website, documentation, wikis, etc.. - Contributors will need to have anything they create reviewed by a Committer and ultimately included by a Committer. Some people find this frustrating, if the Committers are slow to respond or critical of their work. So in your responses, please be clear about whether you would like to offer your help as a Committer or as a Contributor. Thanks, Troy -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic Life is short, play hard
[jira] Created: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier
Change contrib QP API that uses CharSequence as string identifier - Key: LUCENE-2867 URL: https://issues.apache.org/jira/browse/LUCENE-2867 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 3.0.3 Reporter: Adriano Crestani Priority: Minor Fix For: 3.0.4 There are some API methods on contrib queryparser that expects CharSequence as identifier. This is wrong, since it may lead to incorrect or mislead behavior, as shown on LUCENE-2855. To avoid this problem, these APIs will be changed and enforce the use of String instead of CharSequence on version 4. This patch already deprecate the old API methods and add new substitute methods that uses only String. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981322#action_12981322 ] Ahmet Arslan commented on SOLR-1604: Use the most recent file which is non-gray color. Also there is a date attached info for files. It works for (a b) c~10. This is equivalent to a c~10 OR b c~10. SurroundQueryParser does not use Analyzer. It recommended to heavily use wildcard operator instead. e.g. instead of searching foo bar, you search foo* bar* But if you are using Standard Analyzer which does not have stemming in it, I think you can use Surround. You can pre-lowercase your queries etc. You can even pre-analyze your queries since your analyzer does not inject new tokens. But your queries must be well formed, there is not default operator in this. But I think it is better to discuss these things in solr/lucene-user mailing list. Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Fix For: Next Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981380#action_12981380 ] Michael Busch commented on LUCENE-2324: --- bq. Really? That makes synchronized seem simpler? Well look at ThreadAffinityDocumentsWriterThreadPool. There I'm able to use things like tryLock() and getQueueLength(). Also DocumentsWriterPerThreadPool has a getAndLock() method, that can be used by DW for addDocument(), whereas DW.flush(), which needs to iterate the DWPTs, can lock the individual DWPTs directly. I think it's simpler, but I'm open to other suggestions of course :) bq. What about the memory used, eg, the non-use of byte[] recycling? I guess it'll be cleared on flush. Yeah, sure. That is independent on whether they're all created upfront or not. But yeah, after flush or abort we need to clear the DWPT's state to make sure they're not consuming unused RAM (as you described in your earlier comment). Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981388#action_12981388 ] Earwin Burrfoot commented on LUCENE-2324: - Maan, this comment list is infinite. How do I currently get the ..er.. current version? Latest branch + latest Jason's patch? Regardless of everything else, I'd ask you not to extend random things :) at least if you can't say is-a about them. DocumentsWriterPerThreadPool.ThreadState IS A ReentrantLock? No. So you're better off encapsulating it rather than extending. Same can be applied to SegmentInfos that extends Vector :/ Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981390#action_12981390 ] Michael Busch commented on LUCENE-2324: --- bq. How do I currently get the ..er.. current version? Just do 'svn up' on the RT branch. bq. Regardless of everything else, I'd ask you not to extend random things This was a conscious decision, not random. Extending ReentrantLock is not an uncommon pattern, e.g. ConcurrentHashMap.Segment does exactly that. ThreadState basically is nothing but a lock that has a reference to the corresponding DWPT it protects. I encourage you to look at the code. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
unsubscribe
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981453#action_12981453 ] Dawid Weiss commented on SOLR-2282: --- I confirm this must be something related to concurrency, although from whitebox code review I have no clue how this can happen. Seems like a long, fascinating weekend is waiting for me (I am busy tomorrow and won't be able to look into it). What is weird is that we're running this code on our demo server, we do have parallel stress tests and still this happens only here. Life. {noformat} test: [junit] Testsuite: org.apache.solr.handler.clustering.DistributedClusteringComponentTest [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 12.311 sec [junit] - Standard Error - [junit] 2011-1-13 20:05:39 org.apache.solr.common.SolrException log [junit] SEVERE: java.lang.Error: Error: could not match input [junit] at org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.zzScanError(ExtendedWhitespaceTokenizerImpl.java:687) [junit] at org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.getNextToken(ExtendedWhitespaceTokenizerImpl.java:836) [junit] at org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.nextToken(ExtendedWhitespaceTokenizer.java:46) [junit] at org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:147) [junit] at org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline.preprocess(CompletePreprocessingPipeline.java:54) [junit] at org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.preprocess(BasicPreprocessingPipeline.java:92) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.cluster(LingoClusteringAlgorithm.java:198) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.access$000(LingoClusteringAlgorithm.java:43) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm$1.process(LingoClusteringAlgorithm.java:177) [junit] at org.carrot2.text.clustering.MultilingualClustering.clusterByLanguage(MultilingualClustering.java:223) [junit] at org.carrot2.text.clustering.MultilingualClustering.process(MultilingualClustering.java:111) [junit] at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:170) [junit] at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:102) [junit] at org.carrot2.core.Controller.process(Controller.java:347) [junit] at org.carrot2.core.Controller.process(Controller.java:239) [junit] at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:106) [junit] at org.apache.solr.handler.clustering.ClusteringComponent.finishStage(ClusteringComponent.java:167) [junit] at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:336) [junit] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) [junit] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1296) [junit] at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) [junit] at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) [junit] at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) [junit] at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) [junit] at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) [junit] at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) [junit] at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) [junit] at org.mortbay.jetty.Server.handle(Server.java:326) [junit] at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) [junit] at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) [junit] at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) [junit] at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) [junit] at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) [junit] at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) [junit] at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) [junit] [junit] NOTE: reproduce with: ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=8909233178291932652:-485924 4606911873252 [junit] The following exceptions were thrown by threads: [junit] *** Thread: Thread-28 *** [junit] junit.framework.AssertionFailedError:
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981501#action_12981501 ] Robert Muir commented on SOLR-2282: --- Guys, thanks for the debugging help already. Just as a side note: for these tricky non-reproducible ones, sometimes its helpful to use something like -Dtests.iter=10 its just a convenient way to run the test method multiple times. Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 3732 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3732/ 4 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock Error Message: null Stack Trace: junit.framework.AssertionFailedError: at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059) at org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2135) REGRESSION: org.apache.lucene.index.TestIndexWriter.testTermUTF16SortOrder Error Message: this writer hit an OutOfMemoryError; cannot commit Stack Trace: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2334) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2416) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2398) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2382) at org.apache.lucene.index.RandomIndexWriter.commit(RandomIndexWriter.java:114) at org.apache.lucene.index.TestIndexWriter.testTermUTF16SortOrder(TestIndexWriter.java:2416) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059) REGRESSION: org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting Error Message: GC overhead limit exceeded Stack Trace: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.lucene.index.ParallelPostingsArray.init(ParallelPostingsArray.java:33) at org.apache.lucene.index.TermVectorsTermsWriterPerField$TermVectorsPostingsArray.init(TermVectorsTermsWriterPerField.java:274) at org.apache.lucene.index.TermVectorsTermsWriterPerField$TermVectorsPostingsArray.newInstance(TermVectorsTermsWriterPerField.java:285) at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48) at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:306) at org.apache.lucene.util.BytesRefHash.addByPoolOffset(BytesRefHash.java:375) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:141) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:238) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:168) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:248) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:743) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1266) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1240) at org.apache.lucene.index.TestIndexWriter.testIndexingThenDeleting(TestIndexWriter.java:2604) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059) REGRESSION: org.apache.lucene.index.TestIndexWriter.testRandomStoredFields Error Message: this writer hit an OutOfMemoryError; cannot flush Stack Trace: java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot flush at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2484) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2473) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1273) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1240) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:95) at org.apache.lucene.index.TestIndexWriter.testRandomStoredFields(TestIndexWriter.java:2830) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1127) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1059) Build Log (for compile errors): [...truncated 3145 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981519#action_12981519 ] Jason Rutherglen commented on LUCENE-2324: -- bq. look at ThreadAffinityDocumentsWriterThreadPool. There I'm able to use things like tryLock() and getQueueLength(). Makes sense, I had only read the DocumentsWriterPerThreadPool part. * DWPT.perDocAllocator and freeLevel can be removed? * DWPT's RecyclingByteBlockAllocator - DirectAllocator? * Looks like the deletes handling is updated to the patch * I don't think we need FlushControl anymore as the RAM tracking should occur in DW and there's no need for IW to [globally] wait for flushes. * The locking is more clear now, I can see DW.updateDocument locks the threadstate as does flushAllThreads. I'll reincorporate the RAM tracking, and then will try the unit tests again. I'm curious if the file not found errors are gone. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier
[ https://issues.apache.org/jira/browse/LUCENE-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-2867: - Attachment: lucene_2867_adriano_crestani_2011_01_13.patch Here is the patch that deprecates methods using CharSequence. Can someone please review if I did the API deprecation correctly? I was thinking initially that deprecated methods would be removed on version 4, I'm not sure anymore. Will it be removed on 4 or 3.1? Change contrib QP API that uses CharSequence as string identifier - Key: LUCENE-2867 URL: https://issues.apache.org/jira/browse/LUCENE-2867 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 3.0.3 Reporter: Adriano Crestani Priority: Minor Fix For: 3.0.4 Attachments: lucene_2867_adriano_crestani_2011_01_13.patch There are some API methods on contrib queryparser that expects CharSequence as identifier. This is wrong, since it may lead to incorrect or mislead behavior, as shown on LUCENE-2855. To avoid this problem, these APIs will be changed and enforce the use of String instead of CharSequence on version 4. This patch already deprecate the old API methods and add new substitute methods that uses only String. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections
[ https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1540: Attachment: LUCENE-1540.patch Initial patch - against 3.x - not ready to commit - refactors parsing of trec text from TrecContentSource into interface TrecDocParser, currently with single impl - TrecGov2Parser. The interaction between TCS and TDP is less clean than I hoped, for two reasons: # trying to keep the synchronization pattern added while ago to that class, in which the reading of data from the file is synced but the parsing can go in parallel. For this reason there are two methods in that interface. # allowing the TDP impls to use whatever is in TCS caused required to expose some of its methods, and also to pass TCS as param to TDP. With this patch: # TDP was cleaned to use ContentSource's method getInputStream() - this also supporting .gz, .bz2, and plain text (before the patch it supports only .gz). # should be easy to add parsers for other formats. I removed the retry logic for opening the stream - I don't remember why it was added in the first place and it seems strange - if opening failed in first trial why would the next trial succeed? Remaining to do: - add parsers for the other formats - add tests for the other formats and also for bz2, plain text. - allow a single run to ingest file of different formats (needed for the disks 4+5 track). - fix some documemtation. - allow to specify the TDP to use in a property. - changes.txt. - port to trunk, so as to first commit in trunk and then backport to 3.x. Improvements to contrib.benchmark for TREC collections -- Key: LUCENE-1540 URL: https://issues.apache.org/jira/browse/LUCENE-1540 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Tim Armstrong Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-1540.patch The benchmarking utilities for TREC test collections (http://trec.nist.gov) are quite limited and do not support some of the variations in format of older TREC collections. I have been doing some benchmarking work with Lucene and have had to modify the package to support: * Older TREC document formats, which the current parser fails on due to missing document headers. * Variations in query format - newlines after title tag causing the query parser to get confused. * Ability to detect and read in uncompressed text collections * Storage of document numbers by default without storing full text. I can submit a patch if there is interest, although I will probably want to write unit tests for the new functionality first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981548#action_12981548 ] Michael Busch commented on LUCENE-2324: --- bq. DWPT.perDocAllocator and freeLevel can be removed? done. bq. DWPT's RecyclingByteBlockAllocator - DirectAllocator? done. Also removed more recycling code. bq. I don't think we need FlushControl anymore as the RAM tracking should occur in DW and there's no need for IW to [globally] wait for flushes. I removed flushControl from DW. bq. I'm curious if the file not found errors are gone. I think there's something wrong with TermVectors - several related test cases fail. We need to investigate more. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2311) FileListEntityProcessor Fields Stored in SolrDocument do not Match Documentation
[ https://issues.apache.org/jira/browse/SOLR-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981558#action_12981558 ] Matt Parker commented on SOLR-2311: --- I'm not sure I agree. I thought the change would improve the code's clarity. File is actually a misnomer to what is captured in the field. Filename would be more appropriate. Also, I thought the test case I wrote would have been of value and worth including. Regardless of whether it's accepted, the documentation needs to be changed to reflect whatever you decide to implement. FileListEntityProcessor Fields Stored in SolrDocument do not Match Documentation Key: SOLR-2311 URL: https://issues.apache.org/jira/browse/SOLR-2311 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4.1 Environment: Java 1.6 Reporter: Matt Parker Priority: Minor Attachments: SOLR-2311.patch The implicit fields generated by the FileListEntityProcessor do not match the documentation, which are listed in the following excerpt: {quote} The implicit fields generated by the FileListEntityProcessor are fileAbsolutePath, fileSize, fileLastModified, fileName and these are available for use within the entity X as shown above. {quote} The fileName field is not populated. The file's name is stored in the implicit field named file. The hashmap that holds the metadata is (FileListEntityProcessor.java at line 255) stored the following using the associated constants: {quote} details.put(DIR, dir.getAbsolutePath()); details.put(FILE, name); details.put(ABSOLUTE_FILE, aFile.getAbsolutePath()); details.put(SIZE, sz); details.put(LAST_MODIFIED, lastModified); {quote} where DIR = fileDir, FILE = file, ABSOLUTE_FILE = fileAbsolutePath, SIZE = fileSize, and LAST_MODIFIED = fileLastModified. Either the documentation must be updated, or the constant storing the return value must be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2324: - Attachment: test.out Here's the latest test.out. There's a lot of these: {code} [junit] junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _5.fdt [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1156) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1088) [junit] at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3273) [junit] at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3321) [junit] at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2339) [junit] at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2410) [junit] at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1083) [junit] at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1027) [junit] at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:991) [junit] at org.apache.lucene.index.TestAddIndexes.testMergeAfterCopy(TestAddIndexes.java:432) {code} Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2867) Change contrib QP API that uses CharSequence as string identifier
[ https://issues.apache.org/jira/browse/LUCENE-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981640#action_12981640 ] Simon Willnauer commented on LUCENE-2867: - bq. Here is the patch that deprecates methods using CharSequence. Can someone please review if I did the API deprecation correctly? those comments and annotations look good! bq. I was thinking initially that deprecated methods would be removed on version 4, I'm not sure anymore. Will it be removed on 4 or 3.1? you should drop those methods from 4.0 code if you want to deprecate. Since this is a contrib you don't necessarily need to deprecate so you could alo drop them from 3.1 or even from 3.0.x Change contrib QP API that uses CharSequence as string identifier - Key: LUCENE-2867 URL: https://issues.apache.org/jira/browse/LUCENE-2867 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Affects Versions: 3.0.3 Reporter: Adriano Crestani Priority: Minor Fix For: 3.0.4 Attachments: lucene_2867_adriano_crestani_2011_01_13.patch There are some API methods on contrib queryparser that expects CharSequence as identifier. This is wrong, since it may lead to incorrect or mislead behavior, as shown on LUCENE-2855. To avoid this problem, these APIs will be changed and enforce the use of String instead of CharSequence on version 4. This patch already deprecate the old API methods and add new substitute methods that uses only String. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981649#action_12981649 ] Nick Pellow commented on LUCENE-2666: - Hi, I am getting this issue as well? We are doing quite a lot of update updates during indexing. Could this be causing the problem ? This seems to only have happened when we deployed to our linux test server - it didn't appear to occur on MAC OS X during development - with the same data set. Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? ArrayIndexOutOfBoundsException when iterating over TermDocs --- Key: LUCENE-2666 URL: https://issues.apache.org/jira/browse/LUCENE-2666 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2 Reporter: Shay Banon A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File(index)); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField(si); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println(-- + si); if (si.getDocStoreSegment().contains(_26t)) { // this is the probleatic one... System.out.println(problematic one...); FieldCache.DEFAULT.getLongs(subReader, __documentdate, FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_26t_1.del] test: open reader.OK [1 deleted docs] test: fields..OK [32 fields] test: field norms.OK [32 fields] test: terms, freq, prox...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: stored fields...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: term vectorsERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) at
[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981650#action_12981650 ] Nick Pellow commented on LUCENE-2666: - I've also noticed this occurring since I started using a numeric field and accessing the its field cache for boosting. ArrayIndexOutOfBoundsException when iterating over TermDocs --- Key: LUCENE-2666 URL: https://issues.apache.org/jira/browse/LUCENE-2666 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2 Reporter: Shay Banon A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File(index)); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField(si); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println(-- + si); if (si.getDocStoreSegment().contains(_26t)) { // this is the probleatic one... System.out.println(problematic one...); FieldCache.DEFAULT.getLongs(subReader, __documentdate, FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_26t_1.del] test: open reader.OK [1 deleted docs] test: fields..OK [32 fields] test: field norms.OK [32 fields] test: terms, freq, prox...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: stored fields...ERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) test: term vectorsERROR [114] java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) at TestMe.main(TestMe.java:47) The creation of the index does not do something fancy (all defaults), though
[jira] Issue Comment Edited: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981649#action_12981649 ] Nick Pellow edited comment on LUCENE-2666 at 1/14/11 2:17 AM: -- Hi, I am getting this issue as well? We are doing quite a lot of update updates during indexing. Could this be causing the problem ? This seems to only have happened when we deployed to our linux test server - it didn't appear to occur on MAC OS X during development - with the same data set. Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? The exact stack strace: {code} java.lang.ArrayIndexOutOfBoundsException: 5475 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) at org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:207) at org.apache.lucene.search.PhrasePositions.skipTo(PhrasePositions.java:52) at org.apache.lucene.search.PhraseScorer.advance(PhraseScorer.java:120) at org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:249) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:218) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:199) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:177) at org.apache.lucene.search.MultiSearcher$MultiSearcherCallableWithSort.call(MultiSearcher.java:410) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:230) at org.apache.lucene.search.Searcher.search(Searcher.java:49) {code} was (Author: npellow): Hi, I am getting this issue as well? We are doing quite a lot of update updates during indexing. Could this be causing the problem ? This seems to only have happened when we deployed to our linux test server - it didn't appear to occur on MAC OS X during development - with the same data set. Does this only affect Lucene 3.0.2 ? Would a rollback be a good work around ? ArrayIndexOutOfBoundsException when iterating over TermDocs --- Key: LUCENE-2666 URL: https://issues.apache.org/jira/browse/LUCENE-2666 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.2 Reporter: Shay Banon A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114 at org.apache.lucene.util.BitVector.get(BitVector.java:104) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) at TestMe.main(TestMe.java:56) It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: FSDirectory dir = FSDirectory.open(new File(index)); IndexReader reader = IndexReader.open(dir, true); IndexReader[] subReaders = reader.getSequentialSubReaders(); for (IndexReader subReader : subReaders) { Field field = subReader.getClass().getSuperclass().getDeclaredField(si); field.setAccessible(true); SegmentInfo si = (SegmentInfo) field.get(subReader); System.out.println(-- + si); if (si.getDocStoreSegment().contains(_26t)) { // this is the probleatic one... System.out.println(problematic one...); FieldCache.DEFAULT.getLongs(subReader, __documentdate, FieldCache.NUMERIC_UTILS_LONG_PARSER); } } Here is the result of a check index on that segment: 8 of 10: name=_26t docCount=914 compound=true hasProx=true numFiles=2 size (MB)=1.641 diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} has deletions
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981651#action_12981651 ] Dawid Weiss commented on SOLR-2282: --- This tests.iter is exactly what I will need :) I'll most likely weave a runtime aspect into the code to verify when two threads enter the same critical section. Again, from whitebox review it seems impossible, but then actually detecting and fixing impossible things are what we love in our profession... Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282-diagnostics.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282_test.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2657: Attachment: LUCENE-2657.patch * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}}; Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top level, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662 ] Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:49 AM: -- * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top level, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} was (Author: steve_rowe): * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}}; Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top level, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662 ] Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:55 AM: -- * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. * {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes {{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build directories already being removing. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} was (Author: steve_rowe): * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}} from the top level, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662 ] Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:56 AM: -- * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. * {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes {{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build directories already being removed. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} was (Author: steve_rowe): * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. * {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes {{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build directories already being removing. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12981662#action_12981662 ] Steven Rowe edited comment on LUCENE-2657 at 1/14/11 2:57 AM: -- * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. * {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes {{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build directories already being removed. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} was (Author: steve_rowe): * Set snapshot {{updatePolicy}} to {{never}} for both the {{apache.snapshots}} and the {{carrot2.org}} Maven repositories, so that they won't constantly be checked for snapshot updates. * Consolidated distribution-related profiles to just one named {{dist}} * Solr-specific noggit and commons-csv jars are now properly placed in {{solr/dist/maven/}} when deploying with the {{dist}} profile * No longer setting the repositories' {{uniqueVersion}} to {{false}} when deploying under the {{dist}} profile; as a result, snapshot artifacts' names will include build timestamps instead of {{SNAPSHOT}} in {{*/dist/maven/}}. * {{mvn clean}} from {{lucene/src/}} and {{solr/src/}} now removes {{lucene/dist/}} and {{solr/dist/}}, respectively, in addition to the build directories already being removed. To populate both {{lucene/dist/maven/}} and {{solr/dist/maven/}}, run from the top level: {code} mvn -Pdist -DskipTests deploy {code} To only populate only {{lucene/dist/maven/}}, run from the top level: {code} mvn -N -Pdist deploy cd lucene mvn -Pdist -DskipTests deploy cd ../modules mvn -Pdist -DskipTests deploy {code} Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 Attachments: LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch, LUCENE-2657.patch The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. The full Maven POMs in the attached patch include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. Several dependencies are not available through public maven repositories. A profile in the top-level POM can be activated to install these dependencies from the various {{lib/}} directories into your local repository. From the top-level directory: {code} mvn -N -Pbootstrap install {code} Once these non-Maven dependencies have been installed, to run all Lucene/Solr tests via Maven's surefire plugin, and populate your local repository with all artifacts, from the top level directory, run: {code} mvn install {code} When one Lucene/Solr module depends on another, the dependency is declared on the *artifact(s)* produced by the other module and deposited in your local repository, rather than on the other module's un-jarred compiler output in the {{build/}} directory, so you must run {{mvn install}} on the other module before its changes are visible to the module that depends on it. To create all the artifacts without running tests: {code} mvn -DskipTests install {code} I almost always include the {{clean}} phase when I do a build, e.g.: {code} mvn -DskipTests clean install {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.