[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649284#comment-13649284 ] Steve Rowe commented on LUCENE-4956: bq. I've created branch lucene4956 and checked in an arirang module in lucene/analysis. I've added a basic test that tests segmentation, offsets, etc. Cool! bq. License headers have been added to all source code files I can see one that doesn't have one: TestKoreanAnalyzer.java. I'll take a pass over all the files. bq. Eclipse is TODO. I ran {{ant eclipse}} and it seemed to do the right thing already -I can see Arirang entries in the .classpath file that gets produced - I don't think there's anything to be done. I don't use Eclipse, though, so I can't be sure. I added Maven config and an IntelliJ Arirang module test run configuration. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649296#comment-13649296 ] Christian Moen commented on LUCENE-4956: Thanks, Steve. I've added the missing license header to {{TestKoreanAnalyzer.java}}. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4975: --- Attachment: LUCENE-4975.patch Added testConsistencyOnException to test the client and handlers' behavior when they encounter exceptions (I use MockDirWrapp diskFull and randomIOE to simulate that). I think this module is basically ready. I.e. it comes with tests, javadocs and pretty much does what it was written to do. I'm sure there's room for improvement, but I don't think this should hold off the commit. So unless there are any objections, I intend to commit in by Tuesday. If people want to do a thorough review, I don't mind waiting with the commit, but just drop a comment on the issue to let me know. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649300#comment-13649300 ] Uwe Schindler commented on LUCENE-4956: --- I have seen the Tokenizer also uses JFlex, but an older version as used for Lucene's other tokenizers (like StandardTokenizer). Can we add the ANT tasks like we have for StandardTokenizer to regenerate the source file from build.xml. Finally we should regenerate the Java files with the JFlex trunk version and compare with the one committed here (if there are differences). the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: solr no longer webapp
+1
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b86) - Build # 5504 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5504/ Java: 32bit/jdk1.8.0-ea-b86 -server -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom Error Message: doc=336 maxDoc=336 Stack Trace: java.lang.AssertionError: doc=336 maxDoc=336 at __randomizedtesting.SeedInfo.seed([3847A35261F6C0F0:4A0B865DD0967683]:0) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:456) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:494) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2650) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2794) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2774) at org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:163) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:155) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:114) at org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom(AllGroupHeadsCollectorTest.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
Re: VOTE: solr no longer webapp
On 5 May 2013 09:07, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: I feel the same as Shawn; I was quite skeptical until the reasons were finally given. And I agree that the war file distribution needs to stick around longer. [...] Agreed on both points. Given the reasons put forward, I have seen the error of my ways, and am now +1 for dropping the .war. Phasing in this change, with a deprecation announced in the 4.x series would help users in easing in the change. Regards, Gora - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b86) - Build # 5504 - Failure!
Doesn't reproduce on java7 or java 8. On Sun, May 5, 2013 at 9:19 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5504/ Java: 32bit/jdk1.8.0-ea-b86 -server -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom Error Message: doc=336 maxDoc=336 Stack Trace: java.lang.AssertionError: doc=336 maxDoc=336 at __randomizedtesting.SeedInfo.seed([3847A35261F6C0F0:4A0B865DD0967683]:0) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:456) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:494) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2650) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2794) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2774) at org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:163) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:155) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:114) at org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom(AllGroupHeadsCollectorTest.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b86) - Build # 5504 - Failure!
I'm away from home but I just thought Should have tried master seed + tests.jvms from grouping module. If we can repro we could then start switching off hotspot flags and so on On May 5, 2013 11:12 AM, Robert Muir rcm...@gmail.com wrote: Doesn't reproduce on java7 or java 8. On Sun, May 5, 2013 at 9:19 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5504/ Java: 32bit/jdk1.8.0-ea-b86 -server -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom Error Message: doc=336 maxDoc=336 Stack Trace: java.lang.AssertionError: doc=336 maxDoc=336 at __randomizedtesting.SeedInfo.seed([3847A35261F6C0F0:4A0B865DD0967683]:0) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:456) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:494) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2650) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2794) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2774) at org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:163) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:155) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:114) at org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom(AllGroupHeadsCollectorTest.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at
[jira] [Commented] (SOLR-4448) Allow the solr internal load balancer to be more easily pluggable.
[ https://issues.apache.org/jira/browse/SOLR-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649378#comment-13649378 ] Shawn Heisey commented on SOLR-4448: [~rjernst] first let me say that I am having this discussion because what you're saying goes against my limited understanding, and by stating what I think and listening to your response, I might learn something. You probably already know the things that I am saying. I might even find that I misunderstood what you were saying and that I agree with you. bq. Load balancing is used by distributed search. It happens to also be used for uploading documents, which is a client feature. Clients shouldn't be using this for sending distributed search requests. Solr does that. I've just done a non-detailed review of CloudSolrServer. It uses a new LBHttpSolrServer object with a customized URL list for every request. Queries get sent to all replicas, updates only get sent to leaders. A TODO says that currently there is no support in the object for sending updates to the correct leader based on a hashing algorithm. Outside of SolrCloud, the LB object makes sense for clients in master-slave replication environments, but only on the query side. Updates have to be directed to the master only. A separate load balancer does give you more flexibility, but not everyone wants to invest the time (or possibly money) required. If the client on the server side and the client on the client side need identical functionality, then the existing situation makes sense -- one implementation in the org.apache.solr.client.solrj namespace. If we think they'll ever diverge, even a little bit, then having an abstract class in the org.apache.solr.common namespace makes sense, although it should still be in the solrj source tree. Allow the solr internal load balancer to be more easily pluggable. -- Key: SOLR-4448 URL: https://issues.apache.org/jira/browse/SOLR-4448 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: philip hoy Priority: Minor Attachments: SOLR-4448.patch, SOLR-4448.patch Widen some access level modifiers to allow the load balancer to be extended and plugged into an HttpShardHandler instance using an extended HttpShardHandlerFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649379#comment-13649379 ] Christian Moen commented on LUCENE-4956: Good points, Uwe. I'll look into this. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4448) Allow the solr internal load balancer to be more easily pluggable.
[ https://issues.apache.org/jira/browse/SOLR-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649384#comment-13649384 ] Shawn Heisey commented on SOLR-4448: My review of CloudSolrServer obviously wasn't deep enough. I thought it was making a new LB object for every request, which seemed very inefficient. Turns out it was making a new LBHttpSolrServer.Req object. The Req class includes a URL list. Allow the solr internal load balancer to be more easily pluggable. -- Key: SOLR-4448 URL: https://issues.apache.org/jira/browse/SOLR-4448 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: philip hoy Priority: Minor Attachments: SOLR-4448.patch, SOLR-4448.patch Widen some access level modifiers to allow the load balancer to be extended and plugged into an HttpShardHandler instance using an extended HttpShardHandlerFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649395#comment-13649395 ] Steve Rowe commented on LUCENE-4956: bq. Thanks, Steve. I've added the missing license header to TestKoreanAnalyzer.java. I looked over the rest of the files, and the only things missing license headers are the dictionary files and the {{korean.properties}} file, all under {{src/resources/}}. I committed a license header to {{korean.properties}}. I tried adding '#'-commented-out headers to the .dic files (a couple of them already have '##' and '//##' lines), but that triggered a test failure, so more work will need to be done to make the license headers inline in the dictionary files. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649404#comment-13649404 ] Steve Rowe commented on LUCENE-4956: bq. Could you comment about the origins and authorship of org.apache.lucene.analysis.kr.utils.StringUtil in your tar file? I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? Thanks! I looked at the file content, and it's definitely from Apache Commons Lang, sometime early 2010, maybe with a little pulled in from another Commons Lang pulled in. I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, separators) - its javadoc is: {code:java} /** * pSplits the provided text into an array, separators specified. * This is an alternative to using StringTokenizer./p * * pThe separator is not included in the returned String array. * Adjacent separators are treated as one separator. * For more control over the split use the StrTokenizer class./p * * pA codenull/code input String returns codenull/code. * A codenull/code separatorChars splits on whitespace./p * * pre * StringUtil.split(null, *) = null * StringUtil.split(, *) = [] * StringUtil.split(abc def, null) = [abc, def] * StringUtil.split(abc def, ) = [abc, def] * StringUtil.split(abc def, ) = [abc, def] * StringUtil.split(ab:cd:ef, :) = [ab, cd, ef] * /pre * * @param str the String to parse, may be null * @param separatorChars the characters used as the delimiters, * codenull/code splits on whitespace * @return an array of parsed Strings, codenull/code if null String input */ {code} I'm replacing calls to this method with calls to String.split(regex), where regex is [char]+, and char is the (in all cases singular) split character. I'll commit the changes and the StringUtil.java removal in a little bit once I've got it compiling and the tests succeed. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4478) Allow cores to specify a named config set
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4478: - Attachment: SOLR-4478.patch Updated patch with a problem. First I had the bright idea to interleave the configset-style and new-style core.properties files so we'd get some added testing done. Tests passed first time! Except for the stack traces, turns out I was eating an exception in the test that I shouldn't have been. Fortunately it seems to be a stack trace only thrown by the new code. But looking at the stack trace, there's an NPE at SearchHandler.180 or so, this line: ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler(); Of course the shardHandlerFactory is null here. So two things: 1 Does it even make sense to share the SolrConfig object? I can imagine all sorts of threading issues here, but don't know the underlying code well enough to know whether to be terrified or not. 2 Any clue why the shardHandlerFactory would be null? Near as I can tell, the SolrResourceLoader.inform method is where the problem starts, it sets the live member variable and later the NPE happens since the live member var aborts processing in the newInstance method. And if it's as simple as giving each core a new ResourceLoader, is there any point or is the work required at that point enough that sharing the solrconfig isn't worth the effort. Of course it may just be a sequencing issue, but I'm a little lost today, any wisdom gratefully received. Allow cores to specify a named config set - Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already thoughts about how this all should work floating around, so before I start any work on this I thought I'd at least get an idea of whether this is the way people are thinking about going. Configset can be either a relative or absolute path, if relative it's assumed to be relative to solr_home. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4448) Allow the solr internal load balancer to be more easily pluggable.
[ https://issues.apache.org/jira/browse/SOLR-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649407#comment-13649407 ] Ryan Ernst commented on SOLR-4448: -- My apologies. I do not use SolrJ and was making some bad assumptions. I see now that a client would use this to round-robin between all the hosts in a cluster for a top level requests, and then solr would *also* use a different LB (running in solr instead of the client) for distributing requests to slices. As I've said here, I'm fine with this staying in SolrJ. I only hope (in the future, not asking for it here) to see a better abstraction for the load balancer. Allow the solr internal load balancer to be more easily pluggable. -- Key: SOLR-4448 URL: https://issues.apache.org/jira/browse/SOLR-4448 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: philip hoy Priority: Minor Attachments: SOLR-4448.patch, SOLR-4448.patch Widen some access level modifiers to allow the load balancer to be extended and plugged into an HttpShardHandler instance using an extended HttpShardHandlerFactory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set
[ https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649405#comment-13649405 ] Erick Erickson edited comment on SOLR-4478 at 5/5/13 7:58 PM: -- Updated patch with a problem. First I had the bright idea to interleave the configset-style and new-style core.properties files so we'd get some added testing done in OpenCloseCoreStressTest. Tests passed first time! Except for the stack traces, turns out I was eating an exception in the test that I shouldn't have been. Fortunately it seems to be a stack trace only thrown by the new code. NOTE: There's a nocommit in OpenCloseCoreStressTest that forces all cores to be configset only for easier debugging on this issue. But looking at the stack trace, there's an NPE at SearchHandler.180 or so, this line: ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler(); Of course the shardHandlerFactory is null here. So two things: 1 Does it even make sense to share the SolrConfig object? I can imagine all sorts of threading issues here, but don't know the underlying code well enough to know whether to be terrified or not. 2 Any clue why the shardHandlerFactory would be null? Near as I can tell, the SolrResourceLoader.inform method is where the problem starts, it sets the live member variable and later the NPE happens since the live member var aborts processing in the newInstance method. And if it's as simple as giving each core a new ResourceLoader, is there any point or is the work required at that point enough that sharing the solrconfig isn't worth the effort. Of course it may just be a sequencing issue, but I'm a little lost today, any wisdom gratefully received. was (Author: erickerickson): Updated patch with a problem. First I had the bright idea to interleave the configset-style and new-style core.properties files so we'd get some added testing done. Tests passed first time! Except for the stack traces, turns out I was eating an exception in the test that I shouldn't have been. Fortunately it seems to be a stack trace only thrown by the new code. But looking at the stack trace, there's an NPE at SearchHandler.180 or so, this line: ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler(); Of course the shardHandlerFactory is null here. So two things: 1 Does it even make sense to share the SolrConfig object? I can imagine all sorts of threading issues here, but don't know the underlying code well enough to know whether to be terrified or not. 2 Any clue why the shardHandlerFactory would be null? Near as I can tell, the SolrResourceLoader.inform method is where the problem starts, it sets the live member variable and later the NPE happens since the live member var aborts processing in the newInstance method. And if it's as simple as giving each core a new ResourceLoader, is there any point or is the work required at that point enough that sharing the solrconfig isn't worth the effort. Of course it may just be a sequencing issue, but I'm a little lost today, any wisdom gratefully received. Allow cores to specify a named config set - Key: SOLR-4478 URL: https://issues.apache.org/jira/browse/SOLR-4478 Project: Solr Issue Type: Improvement Affects Versions: 4.2, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4478.patch, SOLR-4478.patch Part of moving forward to the new way, after SOLR-4196 etc... I propose an additional parameter specified on the core node in solr.xml or as a parameter in the discovery mode core.properties file, call it configSet, where the value provided is a path to a directory, either absolute or relative. Really, this is as though you copied the conf directory somewhere to be used by more than one core. Straw-man: There will be a directory solr_home/configsets which will be the default. If the configSet parameter is, say, myconf, then I'd expect a directory named myconf to exist in solr_home/configsets, which would look something like solr_home/configsets/myconf/schema.xml solrconfig.xml stopwords.txt velocity velocity/query.vm etc. If multiple cores used the same configSet, schema, solrconfig etc. would all be shared (i.e. shareSchema=true would be assumed). I don't see a good use-case for _not_ sharing schemas, so I don't propose to allow this to be turned off. Hmmm, what if shareSchema is explicitly set to false in the solr.xml or properties file? I'd guess it should be honored but maybe log a warning? Mostly I'm putting this up for comments. I know that there are already
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649410#comment-13649410 ] Commit Tag Bot commented on LUCENE-4956: [lucene4956 commit] sarowe http://svn.apache.org/viewvc?view=revisionrevision=1479362 LUCENE-4956: Remove o.a.l.analysis.kr.utils.StringUtil and all calls to it (mostly StringUtil.split, replaced with String.split) the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649420#comment-13649420 ] Steve Rowe commented on LUCENE-4956: This looks like a typo to me, in {{KoreanEnv.java}} - the second {{FILE_DICTIONARY}} should instead be {{FILE_EXTENSION}}: {code:java} /** * Initialize the default property values. */ private void initDefaultProperties() { defaults = new Properties(); defaults.setProperty(FILE_SYLLABLE_FEATURE,org/apache/lucene/analysis/kr/dic/syllable.dic); defaults.setProperty(FILE_DICTIONARY,org/apache/lucene/analysis/kr/dic/dictionary.dic); defaults.setProperty(FILE_DICTIONARY,org/apache/lucene/analysis/kr/dic/extension.dic); {code} the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649421#comment-13649421 ] Commit Tag Bot commented on LUCENE-4956: [lucene4956 commit] sarowe http://svn.apache.org/viewvc?view=revisionrevision=1479386 LUCENE-4956: fix typo the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649424#comment-13649424 ] Commit Tag Bot commented on LUCENE-4956: [lucene4956 commit] sarowe http://svn.apache.org/viewvc?view=revisionrevision=1479391 LUCENE-4956: Add license headers to dictionary files, and modify FileUtil.readLines() to ignore lines beginning with comment char '\!' the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4976) PersistentSnapshotDeletionPolicy should save to a single file
[ https://issues.apache.org/jira/browse/LUCENE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649428#comment-13649428 ] Commit Tag Bot commented on LUCENE-4976: [trunk commit] mikemccand http://svn.apache.org/viewvc?view=revisionrevision=1479394 LUCENE-4976: add missing sync / delete old save files PersistentSnapshotDeletionPolicy should save to a single file - Key: LUCENE-4976 URL: https://issues.apache.org/jira/browse/LUCENE-4976 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4976.patch, LUCENE-4976.patch Today it creates a single-document Lucene index, and calls commit() after each snapshot/release. I think we can just use a single file instead, and remove Closeable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4976) PersistentSnapshotDeletionPolicy should save to a single file
[ https://issues.apache.org/jira/browse/LUCENE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649430#comment-13649430 ] Commit Tag Bot commented on LUCENE-4976: [branch_4x commit] mikemccand http://svn.apache.org/viewvc?view=revisionrevision=1479395 LUCENE-4976: add missing sync / delete old save files PersistentSnapshotDeletionPolicy should save to a single file - Key: LUCENE-4976 URL: https://issues.apache.org/jira/browse/LUCENE-4976 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4976.patch, LUCENE-4976.patch Today it creates a single-document Lucene index, and calls commit() after each snapshot/release. I think we can just use a single file instead, and remove Closeable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4979) LiveFieldValues should accept any ReferenceManager
Michael McCandless created LUCENE-4979: -- Summary: LiveFieldValues should accept any ReferenceManager Key: LUCENE-4979 URL: https://issues.apache.org/jira/browse/LUCENE-4979 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4979.patch Today it requires ReferenceManagerIndexSearcher but it doesn't rely on that at all (it just forwards that IndexSearcher to the subclass's lookup method). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4979) LiveFieldValues should accept any ReferenceManager
[ https://issues.apache.org/jira/browse/LUCENE-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4979: --- Attachment: LUCENE-4979.patch Simple patch ... LiveFieldValues should accept any ReferenceManager -- Key: LUCENE-4979 URL: https://issues.apache.org/jira/browse/LUCENE-4979 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4979.patch Today it requires ReferenceManagerIndexSearcher but it doesn't rely on that at all (it just forwards that IndexSearcher to the subclass's lookup method). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649436#comment-13649436 ] Steve Rowe commented on LUCENE-4956: I added license headers to the dictionary files, so AFAICT all files now have Apache License headers. I've updated [http://incubator.apache.org/ip-clearance/lucene-korean-analyzer.html] - it looks ready to go to me. (Again, I can only the control the XML version of this, at [http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/lucene-korean-analyzer.xml], so it might be a day or so before the HTML version catches up.) I think we're ready for the incubator-general vote. [~cm], do you agree? We don't need to wait for the vote result to continue making improvements, e.g. tabs-space, svn:eol-style-native, etc. - the vote email will point to the revision on the branch we think is vote-worthy: r1479391. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649404#comment-13649404 ] Steve Rowe edited comment on LUCENE-4956 at 5/5/13 9:32 PM: {quote} Could you comment about the origins and authorship of org.apache.lucene.analysis.kr.utils.StringUtil in your tar file? I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? Thanks! {quote} I looked at the file content, and it's definitely from Apache Commons Lang (the class is named {{StringUtils}} there, renamed {{StringUtil}} here), circa early 2010, maybe with a little pulled in from another Commons Lang class. I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, separators) - its javadoc is: {code:java} /** * pSplits the provided text into an array, separators specified. * This is an alternative to using StringTokenizer./p * * pThe separator is not included in the returned String array. * Adjacent separators are treated as one separator. * For more control over the split use the StrTokenizer class./p * * pA codenull/code input String returns codenull/code. * A codenull/code separatorChars splits on whitespace./p * * pre * StringUtil.split(null, *) = null * StringUtil.split(, *) = [] * StringUtil.split(abc def, null) = [abc, def] * StringUtil.split(abc def, ) = [abc, def] * StringUtil.split(abc def, ) = [abc, def] * StringUtil.split(ab:cd:ef, :) = [ab, cd, ef] * /pre * * @param str the String to parse, may be null * @param separatorChars the characters used as the delimiters, * codenull/code splits on whitespace * @return an array of parsed Strings, codenull/code if null String input */ {code} I'm replacing calls to this method with calls to String.split(regex), where regex is [char]+, and char is the (in all cases singular) split character. I'll commit the changes and the StringUtil.java removal in a little bit once I've got it compiling and the tests succeed. was (Author: steve_rowe): bq. Could you comment about the origins and authorship of org.apache.lucene.analysis.kr.utils.StringUtil in your tar file? I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? Thanks! I looked at the file content, and it's definitely from Apache Commons Lang, sometime early 2010, maybe with a little pulled in from another Commons Lang pulled in. I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, separators) - its javadoc is: {code:java} /** * pSplits the provided text into an array, separators specified. * This is an alternative to using StringTokenizer./p * * pThe separator is not included in the returned String array. * Adjacent separators are treated as one separator. * For more control over the split use the StrTokenizer class./p * * pA codenull/code input String returns codenull/code. * A codenull/code separatorChars splits on whitespace./p * * pre * StringUtil.split(null, *) = null * StringUtil.split(, *) = [] * StringUtil.split(abc def, null) = [abc, def] * StringUtil.split(abc def, ) = [abc, def] * StringUtil.split(abc def, ) = [abc, def] * StringUtil.split(ab:cd:ef, :) = [ab, cd, ef] * /pre * * @param str the String to parse, may be null * @param separatorChars the characters used as the delimiters, * codenull/code splits on whitespace * @return an array of parsed Strings, codenull/code if null String input */ {code} I'm replacing calls to this method with calls to String.split(regex), where regex is [char]+, and char is the (in all cases singular) split character. I'll commit the changes and the StringUtil.java removal in a little bit once I've got it compiling and the tests succeed. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically
Re: VOTE: solr no longer webapp
On May 5, 2013, at 9:30 AM, Gora Mohanty g...@mimirtech.com wrote: Given the reasons put forward, I have seen the error of my ways, and am now +1 for dropping the .war. Phasing in this change, with a deprecation announced in the 4.x series would help users in easing in the change. My main argument is really that your imagination can run wild. I can't even predict all the good things that can come out of owning this layer of the app. The same way we couldn't predict all the changes that would come with the freedom of flexible indexing. I also don't think we need to drop the war right away - the first step is a mindset change. Start calling this an implementation detail in version whatever. It's hard to say who will have the time to do what work here when - but removing limits will allow many changes that are hard to impossible in a webapp world. Help, I'm stuck in a webapp and I can't get out! - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649445#comment-13649445 ] Steve Rowe commented on LUCENE-4956: [~smlee0818], I don't understand the following method in {{WordSpaceAnalyzer.java}} - what's the point of the method always returning false? (i.e.: {{if(true) return false;}}): {code:java} private boolean isNounPart(String str, int jstart) throws MorphException { if(true) return false; for(int i=jstart-1;i=0;i--) { if(DictionaryUtil.getWordExceptVerb(str.substring(i,jstart+1))!=null) return true; } return false; } {code} {{isNounPart()}} is only called from one method in the same class: {{findJosaEnd(snipt,jstart)}}: {code:java} if(DictionaryUtil.existJosa(str) !findNounWithinStr(snipt,i,i+2) !isNounPart(snipt,jstart)) { {code} the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: solr no longer webapp
On 6 May 2013 03:23, Mark Miller markrmil...@gmail.com wrote: On May 5, 2013, at 9:30 AM, Gora Mohanty g...@mimirtech.com wrote: Given the reasons put forward, I have seen the error of my ways, and am now +1 for dropping the .war. Phasing in this change, with a deprecation announced in the 4.x series would help users in easing in the change. My main argument is really that your imagination can run wild. I can't even predict all the good things that can come out of owning this layer of the app. The same way we couldn't predict all the changes that would come with the freedom of flexible indexing. I also don't think we need to drop the war right away - the first step is a mindset change. Start calling this an implementation detail in version whatever. It's hard to say who will have the time to do what work here when - but removing limits will allow many changes that are hard to impossible in a webapp world. Help, I'm stuck in a webapp and I can't get out! Completely agreed: As I said, I have seen the error of my ways :-) People arguing for this change are right, and your point about a mindset change is important. Being largely an end-user of Solr, for me it took the arguments put forward in this thread to make me realise the importance of such a mindset change. IMHO, the intermediate step of deprecation is also important so that users who do view Solr as a webapp, manage multiple installations from such a perspective, and might not be subscribed to the dev list are given reasonable warning. Regards, Gora - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
Michael McCandless created LUCENE-4980: -- Summary: Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest Key: LUCENE-4980 URL: https://issues.apache.org/jira/browse/LUCENE-4980 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4980.patch I tried to combine these two and there were several issues: * It's ... really tricky to manage the two different FacetAccumulators across that N FacetCollectors that DrillSideways creates ... to fix this I added a new MultiFacetsAccumulator that switches for you. * There was still one place in DS/DDQ that wasn't properly handling a non-Term drill-down. * There was a bug in the collector method for DrillSideways whereby if a given segment had no hits, it was skipped, which is incorrect because it must still be visited to tally up the sideways counts. * Separately I noticed that DrillSideways was doing too much work: it would count up drill-down counts *and* drill-sideways counts against the same dim (but then discard the drill-down counts in the end). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
[ https://issues.apache.org/jira/browse/LUCENE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4980: --- Attachment: LUCENE-4980.patch Patch w/ test case + fixes. Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest - Key: LUCENE-4980 URL: https://issues.apache.org/jira/browse/LUCENE-4980 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4980.patch I tried to combine these two and there were several issues: * It's ... really tricky to manage the two different FacetAccumulators across that N FacetCollectors that DrillSideways creates ... to fix this I added a new MultiFacetsAccumulator that switches for you. * There was still one place in DS/DDQ that wasn't properly handling a non-Term drill-down. * There was a bug in the collector method for DrillSideways whereby if a given segment had no hits, it was skipped, which is incorrect because it must still be visited to tally up the sideways counts. * Separately I noticed that DrillSideways was doing too much work: it would count up drill-down counts *and* drill-sideways counts against the same dim (but then discard the drill-down counts in the end). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: solr no longer webapp
I don't think that I have ever thought of Solr as an application. I mean, it is a tool/server that application developers user to develop THEIR application. I've always thought of Solr as a server, and it has only been incidental (and fairly annoying) that it has been packaged as a webapp for deployment in a container. If anything, I think of Solr as a server or app server that supports development and deployment of search apps - coded primarily in XML. Hardly your typical webapp. Are there any examples of a tool/server comparable to Solr’s role in “application development” and support for “search apps” that happily live as a mere “webapp”? That said, I'll stay agnostic for now as to the disposition of the war and support for containers such as Tomcat. Keeping war support in 4.x and dumping it in 5.0 seems fine, but... Meanwhile, I'd like to see the Solr example grow up into a production/deployment example. -- Jack Krupansky From: Walter Underwood Sent: Saturday, May 04, 2013 6:35 PM To: dev@lucene.apache.org Subject: Re: VOTE: solr no longer webapp Let's not reject the classic replication mode out of hand. The tight coupling in Solr Cloud brings along a host of failure modes. For systems that do not have tight freshness requirements, regular old replication is awesome. The loose coupling allows extremely simple failure recovery. For example, AWS region failover is trivial with good ol' Solr replication. With Solr Cloud, I cannot figure out a way to do it. That happens to be a requirement from ops. Also, comparing this to MySQL and Postgres is silly. There is no web framework for C or C++. Instead, let's name a few major Java applications that do not use servlet containers. wunder On May 4, 2013, at 2:25 PM, Mark Miller wrote: Supporting both just compounds our problems and doesn't go very far towards solving any. The only place the webapp will end up still making sense after a bit of time is in non solrcloud mode. The improvements it will bring will make it a dumb choice if you use SolrCloud. We already have enough baggage holding up SolrCloud because of supporting the std mode - adding to the list only makes my life even harder. We need to reduce the number of configurations we ship, not multiply them. I believe *very* strongly. We must start to focus the beam, there is already to much diffraction. - Mark On May 4, 2013, at 2:09 PM, Grant Ingersoll gsing...@apache.org wrote: Why not just support both? It really isn't all that hard. While I agree w/ Robert and Mark that it's time to consider alternatives, I also don't think it is all that hard to support both from a user's perspective. We could have a Netty(or other) version and a WAR version and I don't think it is that big of a deal to maintain. After all, we already have the component pieces as JARs, just bundle them up differently for Netty, etc. -Grant On May 3, 2013, at 8:09 PM, Otis Gospodnetic wrote: But I'm really curious, what is the problem with Solr inside a container? Which problem is this solving? I feel like I missed some important thread which is highly possible. :) Thanks, Otis On Fri, May 3, 2013 at 1:14 PM, Robert Muir rcm...@gmail.com wrote: # rm -rf tomcat # gzip -dc solr.tgz | tar -xvf - # cd solr/example # java -jar start.jar On Fri, May 3, 2013 at 9:52 AM, Steve Molloy smol...@opentext.com wrote: So, if ever this passes, what would be the upgrade path for all the deployments using Solr as a webapp inside tomcat or other container? From: Michael McCandless [luc...@mikemccandless.com] Sent: May 3, 2013 12:09 PM To: Lucene/Solr dev Subject: Re: VOTE: solr no longer webapp On Fri, May 3, 2013 at 12:14 AM, Robert Muir rcm...@gmail.com wrote: I think solr should no longer be a war file but a search app. I don't care how it accomplishes this: jetty, netty, its all up to us. Let me know your ideas: I think its a necessary step to move solr forwards. +1 Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands,
Re: VOTE: solr no longer webapp
So, if we said 5.x was non-WAR, we could move forward with it and maintain 4.x as WAR. I'll put my vote at +0.5. There are a lot of people using Solr that are putting it into a standard, corporate approved web container. I could see that argument going both ways. On the one hand, no one asks what container MySql runs in, on the other, people have been trained for a lot of years on Solr as a WAR. Frankly, I like how Restlet handles this stuff, for the most part. Jetty (or other containers) are an implementation detail. -Grant On May 4, 2013, at 5:25 PM, Mark Miller wrote: Supporting both just compounds our problems and doesn't go very far towards solving any. The only place the webapp will end up still making sense after a bit of time is in non solrcloud mode. The improvements it will bring will make it a dumb choice if you use SolrCloud. We already have enough baggage holding up SolrCloud because of supporting the std mode - adding to the list only makes my life even harder. We need to reduce the number of configurations we ship, not multiply them. I believe *very* strongly. We must start to focus the beam, there is already to much diffraction. - Mark On May 4, 2013, at 2:09 PM, Grant Ingersoll gsing...@apache.org wrote: Why not just support both? It really isn't all that hard. While I agree w/ Robert and Mark that it's time to consider alternatives, I also don't think it is all that hard to support both from a user's perspective. We could have a Netty(or other) version and a WAR version and I don't think it is that big of a deal to maintain. After all, we already have the component pieces as JARs, just bundle them up differently for Netty, etc. -Grant On May 3, 2013, at 8:09 PM, Otis Gospodnetic wrote: But I'm really curious, what is the problem with Solr inside a container? Which problem is this solving? I feel like I missed some important thread which is highly possible. :) Thanks, Otis On Fri, May 3, 2013 at 1:14 PM, Robert Muir rcm...@gmail.com wrote: # rm -rf tomcat # gzip -dc solr.tgz | tar -xvf - # cd solr/example # java -jar start.jar On Fri, May 3, 2013 at 9:52 AM, Steve Molloy smol...@opentext.com wrote: So, if ever this passes, what would be the upgrade path for all the deployments using Solr as a webapp inside tomcat or other container? From: Michael McCandless [luc...@mikemccandless.com] Sent: May 3, 2013 12:09 PM To: Lucene/Solr dev Subject: Re: VOTE: solr no longer webapp On Fri, May 3, 2013 at 12:14 AM, Robert Muir rcm...@gmail.com wrote: I think solr should no longer be a war file but a search app. I don't care how it accomplishes this: jetty, netty, its all up to us. Let me know your ideas: I think its a necessary step to move solr forwards. +1 Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org Grant Ingersoll | @gsingers http://www.lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org Grant Ingersoll | @gsingers http://www.lucidworks.com
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649452#comment-13649452 ] Commit Tag Bot commented on LUCENE-4956: [lucene4956 commit] sarowe http://svn.apache.org/viewvc?view=revisionrevision=1479410 LUCENE-4956: - svn:eol-style - native - tabs - spaces - regularized java code indents to 2 spaces per level the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4981) Deprecate PositionFilter
Adrien Grand created LUCENE-4981: Summary: Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649457#comment-13649457 ] Michael McCandless commented on LUCENE-4975: +1 to commit and iterate from here on... this new module looks very nice! I like the new testConsistencyOnException ... maybe also call MDW.setRandomIOExceptionRateOnOpen? This will additionally randomly throw exceptions from openInput/createOutput. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Including JTS in an Apache project
Hi all, I saw that Apache solr uses JTS (Java Topology Suite) [ http://www.vividsolutions.com/jts/JTSHome.htm] for supporting a spatial data type [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4]. Using JTS in an Apache project is not a straight forward thing as JTS is licensed under LGPL which has some compatibility issued when included in an Apache project. Now, I need to dome something very similar in another Apache project (Pig [http://pig.apache.org/]) and I'm faced with the licensing issue. I'm asking for your advice for the best way we can do to use JTS without breaking the license issue. Does referring to JTS classes from the code of an Apache project without actually including the classes violate the license? Do we have to load the classes dynamically (using Class#forName) or there is another way to do it? Thanks in advance Best regards, Ahmed Eldawy
[jira] [Created] (SOLR-4787) Join Contrib
Joel Bernstein created SOLR-4787: Summary: Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. PostFilterQParserPlugin aka pjoin The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / JoinValueSourceParserPlugin aka vjoin The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. PostFilterQParserPlugin aka pjoin The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Attachment: SOLR-4787.patch Initial pjoin and vjoin contrib. TODO: Tests need to be created and the vjoin has some insanity issues with the FieldCache that will eventually be solved by using on-disk DocValues. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *.PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results.
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *.PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main Query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter main query. Only records from the main query, where the to field is present in the from list will be included in the
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649512#comment-13649512 ] Joel Bernstein commented on SOLR-4787: -- The integer keys are faster to join and take up less memory in the in-memory join structures. So, string keys won't scale nearly as well. It may be possible to make them work, but it might scale about the same as the JoinQParserPlugin. Possibly other high performance string joins can be contribed as well. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar to the join implementation in the JoinQParserPlugin but differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that can be used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Description: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / was: This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from
Re: Including JTS in an Apache project
Hi Ahmed, I faced your conundrum with JTS early last year. As you know, the Apache Software Foundation doesn't like it's projects depending on GPL and even LGPL licensed libraries. The ASF does not have clear unambiguous language on how its projects can depend on them in a limited sense. Different PMCs (projects) have different standards. I've heard of one project (CXF?) that uses Java reflection to use an LGPL library. I think another downloads the LGPL library as part of the build, and then the code has a compile-time dependency (I could be mistaken). If memory serves, in both cases the dependency fit an optional role and not a core purpose of the software. The Lucene PMC in particular didn't formally vote to my knowledge but there was a time when it was clear to me that such approaches were not acceptable. The approach that the Lucene spatial developers took (me, Ryan, Chris) was to create a non-ASF project called Spatial4j that is ASL licensed. Spatial4j *optionally* depends on JTS -- it's only for advanced shapes (namely polygons) and for WKT parsing. https://github.com/spatial4j/spatial4j BTW, WKT parsing will be handled by Spatial4j itself in the near future without JTS. Spatial4j is not a subset of JTS; it critically has things JTS doesn't like a native circle (not a polygon approximation) and the concept of the world being a sphere instead of flat ;-) That's right, JTS, as critical as it is in the world of open-source spatial, doesn't have any geodetic calculations, just Euclidean. Spatial4j adds dateline wrap support to JTS shapes so you can represent Fiji for example, but not yet Antarctica (no pole wrap). So I encourage the Apache Pig project to take a look at using Spatial4j instead of directly using JTS for the same reasons that the Lucene project uses it. If you ultimately decide not to then please let me know why, as I see Spatial4j being an excellent fit for ASF projects in particular because of the licensing issue. So your statement Apache Solr *uses* JTS is incorrect. No it doesn't, and nor does Lucene; not at all. Instead, those projects use Spatial4j, which has an abstraction (Shape), and it has an implementation of that abstraction that depends on JTS. It also has implementations that don't depend on JTS. p.s. Last week I did a long presentation on Spatial in Lucene/Solr/Spatial4j and I'd be happy to share the slides with you. The organizers will but they haven't yet. ~ David Smiley Ahmed El-dawy wrote Hi all, I saw that Apache solr uses JTS (Java Topology Suite) [ http://www.vividsolutions.com/jts/JTSHome.htm] for supporting a spatial data type [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4]. Using JTS in an Apache project is not a straight forward thing as JTS is licensed under LGPL which has some compatibility issued when included in an Apache project. Now, I need to dome something very similar in another Apache project (Pig [http://pig.apache.org/]) and I'm faced with the licensing issue. I'm asking for your advice for the best way we can do to use JTS without breaking the license issue. Does referring to JTS classes from the code of an Apache project without actually including the classes violate the license? Do we have to load the classes dynamically (using Class#forName) or there is another way to do it? Thanks in advance Best regards, Ahmed Eldawy - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Including-JTS-in-an-Apache-project-tp4060944p4060969.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649530#comment-13649530 ] David Smiley commented on SOLR-4787: Nice Joel! I've done a custom join query recently but it's a bit different than either of yours. I read your pjoin code in particular and it looks very good, mostly. Your BSearch class is the only thing that made me frown. Instead of putting each name-value pair into their own key class (which isn't GC friendly), I suggest you take a look at Lucene's SorterTemplate which will allow you to collect your key value integers directly into an array each, and then sort in-place when done. I like your idea on caching the join; I should do that with mine. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org