[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649284#comment-13649284
 ] 

Steve Rowe commented on LUCENE-4956:


bq. I've created branch lucene4956 and checked in an arirang module in 
lucene/analysis. I've added a basic test that tests segmentation, offsets, etc.

Cool!

bq. License headers have been added to all source code files

I can see one that doesn't have one: TestKoreanAnalyzer.java.  I'll take a pass 
over all the files.

bq. Eclipse is TODO.

I ran {{ant eclipse}} and it seemed to do the right thing already -I can see 
Arirang entries in the .classpath file that gets produced - I don't think 
there's anything to be done.  I don't use Eclipse, though, so I can't be sure.

I added Maven config and an IntelliJ Arirang module test run configuration.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649296#comment-13649296
 ] 

Christian Moen commented on LUCENE-4956:


Thanks, Steve.  I've added the missing license header to 
{{TestKoreanAnalyzer.java}}.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4975) Add Replication module to Lucene

2013-05-05 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4975:
---

Attachment: LUCENE-4975.patch

Added testConsistencyOnException to test the client and handlers' behavior when 
they encounter exceptions (I use MockDirWrapp diskFull and randomIOE to 
simulate that).

I think this module is basically ready. I.e. it comes with tests, javadocs and 
pretty much does what it was written to do. I'm sure there's room for 
improvement, but I don't think this should hold off the commit. So unless there 
are any objections, I intend to commit in by Tuesday. If people want to do a 
thorough review, I don't mind waiting with the commit, but just drop a comment 
on the issue to let me know.

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649300#comment-13649300
 ] 

Uwe Schindler commented on LUCENE-4956:
---

I have seen the Tokenizer also uses JFlex, but an older version as used for 
Lucene's other tokenizers (like StandardTokenizer). Can we add the ANT tasks 
like we have for StandardTokenizer to regenerate the source file from 
build.xml. Finally we should regenerate the Java files with the JFlex trunk 
version and compare with the one committed here (if there are differences).


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: solr no longer webapp

2013-05-05 Thread Peter Mitchell
+1


[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b86) - Build # 5504 - Failure!

2013-05-05 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5504/
Java: 32bit/jdk1.8.0-ea-b86 -server -XX:+UseG1GC

1 tests failed.
REGRESSION:  
org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom

Error Message:
doc=336 maxDoc=336

Stack Trace:
java.lang.AssertionError: doc=336 maxDoc=336
at 
__randomizedtesting.SeedInfo.seed([3847A35261F6C0F0:4A0B865DD0967683]:0)
at 
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:456)
at 
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at 
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:494)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2650)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2794)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2774)
at 
org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:163)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:155)
at 
org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:114)
at 
org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom(AllGroupHeadsCollectorTest.java:293)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:490)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

Re: VOTE: solr no longer webapp

2013-05-05 Thread Gora Mohanty
On 5 May 2013 09:07, David Smiley (@MITRE.org) dsmi...@mitre.org wrote:
 I feel the same as Shawn; I was quite skeptical until the reasons were
 finally given.  And I agree that the war file distribution needs to stick
 around longer.
[...]

Agreed on both points.

Given the reasons put forward, I have seen the error of my ways, and
am now +1 for dropping the .war. Phasing in this change, with a
deprecation announced in the 4.x series would help users in easing
in the change.

Regards,
Gora

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b86) - Build # 5504 - Failure!

2013-05-05 Thread Robert Muir
Doesn't reproduce on java7 or java 8.

On Sun, May 5, 2013 at 9:19 AM, Policeman Jenkins Server 
jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5504/
 Java: 32bit/jdk1.8.0-ea-b86 -server -XX:+UseG1GC

 1 tests failed.
 REGRESSION:
  org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom

 Error Message:
 doc=336 maxDoc=336

 Stack Trace:
 java.lang.AssertionError: doc=336 maxDoc=336
 at
 __randomizedtesting.SeedInfo.seed([3847A35261F6C0F0:4A0B865DD0967683]:0)
 at
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:456)
 at
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:494)
 at
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
 at
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
 at
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2650)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2794)
 at
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2774)
 at
 org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:163)
 at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:155)
 at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:114)
 at
 org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom(AllGroupHeadsCollectorTest.java:293)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:490)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at
 

Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b86) - Build # 5504 - Failure!

2013-05-05 Thread Robert Muir
I'm away from home but I just thought Should have tried master seed +
tests.jvms from grouping module. If we can repro we could then start
switching off hotspot flags and so on
On May 5, 2013 11:12 AM, Robert Muir rcm...@gmail.com wrote:

 Doesn't reproduce on java7 or java 8.

 On Sun, May 5, 2013 at 9:19 AM, Policeman Jenkins Server 
 jenk...@thetaphi.de wrote:

 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5504/
 Java: 32bit/jdk1.8.0-ea-b86 -server -XX:+UseG1GC

 1 tests failed.
 REGRESSION:
  org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom

 Error Message:
 doc=336 maxDoc=336

 Stack Trace:
 java.lang.AssertionError: doc=336 maxDoc=336
 at
 __randomizedtesting.SeedInfo.seed([3847A35261F6C0F0:4A0B865DD0967683]:0)
 at
 org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:456)
 at
 org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
 at
 org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
 at
 org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:494)
 at
 org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
 at
 org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:559)
 at
 org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2650)
 at
 org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2794)
 at
 org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2774)
 at
 org.apache.lucene.index.RandomIndexWriter.maybeCommit(RandomIndexWriter.java:163)
 at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:155)
 at
 org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:114)
 at
 org.apache.lucene.search.grouping.AllGroupHeadsCollectorTest.testRandom(AllGroupHeadsCollectorTest.java:293)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:490)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at
 

[jira] [Commented] (SOLR-4448) Allow the solr internal load balancer to be more easily pluggable.

2013-05-05 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649378#comment-13649378
 ] 

Shawn Heisey commented on SOLR-4448:


[~rjernst] first let me say that I am having this discussion because what 
you're saying goes against my limited understanding, and by stating what I 
think and listening to your response, I might learn something.  You probably 
already know the things that I am saying.  I might even find that I 
misunderstood what you were saying and that I agree with you.

bq. Load balancing is used by distributed search. It happens to also be used 
for uploading documents, which is a client feature. Clients shouldn't be using 
this for sending distributed search requests. Solr does that.

I've just done a non-detailed review of CloudSolrServer.  It uses a new 
LBHttpSolrServer object with a customized URL list for every request.  Queries 
get sent to all replicas, updates only get sent to leaders.  A TODO says that 
currently there is no support in the object for sending updates to the correct 
leader based on a hashing algorithm.

Outside of SolrCloud, the LB object makes sense for clients in master-slave 
replication environments, but only on the query side.  Updates have to be 
directed to the master only.  A separate load balancer does give you more 
flexibility, but not everyone wants to invest the time (or possibly money) 
required.

If the client on the server side and the client on the client side need 
identical functionality, then the existing situation makes sense -- one 
implementation in the org.apache.solr.client.solrj namespace.  If we think 
they'll ever diverge, even a little bit, then having an abstract class in the 
org.apache.solr.common namespace makes sense, although it should still be in 
the solrj source tree.


 Allow the solr internal load balancer to be more easily pluggable.
 --

 Key: SOLR-4448
 URL: https://issues.apache.org/jira/browse/SOLR-4448
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: philip hoy
Priority: Minor
 Attachments: SOLR-4448.patch, SOLR-4448.patch


 Widen some access level modifiers to allow the load balancer to be extended 
 and plugged into an HttpShardHandler instance using an extended 
 HttpShardHandlerFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Christian Moen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649379#comment-13649379
 ] 

Christian Moen commented on LUCENE-4956:


Good points, Uwe.  I'll look into this.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4448) Allow the solr internal load balancer to be more easily pluggable.

2013-05-05 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649384#comment-13649384
 ] 

Shawn Heisey commented on SOLR-4448:


My review of CloudSolrServer obviously wasn't deep enough.  I thought it was 
making a new LB object for every request, which seemed very inefficient.  Turns 
out it was making a new LBHttpSolrServer.Req object.  The Req class includes a 
URL list.

 Allow the solr internal load balancer to be more easily pluggable.
 --

 Key: SOLR-4448
 URL: https://issues.apache.org/jira/browse/SOLR-4448
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: philip hoy
Priority: Minor
 Attachments: SOLR-4448.patch, SOLR-4448.patch


 Widen some access level modifiers to allow the load balancer to be extended 
 and plugged into an HttpShardHandler instance using an extended 
 HttpShardHandlerFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649395#comment-13649395
 ] 

Steve Rowe commented on LUCENE-4956:


bq. Thanks, Steve. I've added the missing license header to 
TestKoreanAnalyzer.java.

I looked over the rest of the files, and the only things missing license 
headers are the dictionary files and the {{korean.properties}} file, all under 
{{src/resources/}}.  I committed a license header to {{korean.properties}}.

I tried adding '#'-commented-out headers to the .dic files (a couple of them 
already have '##' and '//##' lines), but that triggered a test failure, 
so more work will need to be done to make the license headers inline in the 
dictionary files.  

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649404#comment-13649404
 ] 

Steve Rowe commented on LUCENE-4956:


bq. Could you comment about the origins and authorship of 
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? 
Thanks!

I looked at the file content, and it's definitely from Apache Commons Lang, 
sometime early 2010, maybe with a little pulled in from another Commons Lang 
pulled in.

I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, 
separators) - its javadoc is:

{code:java}
/**
 * pSplits the provided text into an array, separators specified.
 * This is an alternative to using StringTokenizer./p
 *
 * pThe separator is not included in the returned String array.
 * Adjacent separators are treated as one separator.
 * For more control over the split use the StrTokenizer class./p
 *
 * pA codenull/code input String returns codenull/code.
 * A codenull/code separatorChars splits on whitespace./p
 *
 * pre
 * StringUtil.split(null, *) = null
 * StringUtil.split(, *)   = []
 * StringUtil.split(abc def, null) = [abc, def]
 * StringUtil.split(abc def,  )  = [abc, def]
 * StringUtil.split(abc  def,  ) = [abc, def]
 * StringUtil.split(ab:cd:ef, :) = [ab, cd, ef]
 * /pre
 *
 * @param str  the String to parse, may be null
 * @param separatorChars  the characters used as the delimiters,
 *  codenull/code splits on whitespace
 * @return an array of parsed Strings, codenull/code if null String input
 */
{code}

I'm replacing calls to this method with calls to String.split(regex), where 
regex is [char]+, and char is the (in all cases singular) split character.

I'll commit the changes and the StringUtil.java removal in a little bit once 
I've got it compiling and the tests succeed.

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4478) Allow cores to specify a named config set

2013-05-05 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-4478:
-

Attachment: SOLR-4478.patch

Updated patch with a problem. First I had the bright idea to interleave the 
configset-style and new-style core.properties files so we'd get some added 
testing done. Tests passed first time! Except for the stack traces, turns out I 
was eating an exception in the test that I shouldn't have been. Fortunately it 
seems to be a stack trace only thrown by the new code.

But looking at the stack trace, there's an NPE at SearchHandler.180 or so, this 
line:
ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();

Of course the shardHandlerFactory is null here.

So two things:

1 Does it even make sense to share the SolrConfig object? I can imagine all 
sorts of threading issues here, but don't know the underlying code well enough 
to know whether to be terrified or not.

2 Any clue why the shardHandlerFactory would be null? Near as I can tell, the 
SolrResourceLoader.inform method is where the problem starts, it sets the 
live member variable and later the NPE happens since the live member var 
aborts processing in the newInstance method.

And if it's as simple as giving each core a new ResourceLoader, is there any 
point or is the work required at that point enough that sharing the solrconfig 
isn't worth the effort.

Of course it may just be a sequencing issue, but I'm a little lost today, any 
wisdom gratefully received.

 Allow cores to specify a named config set
 -

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 thoughts about how this all should work floating around, so before I start 
 any work on this I thought I'd at least get an idea of whether this is the 
 way people are thinking about going.
 Configset can be either a relative or absolute path, if relative it's assumed 
 to be relative to solr_home.
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4448) Allow the solr internal load balancer to be more easily pluggable.

2013-05-05 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649407#comment-13649407
 ] 

Ryan Ernst commented on SOLR-4448:
--

My apologies.  I do not use SolrJ and was making some bad assumptions.  I see 
now that a client would use this to round-robin between all the hosts in a 
cluster for a top level requests, and then solr would *also* use a different LB 
(running in solr instead of the client) for distributing requests to slices.

As I've said here, I'm fine with this staying in SolrJ.  I only hope (in the 
future, not asking for it here) to see a better abstraction for the load 
balancer.

 Allow the solr internal load balancer to be more easily pluggable.
 --

 Key: SOLR-4448
 URL: https://issues.apache.org/jira/browse/SOLR-4448
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: philip hoy
Priority: Minor
 Attachments: SOLR-4448.patch, SOLR-4448.patch


 Widen some access level modifiers to allow the load balancer to be extended 
 and plugged into an HttpShardHandler instance using an extended 
 HttpShardHandlerFactory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4478) Allow cores to specify a named config set

2013-05-05 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649405#comment-13649405
 ] 

Erick Erickson edited comment on SOLR-4478 at 5/5/13 7:58 PM:
--

Updated patch with a problem. First I had the bright idea to interleave the 
configset-style and new-style core.properties files so we'd get some added 
testing done in OpenCloseCoreStressTest. Tests passed first time! Except for 
the stack traces, turns out I was eating an exception in the test that I 
shouldn't have been. Fortunately it seems to be a stack trace only thrown by 
the new code. NOTE: There's a nocommit in OpenCloseCoreStressTest that forces 
all cores to be configset only for easier debugging on this issue.

But looking at the stack trace, there's an NPE at SearchHandler.180 or so, this 
line:
ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();

Of course the shardHandlerFactory is null here.

So two things:

1 Does it even make sense to share the SolrConfig object? I can imagine all 
sorts of threading issues here, but don't know the underlying code well enough 
to know whether to be terrified or not.

2 Any clue why the shardHandlerFactory would be null? Near as I can tell, the 
SolrResourceLoader.inform method is where the problem starts, it sets the 
live member variable and later the NPE happens since the live member var 
aborts processing in the newInstance method.

And if it's as simple as giving each core a new ResourceLoader, is there any 
point or is the work required at that point enough that sharing the solrconfig 
isn't worth the effort.

Of course it may just be a sequencing issue, but I'm a little lost today, any 
wisdom gratefully received.

  was (Author: erickerickson):
Updated patch with a problem. First I had the bright idea to interleave the 
configset-style and new-style core.properties files so we'd get some added 
testing done. Tests passed first time! Except for the stack traces, turns out I 
was eating an exception in the test that I shouldn't have been. Fortunately it 
seems to be a stack trace only thrown by the new code.

But looking at the stack trace, there's an NPE at SearchHandler.180 or so, this 
line:
ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();

Of course the shardHandlerFactory is null here.

So two things:

1 Does it even make sense to share the SolrConfig object? I can imagine all 
sorts of threading issues here, but don't know the underlying code well enough 
to know whether to be terrified or not.

2 Any clue why the shardHandlerFactory would be null? Near as I can tell, the 
SolrResourceLoader.inform method is where the problem starts, it sets the 
live member variable and later the NPE happens since the live member var 
aborts processing in the newInstance method.

And if it's as simple as giving each core a new ResourceLoader, is there any 
point or is the work required at that point enough that sharing the solrconfig 
isn't worth the effort.

Of course it may just be a sequencing issue, but I'm a little lost today, any 
wisdom gratefully received.
  
 Allow cores to specify a named config set
 -

 Key: SOLR-4478
 URL: https://issues.apache.org/jira/browse/SOLR-4478
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-4478.patch, SOLR-4478.patch


 Part of moving forward to the new way, after SOLR-4196 etc... I propose an 
 additional parameter specified on the core node in solr.xml or as a 
 parameter in the discovery mode core.properties file, call it configSet, 
 where the value provided is a path to a directory, either absolute or 
 relative. Really, this is as though you copied the conf directory somewhere 
 to be used by more than one core.
 Straw-man: There will be a directory solr_home/configsets which will be the 
 default. If the configSet parameter is, say, myconf, then I'd expect a 
 directory named myconf to exist in solr_home/configsets, which would look 
 something like
 solr_home/configsets/myconf/schema.xml
   solrconfig.xml
   stopwords.txt
   velocity
   velocity/query.vm
 etc.
 If multiple cores used the same configSet, schema, solrconfig etc. would all 
 be shared (i.e. shareSchema=true would be assumed). I don't see a good 
 use-case for _not_ sharing schemas, so I don't propose to allow this to be 
 turned off. Hmmm, what if shareSchema is explicitly set to false in the 
 solr.xml or properties file? I'd guess it should be honored but maybe log a 
 warning?
 Mostly I'm putting this up for comments. I know that there are already 
 

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649410#comment-13649410
 ] 

Commit Tag Bot commented on LUCENE-4956:


[lucene4956 commit] sarowe
http://svn.apache.org/viewvc?view=revisionrevision=1479362

LUCENE-4956: Remove o.a.l.analysis.kr.utils.StringUtil and all calls to it 
(mostly StringUtil.split, replaced with String.split)

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649420#comment-13649420
 ] 

Steve Rowe commented on LUCENE-4956:


This looks like a typo to me, in {{KoreanEnv.java}} - the second 
{{FILE_DICTIONARY}} should instead be {{FILE_EXTENSION}}:

{code:java}
/**
 * Initialize the default property values.
 */
private void initDefaultProperties() {
  defaults = new Properties();

  
defaults.setProperty(FILE_SYLLABLE_FEATURE,org/apache/lucene/analysis/kr/dic/syllable.dic);
  
defaults.setProperty(FILE_DICTIONARY,org/apache/lucene/analysis/kr/dic/dictionary.dic);
  
defaults.setProperty(FILE_DICTIONARY,org/apache/lucene/analysis/kr/dic/extension.dic);
  
{code}


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649421#comment-13649421
 ] 

Commit Tag Bot commented on LUCENE-4956:


[lucene4956 commit] sarowe
http://svn.apache.org/viewvc?view=revisionrevision=1479386

LUCENE-4956: fix typo

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649424#comment-13649424
 ] 

Commit Tag Bot commented on LUCENE-4956:


[lucene4956 commit] sarowe
http://svn.apache.org/viewvc?view=revisionrevision=1479391

LUCENE-4956: Add license headers to dictionary files, and modify 
FileUtil.readLines() to ignore lines beginning with comment char '\!'

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4976) PersistentSnapshotDeletionPolicy should save to a single file

2013-05-05 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649428#comment-13649428
 ] 

Commit Tag Bot commented on LUCENE-4976:


[trunk commit] mikemccand
http://svn.apache.org/viewvc?view=revisionrevision=1479394

LUCENE-4976: add missing sync / delete old save files

 PersistentSnapshotDeletionPolicy should save to a single file
 -

 Key: LUCENE-4976
 URL: https://issues.apache.org/jira/browse/LUCENE-4976
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4976.patch, LUCENE-4976.patch


 Today it creates a single-document Lucene index, and calls commit() after 
 each snapshot/release.
 I think we can just use a single file instead, and remove Closeable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4976) PersistentSnapshotDeletionPolicy should save to a single file

2013-05-05 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649430#comment-13649430
 ] 

Commit Tag Bot commented on LUCENE-4976:


[branch_4x commit] mikemccand
http://svn.apache.org/viewvc?view=revisionrevision=1479395

LUCENE-4976: add missing sync / delete old save files

 PersistentSnapshotDeletionPolicy should save to a single file
 -

 Key: LUCENE-4976
 URL: https://issues.apache.org/jira/browse/LUCENE-4976
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4976.patch, LUCENE-4976.patch


 Today it creates a single-document Lucene index, and calls commit() after 
 each snapshot/release.
 I think we can just use a single file instead, and remove Closeable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4979) LiveFieldValues should accept any ReferenceManager

2013-05-05 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4979:
--

 Summary: LiveFieldValues should accept any ReferenceManager
 Key: LUCENE-4979
 URL: https://issues.apache.org/jira/browse/LUCENE-4979
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4
 Attachments: LUCENE-4979.patch

Today it requires ReferenceManagerIndexSearcher but it doesn't rely on that 
at all (it just forwards that IndexSearcher to the subclass's lookup method).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4979) LiveFieldValues should accept any ReferenceManager

2013-05-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4979:
---

Attachment: LUCENE-4979.patch

Simple patch ...

 LiveFieldValues should accept any ReferenceManager
 --

 Key: LUCENE-4979
 URL: https://issues.apache.org/jira/browse/LUCENE-4979
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4979.patch


 Today it requires ReferenceManagerIndexSearcher but it doesn't rely on that 
 at all (it just forwards that IndexSearcher to the subclass's lookup method).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649436#comment-13649436
 ] 

Steve Rowe commented on LUCENE-4956:


I added license headers to the dictionary files, so AFAICT all files now have 
Apache License headers.

I've updated 
[http://incubator.apache.org/ip-clearance/lucene-korean-analyzer.html] - it 
looks ready to go to me.  (Again, I can only the control the XML version of 
this, at 
[http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/lucene-korean-analyzer.xml],
 so it might be a day or so before the HTML version catches up.)

I think we're ready for the incubator-general vote.  [~cm], do you agree?

We don't need to wait for the vote result to continue making improvements, e.g. 
tabs-space, svn:eol-style-native, etc. - the vote email will point to the 
revision on the branch we think is vote-worthy: r1479391.


 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649404#comment-13649404
 ] 

Steve Rowe edited comment on LUCENE-4956 at 5/5/13 9:32 PM:


{quote}
Could you comment about the origins and authorship of 
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? 
Thanks!
{quote}

I looked at the file content, and it's definitely from Apache Commons Lang (the 
class is named {{StringUtils}} there, renamed {{StringUtil}} here), circa early 
2010, maybe with a little pulled in from another Commons Lang class.

I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, 
separators) - its javadoc is:

{code:java}
/**
 * pSplits the provided text into an array, separators specified.
 * This is an alternative to using StringTokenizer./p
 *
 * pThe separator is not included in the returned String array.
 * Adjacent separators are treated as one separator.
 * For more control over the split use the StrTokenizer class./p
 *
 * pA codenull/code input String returns codenull/code.
 * A codenull/code separatorChars splits on whitespace./p
 *
 * pre
 * StringUtil.split(null, *) = null
 * StringUtil.split(, *)   = []
 * StringUtil.split(abc def, null) = [abc, def]
 * StringUtil.split(abc def,  )  = [abc, def]
 * StringUtil.split(abc  def,  ) = [abc, def]
 * StringUtil.split(ab:cd:ef, :) = [ab, cd, ef]
 * /pre
 *
 * @param str  the String to parse, may be null
 * @param separatorChars  the characters used as the delimiters,
 *  codenull/code splits on whitespace
 * @return an array of parsed Strings, codenull/code if null String input
 */
{code}

I'm replacing calls to this method with calls to String.split(regex), where 
regex is [char]+, and char is the (in all cases singular) split character.

I'll commit the changes and the StringUtil.java removal in a little bit once 
I've got it compiling and the tests succeed.

  was (Author: steve_rowe):
bq. Could you comment about the origins and authorship of 
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? 
Thanks!

I looked at the file content, and it's definitely from Apache Commons Lang, 
sometime early 2010, maybe with a little pulled in from another Commons Lang 
pulled in.

I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, 
separators) - its javadoc is:

{code:java}
/**
 * pSplits the provided text into an array, separators specified.
 * This is an alternative to using StringTokenizer./p
 *
 * pThe separator is not included in the returned String array.
 * Adjacent separators are treated as one separator.
 * For more control over the split use the StrTokenizer class./p
 *
 * pA codenull/code input String returns codenull/code.
 * A codenull/code separatorChars splits on whitespace./p
 *
 * pre
 * StringUtil.split(null, *) = null
 * StringUtil.split(, *)   = []
 * StringUtil.split(abc def, null) = [abc, def]
 * StringUtil.split(abc def,  )  = [abc, def]
 * StringUtil.split(abc  def,  ) = [abc, def]
 * StringUtil.split(ab:cd:ef, :) = [ab, cd, ef]
 * /pre
 *
 * @param str  the String to parse, may be null
 * @param separatorChars  the characters used as the delimiters,
 *  codenull/code splits on whitespace
 * @return an array of parsed Strings, codenull/code if null String input
 */
{code}

I'm replacing calls to this method with calls to String.split(regex), where 
regex is [char]+, and char is the (in all cases singular) split character.

I'll commit the changes and the StringUtil.java removal in a little bit once 
I've got it compiling and the tests succeed.
  
 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically 

Re: VOTE: solr no longer webapp

2013-05-05 Thread Mark Miller

On May 5, 2013, at 9:30 AM, Gora Mohanty g...@mimirtech.com wrote:

 Given the reasons put forward, I have seen the error of my ways, and
 am now +1 for dropping the .war. Phasing in this change, with a
 deprecation announced in the 4.x series would help users in easing
 in the change.

My main argument is really that your imagination can run wild. I can't even 
predict all the good things that can come out of owning this layer of the app. 
The same way we couldn't predict all the changes that would come with the 
freedom of flexible indexing. I also don't think we need to drop the war right 
away - the first step is a mindset change. Start calling this an implementation 
detail in version whatever. It's hard to say who will have the time to do what 
work here when - but removing limits will allow many changes that are hard to 
impossible in a webapp world.

Help, I'm stuck in a webapp and I can't get out!

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649445#comment-13649445
 ] 

Steve Rowe commented on LUCENE-4956:


[~smlee0818], I don't understand the following method in 
{{WordSpaceAnalyzer.java}} - what's the point of the method always returning 
false? (i.e.: {{if(true) return false;}}): 

{code:java}
private boolean isNounPart(String str, int jstart) throws MorphException  {

  if(true) return false;

  for(int i=jstart-1;i=0;i--) {  
if(DictionaryUtil.getWordExceptVerb(str.substring(i,jstart+1))!=null)
  return true;
  }

  return false;
}
{code}

{{isNounPart()}} is only called from one method in the same class: 
{{findJosaEnd(snipt,jstart)}}:

{code:java}
if(DictionaryUtil.existJosa(str)  !findNounWithinStr(snipt,i,i+2)  
!isNounPart(snipt,jstart)) {
{code}

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: VOTE: solr no longer webapp

2013-05-05 Thread Gora Mohanty
On 6 May 2013 03:23, Mark Miller markrmil...@gmail.com wrote:

 On May 5, 2013, at 9:30 AM, Gora Mohanty g...@mimirtech.com wrote:

 Given the reasons put forward, I have seen the error of my ways, and
 am now +1 for dropping the .war. Phasing in this change, with a
 deprecation announced in the 4.x series would help users in easing
 in the change.

 My main argument is really that your imagination can run wild. I can't even 
 predict all the good things that can come out of owning this layer of the 
 app. The same way we couldn't predict all the changes that would come with 
 the freedom of flexible indexing. I also don't think we need to drop the war 
 right away - the first step is a mindset change. Start calling this an 
 implementation detail in version whatever. It's hard to say who will have the 
 time to do what work here when - but removing limits will allow many changes 
 that are hard to impossible in a webapp world.

 Help, I'm stuck in a webapp and I can't get out!

Completely agreed: As I said, I have seen the error of my ways :-)

People arguing for this change are right, and your point about a
mindset change is important. Being largely an end-user of Solr,
for me it took the arguments put forward in this thread to make
me realise the importance of such a mindset change. IMHO, the
intermediate step of deprecation is also important so that users
who do view Solr as a webapp, manage multiple installations
from such a perspective, and might not be subscribed to the dev
list are given reasonable warning.

Regards,
Gora

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest

2013-05-05 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-4980:
--

 Summary: Can't use DrillSideways with both RangeFacetRequest and 
non-RangeFacetRequest
 Key: LUCENE-4980
 URL: https://issues.apache.org/jira/browse/LUCENE-4980
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4
 Attachments: LUCENE-4980.patch

I tried to combine these two and there were several issues:

  * It's ... really tricky to manage the two different
FacetAccumulators across that N FacetCollectors that DrillSideways
creates ... to fix this I added a new MultiFacetsAccumulator that
switches for you.

  * There was still one place in DS/DDQ that wasn't properly handling
a non-Term drill-down.

  * There was a bug in the collector method for DrillSideways
whereby if a given segment had no hits, it was skipped, which is
incorrect because it must still be visited to tally up the
sideways counts.

  * Separately I noticed that DrillSideways was doing too much work:
it would count up drill-down counts *and* drill-sideways counts
against the same dim (but then discard the drill-down counts in
the end).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest

2013-05-05 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4980:
---

Attachment: LUCENE-4980.patch

Patch w/ test case + fixes.

 Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
 -

 Key: LUCENE-4980
 URL: https://issues.apache.org/jira/browse/LUCENE-4980
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.4

 Attachments: LUCENE-4980.patch


 I tried to combine these two and there were several issues:
   * It's ... really tricky to manage the two different
 FacetAccumulators across that N FacetCollectors that DrillSideways
 creates ... to fix this I added a new MultiFacetsAccumulator that
 switches for you.
   * There was still one place in DS/DDQ that wasn't properly handling
 a non-Term drill-down.
   * There was a bug in the collector method for DrillSideways
 whereby if a given segment had no hits, it was skipped, which is
 incorrect because it must still be visited to tally up the
 sideways counts.
   * Separately I noticed that DrillSideways was doing too much work:
 it would count up drill-down counts *and* drill-sideways counts
 against the same dim (but then discard the drill-down counts in
 the end).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: solr no longer webapp

2013-05-05 Thread Jack Krupansky
I don't think that I have ever thought of Solr as an application. I mean, it 
is a tool/server that application developers user to develop THEIR application. 
I've always thought of Solr as a server, and it has only been incidental (and 
fairly annoying) that it has been packaged as a webapp for deployment in a 
container.

If anything, I think of Solr as a server or app server that supports 
development and deployment of search apps - coded primarily in XML. Hardly 
your typical webapp.

Are there any examples of a tool/server comparable to Solr’s role in 
“application development” and support for “search apps” that happily live as a 
mere “webapp”?

That said, I'll stay agnostic for now as to the disposition of the war and 
support for containers such as Tomcat. Keeping war support in 4.x and dumping 
it in 5.0 seems fine, but...

Meanwhile, I'd like to see the Solr example grow up into a 
production/deployment example.

-- Jack Krupansky

From: Walter Underwood 
Sent: Saturday, May 04, 2013 6:35 PM
To: dev@lucene.apache.org 
Subject: Re: VOTE: solr no longer webapp

Let's not reject the classic replication mode out of hand. The tight coupling 
in Solr Cloud brings along a host of failure modes. 

For systems that do not have tight freshness requirements, regular old 
replication is awesome. The loose coupling allows extremely simple failure 
recovery. 

For example, AWS region failover is trivial with good ol' Solr replication. 
With Solr Cloud, I cannot figure out a way to do it. That happens to be a 
requirement from ops.

Also, comparing this to MySQL and Postgres is silly. There is no web framework 
for C or C++. Instead, let's name a few major Java applications that do not use 
servlet containers.

wunder

On May 4, 2013, at 2:25 PM, Mark Miller wrote:


  Supporting both just compounds our problems and doesn't go very far towards 
solving any.

  The only place the webapp will end up still making sense after a bit of time 
is in non solrcloud mode. The improvements it will bring will make it a dumb 
choice if you use SolrCloud. We already have enough baggage holding up 
SolrCloud because of supporting the std mode - adding to the list only makes my 
life even harder.

  We need to reduce the number of configurations we ship, not multiply them. I 
believe *very* strongly. We must start to focus the beam, there is already to 
much diffraction.

  - Mark

  On May 4, 2013, at 2:09 PM, Grant Ingersoll gsing...@apache.org wrote:


Why not just support both?  It really isn't all that hard.  While I agree 
w/ Robert and Mark that it's time to consider alternatives, I also don't think 
it is all that hard to support both from a user's perspective.  We could have a 
Netty(or other) version and a WAR version and I don't think it is that big of a 
deal to maintain.  After all, we already have the component pieces as JARs, 
just bundle them up differently for Netty, etc.



-Grant





On May 3, 2013, at 8:09 PM, Otis Gospodnetic wrote:



  But I'm really curious, what is the problem with Solr inside a

  container?  Which problem is this solving?  I feel like I missed some

  important thread which is highly possible. :)



  Thanks,

  Otis


  On Fri, May 3, 2013 at 1:14 PM, Robert Muir rcm...@gmail.com wrote:



# rm -rf tomcat

# gzip -dc solr.tgz | tar -xvf -

# cd solr/example

# java -jar start.jar





On Fri, May 3, 2013 at 9:52 AM, Steve Molloy smol...@opentext.com 
wrote:



  So, if ever this passes, what would be the upgrade path for all the

  deployments using Solr as a webapp inside tomcat or other container?

  

  From: Michael McCandless [luc...@mikemccandless.com]

  Sent: May 3, 2013 12:09 PM

  To: Lucene/Solr dev

  Subject: Re: VOTE: solr no longer webapp



  On Fri, May 3, 2013 at 12:14 AM, Robert Muir rcm...@gmail.com wrote:

I think solr should no longer be a war file but a search app. I 
don't

care

how it accomplishes this: jetty, netty, its all up to us.



Let me know your ideas: I think its a necessary step to move solr

forwards.



  +1



  Mike McCandless



  http://blog.mikemccandless.com



  -

  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

  For additional commands, e-mail: dev-h...@lucene.apache.org





  -

  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

  For additional commands, e-mail: dev-h...@lucene.apache.org







  -

  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

  For additional commands, 

Re: VOTE: solr no longer webapp

2013-05-05 Thread Grant Ingersoll
So, if we said 5.x was non-WAR, we could move forward with it and maintain 4.x 
as WAR.

I'll put my vote at +0.5.  There are a lot of people using Solr that are 
putting it into a standard, corporate approved web container.  I could see that 
argument going both ways.  On the one hand, no one asks what container MySql 
runs in, on the other, people have been trained for a lot of years on Solr as a 
WAR.

Frankly, I like how Restlet handles this stuff, for the most part.  Jetty (or 
other containers) are an implementation detail.

-Grant


On May 4, 2013, at 5:25 PM, Mark Miller wrote:

 Supporting both just compounds our problems and doesn't go very far towards 
 solving any.
 
 The only place the webapp will end up still making sense after a bit of time 
 is in non solrcloud mode. The improvements it will bring will make it a dumb 
 choice if you use SolrCloud. We already have enough baggage holding up 
 SolrCloud because of supporting the std mode - adding to the list only makes 
 my life even harder.
 
 We need to reduce the number of configurations we ship, not multiply them. I 
 believe *very* strongly. We must start to focus the beam, there is already to 
 much diffraction.
 
 - Mark
 
 On May 4, 2013, at 2:09 PM, Grant Ingersoll gsing...@apache.org wrote:
 
 Why not just support both?  It really isn't all that hard.  While I agree w/ 
 Robert and Mark that it's time to consider alternatives, I also don't think 
 it is all that hard to support both from a user's perspective.  We could 
 have a Netty(or other) version and a WAR version and I don't think it is 
 that big of a deal to maintain.  After all, we already have the component 
 pieces as JARs, just bundle them up differently for Netty, etc.
 
 -Grant
 
 
 On May 3, 2013, at 8:09 PM, Otis Gospodnetic wrote:
 
 But I'm really curious, what is the problem with Solr inside a
 container?  Which problem is this solving?  I feel like I missed some
 important thread which is highly possible. :)
 
 Thanks,
 Otis
 
 
 
 
 
 On Fri, May 3, 2013 at 1:14 PM, Robert Muir rcm...@gmail.com wrote:
 
 # rm -rf tomcat
 # gzip -dc solr.tgz | tar -xvf -
 # cd solr/example
 # java -jar start.jar
 
 
 On Fri, May 3, 2013 at 9:52 AM, Steve Molloy smol...@opentext.com wrote:
 
 So, if ever this passes, what would be the upgrade path for all the
 deployments using Solr as a webapp inside tomcat or other container?
 
 From: Michael McCandless [luc...@mikemccandless.com]
 Sent: May 3, 2013 12:09 PM
 To: Lucene/Solr dev
 Subject: Re: VOTE: solr no longer webapp
 
 On Fri, May 3, 2013 at 12:14 AM, Robert Muir rcm...@gmail.com wrote:
 I think solr should no longer be a war file but a search app. I don't
 care
 how it accomplishes this: jetty, netty, its all up to us.
 
 Let me know your ideas: I think its a necessary step to move solr
 forwards.
 
 +1
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 Grant Ingersoll | @gsingers
 http://www.lucidworks.com
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


Grant Ingersoll | @gsingers
http://www.lucidworks.com







[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

2013-05-05 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649452#comment-13649452
 ] 

Commit Tag Bot commented on LUCENE-4956:


[lucene4956 commit] sarowe
http://svn.apache.org/viewvc?view=revisionrevision=1479410

LUCENE-4956: - svn:eol-style - native
- tabs - spaces
- regularized java code indents to 2 spaces per level

 the korean analyzer that has a korean morphological analyzer and dictionaries
 -

 Key: LUCENE-4956
 URL: https://issues.apache.org/jira/browse/LUCENE-4956
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.2
Reporter: SooMyung Lee
Assignee: Christian Moen
  Labels: newbie
 Attachments: kr.analyzer.4x.tar


 Korean language has specific characteristic. When developing search service 
 with lucene  solr in korean, there are some problems in searching and 
 indexing. The korean analyer solved the problems with a korean morphological 
 anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
 korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
 and solr. If you develop a search service with lucene in korean, It is the 
 best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4981) Deprecate PositionFilter

2013-05-05 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-4981:


 Summary: Deprecate PositionFilter
 Key: LUCENE-4981
 URL: https://issues.apache.org/jira/browse/LUCENE-4981
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


According to the documentation 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory),
 PositionFilter is mainly useful to make query parsers generate boolean queries 
instead of phrase queries although this problem can be solved at query parsing 
level instead of analysis level (eg. using 
QueryParser.setAutoGeneratePhraseQueries).

So given that PositionFilter corrupts token graphs (see TestRandomChains), I 
propose to deprecate it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

2013-05-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649457#comment-13649457
 ] 

Michael McCandless commented on LUCENE-4975:


+1 to commit and iterate from here on... this new module looks very nice!

I like the new testConsistencyOnException ... maybe also call 
MDW.setRandomIOExceptionRateOnOpen?  This will additionally randomly throw 
exceptions from openInput/createOutput.

 Add Replication module to Lucene
 

 Key: LUCENE-4975
 URL: https://issues.apache.org/jira/browse/LUCENE-4975
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, 
 LUCENE-4975.patch


 I wrote a replication module which I think will be useful to Lucene users who 
 want to replicate their indexes for e.g high-availability, taking hot backups 
 etc.
 I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Including JTS in an Apache project

2013-05-05 Thread Ahmed Eldawy
Hi all,
 I saw that Apache solr uses JTS (Java Topology Suite) [
http://www.vividsolutions.com/jts/JTSHome.htm] for supporting a spatial
data type [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4].
Using JTS in an Apache project is not a straight forward thing as JTS is
licensed under LGPL which has some compatibility issued when included in an
Apache project. Now, I need to dome something very similar in another
Apache project (Pig [http://pig.apache.org/]) and I'm faced with the
licensing issue. I'm asking for your advice for the best way we can do to
use JTS without breaking the license issue. Does referring to JTS classes
from the code of an Apache project without actually including the classes
violate the license? Do we have to load the classes dynamically (using
Class#forName) or there is another way to do it?
Thanks in advance

Best regards,
Ahmed Eldawy


[jira] [Created] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)
Joel Bernstein created SOLR-4787:


 Summary: Join Contrib
 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1



This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

PostFilterQParserPlugin aka pjoin

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


JoinValueSourceParserPlugin aka vjoin

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like
the pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like
the pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:

This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

PostFilterQParserPlugin aka pjoin

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The 

[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like
the pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq={!pjoin fromCore=collection2 from=id_i to=id_i}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The 

[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

Initial pjoin and vjoin contrib.

TODO: Tests need to be created and the vjoin has some insanity issues with the 
FieldCache that will eventually be solved by using on-disk DocValues.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations.
 *PostFilterQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar to the join 
 implementation in the JoinQParserPlugin but differs in a couple of important 
 ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that can be 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 Query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*.PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like
the pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.


[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like
the pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*.PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 

[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 
results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like the 
pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
Query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 

[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter the main query. Only records from the main 
query, where the to field is present in the from list will be included in 
the results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like the 
pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter main query. Only records from the main query, 
where the to field is present in the from list will be included in the 

[jira] [Commented] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649512#comment-13649512
 ] 

Joel Bernstein commented on SOLR-4787:
--

The integer keys are faster to join and take up less memory in the in-memory 
join structures. So, string keys won't scale nearly as well. It may be possible 
to make them work, but it might scale about the same as the JoinQParserPlugin. 
Possibly other high performance string joins can be contribed as well. 

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar to the join 
 implementation in the JoinQParserPlugin but differs in a couple of important 
 ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that can be 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar in functionality to 
the JoinQParserPlugin but the implementation differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter the main query. Only records from the main 
query, where the to field is present in the from list will be included in 
the results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like the 
pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar to the join 
implementation in the JoinQParserPlugin but differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter the main query. Only records from the main 
query, where the to field is present in the from list will be 

[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar in functionality to 
the JoinQParserPlugin but the implementation differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that are used 
to quickly connect the join keys. It also uses a custom SolrCache named join 
to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter the main query. Only records from the main 
query, where the to field is present in the from list will be included in 
the results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like the 
pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar in functionality to 
the JoinQParserPlugin but the implementation differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that can be 
used to quickly connect the join keys. It also uses a custom SolrCache named 
join to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter the main query. Only records from the main 
query, where the to field is present in the from list will be 

[jira] [Updated] (SOLR-4787) Join Contrib

2013-05-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Description: 
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations. 
The initial patch was generated from the Solr 4.2.1 tag. Because of changes in 
the FieldCache API this patch will only build with Solr 4.2 or above.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar in functionality to 
the JoinQParserPlugin but the implementation differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that are used 
to quickly connect the join keys. It also uses a custom SolrCache named join 
to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from the from 
field that will be used to filter the main query. Only records from the main 
query, where the to field is present in the from list will be included in 
the results.

The solrconfig.xml in the main query core must contain the reference to the 
pjoin.

queryParser name=pjoin 
class=org.apache.solr.joins.PostFilterJoinQParserPlugin/

And the join contrib jars must be registed in the solrconfig.xml.

lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /

The solrconfig.xml in the fromcore must have the join SolrCache configured.

 cache name=join
  class=solr.LRUCache
  size=4096
  initialSize=1024
  /


*JoinValueSourceParserPlugin aka vjoin*

The second implementation is the JoinValueSourceParserPlugin aka vjoin. This 
implements a ValueSource function query that can return values from a second 
core based on join keys. This allows relevance data to be stored in a separate 
core and then joined in the main query.

The vjoin is called using the vjoin function query. For example:

bf=vjoin(fromCore, fromKey, fromVal, toKey)

This example shows vjoin being called by the edismax boost function 
parameter. This example will return the fromVal from the fromCore. The 
fromKey and toKey are used to link the records from the main query to the 
records in the fromCore.

As with the pjoin, both the fromKey and toKey must be integers. Also like the 
pjoin, the join SolrCache is used to hold the join memory structures.

To configure the vjoin you must register the ValueSource plugin in the 
solrconfig.xml as follows:

 valueSourceParser name=vjoin 
class=org.apache.solr.joins.JoinValueSourceParserPlugin /







  was:
This contrib provides a place where different join implementations can be 
contributed to Solr. This contrib currently includes 2 join implementations.

*PostFilterJoinQParserPlugin aka pjoin*

The pjoin provides a join implementation that filters results in one core based 
on the results of a search in another core. This is similar in functionality to 
the JoinQParserPlugin but the implementation differs in a couple of important 
ways.

The first way is that the pjoin is designed to work with integer join keys 
only. So, in order to use pjoin, integer join keys must be included in both the 
to and from core.

The second difference is that the pjoin builds memory structures that are used 
to quickly connect the join keys. It also uses a custom SolrCache named join 
to hold intermediate DocSets which are needed to build the join memory 
structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
perform the join.

The main advantage of the pjoin is that it can scale to join millions of keys 
between cores.

Because it's a PostFilter, it only needs to join records that match the main 
query.

The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
plugin is referenced by the string pjoin rather then join.

fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1

The example filter query above will search the fromCore (collection2) for 
user:customer1. This query will generate a list of values from 

Re: Including JTS in an Apache project

2013-05-05 Thread David Smiley (@MITRE.org)
Hi Ahmed,

I faced your conundrum with JTS early last year.  As you know, the Apache
Software Foundation doesn't like it's projects depending on GPL and even
LGPL licensed libraries.  The ASF does not have clear unambiguous language
on how its projects can depend on them in a limited sense.  Different PMCs
(projects) have different standards.  I've heard of one project (CXF?) that
uses Java reflection to use an LGPL library.  I think another downloads the
LGPL library as part of the build, and then the code has a compile-time
dependency (I could be mistaken).  If memory serves, in both cases the
dependency fit an optional role and not a core purpose of the software.  The
Lucene PMC in particular didn't formally vote to my knowledge but there was
a time when it was clear to me that such approaches were not acceptable.

The approach that the Lucene spatial developers took (me, Ryan, Chris) was
to create a non-ASF project called Spatial4j that is ASL licensed. 
Spatial4j *optionally* depends on JTS -- it's only for advanced shapes
(namely polygons) and for WKT parsing. 
https://github.com/spatial4j/spatial4j  BTW, WKT parsing will be handled by
Spatial4j itself in the near future without JTS. Spatial4j is not a subset
of JTS; it critically has things JTS doesn't like a native circle (not a
polygon approximation) and the concept of the world being a sphere instead
of flat ;-)  That's right, JTS, as critical as it is in the world of
open-source spatial, doesn't have any geodetic calculations, just Euclidean. 
Spatial4j adds dateline wrap support to JTS shapes so you can represent Fiji
for example, but not yet Antarctica (no pole wrap).  So I encourage the
Apache Pig project to take a look at using Spatial4j instead of directly
using JTS for the same reasons that the Lucene project uses it.  If you
ultimately decide not to then please let me know why, as I see Spatial4j
being an excellent fit for ASF projects in particular because of the
licensing issue.  

So your statement Apache Solr *uses* JTS is incorrect.  No it doesn't, and
nor does Lucene; not at all.  Instead, those projects use Spatial4j, which
has an abstraction (Shape), and it has an implementation of that abstraction
that depends on JTS.  It also has implementations that don't depend on JTS.

p.s. Last week I did a long presentation on Spatial in Lucene/Solr/Spatial4j
and I'd be happy to share the slides with you. The organizers will but they
haven't yet.

~ David Smiley


Ahmed El-dawy wrote
 Hi all,
  I saw that Apache solr uses JTS (Java Topology Suite) [
 http://www.vividsolutions.com/jts/JTSHome.htm] for supporting a spatial
 data type [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4].
 Using JTS in an Apache project is not a straight forward thing as JTS is
 licensed under LGPL which has some compatibility issued when included in
 an
 Apache project. Now, I need to dome something very similar in another
 Apache project (Pig [http://pig.apache.org/]) and I'm faced with the
 licensing issue. I'm asking for your advice for the best way we can do to
 use JTS without breaking the license issue. Does referring to JTS classes
 from the code of an Apache project without actually including the classes
 violate the license? Do we have to load the classes dynamically (using
 Class#forName) or there is another way to do it?
 Thanks in advance
 
 Best regards,
 Ahmed Eldawy





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Including-JTS-in-an-Apache-project-tp4060944p4060969.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2013-05-05 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649530#comment-13649530
 ] 

David Smiley commented on SOLR-4787:


Nice Joel!  I've done a custom join query recently but it's a bit different 
than either of yours.  I read your pjoin code in particular and it looks very 
good, mostly.  Your BSearch class is the only thing that made me frown.  
Instead of putting each name-value pair into their own key class (which isn't 
GC friendly), I suggest you take a look at Lucene's SorterTemplate which will 
allow you to collect your key  value integers directly into an array each, and 
then sort in-place when done.  I like your idea on caching the join; I should 
do that with mine.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.2.1

 Attachments: SOLR-4787.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 2 join implementations. 
 The initial patch was generated from the Solr 4.2.1 tag. Because of changes 
 in the FieldCache API this patch will only build with Solr 4.2 or above.
 *PostFilterJoinQParserPlugin aka pjoin*
 The pjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the pjoin is designed to work with integer join keys 
 only. So, in order to use pjoin, integer join keys must be included in both 
 the to and from core.
 The second difference is that the pjoin builds memory structures that are 
 used to quickly connect the join keys. It also uses a custom SolrCache named 
 join to hold intermediate DocSets which are needed to build the join memory 
 structures. So, the pjoin will need more memory then the JoinQParserPlugin to 
 perform the join.
 The main advantage of the pjoin is that it can scale to join millions of keys 
 between cores.
 Because it's a PostFilter, it only needs to join records that match the main 
 query.
 The syntax of the pjoin is the same as the JoinQParserPlugin except that the 
 plugin is referenced by the string pjoin rather then join.
 fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1
 The example filter query above will search the fromCore (collection2) for 
 user:customer1. This query will generate a list of values from the from 
 field that will be used to filter the main query. Only records from the main 
 query, where the to field is present in the from list will be included in 
 the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 pjoin.
 queryParser name=pjoin 
 class=org.apache.solr.joins.PostFilterJoinQParserPlugin/
 And the join contrib jars must be registed in the solrconfig.xml.
 lib dir=../../../dist/ regex=solr-joins-\d.*\.jar /
 The solrconfig.xml in the fromcore must have the join SolrCache configured.
  cache name=join
   class=solr.LRUCache
   size=4096
   initialSize=1024
   /
 *JoinValueSourceParserPlugin aka vjoin*
 The second implementation is the JoinValueSourceParserPlugin aka vjoin. 
 This implements a ValueSource function query that can return values from a 
 second core based on join keys. This allows relevance data to be stored in a 
 separate core and then joined in the main query.
 The vjoin is called using the vjoin function query. For example:
 bf=vjoin(fromCore, fromKey, fromVal, toKey)
 This example shows vjoin being called by the edismax boost function 
 parameter. This example will return the fromVal from the fromCore. The 
 fromKey and toKey are used to link the records from the main query to the 
 records in the fromCore.
 As with the pjoin, both the fromKey and toKey must be integers. Also like 
 the pjoin, the join SolrCache is used to hold the join memory structures.
 To configure the vjoin you must register the ValueSource plugin in the 
 solrconfig.xml as follows:
  valueSourceParser name=vjoin 
 class=org.apache.solr.joins.JoinValueSourceParserPlugin /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org