RE: [Lucene.Net] I want to help! Also, where are we at?
Good catch Stefan! From: bode...@apache.org To: seansevilt...@gmail.com; sean.new...@grantadesign.com CC: lucene-net-...@incubator.apache.org Date: Sat, 4 Feb 2012 08:07:48 +0100 Subject: Re: [Lucene.Net] I want to help! Also, where are we at? Hi Sean, I just now realized the responses you received http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201201.mbox/%3CCAFZAm_XwoDKkTK9AuJ=zeegvtqufdmebwbz89pd6lbbjguc...@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/201201.mbox/%3cca+p8kvobdicn-njpcua8obeqqzygdkpah5iu4j-6mr3796g...@mail.gmail.com%3E only went to the Lucene.Net list rather than yourself, you may have never received them. In short: You are more than welcome. Look around to see if you find anything you want to work on, do what you enjoy to do. We don't assign work to people, people pick the stuff they want to work on. If you have any questions or need help, don't hesitate to ask. In order to join the mailing list, which is where all discussion and coordination happens you have to send an email to lucene-net-dev-subscr...@incubator.apache.org using the email address you intend to use when posting to this list. Everybody can join the list. Cheers Stefan
RE: [Lucene.Net] 3.0.3
So, Chris if you did this as a direct port of the java version (https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/), Does that mean that all of the LUCENE JIRA issues (https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+LUCENE+AND+fixVersion+%3D+%223.0.3%22+AND+status+%3D+Closed+ORDER+BY+priority+DESCmode=hide) are part of this code already? That would make 3.0.3 well on it's way to release... ~P From: bode...@apache.org To: lucene-net-...@incubator.apache.org Date: Wed, 25 Jan 2012 12:35:25 +0100 Subject: Re: [Lucene.Net] 3.0.3 On 2012-01-25, Michael Herndon wrote: Do we have a standard of copy or tag of Java's version source that we're doing a compare against? I only see the 3_1 and above in the tags. Likely because the svn location has changed in between. I think it must be https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/ Stefan
Re: Changes to enable easy_install of packages using JCC
Hi Chris, On Wed, 1 Feb 2012, Andi Vajda wrote: No objections to these patches in principle but it would be easier for me to integrate them if you could provide patches computed from the svn repository of JCC: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/ Your patches seem to be small enough so I should be able to do without but it would be nicer if I didn't have to guess... I think the patch that I attached was already based on trunk. The git repository includes the .svn directories, points to trunk, and I generated the patch using svn diff. Sorry, I missed that you indeed had attached a patch last time. (to be continued...) Also, please write small descriptions for these new command line flags to go into JCC's __main__.py file: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/jcc/__main__.py Done, new patch attached. Thank you ! I integrated your patches with rev 1240624. I moved a few changes around :parameters to their section in __main__.py and 'maxstack' hardcoding to where it used to be. Thank you for your contribution. Andi.. This mess of setuptools patching was meant to be *temporary* until setuptools' issue 43 was fixed. As you can see, I filed this bug 3 1/2 years ago, http://bugs.python.org/setuptools/issue43, and my patch for issue 43 still hasn't been accepted, rejected, integrated, anything'ed... Dormant. For over three years. Sorry about that. I've had similar experience with bugs reported against ubuntu, hibernate, rails... :( * Why does JCC use non-standard command line arguments like --build and --install? Can it be modified to make it easier to invoke from a setup.py-style environment, such as exporting a setup() function as setuptools does? What standard are you referring to ? The python extension module build/install/deploy story on Python keeps evolving... Add Python 3.x support into the mix, and the mess is complete. Seriously, though, I think that the right thing to do to better integrate JCC with distutils/setuptools/distribute/pip/etc... is to make it into a distutils 'compiler'. This requires some work, though, and I haven't done it in all thee years. Anyone with the itch to hack on distutils is welcome to take that on. I'm afraid I don't fully understand how distutils works, it seems to be sparsely documented, and I don't have a lot of time and energy to work on refactoring jcc. I am a bit surprised that we can't just generate a source distribution containing the jars, .cpp files and a setup.py which does the rest like any other Python extension. Same here. I don't know distutils too well and whenever I tried to dig into it, I quickly gave up. I don't know what it means to just generate a source distribution. If they contain .class files, JAR files are not source files. My understanding could be wrong here, but I don't think they're even compatible between 32- and 64-bit VMs. Or is that incompatible between Java 5 and 6 ? I have very little itch to dabble in configure scripts either so I've been dragging my feet. If someone were to step forward with a patch for that, I'd be delighted in ripping out all this patching brittleness. How would a configure script solve the problem and what would it have to do? Generate the .cpp files? How does it integrate with Python extensions? A configure script for building libjcc.dylib (libjcc.so on Linux, jcc.dll on Windows, etc...) would take care of doing what setuptools + the issue43 patch is doing for us currently: invoking the C++ compiler and linker against the correct Python headers and Libraries to produce a vanilla shared library. With such a contribute script, there is no longer a need to patch setuptools. That is a whole different project. If I remember correctly, the JPype project is (or was) taking that approach: http://jpype.sourceforge.net OK, thanks. * Could JCC generate a source distribution (sdist) that could be uploaded to pypi? You mean a source distribution that includes the Java sources of all the libraries/classes wrapped ? I was thinking more of the jars. Something like https://github.com/aptivate/python-tika that doesn't depend on jcc any more. * setup.py develop is still broken in the current implementation I'm not familiar with this 'develop' command nor that it is broken. What is it supposed to be doing and how is it broken ? http://packages.python.org/distribute/setuptools.html#development-mode It seems that when invoked this way, my setup.py (from python-tika) which calls jcc ends up creating build/_tika as a file (not a directory). For example, this command: sudo pip install -e git+https://github.com/aptivate/python-tika#egg=tika (note the -e for editable mode) results in this: Running setup.py develop for tika ... Traceback (most recent call last): File string, line 1, in module File /tmp/src/tika/setup.py, line 108, in module cpp.jcc(jcc_args) File
Re: Changes to enable easy_install of packages using JCC
Hi Andi, On Sat, 4 Feb 2012, Andi Vajda wrote: I integrated your patches with rev 1240624. I moved a few changes around :parameters to their section in __main__.py and 'maxstack' hardcoding to where it used to be. Thank you for your contribution. Thanks :) Cheers, Chris. -- Aptivate | http://www.aptivate.org | Phone: +44 1223 760887 The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES Aptivate is a not-for-profit company registered in England and Wales with company number 04980791.
[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported
[ https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200362#comment-13200362 ] Tommaso Teofili commented on SOLR-3049: --- Hi Harsh, I think there should be a more general way of mapping typed parameters, just need to dig a little deeper to find it. However in the meantime I'll try and test your patch, thanks! UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported - Key: SOLR-3049 URL: https://issues.apache.org/jira/browse/SOLR-3049 Project: Solr Issue Type: Bug Components: update Reporter: Harsh P Priority: Minor Labels: uima, update_request_handler Attachments: SOLR-3049.patch solrconfig.xml file has an option to override certain UIMA runtime parameters in the UpdateRequestProcessorChain section. There are certain UIMA annotators like RegexAnnotator which define runtimeParameters value as an Array which is not currently supported in the Solr-UIMA interface. In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java, private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName) function defines override for UIMA analysis engine runtimeParameters as they are passed to UIMA Analysis Engine. runtimeParameters which are currently supported in the Solr-UIMA interface are: String Integer Boolean Float I have made a hack to fix this issue to add Array support. I would like to submit that as a patch if no one else is working on fixing this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200461#comment-13200461 ] Christian Moen commented on LUCENE-3745: I'll submit a patch for this tomorrow. Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Attachments: filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators
[ https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200476#comment-13200476 ] Brian Carver commented on SOLR-2649: I'm new to solr, so I have a tenuous grasp on some of these issues, but I've understood boolean logic for a couple of decades and it seems to me like solr's current behavior is thwarting the expectations of those who understand what they want and explicitly ask for it. Mike's example above is what troubles me. Principles: 1. The maintainer sets whitespace to be interpreted as AND or OR and solr should do nothing to change that in particular instances. 2. Where a user inputs an ambiguous query, a default rule about how operator scope will work is needed and that also should not be changed in particular instances. So, Mike says he sets whitespace to AND, users know this, and then a user enters: Example 1: (A or B or C) D E Given the above assumptions, the only reasonable interpretation of this is: (A or B or C) AND D E which is a conjunction with two conjuncts, both of which must be satisfied for a result to be produced, yet Mike/the user gets results that only satisfy one of the conjuncts. That shouldn't happen. I'd agree though that how to understand/apply mm in some of the examples above creates hard questions, but that is why many search engines provide two interfaces, one natural language interface and one that requires strict use of boolean syntax. Allowing people to enter some boolean operators (which they're going to expect will be respected-no-matter-what) and simultaneously interpreting their query using mm handlers intended for a more rough-and-ready approach is just going to lead to confused end users most of the time. So, in some ways, ignoring mm when operators are used is a feature, not a bug, but that seems orthogonal to the completely unacceptable outcome Mike described: whatever is causing THAT, is a bug. MM ignored in edismax queries with operators Key: SOLR-2649 URL: https://issues.apache.org/jira/browse/SOLR-2649 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.3 Reporter: Magnus Bergmark Priority: Minor Hypothetical scenario: 1. User searches for stocks oil gold with MM set to 50% 2. User adds -stockings to the query: stocks oil gold -stockings 3. User gets no hits since MM was ignored and all terms where AND-ed together The behavior seems to be intentional, although the reason why is never explained: // For correct lucene queries, turn off mm processing if there // were explicit operators (except for AND). boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; (lines 232-234 taken from tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java) This makes edismax unsuitable as an replacement to dismax; mm is one of the primary features of dismax. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-3.x - Build # 589 - Failure
Build: https://builds.apache.org/job/Solr-3.x/589/ All tests passed Build Log (for compile errors): [...truncated 36740 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12368 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12368/ No tests ran. Build Log (for compile errors): [...truncated 3338 lines...] [javac] lst.add(errors, numErrors); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:176: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(timeouts, numTimeouts); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:177: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(totalTime,totalTime); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:178: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(avgTimePerRequest, (float) totalTime / (float) this.numRequests); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java:179: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] lst.add(avgRequestsPerSecond, (float) numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart)); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java:216: warning: [unchecked] unchecked conversion [javac] found : org.apache.solr.util.RefCounted[] [javac] required: org.apache.solr.util.RefCountedorg.apache.solr.search.SolrIndexSearcher[] [javac] searchers = new RefCounted[sourceCores.length]; [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/component/ResponseBuilder.java:331: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] rsp.getResponseHeader().add( partialResults, Boolean.TRUE ); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/search/FunctionQParser.java:254: warning: [unchecked] unchecked conversion [javac] found : java.util.HashMap [javac] required: java.util.Mapjava.lang.String,java.lang.String [javac] int end = QueryParsing.parseLocalParams(qs, start, nestedLocalParams, getParams()); [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java:491: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] facet_queries.add(qf.getKey(), num(qf.count)); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/request/SimpleFacets.java:194: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] facetResponse.add(facet_queries, getFacetQueryCounts()); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/request/SimpleFacets.java:195: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] facetResponse.add(facet_fields, getFacetFieldCounts()); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/request/SimpleFacets.java:196: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of the raw type org.apache.solr.common.util.NamedList [javac] facetResponse.add(facet_dates, getFacetDateCounts()); [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/core/src/java/org/apache/solr/request/SimpleFacets.java:197: warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of
RE: svn commit: r1240035 - in /lucene/dev/branches/branch_3x/lucene/src: java/org/apache/lucene/analysis/TypeTokenFilter.java test/org/apache/lucene/analysis/TestTypeTokenFilter.java
Hi Tommaso, As you are a new committer, please take care of the following: - The branch 3.x of Lucene/Solr must still compile and test with Java 5, so after merging from trunk, run and compile all tests with Java 5. There is a bug/feature/whatever in Java 6's compiler that it does not complain about @Override on -source 1.5 -target 1.5 when added to interface implementations (but it should, as @Override is not allowed there in Java 5). - You had a merge relict (x somewhere). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: tomm...@apache.org [mailto:tomm...@apache.org] Sent: Friday, February 03, 2012 10:14 AM To: comm...@lucene.apache.org Subject: svn commit: r1240035 - in /lucene/dev/branches/branch_3x/lucene/src: java/org/apache/lucene/analysis/TypeTokenFilter.java test/org/apache/lucene/analysis/TestTypeTokenFilter.java Author: tommaso Date: Fri Feb 3 09:14:08 2012 New Revision: 1240035 URL: http://svn.apache.org/viewvc?rev=1240035view=rev Log: [LUCENE-3744] - applied patch for whiteList usage in TypeTokenFilter Modified: lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/analysis/T ypeTokenFilter.java lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/analysis/T estTypeTokenFilter.java Modified: lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/analysis/T ypeTokenFilter.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java /org/apache/lucene/analysis/TypeTokenFilter.java?rev=1240035r1=1240034 r2=1240035view=diff == --- lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/analysis/T ypeTokenFilter.java (original) +++ lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/anal +++ ysis/TypeTokenFilter.java Fri Feb 3 09:14:08 2012 @@ -29,17 +29,24 @@ public final class TypeTokenFilter exten private final SetString stopTypes; private final TypeAttribute typeAttribute = addAttribute(TypeAttribute.class); + private final boolean useWhiteList; - public TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, SetString stopTypes) { + public TypeTokenFilter(boolean enablePositionIncrements, TokenStream + input, SetString stopTypes, boolean useWhiteList) { super(enablePositionIncrements, input); this.stopTypes = stopTypes; +this.useWhiteList = useWhiteList; + } + + public TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, SetString stopTypes) { +this(enablePositionIncrements, input, stopTypes, false); } /** - * Returns the next input Token whose typeAttribute.type() is not a stop type. + * By default accept the token if its type is not a stop type. + * When the useWhiteList parameter is set to true then accept the + token if its type is contained in the stopTypes */ @Override protected boolean accept() throws IOException { -return !stopTypes.contains(typeAttribute.type()); +return useWhiteList == stopTypes.contains(typeAttribute.type()); } } Modified: lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/analysis/T estTypeTokenFilter.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/test/ org/apache/lucene/analysis/TestTypeTokenFilter.java?rev=1240035r1=12400 34r2=1240035view=diff == --- lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/analysis/T estTypeTokenFilter.java (original) +++ lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/anal +++ ysis/TestTypeTokenFilter.java Fri Feb 3 09:14:08 2012 @@ -23,9 +23,9 @@ import org.apache.lucene.analysis.tokena import org.apache.lucene.analysis.tokenattributes.TypeAttribute; import org.apache.lucene.util.English; +import java.util.Collections; import java.io.IOException; import java.io.StringReader; -import java.util.Collections; import java.util.Set; @@ -81,6 +81,13 @@ public class TestTypeTokenFilter extends stpf.close(); } + public void testTypeFilterWhitelist() throws IOException { +StringReader reader = new StringReader(121 is palindrome, while 123 is not); +SetString stopTypes = Collections.singleton(NUM); +TokenStream stream = new TypeTokenFilter(true, new StandardTokenizer(TEST_VERSION_CURRENT, reader), stopTypes, true); +assertTokenStreamContents(stream, new String[]{121, 123}); + } + // print debug info depending on VERBOSE private static void log(String s) { if (VERBOSE) { - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1240035 - in /lucene/dev/branches/branch_3x/lucene/src: java/org/apache/lucene/analysis/TypeTokenFilter.java test/org/apache/lucene/analysis/TestTypeTokenFilter.java
One more thing: Please merge changes from trunk to 3.x, not only apply patch twice. More info about the sometimes complicated merging (because of move to modules of some code parts): http://wiki.apache.org/lucene-java/SvnMerge I added the missing merge properties. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, February 04, 2012 6:51 PM To: dev@lucene.apache.org Cc: tommaso.teof...@gmail.com Subject: RE: svn commit: r1240035 - in /lucene/dev/branches/branch_3x/lucene/src: java/org/apache/lucene/analysis/TypeTokenFilter.java test/org/apache/lucene/analysis/TestTypeTokenFilter.java Hi Tommaso, As you are a new committer, please take care of the following: - The branch 3.x of Lucene/Solr must still compile and test with Java 5, so after merging from trunk, run and compile all tests with Java 5. There is a bug/feature/whatever in Java 6's compiler that it does not complain about @Override on -source 1.5 -target 1.5 when added to interface implementations (but it should, as @Override is not allowed there in Java 5). - You had a merge relict (x somewhere). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: tomm...@apache.org [mailto:tomm...@apache.org] Sent: Friday, February 03, 2012 10:14 AM To: comm...@lucene.apache.org Subject: svn commit: r1240035 - in /lucene/dev/branches/branch_3x/lucene/src: java/org/apache/lucene/analysis/TypeTokenFilter.java test/org/apache/lucene/analysis/TestTypeTokenFilter.java Author: tommaso Date: Fri Feb 3 09:14:08 2012 New Revision: 1240035 URL: http://svn.apache.org/viewvc?rev=1240035view=rev Log: [LUCENE-3744] - applied patch for whiteList usage in TypeTokenFilter Modified: lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/analys is/T ypeTokenFilter.java lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/analys is/T estTypeTokenFilter.java Modified: lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/analys is/T ypeTokenFilter.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/ java /org/apache/lucene/analysis/TypeTokenFilter.java?rev=1240035r1=124003 4 r2=1240035view=diff == --- lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/analys is/T ypeTokenFilter.java (original) +++ lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/an +++ al ysis/TypeTokenFilter.java Fri Feb 3 09:14:08 2012 @@ -29,17 +29,24 @@ public final class TypeTokenFilter exten private final SetString stopTypes; private final TypeAttribute typeAttribute = addAttribute(TypeAttribute.class); + private final boolean useWhiteList; - public TypeTokenFilter(boolean enablePositionIncrements, TokenStream input, SetString stopTypes) { + public TypeTokenFilter(boolean enablePositionIncrements, + TokenStream input, SetString stopTypes, boolean useWhiteList) { super(enablePositionIncrements, input); this.stopTypes = stopTypes; +this.useWhiteList = useWhiteList; } + + public TypeTokenFilter(boolean enablePositionIncrements, + TokenStream input, SetString stopTypes) { +this(enablePositionIncrements, input, stopTypes, false); } /** - * Returns the next input Token whose typeAttribute.type() is not a stop type. + * By default accept the token if its type is not a stop type. + * When the useWhiteList parameter is set to true then accept the + token if its type is contained in the stopTypes */ @Override protected boolean accept() throws IOException { -return !stopTypes.contains(typeAttribute.type()); +return useWhiteList == stopTypes.contains(typeAttribute.type()); } } Modified: lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/analys is/T estTypeTokenFilter.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/ test/ org/apache/lucene/analysis/TestTypeTokenFilter.java?rev=1240035r1=124 00 34r2=1240035view=diff == --- lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/analys is/T estTypeTokenFilter.java (original) +++ lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/an +++ al ysis/TestTypeTokenFilter.java Fri Feb 3 09:14:08 2012 @@ -23,9 +23,9 @@ import org.apache.lucene.analysis.tokena import org.apache.lucene.analysis.tokenattributes.TypeAttribute; import org.apache.lucene.util.English; +import java.util.Collections; import java.io.IOException; import
[jira] [Created] (SOLR-3096) Add book information to the new website
Add book information to the new website --- Key: SOLR-3096 URL: https://issues.apache.org/jira/browse/SOLR-3096 Project: Solr Issue Type: Task Reporter: David Smiley Attachments: website_books.patch The attached patch modifies the new website design to incorporate the book information. It ads a header mantle slideshow entry with both book images (just the 2 current books), and it adds a book page with the 3 books published (this includes the 1st edition that is out of date now). The image files referenced are the same actual binary images on the current website by I chose a more consistent naming convention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3096) Add book information to the new website
[ https://issues.apache.org/jira/browse/SOLR-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-3096: --- Attachment: website_books.patch Add book information to the new website --- Key: SOLR-3096 URL: https://issues.apache.org/jira/browse/SOLR-3096 Project: Solr Issue Type: Task Reporter: David Smiley Attachments: website_books.patch The attached patch modifies the new website design to incorporate the book information. It ads a header mantle slideshow entry with both book images (just the 2 current books), and it adds a book page with the 3 books published (this includes the 1st edition that is out of date now). The image files referenced are the same actual binary images on the current website by I chose a more consistent naming convention. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3750) Convert Versioned docs to Markdown/New CMS
Convert Versioned docs to Markdown/New CMS -- Key: LUCENE-3750 URL: https://issues.apache.org/jira/browse/LUCENE-3750 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Since we are moving our main site to the ASF CMS (LUCENE-2748), we should bring in any new versioned Lucene docs into the same format so that we don't have to deal w/ Forrest anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3749) Similarity.java javadocs and simplifications for 4.0
[ https://issues.apache.org/jira/browse/LUCENE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3749: Attachment: LUCENE-3749_part2.patch Here's part2: nuking SimilarityProvider (instead use PerFieldSimilarityWrapper if you want special per-field stuff). This really simplifies the APIs, especially for say a casual user who just wants to try out a new ranking model. Similarity.java javadocs and simplifications for 4.0 Key: LUCENE-3749 URL: https://issues.apache.org/jira/browse/LUCENE-3749 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3749.patch, LUCENE-3749_part2.patch As part of adding additional scoring systems to lucene, we made a lower-level Similarity and the existing stuff became e.g. TFIDFSimilarity which extends it. However, I always feel bad about the complexity introduced here (though I do feel there are some excuses, that its a difficult challenge). In order to try to mitigate this, we also exposed an easier API (SimilarityBase) on top of it that makes some assumptions (and trades off some performance) to try to provide something consumable for e.g. experiments. Still, we can cleanup a few things with the low-level api: fix outdated documentation and shoot for better/clearer naming etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2802) Toolkit of UpdateProcessors for modifying document values
[ https://issues.apache.org/jira/browse/SOLR-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200620#comment-13200620 ] Jan Høydahl commented on SOLR-2802: --- Sweet :) You got there before me Toolkit of UpdateProcessors for modifying document values - Key: SOLR-2802 URL: https://issues.apache.org/jira/browse/SOLR-2802 Project: Solr Issue Type: New Feature Reporter: Hoss Man Attachments: SOLR-2802_update_processor_toolkit.patch, SOLR-2802_update_processor_toolkit.patch, SOLR-2802_update_processor_toolkit.patch, SOLR-2802_update_processor_toolkit.patch Frequently users ask about questions about things where the answer is you could do it with an UpdateProcessor but the number of our of hte box UpdateProcessors is generally lacking and there aren't even very good base classes for the common case of manipulating field values when adding documents -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported
I will try to find a better way. I found this issue while using RegexAnnotator. On Sat, Feb 4, 2012 at 1:55 PM, Tommaso Teofili (Commented) (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200362#comment-13200362 ] Tommaso Teofili commented on SOLR-3049: --- Hi Harsh, I think there should be a more general way of mapping typed parameters, just need to dig a little deeper to find it. However in the meantime I'll try and test your patch, thanks! UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported - Key: SOLR-3049 URL: https://issues.apache.org/jira/browse/SOLR-3049 Project: Solr Issue Type: Bug Components: update Reporter: Harsh P Priority: Minor Labels: uima, update_request_handler Attachments: SOLR-3049.patch solrconfig.xml file has an option to override certain UIMA runtime parameters in the UpdateRequestProcessorChain section. There are certain UIMA annotators like RegexAnnotator which define runtimeParameters value as an Array which is not currently supported in the Solr-UIMA interface. In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java, private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName) function defines override for UIMA analysis engine runtimeParameters as they are passed to UIMA Analysis Engine. runtimeParameters which are currently supported in the Solr-UIMA interface are: String Integer Boolean Float I have made a hack to fix this issue to add Array support. I would like to submit that as a patch if no one else is working on fixing this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3726) Default KuromojiAnalyzer to use search mode
[ https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3726: --- Attachment: LUCENE-3726.patch Default KuromojiAnalyzer to use search mode --- Key: LUCENE-3726 URL: https://issues.apache.org/jira/browse/LUCENE-3726 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3726.patch, kuromojieval.tar.gz Kuromoji supports an option to segment text in a way more suitable for search, by preventing long compound nouns as indexing terms. In general 'how you segment' can be important depending on the application (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this in chinese) The current algorithm punishes the cost based on some parameters (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc) for long runs of kanji. Some questions (these can be separate future issues if any useful ideas come out): * should these parameters continue to be static-final, or configurable? * should POS also play a role in the algorithm (can/should we refine exactly what we decompound)? * is the Tokenizer the best place to do this, or should we do it in a tokenfilter? or both? with a tokenfilter, one idea would be to also preserve the original indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0) from my understanding this tends to help with noun compounds in other languages, because IDF of the original term boosts 'exact' compound matches. but does a tokenfilter provide the segmenter enough 'context' to do this properly? Either way, I think as a start we should turn on what we have by default: its likely a very easy win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3726) Default KuromojiAnalyzer to use search mode
[ https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3726: --- Attachment: LUCENE-3726.patch Default KuromojiAnalyzer to use search mode --- Key: LUCENE-3726 URL: https://issues.apache.org/jira/browse/LUCENE-3726 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3726.patch, LUCENE-3726.patch, kuromojieval.tar.gz Kuromoji supports an option to segment text in a way more suitable for search, by preventing long compound nouns as indexing terms. In general 'how you segment' can be important depending on the application (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this in chinese) The current algorithm punishes the cost based on some parameters (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc) for long runs of kanji. Some questions (these can be separate future issues if any useful ideas come out): * should these parameters continue to be static-final, or configurable? * should POS also play a role in the algorithm (can/should we refine exactly what we decompound)? * is the Tokenizer the best place to do this, or should we do it in a tokenfilter? or both? with a tokenfilter, one idea would be to also preserve the original indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0) from my understanding this tends to help with noun compounds in other languages, because IDF of the original term boosts 'exact' compound matches. but does a tokenfilter provide the segmenter enough 'context' to do this properly? Either way, I think as a start we should turn on what we have by default: its likely a very easy win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3726) Default KuromojiAnalyzer to use search mode
[ https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3726: --- Attachment: LUCENE-3726.patch Default KuromojiAnalyzer to use search mode --- Key: LUCENE-3726 URL: https://issues.apache.org/jira/browse/LUCENE-3726 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3726.patch, LUCENE-3726.patch, LUCENE-3726.patch, kuromojieval.tar.gz Kuromoji supports an option to segment text in a way more suitable for search, by preventing long compound nouns as indexing terms. In general 'how you segment' can be important depending on the application (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this in chinese) The current algorithm punishes the cost based on some parameters (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc) for long runs of kanji. Some questions (these can be separate future issues if any useful ideas come out): * should these parameters continue to be static-final, or configurable? * should POS also play a role in the algorithm (can/should we refine exactly what we decompound)? * is the Tokenizer the best place to do this, or should we do it in a tokenfilter? or both? with a tokenfilter, one idea would be to also preserve the original indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0) from my understanding this tends to help with noun compounds in other languages, because IDF of the original term boosts 'exact' compound matches. but does a tokenfilter provide the segmenter enough 'context' to do this properly? Either way, I think as a start we should turn on what we have by default: its likely a very easy win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3726) Default KuromojiAnalyzer to use search mode
[ https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200654#comment-13200654 ] Christian Moen commented on LUCENE-3726: The latest attached patch introduces a default mode in {{Segmenter}}, which is now {{Mode.SEARCH}}. This mode is used by {{KuromojiAnalyzer}} in Lucene without further code changes. The Solr factory duplicated the default mode, but now retrieves it from {{Segmenter}}. This way, we set the default mode for both Solr and Lucene in a single place (in {{Segmenter}}), which I find cleaner. I've also moved some constructors around in {{Segmenter}} and did some minor formatting/style changes. Default KuromojiAnalyzer to use search mode --- Key: LUCENE-3726 URL: https://issues.apache.org/jira/browse/LUCENE-3726 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3726.patch, LUCENE-3726.patch, LUCENE-3726.patch, kuromojieval.tar.gz Kuromoji supports an option to segment text in a way more suitable for search, by preventing long compound nouns as indexing terms. In general 'how you segment' can be important depending on the application (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this in chinese) The current algorithm punishes the cost based on some parameters (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc) for long runs of kanji. Some questions (these can be separate future issues if any useful ideas come out): * should these parameters continue to be static-final, or configurable? * should POS also play a role in the algorithm (can/should we refine exactly what we decompound)? * is the Tokenizer the best place to do this, or should we do it in a tokenfilter? or both? with a tokenfilter, one idea would be to also preserve the original indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0) from my understanding this tends to help with noun compounds in other languages, because IDF of the original term boosts 'exact' compound matches. but does a tokenfilter provide the segmenter enough 'context' to do this properly? Either way, I think as a start we should turn on what we have by default: its likely a very easy win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3751) Align default Japanese configurations for Lucene and Solr
Align default Japanese configurations for Lucene and Solr - Key: LUCENE-3751 URL: https://issues.apache.org/jira/browse/LUCENE-3751 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen The {{KuromojiAnalyzer}} in Lucene shoud have the same default configuration as the {{text_ja}} field type introduced in {{schema.xml}} by SOLR-3056. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3745: --- Attachment: LUCENE-3745.patch Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200680#comment-13200680 ] Christian Moen commented on LUCENE-3745: Please find a patch attached. I've made {{stoptags.txt}} lighter by not stopping all prefixes and also allowing auxiliary verbs and interjections to pass. I didn't come across any occurrences of unclassified symbols (記号) in Wikipedia, but it is now stopped as that seem to align better with our overall stop approach for symbols. Many of the most frequent terms that now pass have been re-introduced in {{stopwords.txt} so they are stopped using a {{StopFilter}} instead of {{KuromojiPartOfSpeechStopFilter}}. I believe this configuration is more balanced. Overall, I've used the term frequencies attached to as a governing guideline for what to introduce into {{stopwords.txt}}. It mostly contains hiragana words and expressions and I've deliberately left out common kanji as I'd like to keep the stopping fairly light. I'll create a separate JIRA for introducing stopwords and stoptags to Solr. Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
[ https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated SOLR-3097: - Attachment: SOLR-3097.patch Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3097.patch SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
[ https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200686#comment-13200686 ] Christian Moen commented on SOLR-3097: -- Patch for {{trunk}} and {{branch_3x}} attached. Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3097.patch SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org