[jira] [Created] (LUCENE-2988) trunk 'ant test' hangs
trunk 'ant test' hangs -- Key: LUCENE-2988 URL: https://issues.apache.org/jira/browse/LUCENE-2988 Project: Lucene - Java Issue Type: Bug Components: Tests Environment: inspected so far on XP within Cygwin using IBM JDK 6 Reporter: Doron Cohen Assignee: Doron Cohen Fix For: 4.0 Running 'ant test' from trunk on XP in a Cygwin shell hangs, taking 100% CPU. There was no progress in the console for a long time, so i stopped the program. Before stopping it, created 5 consecutive thread dumps to see where the code is. It is not clear what is going on - does not seem like a Lucene code I think but not sure. Opening this issue to keep an eye on this - I will try with other JDKs to see if this is persistent. Also, when first seeing this had local changes of two issue: LUCENE-2986 and LUCENE-2977 - I think the changes in these issues are related but will repeat the tests without these changes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-2061: -- Attachment: SOLR-2061.patch This version of the patch includes all of Robert's, and adds in Maven and IntelliJ support. The Solr test-framework binary, source, and javadoc jars are produced by {{ant generate-maven-artifacts}} and signed, along with their {{.pom}} file, by {{ant sign-artifacts}}. The Maven build works through the {{install}} phase, including the {{test}} phase, switching all modules to depend on the new Solr test framework jar instead of the jar produced from all Solr test sources. The IntelliJ build works, and all modules' test suites run and succeed. > Generate jar containing test classes. > - > > Key: SOLR-2061 > URL: https://issues.apache.org/jira/browse/SOLR-2061 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 3.1 >Reporter: Drew Farris >Assignee: Robert Muir >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, > SOLR-2061.patch > > > Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate > and deploy a jar contaiing the test classes so other projects could write > unit tests using the framework in Solr. > This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2987) QueryParser throwing null pointer exception if input is invalid
QueryParser throwing null pointer exception if input is invalid --- Key: LUCENE-2987 URL: https://issues.apache.org/jira/browse/LUCENE-2987 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 3.0.2 Reporter: Ramesh I was using org.apache.lucene.queryParser.QueryParser for parsing the input. My input: Input query string: "category:(4 or 6 or 8)" Analyzer: StandardAnalyzer QueryParser's parse() method is resulting in Null Pointer Exception. If i give input query string as "category:(4 OR 6 OR 8)" which is uppercase 'OR', it works fine and i get the desired results. I'm seeing the problem only with lower case 'or' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2987) QueryParser throwing null pointer exception if input is invalid
[ https://issues.apache.org/jira/browse/LUCENE-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh updated LUCENE-2987: --- Description: I was using org.apache.lucene.queryParser.QueryParser for parsing the input. My input: Input query string: "category: (4 or 6 or 8)" Analyzer: StandardAnalyzer QueryParser's parse() method is resulting in Null Pointer Exception. If i give input query string as "category: (4 OR 6 OR 8)" which is uppercase 'OR', it works fine and i get the desired results. I'm seeing the problem only with lower case 'or' was: I was using org.apache.lucene.queryParser.QueryParser for parsing the input. My input: Input query string: "category:(4 or 6 or 8)" Analyzer: StandardAnalyzer QueryParser's parse() method is resulting in Null Pointer Exception. If i give input query string as "category:(4 OR 6 OR 8)" which is uppercase 'OR', it works fine and i get the desired results. I'm seeing the problem only with lower case 'or' > QueryParser throwing null pointer exception if input is invalid > --- > > Key: LUCENE-2987 > URL: https://issues.apache.org/jira/browse/LUCENE-2987 > Project: Lucene - Java > Issue Type: Bug > Components: QueryParser >Affects Versions: 3.0.2 >Reporter: Ramesh > > I was using org.apache.lucene.queryParser.QueryParser for parsing the input. > My input: > Input query string: "category: (4 or 6 or 8)" > Analyzer: StandardAnalyzer > QueryParser's parse() method is resulting in Null Pointer Exception. > If i give input query string as "category: (4 OR 6 OR 8)" which is uppercase > 'OR', it works fine and i get the desired results. > I'm seeing the problem only with lower case 'or' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity
[ https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010570#comment-13010570 ] Doron Cohen commented on LUCENE-2986: - +1 for this change (I did not remember discussing this, but other than remembering I am consistent :)) Patch looks very clean. Minor technical comments - concerning just some tests: - some of the DSP implementations are still named xyzSimilarity - I think it would be more clear to name them xyzSimilarityProvider: -- o.a.l.search.payloads.TestPayloadNearQuery.BoostingSimilarity -- o.a.l.search.payloads.TestPayloadTermQuery.BoostingSimilarity -- o.a.solr.schema.MockConfigurableSimilarity -- o.a.l.index.TestIndexWriterConfig.MySimilarity -- o.a.l.index.TestIndexReaderCloneNorms.SimilarityOne -- o.a.l.index.TestNorms.SimilarityOne -- o.a.l.index.TestOmitTf.SimpleSimilarity -- o.a.l.search.TestSimilarity.SimpleSimilarity - for few of the above it is not only the name - they are still doing both roles: {code}extends DefaultSimilarity implements SimilarityProvider{code}: -- o.a.l.search.payloads.TestPayloadNearQuery.BoostingSimilarity -- o.a.l.search.payloads.TestPayloadTermQuery.BoostingSimilarity -- o.a.l.index.TestOmitTf.SimpleSimilarity -- o.a.l.search.TestSimilarity.SimpleSimilarity Other than that I think it is good to go in. Also, tests from trunk/lucene and trunk/solr passed. (I am seeing problems in running all trunk tests, at least on Windows, but I'll send a separate mail to the list on that) > divorce defaultsimilarityprovider from defaultsimilarity > > > Key: LUCENE-2986 > URL: https://issues.apache.org/jira/browse/LUCENE-2986 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-2986.patch > > > In LUCENE-2236 as a start, we made DefaultSimilarity which implements the > factory interface (SimilarityProvider), and also extends Similarity. > Its factory interface just returns itself always by default. > Doron mentioned it would be cleaner to split the two, and I thought it would > be good to revisit it later. > Today as I was looking at SOLR-2338, it became pretty clear that we should do > this, it makes things a lot cleaner. I think currently its confusing to users > to see the two apis mixed if they are trying to subclass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
> > I don't think someone should have to deal with maven to get the lucene > source release... I think lucene should have its own artifacts as in > the past (the source code being the most important). > sorry, did not mean to muddy the water with maven discussion... ignore my comment when you say "lucene should have its own artifacts" do you mean lucene w/o solr? or could a single source artifact include everything? (making the release process easier and apparently cleaner) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
GSoC 2011
Hello, I am planning to submit a project proposal to GSoC 2011 and Lucene seems to have a lot of GSoC projects this year. Last year I did a GSoC project using Lucene for PhotArk project. This year, instead of just using Lucene, I am planning to contribute code to it. My experience with Lucene is just as a regular user, the only code I have changed/extended so far was token streams/analyzers and query parser, so I have more knowledge on this part of the code. Based on that, I'm planning to focus on query parser and analyzer/token stream projects. Does that sound reasonable? I will be studying the code and planning the proposal(s), so you should start seeing more posts from me in the next few days. -- Phillipe Ramalho
Re: [VOTE] Release Lucene/Solr 3.1
On Thu, Mar 24, 2011 at 12:18 AM, Ryan McKinley wrote: > > I don't want to suggest anything to slow down the release... but if > the problems are with the source release, what about just doing a > single source release for lucene+solr? > > We currently have: > > lucene-solr-3.1RC2/lucene/ > lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz > lucene-solr-3.1RC2/lucene/... > lucene-solr-3.1RC2/solr/ > lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz > lucene-solr-3.1RC2/solr/... > > Why not: > lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz > lucene-solr-3.1RC2/lucene/... > lucene-solr-3.1RC2/solr/... > > and let the src release be as close to svn export as possible? This > will make sure the result builds just as it does when we actually > build it! > > With the maven artifacts, we have source for each jar: > http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar > > http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar > > I'm not sure the exact ASF source requirements, but maybe the maven > source.jar files are good enough? > I don't think someone should have to deal with maven to get the lucene source release... I think lucene should have its own artifacts as in the past (the source code being the most important). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2338: -- Attachment: SOLR-2338.patch Here's a first stab: I included LUCENE-2986's cleanup work for easy testing (this issue depends upon it). Here is the syntax: {noformat} is there an echo? {noformat} Additionally, its necessary to allow customization of the SimilarityProvider too, in order to customize the non-field specific stuff like coord()... this is done via: {noformat} is there an echo? {noformat} > improved per-field similarity integration into schema.xml > - > > Key: SOLR-2338 > URL: https://issues.apache.org/jira/browse/SOLR-2338 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: SOLR-2338.patch > > > Currently since LUCENE-2236, we can enable Similarity per-field, but in > schema.xml there is only a 'global' factory > for the SimilarityProvider. > In my opinion this is too low-level because to customize Similarity on a > per-field basis, you have to set your own > CustomSimilarityProvider with and manage the > per-field mapping yourself in java code. > Instead I think it would be better if you just specify the Similarity in the > FieldType, like after . > As far as the example, one idea from LUCENE-1360 was to make a "short_text" > or "metadata_text" used by the > various metadata fields in the example that has better norm quantization for > its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
> > : Please vote to release the artifacts at > : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2 > > -0 > > I can't in good conscience vote for these artifacts. > I don't want to suggest anything to slow down the release... but if the problems are with the source release, what about just doing a single source release for lucene+solr? We currently have: lucene-solr-3.1RC2/lucene/ lucene-solr-3.1RC2/lucene/lucene-3.1.0-src.tar.gz lucene-solr-3.1RC2/lucene/... lucene-solr-3.1RC2/solr/ lucene-solr-3.1RC2/solr/apache-solr-3.1.0-src.tgz lucene-solr-3.1RC2/solr/... Why not: lucene-solr-3.1RC2/lucene-3.1.0-src.tar.gz lucene-solr-3.1RC2/lucene/... lucene-solr-3.1RC2/solr/... and let the src release be as close to svn export as possible? This will make sure the result builds just as it does when we actually build it! With the maven artifacts, we have source for each jar: http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/solr/maven/org/apache/solr/solr-core/3.1.0/solr-core-3.1.0-sources.jar http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2/lucene/maven/org/apache/lucene/lucene-queries/3.1.0/lucene-queries-3.1.0-sources.jar I'm not sure the exact ASF source requirements, but maybe the maven source.jar files are good enough? Again, I don't think this should be a blocker, but it would be nice to have things simplified for the next release -- gasp. ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010547#comment-13010547 ] Shai Erera commented on LUCENE-2977: Looks good to me. > WriteLineDocTask should write gzip/bzip2/txt according to the extension of > specified output file name > - > > Key: LUCENE-2977 > URL: https://issues.apache.org/jira/browse/LUCENE-2977 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2977.patch, LUCENE-2977.patch > > > Since the readers behave this way it would be nice and handy if also this > line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
: Please vote to release the artifacts at : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2 -0 I can't in good conscience vote for these artifacts. For the most part, there only only a few minor hicups -- but the big blocker (in my opinion) is that since RC1, dev-tools has been removed from the solr src packages and this causes the top level build.xml (and instructions for IDE users in the top level README.txt file) to be broken. My detailed notes below... ## ### apache-solr-3.1.0-src.tgz dev-tools isn't in here -- this totally boggles my mind, particularly since there was a deliberate and concious switch to make the source releases match what you get when doing an "svn export" because dev-tools is missing, 3 of the top level ant targets advertised using "ant -p" don't work; including 'ant idea' and 'ant eclipse' which are also explicitly mentioned in the top level README.txt as how people using those IDEs should get started developing the code. This seems like a major issue to me. we're setting ourselves up to make the release look completely broken right out of the gate for anyone using one of those IDEs. Ask about this on IRC. yonik & ryan indicated that a couple of folks had said they would veto any release with dev-tools in it because that stuff is suppose to be "unsupported" ... this makes no sense to me as we have lots of places in the code base where things are documented as being experimental, subject to change, and/or for developer use only. i don't relaly see how dev-tools should be any different. if there is really such violent oposition to including dev-tools in src releases, then the top level build.xml should not depend on it, and the top level README.txt should not refer to it (except maybe with something like "people interested in hacking on the src should use svn which includes some unofficial 'dev-tools'" --- Now that the src packages are driven by svn exports, more files exist then were in RC1 and some of the changes we made to the solr/README.txt based on the earlier release candidates are missleading. In particular a lot of things are listed as being in the "docs" directory of a binary distribution, but those files *do* exist in the src packages -- if you look in the "site" directory. This seems silly, but at no point is the README.txt factually incorrect, so I guess it's not a big enough deal to worry about. --- running all tests, running the example, and building the javadocs all worked fine. ## ### apache-solr-3.1.0.tgz docs look good, basic example usage works fine. ## ### apache-solr-3.1.0.zip Diffing the contents of apache-solr-3.1.0.tgz with apache-solr-3.1.0.zip (using "diff --ignore-all-space --strip-trailing-cr -r") turned up a quite a fiew instances where the CRLF fixing in build.xml seems to have corrupted some non-ascii characters in a few files contrib/dataimporthandler/lib/activation-LICENSE.txt contrib/dataimporthandler/lib/mail-LICENSE.txt docs/skin/CommonMessages_de.xml docs/skin/CommonMessages_es.xml docs/skin/CommonMessages_fr.xml example/solr/conf/velocity/facet_dates.vm ...but these changes don't seem to have substantively harmed the files. ## ### lucene-3.1.0-src.tar.gz tests and javadocs worked fine. ## ### lucene-3.1.0.tar.gz docs look good, demo runs fine. ## ### lucene-3.1.0.zip no differences found with lucene-3.1.0.tar.gz -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2439) change solr javadocs to link to local lucene javadocs w/relative links
change solr javadocs to link to local lucene javadocs w/relative links -- Key: SOLR-2439 URL: https://issues.apache.org/jira/browse/SOLR-2439 Project: Solr Issue Type: Task Components: documentation Reporter: Hoss Man Fix For: 3.2 Now that solr/lucene are in lock step development, and solr releases include the entire lucene-java release, the solr ant targets for building javadocs should depend on the lucene (and module) targets for building javadocs and link directly to the local copies of those docs (using relative paths) (currently, the links point to https://hudson.apache.org/hudson/job/Lucene-trunk/javadoc/all/) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010318#comment-13010318 ] Stefan Matheis (steffkes) edited comment on SOLR-2399 at 3/23/11 8:15 PM: -- Ryan: ty, will take your points on my list - pretty sure, that it should be possible to integrate them Mark: ty! :) For today, it's about *Logging*. Talked about that with Hoss on #solr the last days, so already changed a few things .. on the way, but not finished: http://files.mathe.is/solr-admin/07_logging.png Actually thinking about the following points: * Tree Structure good way to solve it? * Do we need the possibitly to collapse/expand the three/the childrens? The List could be longer (the screenshot is cropped, just for layout reasons) especially while using SolrCloud which adds about 30 Loggers * In the current er .. "Interface" you are able to see that the row you're looking at has a level set and in the end (at the right) which is the effective level - for me, that does not matter. if a row/logger, has level-x - that's enough to know. don't need to see if this level is set or inherited. * just a quick idea: if you change f.e. {{org.apache.solr}} then the interface will automatically update all childrens in realtime, affects all nested/sub loggers w/o a assigned level. Thoughts on these points? anyone? :> Short Note: i moved Logging to a global level, because it's not configurable on a per-core basis. # Edit What i forgot to mention .. actually it's based on a [static logging.json-file|https://github.com/steffkes/solr-admin/blob/master/logging.json] but will try to change the {{LogLevelSection}} Servlet so that it outputs the needed json-structure was (Author: steffkes): Ryan: ty, will take your points on my list - pretty sure, that it should be possible to integrate them Mark: ty! :) For today, it's about *Logging*. Talked about that with Hoss on #solr the last days, so already changed a few things .. on the way, but not finished: http://files.mathe.is/solr-admin/07_logging.png Actually thinking about the following points: * Tree Structure good way to solve it? * Do we need the possibitly to collapse/expand the three/the childrens? The List could be longer (the screenshot is cropped, just for layout reasons) especially while using SolrCloud which adds about 30 Loggers * In the current er .. "Interface" you are able to see that the row you're looking at has a level set and in the end (at the right) which is the effective level - for me, that does not matter. if a row/logger, has level-x - that's enough to know. don't need to see if this level is set or inherited. * just a quick idea: if you change f.e. {{org.apache.solr}} then the interface will automatically update all childrens in realtime, affects all nested/sub loggers w/o a assigned level. Thoughts on these points? anyone? :> Short Note: i moved Logging to a global level, because it's not configurable on a per-core basis. > Solr Admin Interface, reworked > -- > > Key: SOLR-2399 > URL: https://issues.apache.org/jira/browse/SOLR-2399 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Stefan Matheis (steffkes) >Priority: Minor > Fix For: 4.0 > > > *The idea was to create a new, fresh (and hopefully clean) Solr Admin > Interface.* [Based on this > [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] > I've quickly created a Github-Repository (Just for me, to keep track of the > changes) > » https://github.com/steffkes/solr-admin > [This commit shows the > differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] > between old/existing index.jsp and my new one (which is could > copy-cut/paste'd from the existing one). > Main Action takes place in > [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] > which is actually neither clean nor pretty .. just work-in-progress. > Actually it's Work in Progress, so ... give it a try. It's developed with > Firefox as Browser, so, for a first impression .. please don't use _things_ > like Internet Explorer or so ;o > Jan already suggested a bunch of good things, i'm sure there are more ideas > over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2977: Attachment: LUCENE-2977.patch Thanks for reviewing Shai! bq. In StreamUtils you have ".bz" -- it should be ".bz2" Good catch! Fixed. bq. +1 (you mean the bzip.compression property in WLDT right?). Yes. bq. I think that it's reasonable to request the user to specify an output file with .bz2 extension if he wants bzip compression. Great, I removed it. bq. I don't see how it will simplify StreamUtils though, but I trust you :) (perhaps you meant it will simplify WLDT?) It allowed to keep just one of the two variations of StreamUtils.outputStream(). WLDT and the tests became simpler as well. Attaching updated patch. (again first apply that svn mv...) > WriteLineDocTask should write gzip/bzip2/txt according to the extension of > specified output file name > - > > Key: LUCENE-2977 > URL: https://issues.apache.org/jira/browse/LUCENE-2977 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2977.patch, LUCENE-2977.patch > > > Since the readers behave this way it would be nice and handy if also this > line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010318#comment-13010318 ] Stefan Matheis (steffkes) commented on SOLR-2399: - Ryan: ty, will take your points on my list - pretty sure, that it should be possible to integrate them Mark: ty! :) For today, it's about *Logging*. Talked about that with Hoss on #solr the last days, so already changed a few things .. on the way, but not finished: http://files.mathe.is/solr-admin/07_logging.png Actually thinking about the following points: * Tree Structure good way to solve it? * Do we need the possibitly to collapse/expand the three/the childrens? The List could be longer (the screenshot is cropped, just for layout reasons) especially while using SolrCloud which adds about 30 Loggers * In the current er .. "Interface" you are able to see that the row you're looking at has a level set and in the end (at the right) which is the effective level - for me, that does not matter. if a row/logger, has level-x - that's enough to know. don't need to see if this level is set or inherited. * just a quick idea: if you change f.e. {{org.apache.solr}} then the interface will automatically update all childrens in realtime, affects all nested/sub loggers w/o a assigned level. Thoughts on these points? anyone? :> Short Note: i moved Logging to a global level, because it's not configurable on a per-core basis. > Solr Admin Interface, reworked > -- > > Key: SOLR-2399 > URL: https://issues.apache.org/jira/browse/SOLR-2399 > Project: Solr > Issue Type: Improvement > Components: web gui >Reporter: Stefan Matheis (steffkes) >Priority: Minor > Fix For: 4.0 > > > *The idea was to create a new, fresh (and hopefully clean) Solr Admin > Interface.* [Based on this > [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] > I've quickly created a Github-Repository (Just for me, to keep track of the > changes) > » https://github.com/steffkes/solr-admin > [This commit shows the > differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] > between old/existing index.jsp and my new one (which is could > copy-cut/paste'd from the existing one). > Main Action takes place in > [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] > which is actually neither clean nor pretty .. just work-in-progress. > Actually it's Work in Progress, so ... give it a try. It's developed with > Firefox as Browser, so, for a first impression .. please don't use _things_ > like Internet Explorer or so ;o > Jan already suggested a bunch of good things, i'm sure there are more ideas > over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2415) Change XMLWriter version parameter to "wt.xml.version"
[ https://issues.apache.org/jira/browse/SOLR-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010308#comment-13010308 ] Hoss Man commented on SOLR-2415: bq. how should we handle the desire to change the faceting format (to make it easier to add metadata like total number of constraints, etc)? "version" would be one way. "facet.format" would be another way. i don't think the *structure* of the response (ie: the facet response section) should be driven by the same param as the *format* of the response, which is what "version" currently is. Something like facet.format seems more appropriate when dealing with a specific component like that ... but i don't think it should be a numeric "version" equse property, i think it should be descriptive (ie: "flat", vs "nested" or something) bq. perhaps we should add a getVersion() parameter on SolrQueryRequest and have that used across all components. when i suggested we have a common wt.version param that all of the response writers could use, i didn't mean to suggest that it should have a singular id space. my suggestion was that the specific values specified for "version" or "wt.version" or whatever would only be meaningful to the specific response writer used -- just as the current values of the version param that the XMLResponseWriter uses are meaninless to the JSONResponseWriter. the overlap would only be in reusing the param name (in the same way that "q" is the common param name for the main query, regardless of what query parser is specified by "defType") bq. Look at how long the existing response writers have hung around in their current format, independent of the version # changes (1.2, 1.3, 1.4, and now 3.1) the version param of the XML response writer has never been in sync with the solr version, it was never intended to be. it's always been the version number of the xml format. > Change XMLWriter version parameter to "wt.xml.version" > -- > > Key: SOLR-2415 > URL: https://issues.apache.org/jira/browse/SOLR-2415 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Priority: Trivial > Fix For: 4.0 > > > The XMLWriter has a parameter called 'version'. This controls some specifics > about how the XMLWriter works. Using the parameter name 'version' made sense > back when the XMLWriter was the only option, but with all the various writers > and different places where 'version' makes sense, I think we should change > this parameter name to "wt.xml.version" so that it specifically refers to the > XMLWriter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: write byte[] directly to TokenStream
works great - thanks! On Wed, Mar 23, 2011 at 1:04 AM, Robert Muir wrote: > > On Mar 22, 2011 11:38 PM, "Ryan McKinley" wrote: >> >> I'm messing with putting binary data directly in the index. I have a >> field class with: >> >> @Override >> public TokenStream tokenStreamValue() { >> byte[] value = (byte[])fieldsData; >> >> Token token = new Token( 0, value.length, "geo" ); >> token.resizeBuffer( value.length ); >> BytesRef ref = token.getBytesRef(); >> ref.bytes = value; >> ref.length = value.length; >> ref.offset = 0; >> token.setLength( ref.length ); >> return new SingleTokenTokenStream( token ); >> } >> >> but that is just writing an empty token. Is it possible to set the >> Token value without converting to char[]? >> > > check out Test2BTerms for an example... > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010268#comment-13010268 ] Peter Sturge commented on SOLR-2438: If you're like me, you may have often wondered why MyTerm, myterm, myter* and MyTer* can return different, and sometimes empty results. This patch addresses this for wildcard queries by adding an attribute to relevant solr.TextField entries in schema.xml. The new attribute is called: {{ignoreCaseForWildcards}} Example entry in schema.xml: {code:title=schema.xml [excerpt]|borderStyle=solid} {code} It's worth noting that this will lower-case text for ALL terms that match the field type - including synonyms and stemmers. For backward compatibility, the default behaviour is as before - i.e. a case sensitive wildcard search ({{ignoreCaseForWildcards=false}}). The patch was created against the lucene_solr_3_1 branch. I've not applied it yet on trunk. [caveat emptor] I freely admit I'm no schema expert, so commiters and community members may see use cases where this approach could pose problems. I'm all for feedback to enhance the functionality... The hope here is to re-ignite enthusiasm for case-insensitive wildcard searches in Solr - in line with the 'it just works' Solr philosophy. Enjoy! > Case Insensitive Search for Wildcard Queries > > > Key: SOLR-2438 > URL: https://issues.apache.org/jira/browse/SOLR-2438 > Project: Solr > Issue Type: Improvement >Reporter: Peter Sturge > Attachments: SOLR-2438.patch > > > This patch adds support to allow case-insensitive queries on wildcard > searches for configured TextField field types. > This patch extends the excellent work done Yonik and Michael in SOLR-219. > The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010263#comment-13010263 ] Shai Erera commented on LUCENE-2977: Patch looks good ! In StreamUtils you have ".bz" -- it should be ".bz2" bq. Any opinions on removing this "force-bzip" option? +1 (you mean the bzip.compression property in WLDT right?). I think that it's reasonable to request the user to specify an output file with .bz2 extension if he wants bzip compression. I don't see how it will simplify StreamUtils though, but I trust you :) (perhaps you meant it will simplify WLDT?) > WriteLineDocTask should write gzip/bzip2/txt according to the extension of > specified output file name > - > > Key: LUCENE-2977 > URL: https://issues.apache.org/jira/browse/LUCENE-2977 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2977.patch > > > Since the readers behave this way it would be nice and handy if also this > line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2438) Case Insensitive Search for Wildcard Queries
[ https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Sturge updated SOLR-2438: --- Attachment: SOLR-2438.patch Attached patch file > Case Insensitive Search for Wildcard Queries > > > Key: SOLR-2438 > URL: https://issues.apache.org/jira/browse/SOLR-2438 > Project: Solr > Issue Type: Improvement >Reporter: Peter Sturge > Attachments: SOLR-2438.patch > > > This patch adds support to allow case-insensitive queries on wildcard > searches for configured TextField field types. > This patch extends the excellent work done Yonik and Michael in SOLR-219. > The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2438) Case Insensitive Search for Wildcard Queries
Case Insensitive Search for Wildcard Queries Key: SOLR-2438 URL: https://issues.apache.org/jira/browse/SOLR-2438 Project: Solr Issue Type: Improvement Reporter: Peter Sturge This patch adds support to allow case-insensitive queries on wildcard searches for configured TextField field types. This patch extends the excellent work done Yonik and Michael in SOLR-219. The approach here is different enough (imho) to warrant a separate JIRA issue. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010218#comment-13010218 ] Paul Elschot edited comment on LUCENE-2945 at 3/23/11 5:01 PM: --- New -2945d patch that also has the changes to SpanNearClauseFactory. was (Author: paul.elsc...@xs4all.nl): Also has the changes to SpanNearClauseFactory. > Surround Query doesn't properly handle equals/hashcode > -- > > Key: LUCENE-2945 > URL: https://issues.apache.org/jira/browse/LUCENE-2945 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 3.1.1, 4.0 > > Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, > LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, > LUCENE-2945d.patch > > > In looking at using the surround queries with Solr, I am hitting issues > caused by collisions due to equals/hashcode not being implemented on the > anonymous inner classes that are created by things like DistanceQuery (branch > 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode
[ https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-2945: - Attachment: LUCENE-2945d.patch Also has the changes to SpanNearClauseFactory. > Surround Query doesn't properly handle equals/hashcode > -- > > Key: LUCENE-2945 > URL: https://issues.apache.org/jira/browse/LUCENE-2945 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 3.0.3, 3.1, 4.0 >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 3.1.1, 4.0 > > Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, > LUCENE-2945.patch, LUCENE-2945.patch, LUCENE-2945c.patch, LUCENE-2945d.patch, > LUCENE-2945d.patch > > > In looking at using the surround queries with Solr, I am hitting issues > caused by collisions due to equals/hashcode not being implemented on the > anonymous inner classes that are created by things like DistanceQuery (branch > 3.x, near line 76) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2977: Attachment: LUCENE-2977.patch Patch for auto-detecting output compression mode of result line file: - getInputStream() moved from ContentSource to a new class StreamUtils under util. It is now named inputStream(File). - outputStream() method added to StreamUtils. Before applying this patch *svn mv modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/feeds/ContentSourceTest.java modules/benchmark/src/test/org/apache/lucene/benchmark/byTask/utils/StreamUtilsTest.java* I kept for now the "force-bzip" logic in WriteLineDocTask but I would like to remove it - it is strange, and in any case LineDocSource would only auto-detect bzip input format if WriteLineDocTask was able to auto-detect bzip output format. Removing it will also simplify StreamUtils. Any opinions on removing this "force-bzip" option? > WriteLineDocTask should write gzip/bzip2/txt according to the extension of > specified output file name > - > > Key: LUCENE-2977 > URL: https://issues.apache.org/jira/browse/LUCENE-2977 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2977.patch > > > Since the readers behave this way it would be nice and handy if also this > line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity
[ https://issues.apache.org/jira/browse/LUCENE-2986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2986: Attachment: LUCENE-2986.patch Attached is a patch: adds DefaultSimilarityProvider, which has our default implementations of the non-field-specific methods (coord/queryNorm/etc), and always returns DefaultSimilarity. > divorce defaultsimilarityprovider from defaultsimilarity > > > Key: LUCENE-2986 > URL: https://issues.apache.org/jira/browse/LUCENE-2986 > Project: Lucene - Java > Issue Type: Task >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-2986.patch > > > In LUCENE-2236 as a start, we made DefaultSimilarity which implements the > factory interface (SimilarityProvider), and also extends Similarity. > Its factory interface just returns itself always by default. > Doron mentioned it would be cleaner to split the two, and I thought it would > be good to revisit it later. > Today as I was looking at SOLR-2338, it became pretty clear that we should do > this, it makes things a lot cleaner. I think currently its confusing to users > to see the two apis mixed if they are trying to subclass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2986) divorce defaultsimilarityprovider from defaultsimilarity
divorce defaultsimilarityprovider from defaultsimilarity Key: LUCENE-2986 URL: https://issues.apache.org/jira/browse/LUCENE-2986 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 4.0 In LUCENE-2236 as a start, we made DefaultSimilarity which implements the factory interface (SimilarityProvider), and also extends Similarity. Its factory interface just returns itself always by default. Doron mentioned it would be cleaner to split the two, and I thought it would be good to revisit it later. Today as I was looking at SOLR-2338, it became pretty clear that we should do this, it makes things a lot cleaner. I think currently its confusing to users to see the two apis mixed if they are trying to subclass. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
multifield search using dismax
Hi, is it possible, USING DISMAX SEARCH HANDLER, to make a search like: search value1 in field1 & value 2 in field 2 &?? it's like q=field1:value1 field2:value2 in standard search, but i want to do this in dismax Thanx -- Gastone Penzo *www.solr-italia.it* *The first italian blog about Apache Solr *
[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2573: Attachment: LUCENE-2573.patch here is my current state on this issue. I did't add all JDocs needed (by far) and I will wait until we settled on the API for FlushPolicy. * I removed the complex TieredFlushPolicy entirely and added one DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT pending. * DW will stall threads if we reach 2 x maxNetRam which is retrieved from FlushPolicy so folks can lower that depending on their env. * DWFlushControl checks if a single DWPT grows too large and sets it forcefully pending once its ram consumption is > 1.9 GB. That should be enough buffer to not reach the 2048MB limit. We should consider making this configurable. * FlushPolicy has now three methods onInsert, onUpdate and onDelete while DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base class just calls those on an update. * I removed FlushControl from IW * added documentation on IWC for FlushPolicy and removed the jdocs for the RAM limit. I think we should add some lines about how RAM is now used and that users should balance the RAM with the number of threads they are using. Will do that later on though. * For testing I added a ThrottledIndexOutput that makes flushing slow so I can test if we are stalled and / or blocked. This is passed to MockDirectoryWrapper. Its currently under util but it rather should go under store, no? * byte consumption is now committed before FlushPolicy is called since we don't have the multitier flush which required that to reliably proceed across tier boundaries (not required but it was easier to test really). So FP doesn't need to take care of the delta * FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which causes some failures though. I added //nocommit & @Ignore to those tests. * this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy which I couldn't figure out why it is failing but it could be due to an old version of LUCENE-2881 on this branch. I will see if it still fails once we merged. * Healthiness now doesn't stall if we are not flushing on RAM consumption to ensure we don't lock in threads. over all this seems much closer now. I will start writing jdocs. Flush on buffered delete terms might need some tests and I should also write a more reliable test for Healthiness... current it relies on that the ThrottledIndexOutput is slowing down indexing enough to block which might not be true all the time. It didn't fail yet. > Tiered flushing of DWPTs by RAM with low/high water marks > - > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Michael Busch >Assignee: Simon Willnauer >Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, > LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed > RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a > tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are > used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values > explicitly using total values (e.g. low water mark at 120MB, high water mark > at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() > config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
On Mar 23, 2011, at 9:20 AM, Dawid Weiss wrote: > Sure, I'll change it. Can I alter branch_3x too? That's fine to change 3_x, the 3.1 release is on lucene_solr_3_1 (or something similar). This way it will be on in 3.2. -Grant > Don't know what the > policy is after the RCs have been published. > > Dawid > > On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll wrote: >> Hey Dawid, >> >> Thanks for doing this. It would be good, too, if we no longer had to pass >> in -Dsolr.clustering.enabled=true as there is no reason why we can't just >> have it on like the other components. >> >> -Grant >> >> On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote: >> >>> Author: dweiss >>> Date: Tue Mar 22 20:44:21 2011 >>> New Revision: 1084345 >>> >>> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev >>> Log: >>> Removing the note about excluded JARs (everything is included). >>> >>> Modified: >>>lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml >>> >>> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml >>> URL: >>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff >>> == >>> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original) >>> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 >>> 20:44:21 2011 >>> @@ -1183,12 +1183,10 @@ >>> >>>http://wiki.apache.org/solr/ClusteringComponent >>> >>> - This relies on third party jars which are notincluded in the >>> - release. To use this component (and the "/clustering" handler) >>> - Those jars will need to be downloaded, and you'll need to set >>> - the solr.cluster.enabled system property when running solr... >>> + You'll need to set the solr.cluster.enabled system property >>> + when running solr to run with clustering enabled: >>> >>> - java -Dsolr.clustering.enabled=true -jar start.jar >>> + java -Dsolr.clustering.enabled=true -jar start.jar >>> --> >>> >>enable="${solr.clustering.enabled:false}" >>> >>> >> >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010112#comment-13010112 ] Tommaso Teofili edited comment on SOLR-2436 at 3/23/11 1:26 PM: Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch). In my opinion the solution you're proposing is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. was (Author: teofili): Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch). In my opinion this solution is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. > move uimaConfig to under the uima's update processor in solrconfig.xml > -- > > Key: SOLR-2436 > URL: https://issues.apache.org/jira/browse/SOLR-2436 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch > > > Solr contrib UIMA has its config just beneath . I think it should > move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2436) move uimaConfig to under the uima's update processor in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated SOLR-2436: -- Attachment: SOLR-2436_2.patch Hello Koji, I've tested your patch, I needed to align it to latest patch applied (see SOLR-2387) to make tests work (see attached patch). In my opinion this solution is better than the current one as it reflects the Solr way of specifying parameters in Handlers. However I think it should be good if it was possible to alternatively get rid of the uimaConfig file defining each parameter inside the Processor with Solr elements (str/lst/int etc.) as well. > move uimaConfig to under the uima's update processor in solrconfig.xml > -- > > Key: SOLR-2436 > URL: https://issues.apache.org/jira/browse/SOLR-2436 > Project: Solr > Issue Type: Improvement >Affects Versions: 3.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-2436.patch, SOLR-2436.patch, SOLR-2436_2.patch > > > Solr contrib UIMA has its config just beneath . I think it should > move to uima's update processor tag. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2454) Nested Document query support
[ https://issues.apache.org/jira/browse/LUCENE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010110#comment-13010110 ] Mark Harwood commented on LUCENE-2454: -- bq. I have not looked this patch so this comment may be off base. The slideshare deck gives a good overview: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene As a simple Lucene-focused addition I'd prefer not to explore all the possible implications for Solr adoption here. The affected areas in Solr are extensive and would include schema definitions, query syntax, facets/filter caching, result-fetching, DIH etc etc. Probably best discussed elsewhere. > Nested Document query support > - > > Key: LUCENE-2454 > URL: https://issues.apache.org/jira/browse/LUCENE-2454 > Project: Lucene - Java > Issue Type: New Feature > Components: Search >Affects Versions: 3.0.2 >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Attachments: LuceneNestedDocumentSupport.zip > > > A facility for querying nested documents in a Lucene index as outlined in > http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
Sure, I'll change it. Can I alter branch_3x too? Don't know what the policy is after the RCs have been published. Dawid On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll wrote: > Hey Dawid, > > Thanks for doing this. It would be good, too, if we no longer had to pass in > -Dsolr.clustering.enabled=true as there is no reason why we can't just have > it on like the other components. > > -Grant > > On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote: > >> Author: dweiss >> Date: Tue Mar 22 20:44:21 2011 >> New Revision: 1084345 >> >> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev >> Log: >> Removing the note about excluded JARs (everything is included). >> >> Modified: >> lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml >> >> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml >> URL: >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff >> == >> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original) >> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 >> 20:44:21 2011 >> @@ -1183,12 +1183,10 @@ >> >> http://wiki.apache.org/solr/ClusteringComponent >> >> - This relies on third party jars which are notincluded in the >> - release. To use this component (and the "/clustering" handler) >> - Those jars will need to be downloaded, and you'll need to set >> - the solr.cluster.enabled system property when running solr... >> + You'll need to set the solr.cluster.enabled system property >> + when running solr to run with clustering enabled: >> >> - java -Dsolr.clustering.enabled=true -jar start.jar >> + java -Dsolr.clustering.enabled=true -jar start.jar >> --> >> > enable="${solr.clustering.enabled:false}" >> >> > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.1
+1 * Ran Solr example * Perused entire structure of both binary and source distros Noticed the minor issues others have reported, to echo Ryan, none seem like blockers to me. And also to echo Ryan's thanks huge thanks to everyone's hard work on the 3.1 Lucene/Solr release(s). This is a big milestone for the technology and community. Erik On Mar 22, 2011, at 23:42 , Ryan McKinley wrote: > +1 > > * Walked through the solr example > * Tested a simple maven project, worked well > > I don't think the minor issues listed so far are blockers > > Thanks to everyone who worked on this! > > ryan > > > On Tue, Mar 22, 2011 at 10:21 AM, Yonik Seeley > wrote: >> Please vote to release the artifacts at >> http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2 >> as Lucene 3.1 and Solr 3.1 >> >> Thanks for everyone's help pulling all this together! >> >> -Yonik >> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May >> 25-26, San Francisco >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2967) Use linear probing with an additional good bit avalanching function in FST's NodeHash.
[ https://issues.apache.org/jira/browse/LUCENE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-2967. - Resolution: Won't Fix Lucene Fields: (was: [New]) I spent some time on this. It's quite fascinating: the number of collisions for the default probing is smaller than: a) linear probing with murmurhash mix of the original hash b) linear probing without murmurhash mix (start from raw hash only). Curiously, the number of collisions for (b) is smaller than for (a) -- this could be explained if we assume bits are spread evently throughout the entire 32-bit range after murmurhash, so after masking to table size there should be more collisions on lower bits compared to a raw hash (this would have more collisions on upper bits and fewer on lower bits because it is multiplicative... or at least I think so). Anyway, I tried many different versions and I don't see any significant difference in favor of linear probing here. Measured the GC overhead during my tests too, but it is not the primary factor contributing to the total cost of constructing the FST (about 3-5% of the total time, running in parallel, typically). > Use linear probing with an additional good bit avalanching function in FST's > NodeHash. > -- > > Key: LUCENE-2967 > URL: https://issues.apache.org/jira/browse/LUCENE-2967 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: 4.0 > > Attachments: LUCENE-2967.patch > > > I recently had an interesting discussion with Sebastiano Vigna (fastutil), > who suggested that linear probing, given a hash mixing function with good > avalanche properties, is a way better method of constructing lookups in > associative arrays compared to quadratic probing. Indeed, with linear probing > you can implement removals from a hash map without removed slot markers and > linear probing has nice properties with respect to modern CPUs (caches). I've > reimplemented HPPC's hash maps to use linear probing and we observed a nice > speedup (the same applies for fastutils of course). > This patch changes NodeHash's implementation to use linear probing. The code > is a bit simpler (I think :). I also moved the load factor to a constant -- > 0.5 seems like a generous load factor, especially if we allow large FSTs to > be built. I don't see any significant speedup in constructing large automata, > but there is no slowdown either (I checked on one machine only for now, but > will verify on other machines too). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
Hey Dawid, Thanks for doing this. It would be good, too, if we no longer had to pass in -Dsolr.clustering.enabled=true as there is no reason why we can't just have it on like the other components. -Grant On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote: > Author: dweiss > Date: Tue Mar 22 20:44:21 2011 > New Revision: 1084345 > > URL: http://svn.apache.org/viewvc?rev=1084345&view=rev > Log: > Removing the note about excluded JARs (everything is included). > > Modified: >lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml > > Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml > URL: > http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff > == > --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original) > +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 > 20:44:21 2011 > @@ -1183,12 +1183,10 @@ > >http://wiki.apache.org/solr/ClusteringComponent > > - This relies on third party jars which are notincluded in the > - release. To use this component (and the "/clustering" handler) > - Those jars will need to be downloaded, and you'll need to set > - the solr.cluster.enabled system property when running solr... > + You'll need to set the solr.cluster.enabled system property > + when running solr to run with clustering enabled: > > - java -Dsolr.clustering.enabled=true -jar start.jar > + java -Dsolr.clustering.enabled=true -jar start.jar > --> > enable="${solr.clustering.enabled:false}" > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #70: POMs out of sync
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-3.x/70/ No tests ran. Build Log (for compile errors): [...truncated 22 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing
[ https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2985: Attachment: LUCENE-2985.patch here is an initial patch that uses a SegmentCodecBuilder to assign codec IDs during indexing in DocFieldProcessorPerThread. > Build SegmentCodecs incrementally for consistent codecIDs during indexing > - > > Key: LUCENE-2985 > URL: https://issues.apache.org/jira/browse/LUCENE-2985 > Project: Lucene - Java > Issue Type: Improvement > Components: Codecs, Index >Affects Versions: CSF branch, 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Fix For: CSF branch, 4.0 > > Attachments: LUCENE-2985.patch > > > currently we build the SegementCodecs during flush which is fine as long as > no codec needs to know which fields it should handle. This will change with > DocValues or when we expose StoredFields / TermVectors via Codec (see > LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a > consistent view of which codec belongs to which field during indexing and all > FieldInfo instances are unassigned (set to -1). Instead we should build the > SegmentCodecs incrementally as fields come in so no matter when a codec needs > to be selected to process a document / field we have the right codec ID > assigned. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]
On Wed, Mar 23, 2011 at 9:37 AM, David Nemeskey wrote: > Hey Simon and all, > > May we get an update on this? I understand that Google has published the list > of accepted organizations, which -- not surprisingly -- includes the ASF. Is > there any information on how many slots Apache got, and which issues will be > selected? > > The student application period opens on the 28th, so I'm just wondering if I > should go ahead and apply or wait for the decision. David, you should go ahead and apply via the GSoC website and reference the issue there this is how I understand it works. We will later rate the proposals from the GSoC website and decide which we choose. This is also when slots get assigned. simon > > Thanks, > David > > On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote: >> Hey folks, >> >> Google Summer of Code 2011 is very close and the Project Applications >> Period has started recently. Now it's time to get some excited students >> on board for this year's GSoC. >> >> I encourage students to submit an application to the Google Summer of Code >> web-application. Lucene & Solr are amazing projects and GSoC is an >> incredible opportunity to join the community and push the project >> forward. >> >> If you are a student and you are interested spending some time on a >> great open source project while getting paid for it, you should submit >> your application from March 28 - April 8, 2011. There are only 3 >> weeks until this process starts! >> >> Quote from the GSoC website: "We hear almost universally from our >> mentoring organizations that the best applications they receive are >> from students who took the time to interact and discuss their ideas >> before submitting an application, so make sure to check out each >> organization's Ideas list to get to know a particular open source >> organization better." >> >> So if you have any ideas what Lucene & Solr should have, or if you >> find any of the GSoC pre-selected projects [1] interesting, please >> join us on dev@lucene.apache.org [2]. Since you as a student must >> apply for a certain project via the GSoC website [3], it's a good idea >> to work on it ahead of time and include the community and possible >> mentors as soon as possible. >> >> Open source development here at the Apache Software >> Foundation happens almost exclusively in the public and I encourage you to >> follow this. Don't mail folks privately; please use the mailing list to >> get the best possible visibility and attract interested community >> members and push your idea forward. As always, it's the idea that >> counts not the person! >> >> That said, please do not underestimate the complexity of even small >> "GSoC - Projects". Don't try to rewrite Lucene or Solr! A project >> usually gains more from a smaller, well discussed and carefully >> crafted & tested feature than from a half baked monster change that's >> too large to work with. >> >> Once your proposal has been accepted and you begin work, you should >> give the community the opportunity to iterate with you. We prefer >> "progress over perfection" so don't hesitate to describe your overall >> vision, but when the rubber meets the road let's take it in small >> steps. A code patch of 20 KB is likely to be reviewed very quickly so >> get fast feedback, while a patch even 60kb in size can take very >> - Hide quoted text - >> long. So try to break up your vision and the community will work with >> you to get things done! >> >> On behalf of the Lucene & Solr community, >> >> Go! join the mailing list and apply for GSoC 2011, >> >> Simon >> >> [1] >> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQu >> ery=labels+%3D+lucene-gsoc-11 [2] >> http://lucene.apache.org/java/docs/mailinglists.html >> [3] http://www.google-melange.com >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2985) Build SegmentCodecs incrementally for consistent codecIDs during indexing
Build SegmentCodecs incrementally for consistent codecIDs during indexing - Key: LUCENE-2985 URL: https://issues.apache.org/jira/browse/LUCENE-2985 Project: Lucene - Java Issue Type: Improvement Components: Codecs, Index Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: CSF branch, 4.0 currently we build the SegementCodecs during flush which is fine as long as no codec needs to know which fields it should handle. This will change with DocValues or when we expose StoredFields / TermVectors via Codec (see LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a consistent view of which codec belongs to which field during indexing and all FieldInfo instances are unassigned (set to -1). Instead we should build the SegmentCodecs incrementally as fields come in so no matter when a codec needs to be selected to process a document / field we have the right codec ID assigned. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2982) Get rid of ContenSource's workaround for closing b/gzip input stream once this is fixed in CommonCompress
[ https://issues.apache.org/jira/browse/LUCENE-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010086#comment-13010086 ] Doron Cohen commented on LUCENE-2982: - COMPRESS-127 was fixed, so whenever a new CommonsCompress release is available should be able to complete this one. I subscribed to annou...@apache.org to be notified when that happens... > Get rid of ContenSource's workaround for closing b/gzip input stream once > this is fixed in CommonCompress > - > > Key: LUCENE-2982 > URL: https://issues.apache.org/jira/browse/LUCENE-2982 > Project: Lucene - Java > Issue Type: Task > Components: contrib/benchmark >Reporter: Doron Cohen >Priority: Minor > > Once COMPRESS-127 is fixed get rid of the entire workaround method > ContentSource.closableCompressorInputStream(). It would simplify the code and > would perform better without that delegation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-2980. - Resolution: Fixed Lucene Fields: (was: [New]) Committed: - trunk: r1084544, r1084549 - 3x: r1084552 > Benchmark's ContentSource should not rely on file suffixes to be lower cased > when detecting file type (gzip/bzip2/text) > --- > > Key: LUCENE-2980 > URL: https://issues.apache.org/jira/browse/LUCENE-2980 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch > > > file.gz is correctly handled as gzip, but file.GZ handled as text which is > wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2980: Attachment: LUCENE-2980.patch Updated patch applies workaround only for GZIP format, as other types do close their wrapped stream (COMPRESS-127). > Benchmark's ContentSource should not rely on file suffixes to be lower cased > when detecting file type (gzip/bzip2/text) > --- > > Key: LUCENE-2980 > URL: https://issues.apache.org/jira/browse/LUCENE-2980 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2980.patch, LUCENE-2980.patch, LUCENE-2980.patch > > > file.gz is correctly handled as gzip, but file.GZ handled as text which is > wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010064#comment-13010064 ] Shai Erera commented on LUCENE-2980: Agreed. > Benchmark's ContentSource should not rely on file suffixes to be lower cased > when detecting file type (gzip/bzip2/text) > --- > > Key: LUCENE-2980 > URL: https://issues.apache.org/jira/browse/LUCENE-2980 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2980.patch, LUCENE-2980.patch > > > file.gz is correctly handled as gzip, but file.GZ handled as text which is > wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2977) WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name
[ https://issues.apache.org/jira/browse/LUCENE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-2977: Summary: WriteLineDocTask should write gzip/bzip2/txt according to the extension of specified output file name (was: WriteLineDocTask should write gzip/bzip2/txt according to the extension of specifie output file name) > WriteLineDocTask should write gzip/bzip2/txt according to the extension of > specified output file name > - > > Key: LUCENE-2977 > URL: https://issues.apache.org/jira/browse/LUCENE-2977 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > > Since the readers behave this way it would be nice and handy if also this > line writer would. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010043#comment-13010043 ] Doron Cohen commented on LUCENE-2980: - bq. Perhaps we should add a specific test in CSTest for this problem? I wouldn't use file.delete() as in indicator because on Linux it will pass Changed my mind about adding this test to ContentSourceTest - I think such a test fits more to the CommonCompress project, because it should directly call CompressorStreamFactory.createCompressorInputStream(in). In our test we invoke ContentSource.getInputStream(File) and so we cannot pass such a close-sensing stream. But this is a valid point, especially, the test case I provided to COMPRESS-127 will fail on Windows but will likely pass on Linux. I'll add a reference to your comment in COMPRESS-127. > Benchmark's ContentSource should not rely on file suffixes to be lower cased > when detecting file type (gzip/bzip2/text) > --- > > Key: LUCENE-2980 > URL: https://issues.apache.org/jira/browse/LUCENE-2980 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2980.patch, LUCENE-2980.patch > > > file.gz is correctly handled as gzip, but file.GZ handled as text which is > wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2980) Benchmark's ContentSource should not rely on file suffixes to be lower cased when detecting file type (gzip/bzip2/text)
[ https://issues.apache.org/jira/browse/LUCENE-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010039#comment-13010039 ] Doron Cohen commented on LUCENE-2980: - bq. Perhaps we should add a specific test in CSTest for this problem? I wouldn't use file.delete() as in indicator because on Linux it will pass Agree, I'll add one. > Benchmark's ContentSource should not rely on file suffixes to be lower cased > when detecting file type (gzip/bzip2/text) > --- > > Key: LUCENE-2980 > URL: https://issues.apache.org/jira/browse/LUCENE-2980 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark >Reporter: Doron Cohen >Assignee: Doron Cohen >Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: LUCENE-2980.patch, LUCENE-2980.patch > > > file.gz is correctly handled as gzip, but file.GZ handled as text which is > wrong. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2984) Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos
[ https://issues.apache.org/jira/browse/LUCENE-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2984: Description: Spin-off from LUCENE-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. (was: Spin-off from LUCENe-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. ) > Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos > -- > > Key: LUCENE-2984 > URL: https://issues.apache.org/jira/browse/LUCENE-2984 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 4.0 >Reporter: Simon Willnauer > Fix For: 4.0 > > > Spin-off from LUCENE-2881 which had this change already but due to some > random failures related to this change I remove this part of the patch to > make it more isolated and easier to test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010036#comment-13010036 ] Chris Male commented on LUCENE-2310: bq. So, what is the reason for doing this in 3.x at all, can't we simply drop stuff in 4.0 and let 3.x alone? Very good question. Certainly we are simplifying the codebase and I feel that Field is what most users use (not AbstractField). But I know some expert users do use AbstractField. But maybe they can handle the hard change? > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010035#comment-13010035 ] Simon Willnauer commented on LUCENE-2310: - {quote} Yeah but not in 3x unfortunately. As it stands people can retrieve the List of Fieldables via getFields() and add whatever implementation of Fieldable they like. Consequently we need to continue to support Fieldable in IW for example. Once this code has been committed I will create a new patch for trunk which moves all of Solr and Lucene over to the Field. I could do this in many places already of course, but that core classes like IW would have to remain as they are. {quote} So, what is the reason for doing this in 3.x at all, can't we simply drop stuff in 4.0 and let 3.x alone? Simon > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2984) Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos
Move hasVectors() & hasProx() responsibility out of SegmentInfo to FieldInfos -- Key: LUCENE-2984 URL: https://issues.apache.org/jira/browse/LUCENE-2984 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Fix For: 4.0 Spin-off from LUCENe-2881 which had this change already but due to some random failures related to this change I remove this part of the patch to make it more isolated and easier to test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010032#comment-13010032 ] Chris Male commented on LUCENE-2310: Yes Field would still compile if you removed the extends. However if we empty AbstractField then any client code that also extends AbstractField would break. Thats why I deprecate the whole class but leave its code in. We could empty it and change it to extend Field, I think that would still work. > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2983) FieldInfos should be read-only if loaded from disk
[ https://issues.apache.org/jira/browse/LUCENE-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2983: Attachment: LUCENE-2983.patch here is a patch with tests. All tests pass > FieldInfos should be read-only if loaded from disk > -- > > Key: LUCENE-2983 > URL: https://issues.apache.org/jira/browse/LUCENE-2983 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-2983.patch > > > Currently FieldInfos create a private FieldNumberBiMap when they are loaded > from a directory which is necessary due to some limitation we need to face > with IW#addIndexes(Dir). If we add an index via a directory to an existing > index field number can conflict with the global field numbers in the IW > receiving the directories. Those field number conflicts will remain until > those segments are merged and we stabilize again based on the IW global field > numbers. Yet, we unnecessarily creating a BiMap here where we actually should > enforce read-only semantics since nobody should modify this FieldInfos > instance we loaded from the directory. If somebody needs to get a modifiable > copy they should simply create a new one and all all FieldInfo instances to > it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010030#comment-13010030 ] Simon Willnauer commented on LUCENE-2310: - bq. I don't really understand what you're suggesting here. In 3x where the deprecations will be occurring Field has to continue to extend AbstractField. Yes in 4.0 we can drop that extension but addressing the deprecations is not in the scope of 3x. What I mean here is that if I would simply remove the extends AbstractField from Field would it still compile or are there any dependencies from AbstractField? IMO AbstractField should just be empty now right? > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-2983) FieldInfos should be read-only if loaded from disk
FieldInfos should be read-only if loaded from disk -- Key: LUCENE-2983 URL: https://issues.apache.org/jira/browse/LUCENE-2983 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Currently FieldInfos create a private FieldNumberBiMap when they are loaded from a directory which is necessary due to some limitation we need to face with IW#addIndexes(Dir). If we add an index via a directory to an existing index field number can conflict with the global field numbers in the IW receiving the directories. Those field number conflicts will remain until those segments are merged and we stabilize again based on the IW global field numbers. Yet, we unnecessarily creating a BiMap here where we actually should enforce read-only semantics since nobody should modify this FieldInfos instance we loaded from the directory. If somebody needs to get a modifiable copy they should simply create a new one and all all FieldInfo instances to it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [GSoC] Apache Lucene @ Google Summer of Code 2011 [STUDENTS READ THIS]
Hey Simon and all, May we get an update on this? I understand that Google has published the list of accepted organizations, which -- not surprisingly -- includes the ASF. Is there any information on how many slots Apache got, and which issues will be selected? The student application period opens on the 28th, so I'm just wondering if I should go ahead and apply or wait for the decision. Thanks, David On 2011 March 11, Friday 17:23:58 Simon Willnauer wrote: > Hey folks, > > Google Summer of Code 2011 is very close and the Project Applications > Period has started recently. Now it's time to get some excited students > on board for this year's GSoC. > > I encourage students to submit an application to the Google Summer of Code > web-application. Lucene & Solr are amazing projects and GSoC is an > incredible opportunity to join the community and push the project > forward. > > If you are a student and you are interested spending some time on a > great open source project while getting paid for it, you should submit > your application from March 28 - April 8, 2011. There are only 3 > weeks until this process starts! > > Quote from the GSoC website: "We hear almost universally from our > mentoring organizations that the best applications they receive are > from students who took the time to interact and discuss their ideas > before submitting an application, so make sure to check out each > organization's Ideas list to get to know a particular open source > organization better." > > So if you have any ideas what Lucene & Solr should have, or if you > find any of the GSoC pre-selected projects [1] interesting, please > join us on dev@lucene.apache.org [2]. Since you as a student must > apply for a certain project via the GSoC website [3], it's a good idea > to work on it ahead of time and include the community and possible > mentors as soon as possible. > > Open source development here at the Apache Software > Foundation happens almost exclusively in the public and I encourage you to > follow this. Don't mail folks privately; please use the mailing list to > get the best possible visibility and attract interested community > members and push your idea forward. As always, it's the idea that > counts not the person! > > That said, please do not underestimate the complexity of even small > "GSoC - Projects". Don't try to rewrite Lucene or Solr! A project > usually gains more from a smaller, well discussed and carefully > crafted & tested feature than from a half baked monster change that's > too large to work with. > > Once your proposal has been accepted and you begin work, you should > give the community the opportunity to iterate with you. We prefer > "progress over perfection" so don't hesitate to describe your overall > vision, but when the rubber meets the road let's take it in small > steps. A code patch of 20 KB is likely to be reviewed very quickly so > get fast feedback, while a patch even 60kb in size can take very > - Hide quoted text - > long. So try to break up your vision and the community will work with > you to get things done! > > On behalf of the Lucene & Solr community, > > Go! join the mailing list and apply for GSoC 2011, > > Simon > > [1] > https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQu > ery=labels+%3D+lucene-gsoc-11 [2] > http://lucene.apache.org/java/docs/mailinglists.html > [3] http://www.google-melange.com > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010027#comment-13010027 ] Chris Male commented on LUCENE-2310: Thanks for taking a look at this Simon. bq. Why do you reformat all the stuff in Field, is that necessary here at all? I mean its needed eventually but for the deprecation of things it only bloats the patch really doesn't it? Because for me this issue is about reducing the complexity of these classes and Field is a mess. Making it more readable reduces the complexity. If needs be I will do this in two patches, but I don't feel this issue is resolved till the code in Field is readable. bq. When you deprecate AbstractField and Fieldable, Field should ideally be a standalone class. So I see that this still needs to subclass Fieldable / AbstractField but could it stand alone now so that we can simply remove the extends / implements on Field once we drop things in 4.0? I think it looks good from looking at the patch though I don't really understand what you're suggesting here. In 3x where the deprecations will be occurring Field has to continue to extend AbstractField. Yes in 4.0 we can drop that extension but addressing the deprecations is not in the scope of 3x. bq. I don't like the name getAllFields on Document since it implies that we have a getPartialFields or something. I see that you can not use getFields since it only differs in return type which doesn't belong to the signature though. Maybe we should implement Iterable here and offer an additional method getFieldsAsList or maybe getFields(List fields) Yeah good call. I think implementing Iterable is best, but it will also require adding a count() method to Document since often people retrieve the List to get the number of fields. bq. once we have this in what are the next steps towards FieldType? Will we have only one class Field that is backed by a FieldType but still offers the methods it has now? Or doe we have two totally new classes FieldTyps and FieldValue Once FieldType is in, all the various metadata properties (isIndexed, isStored etc) will be moved to FieldType, leaving Field as what you suggest as FieldValue. Field will contain its type, boost, name, value. If we have Analyzers on FieldTypes, then we will be able to remove the TokenStream from Field. bq. I wonder if this patch raises tons of deprecation warnings all over lucene where Fieldable was used? In IW we use it all over the place though. We must fix that in this issue too otherwise uwe will go mad I guess Yeah but not in 3x unfortunately. As it stands people can retrieve the List of Fieldables via getFields() and add whatever implementation of Fieldable they like. Consequently we need to continue to support Fieldable in IW for example. Once this code has been committed I will create a new patch for trunk which moves all of Solr and Lucene over to the Field. I could do this in many places already of course, but that core classes like IW would have to remain as they are. I will wait for your thoughts on the reformating and then make a new patch. > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010022#comment-13010022 ] Simon Willnauer commented on LUCENE-2310: - Hey Chris, good that you reactivate this issue! I was looking into similar stuff while working on docvalues since it really needs to add stuff to Field / Fieldable. With a cleanup and eventually FieldType this would be way less painless I guess. I have a couple of questions and comments to the current patch. Btw. I like the fact that the previous patch was uploaded March 21 2010 and the latest took 1 year to come up on march 23 2011 :) * Why do you reformat all the stuff in Field, is that necessary here at all? I mean its needed eventually but for the deprecation of things it only bloats the patch really doesn't it? * When you deprecate AbstractField and Fieldable, Field should ideally be a standalone class. So I see that this still needs to subclass Fieldable / AbstractField but could it stand alone now so that we can simply remove the extends / implements on Field once we drop things in 4.0? I think it looks good from looking at the patch though * I don't like the name getAllFields on Document since it implies that we have a getPartialFields or something. I see that you can not use getFields since it only differs in return type which doesn't belong to the signature though. Maybe we should implement Iterable here and offer an additional method getFieldsAsList or maybe getFields(List fields) * once we have this in what are the next steps towards FieldType? Will we have only one class Field that is backed by a FieldType but still offers the methods it has now? Or doe we have two totally new classes FieldTyps and FieldValue, something like this: {code} class FieldValue { FieldType type; float boost; String name; Object value; } {code} * I wonder if this patch raises tons of deprecation warnings all over lucene where Fieldable was used? In IW we use it all over the place though. We must fix that in this issue too otherwise uwe will go mad I guess :) thanks for bringing this up again! > Reduce Fieldable, AbstractField and Field complexity > > > Key: LUCENE-2310 > URL: https://issues.apache.org/jira/browse/LUCENE-2310 > Project: Lucene - Java > Issue Type: Sub-task > Components: Index >Reporter: Chris Male > Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-AbstractField.patch, > LUCENE-2310-Deprecate-DocumentGetFields-core.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, > LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch > > > In order to move field type like functionality into its own class, we really > need to try to tackle the hierarchy of Fieldable, AbstractField and Field. > Currently AbstractField depends on Field, and does not provide much more > functionality that storing fields, most of which are being moved over to > FieldType. Therefore it seems ideal to try to deprecate AbstractField (and > possible Fieldable), moving much of the functionality into Field and > FieldType. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org