Re: [Lucene.Net] How to add document to more than one index (but only analyze once)?
How about indexing the new document(s) in memory using RAMDirectory then calling indexWriter.AddIndexesNoOptimize for NRT master index? DIGY On Fri, Sep 9, 2011 at 5:33 PM, Robert Stewart robert_stew...@epam.comwrote: Is it possible to add a document to more than one index at the same time, such that document fields are only analyzed one time? For instance, to add document to both a master index, and a smaller near real-time index. I would like to avoid analyzing document fields more than once but I dont see if that is possible at all using Lucene API. Thanks, Bob
Re: [Lucene.Net] 2.9.4
Since it includes some level of divergence from java I committed it to only 2.9.4g branch. https://issues.apache.org/jira/browse/LUCENE-1930 https://issues.apache.org/jira/browse/LUCENENET-431 DIGY On Wed, Sep 7, 2011 at 1:03 PM, Itamar Syn-Hershko ita...@code972.comwrote: Ok, core compiles, and all tests pass. We are now running long tests to measure memory usage among other things. There is one show stopper tho. There was a patch sent by Matt Warren for Spatial.Net, that doesn't seem to be in. See http://groups.google.com/group/ravendb/msg/7517f095810c48f3 Any chance you can get it in to 2.9.4? On Wed, Sep 7, 2011 at 1:01 AM, Itamar Syn-Hershko ita...@code972.com wrote: Ok, great, we will run RavenDB on top of 2.9.4 in the next few days and will let you know how it went. On Tue, Sep 6, 2011 at 8:59 PM, Michael Herndon mhern...@wickedsoftware.net wrote: I can't tell if the apache git mirror is updated via scheduler or from commit hooks, but its generally stays close to being on par with svn. I'll check next time I push something to svn. But both of those items have made it to the mirror. - michael On Tue, Sep 6, 2011 at 1:44 PM, Digy digyd...@gmail.com wrote: I don't know how often github mirror is updated. These are the original locations 2.9.4 https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/ 2.9.4g https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_ 9_4g/ Both versions include ThreadLocal fix + Signing. Thanks, DIGY -Original Message- From: itamar.synhers...@gmail.com [mailto:itamar.synhers...@gmail.com ] On Behalf Of Itamar Syn-Hershko Sent: Tuesday, September 06, 2011 2:34 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 Not a problem, we will test RavenDB on a separate branch, also for potential memory leaks Digy, can you make sure the github mirror contains an updated 2.9.4 tag I can pull from, which includes the latest ThreadLocal fix + the strongly signed patch applied to it? 2011/9/6 Digy digyd...@gmail.com To avoid misunderstanding... Community==all Lucene.Net users DIGY -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Monday, September 05, 2011 11:46 PM To: 'lucene-net-dev@lucene.apache.org' Subject: RE: [Lucene.Net] 2.9.4 Not bad idea, but I would prefer community's feedback instead of testing against all projects using Lucene.Net DIGY -Original Message- From: Matt Warren [mailto:mattd...@gmail.com] Sent: Monday, September 05, 2011 11:09 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 If you want to test it against a large project you could take a look at how RavenDB uses it? At the moment it's using 2.9.2 ( https://github.com/ayende/ravendb/tree/master/SharedLibs/Sources/Lucene2.9.2 ) but if you were to recompile it against 2.9.4 and check that all it's unit-tests still run that would give you quite a large test case. On 5 September 2011 19:22, Prescott Nasser geobmx...@hotmail.com wrote: Hey All, How do people feel about the 2.9.4 code base? I've been using it for sometime, for my use cases it's be excellent. Do we feel we are ready to package this up and make it an official release? Or do we have some tasks left to take care of? ~Prescott
Re: [Lucene.Net] Incubator Status Page
On Sun, Jul 10, 2011 at 6:24 PM, Stefan Bodewig bode...@apache.org wrote: Hi all, http://incubator.apache.org/projects/lucene.net.html contains quite a few blanks that I think we could easily fill. I intend to either add some N/A or real dates where I can during the coming week. On the IP issues part (copyright and distribution rights) I trust the Lucene PMC has been taking care of this before Lucene.NET headed back to the Incubator and after that all contributions have come either directly by people with a CLA on file or as patches via JIRA where the ASF may use this checkbox has been checked - is this correct? absolutely. For the project specific tasks I'd ask all of you to fill in whatever you feel like adding. All Lucene.NET committers should be able to modify the status page. Stefan DIGY
Re: [Lucene.Net] 2.9.4g branch - test
I've never committed any code to 2.9.4g branch before testing. So, It should pass all the tests. DIGY On Mon, Jun 13, 2011 at 4:26 AM, Prescott Nasser geobmx...@hotmail.comwrote: Does anyone have the latest 2.9.4g branch they can run the tests on - I've done some WP7 stuff, and I'm coming up with 6 errors throughout all the tests. I didn't think to test before hand, and at the moment, I can't download a fresh copy of the branch ~P
Re: [Lucene.Net] [jira] [Commented] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.
On Fri, May 20, 2011 at 12:34 PM, Andy Pook andy.p...@gmail.com wrote: It'd be useful if There was a StopAnalyzer ctor overload that took an IEnumerablestring and maybe the current one that takes Liststring should be ICollectionstring (same as internal stopWords member). Just gives a little flexibility on the types that can be used. I changed Liststring to ICollectionstring Also there is a little confusion around the treatment of the various collection types. i.e. string[] gets converted to a CharArraySet. Why not just a Liststring ? So is it in lucene.java Thoughts? Cheers, Andy DIGY On 18 May 2011 23:20, Digy (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035795#comment-13035795 ] Digy commented on LUCENENET-412: Hi All, Lucene.Net 2.9.4g is almost ready for testing feedbacks. While injecting generics making some clean up in code, I tried to be close to lucene 3.0.3 as much as possible. Therefore it's position is somewhere between lucene.Java 2.9.4 3.0.3 DIGY PS: For those who might want to try this version: It won't probably be a drop-in replacement since there are a few API changes like - StopAnalyzer(Liststring stopWords) - Query.ExtractTerms(ICollectionstring) - TopDocs.*TotalHits*, TopDocs.*ScoreDocs* and some removed methods/classes like - Filter.Bits - JustCompileSearch - Contrib/Similarity.Net Replacing ArrayLists, Hashtables etc. with appropriate Generics. Key: LUCENENET-412 URL: https://issues.apache.org/jira/browse/LUCENENET-412 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: IEquatable for QuerySubclasses.patch, LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some performance gains. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Lucene.Net] How can we implement faceted search with lucene
http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/ DIGY On Fri, May 13, 2011 at 10:40 AM, K a r n a v karunakerred...@gmail.comwrote: How can we implement faceted search with lucene
Re: [Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.
Hi Vincent, My first goal was to replace ArrayList, Hashtables, Enumerators etc. as quickly as possible. Applying best practices could wait till a more cleaner code . The purpose for Support.Set was to have a collection that can be accessed with indexer and also implements the method Contains. It was a quick solution to the problem. Similarly, Support.Dictionary was just to be able to return null when a collection didn't contain the item(without exception). Changing zillions of lines with if(coll.ContainsKey(...)) seemed too hard to me at that time(forgetting one results in weird effects at runtime not at compile time). DIGY On Fri, May 13, 2011 at 4:22 PM, Van Den Berghe, Vincent (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033031#comment-13033031] Van Den Berghe, Vincent commented on LUCENENET-414: --- Hello Digy, Thanks for your response. I don't want to sound overly pedantic (but please tell me if I do), but this changed implementation solves only part of the problem. Now, CharArraySet derives from SetT, which itself derives from ListT. Items are now stored both in this base class, as in the private HashSetstring _Set. However, because ListT doesn't define its modifiers Add(T), Clear() and Remove(T) as virtual, the derived implementation defines them as new. This violates a variant of the Liskov substitution principle: an operation on the derived type has not the same effect as the same operation on the base type. In this case, it means that the following code will cause the items in the ListT base type and in the _Set to be desynchronized: CharArraySet set=... Liststring same=set; same.Add(whatever); // at this point, same.Contains(whatever)==true but set.Contains(whatever)==false even though it's the same instance. You might rightfully retort that this never happens and I should mind my own business, but I know at least one poor soul who did just that: me :-(. On a completely unrelated matter, the new implementation has 2 methods: public void Add(System.Collections.Generic.IListT items) public void Add(Support.SetT items) .. which can be collapsed into one, since the only thing used in both cases is the enumerator: public void Add(IEnumerableT items) I don't recall the design rule, but it's something like to increase reuse, make your function parameters are general as possible, but their return value as specific as possible. I am unable to get 2.9.4g to investigate further, but if you are moving towards the Generic collections in Lucene, the following implementation should be a drop-in replacement, without suffering from the aforementioned quirks: [Serializable] public class SetT : ICollectionT { private readonly System.Collections.Generic.HashSetT _Set = new System.Collections.Generic.HashSetT(); bool _ReadOnly = false; public Set() { } public Set(bool readOnly) { this._ReadOnly = readOnly; } public bool ReadOnly { set { _ReadOnly = value; } get { return _ReadOnly; } } public virtual void Add(T item) { if (_ReadOnly) throw new NotSupportedException(); if (_Set.Contains(item)) return; _Set.Add(item); } public void Add(IEnumerableT items) { if (_ReadOnly) throw new NotSupportedException(); foreach (T item in items) { if (_Set.Contains(item)) continue; _Set.Add(item); } } public void Clear() { if (_ReadOnly) throw new NotSupportedException(); _Set.Clear(); } public bool Contains(T item) { return
Re: [Lucene.Net] release 2.9.4
Thanks, updated. DIGY On Tue, Apr 5, 2011 at 11:34 PM, Granroth, Neal V. neal.granr...@thermofisher.com wrote: I had no difficulty building it in Visual Studio 2005. The assembly copyright information appears to be out of date; shouldn't it read 2011 not 2009 ? - Neal -Original Message- From: Wyatt Barnett [mailto:wyatt.barn...@gmail.com] Sent: Tuesday, April 05, 2011 2:23 PM To: lucene-net-dev@lucene.apache.org Cc: Troy Howard Subject: Re: [Lucene.Net] release 2.9.4 Tag [+1] svn export and command line build successful; I'll keep you all posted . . . On Tue, Apr 5, 2011 at 3:07 PM, Troy Howard thowar...@gmail.com wrote: Yes. Once we're ready to call this revision an RC, it should be tagged as such. Wyatt: Thanks for helping to test! Looking forward to your results. Thanks, Troy On Tue, Apr 5, 2011 at 11:37 AM, Granroth, Neal V. neal.granr...@thermofisher.com wrote: No, the URL in DIGY's email apepars correct and the SVN revision appears to be 1086410. Question: Should there be a tag for Lucene.Net_2_9_4 as there are for previous release candidates? - Neal -Original Message- From: Wyatt Barnett [mailto:wyatt.barn...@gmail.com] Sent: Tuesday, April 05, 2011 12:15 PM To: lucene-net-dev@lucene.apache.org Cc: digy digy Subject: Re: [Lucene.Net] release 2.9.4 Thanks. For anyone watching, the corrected clickable link is https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/. Also, just to make sure we are looking at this right, the revision we should be using is 1089138 -- main thing is I've been in and out of town, not caught up on anything and I'd hate to start building stuff against the wrong version . . On Tue, Apr 5, 2011 at 1:10 PM, digy digy digyd...@gmail.com wrote: Sorry, no binaries. You can download the source from https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C#/src/Lucene.Net DIGY On Tue, Apr 5, 2011 at 12:12 AM, Wyatt Barnett wyatt.barn...@gmail.comwrote: Actually about to dive into a big search tweaking spike in a certain project here, happy to do it on 2.9.4. Got binaries? On Mon, Apr 4, 2011 at 12:27 PM, Troy Howard thowar...@gmail.com wrote: We don't have any sort of QA report on the latest build. DIGY called for testing, but I haven't seen anyone respond to that request indicating successful testing. So, how do we want to manage this? In the business world, we'd never think of making a release without extensive QA first. In my other open source projects, either we've managed QA ourselves by 'switching hats' for a couple weeks prior to release, or just crossed our fingers because the user base was too small. Lucene.Net is a fairly high-profile project, with a large user base. I think it would not be responsible to make a release without a formal QA process. We do have extensive unit tests, but do you think those are sufficient to cover our QA needs? Should we try to find community members with a specialty in software testing that would be willing to fulfill this role on our project? Should we just swap hats? I didn't worry about this issue with the latest 2.9.2 release because it was QAed by the user base for a long time before it was an 'official release'. Maybe this is an effective tactic? Release first, and let the user base roll in bug reports fixing them on yet later minor maintenance releases? This seems to be the method a lot of projects use (i.e. no specific QA process, but rather an organic process of 'try our best then deal with bug reports later'). What do we think about this? Thanks, Troy On Sun, Apr 3, 2011 at 11:59 PM, Prescott Nasser geobmx...@hotmail.com wrote: Hey all, I know we have a number of outstanding JIRA issues, but I think most of them have been handled for the 2.9.4 release? Do we have anything outstanding that is holding back a new release? ~P
Re: [Lucene.Net] [VOTE] New Directory Layout for Project
+1. No pending commits. A copy of the current trunk somewhere else(tag, branches etc.) would be good too. DIGY. On Tue, Mar 29, 2011 at 9:38 PM, Troy Howard thowar...@gmail.com wrote: Looks like we have a 'lazy consensus', in that, no one has raised any significant objections, a few minor modifications have been suggested (which sound totally reasonable), and those who did vote were positive. Barring any objections, this vote passes. Since DIGY and Scott seem to have gotten the bulk of the work on 2.9.4 finished, I think now is a good time to start the directory layout changes, and it won't be too intrusive to any active commits. I'll start on that this week. If you have any pending commits that would be totally screwed up by this directory change, please finalize those as soon as possible! Otherwise I'll be moving things around and your patches/commits might not be able to find the appropriate files. Thanks, Troy On Sun, Mar 20, 2011 at 12:44 AM, Prescott Nasser geobmx...@hotmail.com wrote: Any more thoughts on the directory structure? Quick Recap: We have Troy's original proposal here: http://people.apache.org/~thoward/Lucene.Net/directory-structure-example/ bin/ build/ (various solution and project files) vs2008/ vs2010/ doc/ lib/ - third party libraries to make it easy to pull down the source and go src/ contrib/ core/ demo/ test/ contrib/ core/ demo/ From here, I further suggested cleaning up the contrib folder - because we have extra folders: src/contrib/contrib.net/contrib.net/ - src/contrib/contrib.net/ src/contrib/snowball/snowball.net/ - src/contrib/Snowball.net/ Digy further suggested dropping the .net in all those folders above, and finding a better name for contrib.net. Date: Thu, 10 Mar 2011 09:41:17 +0200 From: digyd...@gmail.com To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project Well, not really core. Codes under Analyzer(by DIGY) can be moved to /src/contrib/analyzers (but they are not ports from java). The others(by M.GARSKI) are extensions to the core(something like Lucene.Net.Core.Extensions) DIGY On Thu, Mar 10, 2011 at 1:36 AM, Troy Howard wrote: Yeah -- I also changed the Contrib.Net project folder name to ~/src/contrib/core ... IMO we should just roll these into the main library if they are solid, tested and useful.. This is keeping in line with our new philosophy about allowing .NET specific changes, even if it means diverging from Java Lucene to do it. Thanks, Troy On Wed, Mar 9, 2011 at 12:56 PM, Prescott Nasser wrote: Actually what IS contrib.net? It looks like it replaces certain files in Lucene.Net core - are they files better suited to .net? What are they? If they are plugins / additional contributions like snowball, etc - why not just break it out and include the appropriate stuff in contrib? Do we need to specify that they are not avaliable in the java version? Date: Wed, 9 Mar 2011 22:18:22 +0200 From: digyd...@gmail.com To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project 0 .Nets seem to be redundant under /src/contrib/ . It could be something like Analyzers Highlighter Similarity ... (Maybe, we should find a different name for contrib.net. It contains contributions specific to Lucene.Net which are not available in Lucene.java) DIGY On Wed, Mar 9, 2011 at 9:08 PM, Prescott Nasser wrote: Probably just a miss - but under the src/contrib folder you also have a number of tests in there... Also, is it necessary to have all the sub folders? For the most part the stuff in contrib.net is contrib.net - why the secondary folder? Unless that is a requirement of NUnit to have the structure that way it seems a bit cluttered. I would think something like src/contrib/contrib.net/ src/contrib/Snowball.net/ instead of src/contrib/contrib.net/contrib.net/ src/contrib/snowball/snowball.net/ I don't know how people feel about that ~P Date: Wed, 9 Mar 2011 13:31:34 -0500 From: mhern...@wickedsoftware.net To: lucene-net-...@lucene.apache.org CC: thowar...@gmail.com Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project +1 just a question though. for cmd/bat//sh files for letting people executing the build or just executing other tools from the command line, would those have a place
Re: [Lucene.Net] [VOTE] New Directory Layout for Project
After this directory layout changes; what about replacing ArrayLists, Hashtables etc, with appropriate Generics? This would bring us very close to lucene 3.0.3 (and not hard to do with the help of VS). DIGY On Tue, Mar 29, 2011 at 10:02 PM, Troy Howard thowar...@gmail.com wrote: Sounds good. I'll make a tag prior to starting the directory changes, but I'll commit changes to trunk. Thanks, Troy On Tue, Mar 29, 2011 at 11:55 AM, digy digy digyd...@gmail.com wrote: +1. No pending commits. A copy of the current trunk somewhere else(tag, branches etc.) would be good too. DIGY. On Tue, Mar 29, 2011 at 9:38 PM, Troy Howard thowar...@gmail.com wrote: Looks like we have a 'lazy consensus', in that, no one has raised any significant objections, a few minor modifications have been suggested (which sound totally reasonable), and those who did vote were positive. Barring any objections, this vote passes. Since DIGY and Scott seem to have gotten the bulk of the work on 2.9.4 finished, I think now is a good time to start the directory layout changes, and it won't be too intrusive to any active commits. I'll start on that this week. If you have any pending commits that would be totally screwed up by this directory change, please finalize those as soon as possible! Otherwise I'll be moving things around and your patches/commits might not be able to find the appropriate files. Thanks, Troy On Sun, Mar 20, 2011 at 12:44 AM, Prescott Nasser geobmx...@hotmail.com wrote: Any more thoughts on the directory structure? Quick Recap: We have Troy's original proposal here: http://people.apache.org/~thoward/Lucene.Net/directory-structure-example/ bin/ build/ (various solution and project files) vs2008/ vs2010/ doc/ lib/ - third party libraries to make it easy to pull down the source and go src/ contrib/ core/ demo/ test/ contrib/ core/ demo/ From here, I further suggested cleaning up the contrib folder - because we have extra folders: src/contrib/contrib.net/contrib.net/ - src/contrib/contrib.net/ src/contrib/snowball/snowball.net/ - src/contrib/Snowball.net/ Digy further suggested dropping the .net in all those folders above, and finding a better name for contrib.net. Date: Thu, 10 Mar 2011 09:41:17 +0200 From: digyd...@gmail.com To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project Well, not really core. Codes under Analyzer(by DIGY) can be moved to /src/contrib/analyzers (but they are not ports from java). The others(by M.GARSKI) are extensions to the core(something like Lucene.Net.Core.Extensions) DIGY On Thu, Mar 10, 2011 at 1:36 AM, Troy Howard wrote: Yeah -- I also changed the Contrib.Net project folder name to ~/src/contrib/core ... IMO we should just roll these into the main library if they are solid, tested and useful.. This is keeping in line with our new philosophy about allowing .NET specific changes, even if it means diverging from Java Lucene to do it. Thanks, Troy On Wed, Mar 9, 2011 at 12:56 PM, Prescott Nasser wrote: Actually what IS contrib.net? It looks like it replaces certain files in Lucene.Net core - are they files better suited to .net? What are they? If they are plugins / additional contributions like snowball, etc - why not just break it out and include the appropriate stuff in contrib? Do we need to specify that they are not avaliable in the java version? Date: Wed, 9 Mar 2011 22:18:22 +0200 From: digyd...@gmail.com To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project 0 .Nets seem to be redundant under /src/contrib/ . It could be something like Analyzers Highlighter Similarity ... (Maybe, we should find a different name for contrib.net. It contains contributions specific to Lucene.Net which are not available in Lucene.java) DIGY On Wed, Mar 9, 2011 at 9:08 PM, Prescott Nasser wrote: Probably just a miss - but under the src/contrib folder you also have a number of tests in there... Also, is it necessary to have all the sub folders? For the most part the stuff in contrib.net is contrib.net - why the secondary folder? Unless that is a requirement of NUnit to have the structure that way it seems a bit cluttered. I would think something like src/contrib/contrib.net/ src/contrib/Snowball.net
Re: [Lucene.Net] [VOTE] New Directory Layout for Project
We already have a release for .NET 2.0 (Lucene.Net 2.9.2). So, jumping to 4.0 shouldn't be a problem for Lucene,Net community. DIGY On Tue, Mar 29, 2011 at 10:31 PM, Troy Howard thowar...@gmail.com wrote: Sounds good to me. I have done this previously in a local branch and noticed massive performance improvements. Removing all the casting in the library makes for dramatic speedups. As a side note: Chris Currens is in the process of benchmarking Lucene.Net running under .NET 4.0 vs 3.5 vs 2.0... This benchmarking is to prove what we found in our production deployments... Compiling and deploying as a .NET 4.0 assembly results in major improvements in both speed and correct memory handling (memory leaks magically disappear). We want to prove this with benchmarks before publishing a definitive statement about this however. If this is the case, there might be a very compelling reason to move forward to 4.0 runtime for Lucene.Net. Thanks, Troy On Tue, Mar 29, 2011 at 12:23 PM, digy digy digyd...@gmail.com wrote: After this directory layout changes; what about replacing ArrayLists, Hashtables etc, with appropriate Generics? This would bring us very close to lucene 3.0.3 (and not hard to do with the help of VS). DIGY On Tue, Mar 29, 2011 at 10:02 PM, Troy Howard thowar...@gmail.com wrote: Sounds good. I'll make a tag prior to starting the directory changes, but I'll commit changes to trunk. Thanks, Troy On Tue, Mar 29, 2011 at 11:55 AM, digy digy digyd...@gmail.com wrote: +1. No pending commits. A copy of the current trunk somewhere else(tag, branches etc.) would be good too. DIGY. On Tue, Mar 29, 2011 at 9:38 PM, Troy Howard thowar...@gmail.com wrote: Looks like we have a 'lazy consensus', in that, no one has raised any significant objections, a few minor modifications have been suggested (which sound totally reasonable), and those who did vote were positive. Barring any objections, this vote passes. Since DIGY and Scott seem to have gotten the bulk of the work on 2.9.4 finished, I think now is a good time to start the directory layout changes, and it won't be too intrusive to any active commits. I'll start on that this week. If you have any pending commits that would be totally screwed up by this directory change, please finalize those as soon as possible! Otherwise I'll be moving things around and your patches/commits might not be able to find the appropriate files. Thanks, Troy On Sun, Mar 20, 2011 at 12:44 AM, Prescott Nasser geobmx...@hotmail.com wrote: Any more thoughts on the directory structure? Quick Recap: We have Troy's original proposal here: http://people.apache.org/~thoward/Lucene.Net/directory-structure-example/ bin/ build/ (various solution and project files) vs2008/ vs2010/ doc/ lib/ - third party libraries to make it easy to pull down the source and go src/ contrib/ core/ demo/ test/ contrib/ core/ demo/ From here, I further suggested cleaning up the contrib folder - because we have extra folders: src/contrib/contrib.net/contrib.net/ - src/contrib/contrib.net/ src/contrib/snowball/snowball.net/ - src/contrib/Snowball.net/ Digy further suggested dropping the .net in all those folders above, and finding a better name for contrib.net. Date: Thu, 10 Mar 2011 09:41:17 +0200 From: digyd...@gmail.com To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project Well, not really core. Codes under Analyzer(by DIGY) can be moved to /src/contrib/analyzers (but they are not ports from java). The others(by M.GARSKI) are extensions to the core(something like Lucene.Net.Core.Extensions) DIGY On Thu, Mar 10, 2011 at 1:36 AM, Troy Howard wrote: Yeah -- I also changed the Contrib.Net project folder name to ~/src/contrib/core ... IMO we should just roll these into the main library if they are solid, tested and useful.. This is keeping in line with our new philosophy about allowing .NET specific changes, even if it means diverging from Java Lucene to do it. Thanks, Troy On Wed, Mar 9, 2011 at 12:56 PM, Prescott Nasser wrote: Actually what IS contrib.net? It looks like it replaces certain files in Lucene.Net core - are they files better suited to .net? What are they? If they are plugins / additional contributions like snowball, etc - why not just break it out and include the appropriate stuff in contrib? Do we need to specify that they are not avaliable in the java version
Re: [Lucene.Net] svn commit: r1080881 - in /incubator/lucene.net/trunk/C#/src/Lucene.Net: Index/DocumentsWriter.cs Index/StoredFieldsWriter.cs Index/TermVectorsTermsWriter.cs Index/TermVectorsTermsWri
It would be better to attach the patches to the issue before committing. So others can track what is going on. DIGY On Sat, Mar 12, 2011 at 9:20 AM, slomb...@apache.org wrote: Author: slombard Date: Sat Mar 12 07:20:44 2011 New Revision: 1080881 URL: http://svn.apache.org/viewvc?rev=1080881view=rev Log: [LUCENENET-399] (trunk) 2.9.3 - change LUCENE 2283: use shared byte[] pool to buffer pending stored fields term vectors during indexing; fixes excessive memory usage for mixed tiny big docs with many threads Modified: incubator/lucene.net/trunk/C#/src/Lucene.Net/Index/DocumentsWriter.cs incubator/ lucene.net/trunk/C#/src/Lucene.Net/Index/StoredFieldsWriter.cs incubator/ lucene.net/trunk/C#/src/Lucene.Net/Index/TermVectorsTermsWriter.cs incubator/ lucene.net/trunk/C#/src/Lucene.Net/Index/TermVectorsTermsWriterPerField.cs incubator/lucene.net/trunk/C#/src/Lucene.Net/Store/RAMFile.cs incubator/lucene.net/trunk/C#/src/Lucene.Net/Store/RAMOutputStream.cs Modified: incubator/ lucene.net/trunk/C#/src/Lucene.Net/Index/DocumentsWriter.cs URL: http://svn.apache.org/viewvc/incubator/lucene.net/trunk/C%23/src/Lucene.Net/Index/DocumentsWriter.cs?rev=1080881r1=1080880r2=1080881view=diff == --- incubator/lucene.net/trunk/C#/src/Lucene.Net/Index/DocumentsWriter.cs(original) +++ incubator/lucene.net/trunk/C#/src/Lucene.Net/Index/DocumentsWriter.csSat Mar 12 07:20:44 2011 @@ -19,15 +19,16 @@ using System; using Analyzer = Lucene.Net.Analysis.Analyzer; using Document = Lucene.Net.Documents.Document; -using AlreadyClosedException = Lucene.Net.Store.AlreadyClosedException; -using Directory = Lucene.Net.Store.Directory; -using ArrayUtil = Lucene.Net.Util.ArrayUtil; -using Constants = Lucene.Net.Util.Constants; using IndexSearcher = Lucene.Net.Search.IndexSearcher; using Query = Lucene.Net.Search.Query; using Scorer = Lucene.Net.Search.Scorer; using Similarity = Lucene.Net.Search.Similarity; using Weight = Lucene.Net.Search.Weight; +using AlreadyClosedException = Lucene.Net.Store.AlreadyClosedException; +using Directory = Lucene.Net.Store.Directory; +using RAMFile = Lucene.Net.Store.RAMFile; +using ArrayUtil = Lucene.Net.Util.ArrayUtil; +using Constants = Lucene.Net.Util.Constants; namespace Lucene.Net.Index { @@ -104,7 +105,7 @@ namespace Lucene.Net.Index { internal override DocConsumer GetChain(DocumentsWriter documentsWriter) - { +{ /* This is the current indexing chain: @@ -145,7 +146,8 @@ namespace Lucene.Net.Index freeLevel = (long) (IndexWriter.DEFAULT_RAM_BUFFER_SIZE_MB * 1024 * 1024 * 0.95); maxBufferedDocs = IndexWriter.DEFAULT_MAX_BUFFERED_DOCS; skipDocWriter = new SkipDocWriter(); - byteBlockAllocator = new ByteBlockAllocator(this); + byteBlockAllocator = new ByteBlockAllocator(this, BYTE_BLOCK_SIZE); +perDocAllocator = new ByteBlockAllocator(this, PER_DOC_BLOCK_SIZE); waitQueue = new WaitQueue(this); } @@ -220,6 +222,59 @@ namespace Lucene.Net.Index } } +//Create and return a new DocWriterBuffer. +internal PerDocBuffer newPerDocBuffer() +{ +return new PerDocBuffer(perDocAllocator); +} + +/// summaryRAMFile buffer for DocWriters./summary +internal class PerDocBuffer:RAMFile +{ +public PerDocBuffer(ByteBlockAllocator perDocAllocator) + { + InitBlock(perDocAllocator); + } +private void InitBlock(ByteBlockAllocator perDocAllocator) + { +this.perDocAllocator = perDocAllocator; + } +private ByteBlockAllocator perDocAllocator; + +/// summary +/// Allocate bytes used from shared pool. +/// /summary +/// param name=sizeSize of new buffer. Fixed at see cref=PER_DOC_BLOCK_SIZE/./param +/// returns/returns +protected internal byte[] newBuffer(int size) +{ +System.Diagnostics.Debug.Assert(size == PER_DOC_BLOCK_SIZE); +return perDocAllocator.GetByteBlock(false); +} + +//Recycle the bytes used. +internal void recycle() +{ +lock(this) +{ +if (buffers.Count 0) +{ +SetLength(0); + +// Recycle the blocks +int blockCount = buffers.Count; +
Re: [Lucene.Net] [VOTE] New Directory Layout for Project
0 .Nets seem to be redundant under /src/contrib/ . It could be something like Analyzers Highlighter Similarity ... (Maybe, we should find a different name for contrib.net. It contains contributions specific to Lucene.Net which are not available in Lucene.java) DIGY On Wed, Mar 9, 2011 at 9:08 PM, Prescott Nasser geobmx...@hotmail.comwrote: Probably just a miss - but under the src/contrib folder you also have a number of tests in there... Also, is it necessary to have all the sub folders? For the most part the stuff in contrib.net is contrib.net - why the secondary folder? Unless that is a requirement of NUnit to have the structure that way it seems a bit cluttered. I would think something like src/contrib/contrib.net/ src/contrib/Snowball.net/ instead of src/contrib/contrib.net/contrib.net/ src/contrib/snowball/snowball.net/ I don't know how people feel about that ~P Date: Wed, 9 Mar 2011 13:31:34 -0500 From: mhern...@wickedsoftware.net To: lucene-net-dev@lucene.apache.org CC: thowar...@gmail.com Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project +1 just a question though. for cmd/bat//sh files for letting people executing the build or just executing other tools from the command line, would those have a place in /bin or somewhere els? This is that someone can just export PATH = / SET PATH= to that one folder and then be able to execute those commands from one location? On Sun, Mar 6, 2011 at 11:27 PM, Troy Howard wrote: All, We'd like to update the project directory structure/layout. See below for a proposed layout. I've also uploaded an example which you can navigate at: http://people.apache.org/~thoward/Lucene.Net/directory-structure-example NOTE: This will not build!! I just put things in the appropriate places without updating the solution/project files to show how we might lay things out. Also, I included NUnit as an example of a third-party dependency that we might include in the repository under 'lib'. We of course will *not* be distributing NUnit in this manner, due to licensing restrictions. Ok, disclaimer over... Please vote on this layout, or suggest a modification or alternative layout. Voting will be open for 72 hours. [ ] +1 Use this directory structure exactly as described, or with a minor modification [ ] 0 Use a different structure (described in response) [ ] -1 Do not change the directory structure at all Text description of directory schema: Build Files: \build \build\VS2008 \build\VS2010 Source Projects: \src \src\contrib \src\core \src\demo \src\contrib\ \src\core\ \src\demo\ Test Projects: \test \test\contrib \test\core \test\demo \test\contrib\ \test\core\ \test\demo\ Product Documentation: \doc \doc\contrib \doc\core \doc\demo \doc\contrib\ \doc\core\ \doc\demo\ Third-Party Dependencies: \lib \lib\ \lib\\ \lib\\\ Binary Builds: \bin \bin\contrib \bin\core \bin\demo \bin\contrib\ \bin\core\ \bin\demo\
Re: [Lucene.Net] [VOTE] New Directory Layout for Project
Well, not really core. Codes under Analyzer(by DIGY) can be moved to /src/contrib/analyzers (but they are not ports from java). The others(by M.GARSKI) are extensions to the core(something like Lucene.Net.Core.Extensions) DIGY On Thu, Mar 10, 2011 at 1:36 AM, Troy Howard thowar...@gmail.com wrote: Yeah -- I also changed the Contrib.Net project folder name to ~/src/contrib/core ... IMO we should just roll these into the main library if they are solid, tested and useful.. This is keeping in line with our new philosophy about allowing .NET specific changes, even if it means diverging from Java Lucene to do it. Thanks, Troy On Wed, Mar 9, 2011 at 12:56 PM, Prescott Nasser geobmx...@hotmail.com wrote: Actually what IS contrib.net? It looks like it replaces certain files in Lucene.Net core - are they files better suited to .net? What are they? If they are plugins / additional contributions like snowball, etc - why not just break it out and include the appropriate stuff in contrib? Do we need to specify that they are not avaliable in the java version? Date: Wed, 9 Mar 2011 22:18:22 +0200 From: digyd...@gmail.com To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project 0 .Nets seem to be redundant under /src/contrib/ . It could be something like Analyzers Highlighter Similarity ... (Maybe, we should find a different name for contrib.net. It contains contributions specific to Lucene.Net which are not available in Lucene.java) DIGY On Wed, Mar 9, 2011 at 9:08 PM, Prescott Nasser wrote: Probably just a miss - but under the src/contrib folder you also have a number of tests in there... Also, is it necessary to have all the sub folders? For the most part the stuff in contrib.net is contrib.net - why the secondary folder? Unless that is a requirement of NUnit to have the structure that way it seems a bit cluttered. I would think something like src/contrib/contrib.net/ src/contrib/Snowball.net/ instead of src/contrib/contrib.net/contrib.net/ src/contrib/snowball/snowball.net/ I don't know how people feel about that ~P Date: Wed, 9 Mar 2011 13:31:34 -0500 From: mhern...@wickedsoftware.net To: lucene-net-dev@lucene.apache.org CC: thowar...@gmail.com Subject: Re: [Lucene.Net] [VOTE] New Directory Layout for Project +1 just a question though. for cmd/bat//sh files for letting people executing the build or just executing other tools from the command line, would those have a place in /bin or somewhere els? This is that someone can just export PATH = / SET PATH= to that one folder and then be able to execute those commands from one location? On Sun, Mar 6, 2011 at 11:27 PM, Troy Howard wrote: All, We'd like to update the project directory structure/layout. See below for a proposed layout. I've also uploaded an example which you can navigate at: http://people.apache.org/~thoward/Lucene.Net/directory-structure-example NOTE: This will not build!! I just put things in the appropriate places without updating the solution/project files to show how we might lay things out. Also, I included NUnit as an example of a third-party dependency that we might include in the repository under 'lib'. We of course will *not* be distributing NUnit in this manner, due to licensing restrictions. Ok, disclaimer over... Please vote on this layout, or suggest a modification or alternative layout. Voting will be open for 72 hours. [ ] +1 Use this directory structure exactly as described, or with a minor modification [ ] 0 Use a different structure (described in response) [ ] -1 Do not change the directory structure at all Text description of directory schema: Build Files: \build \build\VS2008 \build\VS2010 Source Projects: \src \src\contrib \src\core \src\demo \src\contrib\ \src\core\ \src\demo\ Test Projects: \test \test\contrib \test\core \test\demo \test\contrib\ \test\core\ \test\demo\ Product Documentation: \doc \doc\contrib \doc\core \doc\demo \doc\contrib\ \doc\core\ \doc\demo\ Third-Party Dependencies: \lib \lib\ \lib\\ \lib\\\ Binary Builds: \bin \bin\contrib \bin\core \bin\demo \bin\contrib\ \bin\core\ \bin\demo\
Re: [Lucene.Net] [VOTE] Release Apache Lucene.Net 2.9.2-incubating-RC2
+1 DIGY On Mon, Feb 28, 2011 at 11:04 AM, Glyn Darkin g...@darkinsystems.comwrote: +1 On 28 Feb 2011, at 08:39, Troy Howard wrote: All, A quick voting reminder... This [VOTE] thread will only be active for another 4 hours (72 hours total). So far, we have two +1 votes in. After this vote, the release will be proposed to the Incubator PMC, and will have another 72 hour vote for acceptance there. Assuming that passes, it will become an official ASF Incubator release. Thanks, Troy On Fri, Feb 25, 2011 at 12:29 PM, Stefan Bodewig bode...@apache.org wrote: On 2011-02-25, Troy Howard wrote: I updated the .src zip and associated checksums/signatures at: I have verified the bin zip is still the same that I checked. All signatures and hashes are fine, RAT is reasonably happy with the src zip (I've updated http://people.apache.org/~bodewig/Lucene.NET/src-rat.log). +1 from me for the release. Of course I haven't perfromed any technical tests, just verified the artifacts meet the Incubator requirements (at least all I know of). Hopefully this one will pass the licensing validation! I think I got them all this time.. Thanks a lot. Stefan Glyn Darkin Darkin Systems Ltd Mob: 07961815649 Fax: 08717145065 Web: www.darkinsystems.com Company No: 6173001 VAT No: 906350835
Re: [Lucene.Net] CI Task Update: Hudkins
+1 DIGY On Mon, Feb 28, 2011 at 10:29 AM, Troy Howard thowar...@gmail.com wrote: +1 to all suggestions. Hudkins is my new favourite word. ;) One quick concern I have, is how much of the things listed are already available on the Apache hudson server? A lot of this is .NET specific, so unlikely that it will already be available. We'll have to request that ASF Infra team install these tools for us, and they may not agree, or there might be licensing issues, etc.. Not sure. I'd start the conversation with them now to suss this out. Specifically some thoughts: - FxCop/StyleCop/NCover/NDepend are a basics which should be viewed as a necessity for any serious .NET project - I love SandCastle and was about to bring that up on the list. We could keep rolling with NDoc. Either way. - I love Gallio/MbUnit/Moq... nunit not so much, but again, either would be fine. - Mono is going to be a requirement moving forward - Project structure was being discussed on the LUCENENET-377 thread. For the binary release, I used a structure similar to what you were describing, and the topic of applying that structure to trunk came up. This is something that we should discuss in detail before applying to trunk (my work was in a branch). We need to make sure directory structure remains relatively static, however, current structure needs a lot of improvement.. So now is a good time to change. Digy suggested: \build \contrib \core \core\Lucene.Net \core\Test \demo ... and in the recent binary release, I used a root \bin and \doc in the same way you suggested. As a combination of ideas, how about: Build Files: \build \build\VS2008 \build\VS2010 Source Projects: \src \src\contrib \src\core \src\demo \src\contrib\project-name \src\core\project-name \src\demo\project-name Test Projects: \test \test\contrib \test\core \test\demo \test\contrib\project-name \test\core\project-name \test\demo\project-name Product Documentation: \doc \doc\contrib \doc\core \doc\demo \doc\contrib\project-name \doc\core\project-name \doc\demo\project-name Third-Party Dependencies: \lib \lib\vendor \lib\vendor\product \lib\vendor\product\version Binary Builds: \bin \bin\contrib \bin\core \bin\demo \bin\contrib\project-name \bin\core\project-name \bin\demo\project-name Thanks, Troy On Sun, Feb 27, 2011 at 11:10 PM, Michael Herndon mhern...@o19s.comwrote: So the CI choices for Apache are the following: - Buildbot http://ci.apache.org/#buildbot - Continuum http://ci.apache.org/#continuum - Gump http://ci.apache.org/#gump - Hudson http://ci.apache.org/#hudson There is a current discussion on the build list about moving with the name shift of hudson to jenkins. My vote is to go with hudkins [?] because it has been successfully used for .net projects in the past and has plugins to help support that. Nothing personal against python, but there seems to be more material on integration .net builds inside of Hudson. I've also used Hudson in the past so I can vouch that it does a decent job. (This was a while ago, so I can only hope the number of plugins and integration points have increased). I'm going to do a bit more reading on the apache mailing list for builds to see if there is an actual windows slave for hudson/jenkins (which shall henceforth be called hudkins on this list, well at least by me). Obviously the first priority will be getting a build set up and starting simple. However to spark discussion and future planning: Here is a list of other things to include or think about for the build process long term. * fxcop - (this will probably need customized rules for strict java port version) * stylecop - (same) * sandcastle - (building xml comments into documentation). * sandcastle help file builder - (SHFB) * code coverage tool - possibly seeing if we can get a code coverage tool (possibly ncover as they used to give a free license to os projects), * code metrics tool - (i.e. cyclomatic complexity, ndepend used to do the same thing as ncover, thus worth investigating). * gallio test runner vs nunit. (gallio is testing automation tool capable of running various testing frameworks and tools including nunit). * extended msbuild tasks. * mono build. * project structure for the build. ** * insert _ any other suggestions here. I'll volunteer myself for the boring job of fixing up xml comments so there is some meat and code examples inside xml comments so the documentation generates more than just text and method signatures with type information. After some discussion in the list and unless there is show stopper or killer reason not to go with hudkins. I'll notate the decision the jira and start putting notes in the wiki about the CI. notes: - ** a common project structure/format or some variant there of, which might be more intuitive for people that have worked on other
Re: IKVM (or rather OpenJDK) License Problem
Hi Stefan, * Java's bytecode doesn't contain metadata about generics and when Java is compiled, all info about generics gets lost. So, IKVMed Lucene.Net will have to live without generics. * IKVM is the java world in .NET runtime in fact. If you are , for ex, to write an analyzer, you have to override TokenStream method which accepts java.io.Reader instead of System.IO.TextReader. So .NET people have to learn java namespaces/classes and develop their own java-compatible libraries * Since IKVM is a different world, remoting (for ex.) between native .NET code IKVMed code is problematic (one uses java.rmi.server.UnicastRemoteObject, the other one System.MarshalByRefObject). * It's not possible to make custom changes in IKVMed Lucene.NET unless you make your changes in java sources and compile them. I think people can find more examples. Of course, none of them is a blocking issue but too far from giving a .NET taste. DIGY On Fri, Jan 28, 2011 at 7:24 AM, Stefan Bodewig bode...@apache.org wrote: On 2011-01-27, Granroth, Neal V. wrote: Use of IKVM was discussed before. I'm really sorry. Normally I wouldn't have brought it up without searching the archive - I did so in the context of this is a question the people we hope to attract might ask. Please be patient with the new people we want to attract, they will not hunt down the mailing list archives for every idea they have. This is why putting things on the Wiki like Scott has started is a better approach. You can tell people was discussed before and URL-HERE is the outcome. Adding this layer (or any other shim) on top of Lucene.NET is extremely unpalatable in the environment in which our products are deployed. The license rules it out anyway (unless we ikvmc'ed Harmony, yet another can of worms) so this question is moot. But still, out of curiosity: is there any technical reason that turned it into a bad idea? The discussion from the other thread seemed to indicate that performance was not an issue. Thanks Stefan
Re: IKVM (or rather OpenJDK) License Problem
* It's not possible to make custom changes in IKVMed Lucene.NET unless you make your changes in java sources and compile them. Wouldn't a custom change contradict the goal of a line-by-line translation? What I intented to say was customizations made by Lucene.Net users, not as a Lucene.Net project. DIGY On Fri, Jan 28, 2011 at 10:44 AM, Stefan Bodewig bode...@apache.org wrote: Hi DIGY On 2011-01-28, digy digy wrote: * Java's bytecode doesn't contain metadata about generics and when Java is compiled, all info about generics gets lost. So, IKVMed Lucene.Net will have to live without generics. Ah, yes, the joys of type erasure. I completely missed that. * IKVM is the java world in .NET runtime in fact. If you are , for ex, to write an analyzer, you have to override TokenStream method which accepts java.io.Reader instead of System.IO.TextReader. So .NET people have to learn java namespaces/classes and develop their own java-compatible libraries * Since IKVM is a different world, remoting (for ex.) between native .NET code IKVMed code is problematic (one uses java.rmi.server.UnicastRemoteObject, the other one System.MarshalByRefObject). Ugly, I agree. Although this could be meliorated by an additional .NET centric library that took care of adapting the differences. This extra layer would add complexity and not help with performance, of course. * It's not possible to make custom changes in IKVMed Lucene.NET unless you make your changes in java sources and compile them. Wouldn't a custom change contradict the goal of a line-by-line translation? Many thanks, I'll add your points to the wiki Stefan
Re: Build CI Considerations
No. It's Rune's work. http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/200912.mbox/%3c4b1820f4.10...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/lucene-lucene-net-dev/200912.mbox/%3c4b1820f4.10...@gmail.com%3E DIGY On Thu, Jan 27, 2011 at 12:16 PM, Glyn Darkin g...@darkinsystems.comwrote: The guys at code better run a Team City CI which has been building Lucene.Net for a while I believe that DIGY set this up. http://teamcity.codebetter.com/login.html Glyn On 27 Jan 2011, at 09:28, Stefan Bodewig wrote: On 2011-01-26, Wyatt Barnett wrote: 2) CI : oh hells yeah. My vision would be to setup something where the automated conversion would be triggered by commits to the stable branch of the java project. I think if we can construct this bit right we can even really get down the road of automatically running all the conversion options until we get it right. Sounds good. Back to the mundane as you said later the ASF runs a few options for CI http://ci.apache.org/, one of them is Hudson https://hudson.apache.org/hudson/ which has at least one Windows slave installation (Server 2008) and is supposed to support MSBuild. Buildbot might work as well. I'm not up to speed with the state of xbuild but adding support for it to Gump (which fills quite a different role from a traditional CI) wouldn't be too hard and give us builds on Mono - albeit 2.4 right now, this could be changed by adding the Mono PPAs to the Ubuntu servers. Stefan Glyn Darkin Darkin Systems Ltd Mob: 07961815649 Fax: 08717145065 Web: www.darkinsystems.com Company No: 6173001 VAT No: 906350835