from:"Grant Ingersoll"

Re: building Lucene from sources without Solr sources from svn ?

2011-03-28 Thread Grant Ingersoll

I would rather not rebuild for L-2996.  This issue has a known workaround.   As 
for the sources issue that Andi brought up, it never effected 3.1 b/c it 
doesn't have the validation stuff.

I'd like to stick w/ the artifacts we have.

On Mar 28, 2011, at 11:33 AM, Shai Erera wrote:

> If you're talking about LUCENE-2996, then note that I haven't checked in the 
> code yet. If you're going to rebuild the artifacts off of 
> branches/lucene_solr_3_1, I can check in the code there now.
> 
> Shai
> 
> On Mon, Mar 28, 2011 at 5:04 PM, Robert Muir  wrote:
> On Mon, Mar 28, 2011 at 10:54 AM, Uwe Schindler  wrote:
> > Hi,
> >
> > If we we have to rebuild the artifacts, should we add Shai/Mike's
> > addIndexes() fix, too?
> >
> 
> 3.1 branch is fine with regards to this issue, thats why I raised my
> question... it seems only the 3.1 release branch was "fixed" for this
> but trunk and branch_3x are broken.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
>

Re: building Lucene from sources without Solr sources from svn ?

2011-03-28 Thread Grant Ingersoll

I don't think it is lost, it's likely my mistake in adding a top level 
common-build for the validation stuff.  We can change it out.

On Mar 28, 2011, at 8:27 AM, Robert Muir wrote:

> the real question is: why is this fixed only in the 3.1 branch?
> 
> how did our 3.1 branch and 3.x/trunk get so different? I don't like
> that any work done to get 3.1 out the door is going to be "lost" and
> have to be re-done.
> 
> 
> On Sun, Mar 27, 2011 at 5:26 PM, Andi Vajda  wrote:
>> 
>>  Hi,
>> 
>> It seems that at this time, the HEAD of branch_3x can no longer be
>> conveniently checked out with Lucene sources only.
>> 
>> If I check out
>> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/lucene
>> I'm not getting the common-build.xml file from above that the Lucene one
>> depends on. If I checkout branch_3x, I get Solr sources as well.
>> 
>> Is this intentional or an oversight ?
>> 
>> Andi..
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Bug with addIndexes and deleteDocs in trunk and 3x?

2011-03-27 Thread Grant Ingersoll

So, I'd vote we document it in JIRA for 3.1.0 and then continue on with the 
current release artifacts.

-Grant

On Mar 27, 2011, at 9:48 AM, Michael McCandless wrote:

> I think the workaround is to subclass IW and insert your own flush...
> 
> Mike
> 
> http://blog.mikemccandless.com
> 
> On Sun, Mar 27, 2011 at 9:41 AM, Grant Ingersoll  wrote:
>> Is there a workaround?
>> 
>> 
>> On Mar 27, 2011, at 9:30 AM, Michael McCandless wrote:
>> 
>>> Indeed I think this is a real bug -- addIndexes(IR[]) should call
>>> flush(false, true), just like addIndexes(Dir[]) does.
>>> 
>>> Mike
>>> 
>>> http://blog.mikemccandless.com
>>> 
>>> On Sun, Mar 27, 2011 at 9:07 AM, Shai Erera  wrote:
>>>> Hi
>>>> 
>>>> One of our users stumbled upon what seems to be a bug in trunk (didn't
>>>> verify yet against 3x but I have a feeling it exists there as well). The
>>>> scenario is: you want to add an index into an existing index. Beforehand,
>>>> you want to delete all new docs from the existing index. These are the
>>>> operations that are performed:
>>>> 1) deleteDocuments(Term) for all the new documents
>>>> 2) addIndexes(IndexReader)
>>>> 3) commit
>>>> 
>>>> Strangely, it looks like the deleteDocs happens *after* addIndexes. Even
>>>> more strangely, if addIndexes(Directory) is called, the deletes are applied
>>>> *before* addIndexes. This user needs to use addIndexes(IndexReader) in 
>>>> order
>>>> to rewrite payloads using PayloadProcessorProvider. He reported this error
>>>> using a "3x" checkout which is before the RC branch (as he intends to use
>>>> 3.1). I wrote a short unit test that demonstrates this bug on trunk:
>>>> 
>>>> {code}
>>>> private static IndexWriter createIndex(Directory dir) throws Exception 
>>>> {
>>>> IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_40,
>>>> new MockAnalyzer());
>>>> IndexWriter writer = new IndexWriter(dir, conf);
>>>> Document doc = new Document();
>>>> doc.add(new Field("id", "myid", Store.NO,
>>>> Index.NOT_ANALYZED_NO_NORMS));
>>>> writer.addDocument(doc);
>>>> writer.commit();
>>>> return writer;
>>>> }
>>>> 
>>>> public static void main(String[] args) throws Exception {
>>>> // Create the first index
>>>> Directory dir = new RAMDirectory();
>>>> IndexWriter writer = createIndex(dir);
>>>> 
>>>> // Create the second index
>>>> Directory dir1 = new RAMDirectory();
>>>> createIndex(dir1);
>>>> 
>>>> // Now delete the document
>>>> writer.deleteDocuments(new Term("id", "myid"));
>>>> writer.addIndexes(IndexReader.open(dir1));
>>>> //writer.addIndexes(dir1);
>>>> writer.commit();
>>>> System.out.println("numDocs=" + writer.numDocs());
>>>> writer.close();
>>>> }
>>>> {code}
>>>> 
>>>> The test as it is prints "numDocs=0", while if you switch the addIndexes
>>>> calls, it prints 1 (which should be the correct answer).
>>>> 
>>>> Before I open an issue for this, I wanted to verify that it's indeed a bug
>>>> and I haven't missed anything in the expected behavior of these two
>>>> addIndexes. If indeed it's a bug, I think it should be a blocker for 3.1?
>>>> I'll also make a worthy junit test out of it.
>>>> 
>>>> BTW, the user, as an intermediary solution, extends IndexWriter and calls
>>>> flush() before the delete and addIndexes calls. It would be preferable if
>>>> this solution can be avoided.
>>>> 
>>>> Shai
>>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Bug with addIndexes and deleteDocs in trunk and 3x?

2011-03-27 Thread Grant Ingersoll

Is there a workaround?  


On Mar 27, 2011, at 9:30 AM, Michael McCandless wrote:

> Indeed I think this is a real bug -- addIndexes(IR[]) should call
> flush(false, true), just like addIndexes(Dir[]) does.
> 
> Mike
> 
> http://blog.mikemccandless.com
> 
> On Sun, Mar 27, 2011 at 9:07 AM, Shai Erera  wrote:
>> Hi
>> 
>> One of our users stumbled upon what seems to be a bug in trunk (didn't
>> verify yet against 3x but I have a feeling it exists there as well). The
>> scenario is: you want to add an index into an existing index. Beforehand,
>> you want to delete all new docs from the existing index. These are the
>> operations that are performed:
>> 1) deleteDocuments(Term) for all the new documents
>> 2) addIndexes(IndexReader)
>> 3) commit
>> 
>> Strangely, it looks like the deleteDocs happens *after* addIndexes. Even
>> more strangely, if addIndexes(Directory) is called, the deletes are applied
>> *before* addIndexes. This user needs to use addIndexes(IndexReader) in order
>> to rewrite payloads using PayloadProcessorProvider. He reported this error
>> using a "3x" checkout which is before the RC branch (as he intends to use
>> 3.1). I wrote a short unit test that demonstrates this bug on trunk:
>> 
>> {code}
>> private static IndexWriter createIndex(Directory dir) throws Exception {
>> IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_40,
>> new MockAnalyzer());
>> IndexWriter writer = new IndexWriter(dir, conf);
>> Document doc = new Document();
>> doc.add(new Field("id", "myid", Store.NO,
>> Index.NOT_ANALYZED_NO_NORMS));
>> writer.addDocument(doc);
>> writer.commit();
>> return writer;
>> }
>> 
>> public static void main(String[] args) throws Exception {
>> // Create the first index
>> Directory dir = new RAMDirectory();
>> IndexWriter writer = createIndex(dir);
>> 
>> // Create the second index
>> Directory dir1 = new RAMDirectory();
>> createIndex(dir1);
>> 
>> // Now delete the document
>> writer.deleteDocuments(new Term("id", "myid"));
>> writer.addIndexes(IndexReader.open(dir1));
>> //writer.addIndexes(dir1);
>> writer.commit();
>> System.out.println("numDocs=" + writer.numDocs());
>> writer.close();
>> }
>> {code}
>> 
>> The test as it is prints "numDocs=0", while if you switch the addIndexes
>> calls, it prints 1 (which should be the correct answer).
>> 
>> Before I open an issue for this, I wanted to verify that it's indeed a bug
>> and I haven't missed anything in the expected behavior of these two
>> addIndexes. If indeed it's a bug, I think it should be a blocker for 3.1?
>> I'll also make a worthy junit test out of it.
>> 
>> BTW, the user, as an intermediary solution, extends IndexWriter and calls
>> flush() before the delete and addIndexes calls. It would be preferable if
>> this solution can be avoided.
>> 
>> Shai
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll


On Mar 26, 2011, at 9:03 PM, Chris Male wrote:

> Hi,
> 
> It really should say: Added Geospatial Support, as it was non-existent in 
> Solr before.
> 
> Most of the work for adding in spatial in Solr consisted of improving things 
> in Solr to make it easy to leverage the one spatial feature we really added: 
> distance based functions and parsing support.  Everything else was generally 
> useful things: sorting by function, poly fields, etc.  I started on tier 
> support, but dropped it when I realized it was broken beyond repair.  The 
> Solr stuff uses, IMO, the stuff in Lucene that works and ignores the rest.  I 
> seem to recall Chris had said that once I got done w/ the Solr stuff he would 
> do the modules work, but it hasn't happened yet.
> 
> I'd say in 3.2, since it sounds like Chris did at least deprecate 
> contrib/spatial, that we work to get all of this resolved:  spatial -> 
> modules, function queries -> modules.  Naturally we should do it on trunk, 
> too.
> 
> Just note that I didn't not do it out of laziness.  Actually pushing stuff 
> into the module isn't easy since there isn't much that can be saved from 
> contrib, and Solr's spatial code are predominately bound to function queries, 
> which themselves are very coupled to Solr and that there wasn't anything like 
> a consensus that they should be moved.

Agreed, it's not a small task.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

3.1.0 Proposed Release Announcement(s)

2011-03-26 Thread Grant Ingersoll

Proposed Release Announcement (edits welcome).  Also note we can have ASF 
Marketing put out a press release if we want.


March 2011, Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1 and 
Apache Solr 3.1. 

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at 
http://www.apache.org/dyn/closer.cgi/lucene/java
and http://www.apache.org/dyn/closer.cgi/lucene/java.  See the respective 
CHANGES.txt
file included with the release for a full list of details.

Lucene 3.1 Release Highlights
* Improved Unicode support, including Unicode 4

* ReusableAnalyzerBase make it easier to reuse TokenStreams correctly

* Protected words in stemming via KeywordAttribute

* ConstantScoreQuery now allows directly wrapping a Query

* Support for custom ExecutorService in ParallelMultiSearcher

* IndexWriterConfig.setMaxThreadStates for controls of IndexWriter threads

* Numerous performance improvements: faster exact PhraseQuery;
  natural segment merging favors segments with deletions; primary
  key lookup is faster; IndexWriter.addIndexes(Directory[]) uses
  file copy instead of merging; BufferedIndexInput does fewer bounds
  checks; compound file is dynamically turned off for large
  segments; fully deleted segments are dropped on commit; faster
  snowball analyzers (in contrib); ConcurrentMergeScheduler is more
  careful about setting priority of merge threads.

* IndexWriter is now configured with a new separate builder API
  (IndexWriterConfig).

* IndexWriter.getReader is replaced by
  IndexReader.open(IndexWriter).  In addition you can now specify
  whether deletes should be resolved when you open an NRT reader.

* MultiSearcher is deprecated; ParallelMultiSearcher has been
  absorbed directly into IndexSearcher

* CharTermAttribute replaces TermAttribute in the Analysis process

* On 64bit Windows and Solaris JVMs, MMapDirectory is now the
  default implementation (returned by FSDirectory.open).
  MMapDirectory also enables unmapping if the JVM supports it.

* New TotalHitCountCollector just counts total number of hits

* ReaderFinishedListener API enables external caches to evict
  entries once a segment is finished

Solr 3.1 Release Highlights

* Added spatial filtering, boosting and sorting capabilities

* Added extend dismax (edismax) query parser which addresses some missing
features in the dismax query parser along with some extensions

* Several more components now support distributed mode: TermsComponent, 
SpellCheckComponent

* Added an Auto Suggest component 

* Ability to sort by functions

* Support for adding documents using JSON format

* Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well 
as new analysis capabilities

* Numerous bug fixes and optimizations.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[VOTE] Lucene 3.1.0 RC3

2011-03-26 Thread Grant Ingersoll

Artifacts are at http://people.apache.org/~gsingers/staging_area/rc3/.  Please 
vote as you see appropriate.  Vote closes on March 29th.

I've also updated the Release To Do for both Lucene and Solr and it is 
hopefully a lot easier now to produce the artifacts as more of it is automated 
(including uploading to staging area).


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [solr] DataSource for HBase Tables?

2011-03-26 Thread Grant Ingersoll

Yes, if you are going to use the Data Import Handler, I would say that is the 
route to go.  You might also look at using an abstraction like Gora instead of 
having a dependency directly on HBase.


On Mar 25, 2011, at 4:32 PM, Sterk, Paul (Contractor) wrote:

> Hi,
>  
> I have a requirement to use Solr to import data from an HBase table and index 
> the contents – similar to importing data from a RDBMS.  It looks like I will 
> need to create an org.apache.solr.handler.dataimport.DataSource
> implementation for HBase to be used by the Data Import Handler.
>  
> Is this the correct approach?  If it is, has someone created a DataSource 
> implementation for HBase?
>  
> Paul
>  
>  
> This message, including any attachments, is the property of Sears Holdings 
> Corporation and/or one of its subsidiaries. It is confidential and may 
> contain proprietary or legally privileged information. If you are not the 
> intended recipient, please delete it without reading the contents. Thank you.

--
Grant Ingersoll
http://www.lucidimagination.com

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll


On Mar 26, 2011, at 9:48 AM, Yonik Seeley wrote:

> On Sat, Mar 26, 2011 at 7:32 AM, Robert Muir (JIRA)  wrote:
>> I don't really think things like this (queries etc) should go into just Solr
> 
> I disagree strongly with the sentiment that queries don't belong in Solr.
> Everything developed in/for lucene need not be exported to Solr immediately.
> Everything developed in/for solr need not be exported to Lucene immediately.
> 
> If the work has been done, and the patch works for Solr, that should
> be enough.  Period.
> 

I agree it's enough for the contributor to do that, but as committers we need 
to look at the bigger picture in this particular case, which is the move of 
spatial to modules.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll


On Mar 26, 2011, at 8:24 AM, Robert Muir wrote:

> On Sat, Mar 26, 2011 at 8:06 AM, Grant Ingersoll  wrote:
>> Not really related to this issue, so moving to dev@...
>> 
>> On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote:
>> 
>>> 
>>>[ 
>>> https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011616#comment-13011616
>>>  ]
>>> 
>>> Robert Muir commented on SOLR-2155:
>>> ---
>>> 
>>> well what would the deprecation have suggested as an alternative?
>> 
>> It's a good question.  The tier stuff, IMO and confirmed by others is broken 
>> for most of the world.  I sunk a good week into fixing it and was so 
>> entangled in the spaghetti that I gave up.  What we laid out on another 
>> issue (I forget the number, but I think C Male owns it and says he has a 
>> rewrite) is to move to modules, keep what we can (geohash and some of the 
>> utils) and gut the rest.  That combined w/ moving function queries to 
>> modules would make all of spatial a good solution for the large majority of 
>> users.  The only thing that would remain to be back to our current state (at 
>> least in terms of features) would be to implement a tier approach.  I've 
>> proposed the Military Grid System (there is an open JIRA issue for it) as 
>> something that looks to be as a good candidate.  It's well documented on the 
>> web and uses a metric for all distances and has the benefit that all of NATO 
>> uses it, albeit for different purposes.  It also addresses the poles and the 
>> meridians as first class citizens.  It just needs an implementer.  Having 
>> said that, I'm not 100% certain.  I also don't know that the tier stuff is 
>> absolutely necessary.  The combination of what we have in function queries 
>> plus trie fields makes for a very fast spatial lookup at this point.
>> 
>> I'm totally open to other suggestions, however.
>> 
>> Longer term, I've got a lot of ideas for spatial, but that's a different 
>> thread.
>> 
> 
> I guess the reason I asked my question is more high-level: on one hand
> there are suggestions that lucene's spatial package should have been
> deprecated in 3.1, but on the other hand the very first feature on
> solr 3.1's new feature list is 'improved geospatial support'.
> 

It really should say: Added Geospatial Support, as it was non-existent in Solr 
before.

Most of the work for adding in spatial in Solr consisted of improving things in 
Solr to make it easy to leverage the one spatial feature we really added: 
distance based functions and parsing support.  Everything else was generally 
useful things: sorting by function, poly fields, etc.  I started on tier 
support, but dropped it when I realized it was broken beyond repair.  The Solr 
stuff uses, IMO, the stuff in Lucene that works and ignores the rest.  I seem 
to recall Chris had said that once I got done w/ the Solr stuff he would do the 
modules work, but it hasn't happened yet.

I'd say in 3.2, since it sounds like Chris did at least deprecate 
contrib/spatial, that we work to get all of this resolved:  spatial -> modules, 
function queries -> modules.  Naturally we should do it on trunk, too.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll

Not really related to this issue, so moving to dev@...

On Mar 26, 2011, at 7:52 AM, Robert Muir (JIRA) wrote:

> 
>[ 
> https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011616#comment-13011616
>  ] 
> 
> Robert Muir commented on SOLR-2155:
> ---
> 
> well what would the deprecation have suggested as an alternative?

It's a good question.  The tier stuff, IMO and confirmed by others is broken 
for most of the world.  I sunk a good week into fixing it and was so entangled 
in the spaghetti that I gave up.  What we laid out on another issue (I forget 
the number, but I think C Male owns it and says he has a rewrite) is to move to 
modules, keep what we can (geohash and some of the utils) and gut the rest.  
That combined w/ moving function queries to modules would make all of spatial a 
good solution for the large majority of users.  The only thing that would 
remain to be back to our current state (at least in terms of features) would be 
to implement a tier approach.  I've proposed the Military Grid System (there is 
an open JIRA issue for it) as something that looks to be as a good candidate.  
It's well documented on the web and uses a metric for all distances and has the 
benefit that all of NATO uses it, albeit for different purposes.  It also 
addresses the poles and the meridians as first class citizens.  It just needs 
an implementer.  Having said that, I'm not 100% certain.  I also don't know 
that the tier stuff is absolutely necessary.  The combination of what we have 
in function queries plus trie fields makes for a very fast spatial lookup at 
this point.

I'm totally open to other suggestions, however.

Longer term, I've got a lot of ideas for spatial, but that's a different thread.

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-26 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011615#comment-13011615
 ] 

Grant Ingersoll commented on SOLR-2155:
---

Yeah, I agree.  I haven't looked at the patch yet.  It was my understanding 
that Chris Male was going to move lucene/contrib/spatial to modules and gut the 
broken stuff in it.  I think there is a separate issue open for that one.  
Presumably, once spatial and function queries are moved to modules, then we 
will have a properly working spatial package.

I obviously can move it, but I don't have time to do the gutting (we really 
should have deprecated the tier stuff for this release).

> Geospatial search using geohash prefixes
> 
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: Grant Ingersoll
> Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
> GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RC Status

2011-03-26 Thread Grant Ingersoll

I ran into a few kinks w/ signing artifacts (it wasn't finding the maven 
artifacts) in Solr and am fixing them.  Once that goes through, I will upload 
an RC
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-236) Field collapsing

2011-03-26 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011603#comment-13011603
 ] 

Grant Ingersoll commented on SOLR-236:
--

Keep in mind an alternative approach that scales, but loses some attributes of 
this patch (total groups for instance) is committed on trunk and will likely be 
backported to 3.2.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
>Assignee: Shalin Shekhar Mangar
> Fix For: Next
>
> Attachments: DocSetScoreCollector.java, 
> NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, 
> SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, 
> SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, 
> SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
> SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, 
> collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, 
> collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-5.patch, field-collapse-5.patch, 
> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> quasidistributed.additional.patch, solr-236.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-2992) Changes.html is not generated for an svn export of docs

2011-03-25 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-2992.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2
   3.1

> Changes.html is not generated for an svn export of docs
> ---
>
> Key: LUCENE-2992
> URL: https://issues.apache.org/jira/browse/LUCENE-2992
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.1, 3.2, 4.0
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 3.2, 4.0
>
> Attachments: LUCENE-2992.patch
>
>
> When we svn-export for release, the index.html at the top level expects 
> Changes.html in the docs, which is generated, so we should create it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll

On Mar 25, 2011, at 11:15 AM, Robert Muir wrote:

> On Fri, Mar 25, 2011 at 11:11 AM, Grant Ingersoll  wrote:
> 
>> No, as Hoss pointed out, it's broken now w/o the ide configurator!
> 
> Right, but my original suggestion (include dev-tools in the solr
> release, because its the whole trunk) will fix that.
> Alternatively we could remove the mention of dev-tools from the
> README.txt file anyway, its duplicated from HowToContribute which the
> README.txt links to already.
> 

OK.

> Lucene wouldnt have any refs to dev-tools so how is it broken by not
> including dev-tools?
> 

You are correct, it is not.  I was just commenting on all the artifacts.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2992) Changes.html is not generated for an svn export of docs

2011-03-25 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2992:


Attachment: LUCENE-2992.patch

generates the CHANGES.html as part of svn-export.

> Changes.html is not generated for an svn export of docs
> ---
>
> Key: LUCENE-2992
> URL: https://issues.apache.org/jira/browse/LUCENE-2992
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.1, 3.2, 4.0
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2992.patch
>
>
> When we svn-export for release, the index.html at the top level expects 
> Changes.html in the docs, which is generated, so we should create it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-2992) Changes.html is not generated for an svn export of docs

2011-03-25 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-2992:
---

Assignee: Grant Ingersoll

> Changes.html is not generated for an svn export of docs
> ---
>
> Key: LUCENE-2992
> URL: https://issues.apache.org/jira/browse/LUCENE-2992
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.1, 3.2, 4.0
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
>
> When we svn-export for release, the index.html at the top level expects 
> Changes.html in the docs, which is generated, so we should create it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll

On Mar 25, 2011, at 11:07 AM, Robert Muir wrote:

> On Fri, Mar 25, 2011 at 11:04 AM, Grant Ingersoll  wrote:
> 
>>> 
>>> So I don't think this is useful: dev-tools is for developers,
>> 
> 
> So now its a broken build system if it DOESNT include a working
> ide-configurator? This is what I meant by slippery slope

No, as Hoss pointed out, it's broken now w/o the ide configurator!
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll


On Mar 25, 2011, at 10:57 AM, Robert Muir wrote:
>> 
> 
> This is becoming a slippery slope fast... Uwe's perspective is
> starting to become much more attractive.

And what is that?  I've yet to see it written down.

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll


On Mar 25, 2011, at 10:57 AM, Robert Muir wrote:

> On Fri, Mar 25, 2011 at 10:45 AM, Grant Ingersoll  wrote:
> 
>> I do think we need standalone artifacts.  So, I suppose if we do that, then 
>> we can't just svn export, b/c we would need to separate dev tools per 
>> project.  But, then again, why can't we have:
>> /dev-tools/
>> /lucene/dev-tools
>> /solr/dev-tools
>> 
>> The top level just creates IDE that includes the lower ones, but the lower 
>> ones can each be standalone. (This goes for the Maven stuff too).
>> 
>> I realize, of course, this is work, so my suggestion would be we do 3.1 w/ 
>> it included as is and then fix in the next release.
>> 
> 
> I would be against this. currently to fix eclipse i just copy the
> .classpath file to /dev-tools/eclipse/dot.classpath and commit. This
> makes it significantly harder.
> Additionally I don't see how this could possibly work: a "standalone"
> solr would use lucene jar files since it doesnt include the lucene
> source.
> Because of this, a "top-level" dev-tools eclipse configuration would
> not be the composition of lucene+solr, instead it would be a totally
> different thing.

Solr would just include the whole tree.  Lucene could then just deliver Lucene.

> 
> So I don't think this is useful: dev-tools is for developers,

Right.  People who take the source are developers, no?  As it is now, we ship 
them a broken build system.

> and
> developers are all using the big /trunk checkout, so we don't need
> dev-tools at a lower level, for no good reason.
> 
> Honestly I could care less about making it easy for someone to
> configure lucene or solr by itself in their IDE. I did the eclipse
> work (for example) to make it easier for people to contribute to
> lucene/solr, I could care less about making it easier for people to
> configure their "own private copies" of lucene or solr easier, and I'm
> definitely not going to let it make it *harder* on us to support
> contributions (the top-level /dev-tools).
> 

Yes, but isn't the way people start making contributions at first by taking the 
source from a release and working on it?Isn't that the point of the src 
release?  (Other than the ASF requires it)

> This is becoming a slippery slope fast... Uwe's perspective is
> starting to become much more attractive.
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2992) Changes.html is not generated for an svn export of docs

2011-03-25 Thread Grant Ingersoll (JIRA)

Changes.html is not generated for an svn export of docs
---

 Key: LUCENE-2992
 URL: https://issues.apache.org/jira/browse/LUCENE-2992
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.1, 3.2, 4.0
Reporter: Grant Ingersoll
Priority: Minor


When we svn-export for release, the index.html at the top level expects 
Changes.html in the docs, which is generated, so we should create it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll

On Mar 25, 2011, at 10:27 AM, Robert Muir wrote:

> On Fri, Mar 25, 2011 at 10:18 AM, Mark Miller  wrote:
>> Well, actually I think we should just make it completely unsupported. These 
>> are our dev tools - don't count on them for crap. No reason to exclude them 
>> from the src IMO.
>> 
> 
> For the solr release, I think I could be ok with that (my concerns are
> more that later someone will say, how did this eclipse stuff etc slip
> into the release?). I know some people hesitated to add support for
> IDEs for this reason, I was for it as I want to make contributions
> easier, but I don't want us to look at it as making releasing harder.
> 

+1

> For the lucene release, I'm definitely against it: nothing in there
> will work at all because the lucene release doesn't include the solr
> bits. I know its been mentioned in this thread that maybe we should
> look at a single source artifact for everything, I don't think we
> should do this either.

I do think we need standalone artifacts.  So, I suppose if we do that, then we 
can't just svn export, b/c we would need to separate dev tools per project.  
But, then again, why can't we have:
/dev-tools/
/lucene/dev-tools
/solr/dev-tools

The top level just creates IDE that includes the lower ones, but the lower ones 
can each be standalone. (This goes for the Maven stuff too).

I realize, of course, this is work, so my suggestion would be we do 3.1 w/ it 
included as is and then fix in the next release.

> 
> I think its important that lucene stays a standalone search engine
> library from the artifact point of view, even if our development is in
> sync with solr.
> 

I agree.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll


On Mar 24, 2011, at 5:27 PM, Uwe Schindler wrote:

> OK, let's vote. My vote: -1

Care to say why?  Standard practice for a -1 is to say why you don't want it so 
that it might be possible to address the concerns you have.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2155) Geospatial search using geohash prefixes

2011-03-25 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-2155:
-

Assignee: Grant Ingersoll

> Geospatial search using geohash prefixes
> 
>
> Key: SOLR-2155
> URL: https://issues.apache.org/jira/browse/SOLR-2155
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>    Assignee: Grant Ingersoll
> Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, 
> GeoHashPrefixFilter.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on 
> documents that have a variable number of points.  This scenario occurs when 
> there is location extraction (i.e. via a "gazateer") occurring on free text.  
> None, one, or many geospatial locations might be extracted from any given 
> document and users want to limit their search results to those occurring in a 
> user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr 
> with a geohash prefix based filter.  A geohash refers to a lat-lon box on the 
> earth.  Each successive character added further subdivides the box into a 4x8 
> (or 8x4 depending on the even/odd length of the geohash) grid.  The first 
> step in this scheme is figuring out which geohash grid squares cover the 
> user's search query.  I've added various extra methods to GeoHashUtils (and 
> added tests) to assist in this purpose.  The next step is an actual Lucene 
> Filter, GeoHashPrefixFilter, that uses these geohash prefixes in 
> TermsEnum.seek() to skip to relevant grid squares in the index.  Once a 
> matching geohash grid is found, the points therein are compared against the 
> user's query to see if it matches.  I created an abstraction GeoShape 
> extended by subclasses named PointDistance... and CartesianBox to support 
> different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-25 Thread Grant Ingersoll



On Mar 24, 2011, at 5:27 PM, Uwe Schindler wrote:

> OK, let's vote. My vote: -1

Mine is +1 as long as we mark it as experimental.

> 
> If we resping, can we commit the last changes from branch 3.x that are bugs?

I would think so.

> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
>> -Original Message-
>> From: Grant Ingersoll [mailto:gsing...@apache.org]
>> Sent: Thursday, March 24, 2011 10:09 PM
>> To: dev@lucene.apache.org
>> Subject: Re: [VOTE] Release Lucene/Solr 3.1
>> 
>> So, my sense is here that we should fix these minor documentation issues
>> and decide on dev-tools and spin a new RC and get this sucker out the
> door.
>> I think I have some time tomorrow, I can generate the artifacts.
>> 
>> Shall we vote on inclusion of dev-tools?
>> 
>> -Grant
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
>> commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-24 Thread Grant Ingersoll

So, my sense is here that we should fix these minor documentation issues and 
decide on dev-tools and spin a new RC and get this sucker out the door.  I 
think I have some time tomorrow, I can generate the artifacts.  

Shall we vote on inclusion of dev-tools?

-Grant


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-24 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010866#comment-13010866
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

I'll fix it, Doron.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-24 Thread Grant Ingersoll


On Mar 23, 2011, at 6:14 PM, Chris Hostetter wrote:

> 
> : Please vote to release the artifacts at
> : http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2
> 
> -0
> 
> I can't in good conscience vote for these artifacts.
> 
> For the most part, there only only a few minor hicups -- but the big 
> blocker (in my opinion) is that since RC1, dev-tools has been removed from 
> the solr src packages and this causes the top level build.xml (and 
> instructions for IDE users in the top level README.txt file) to be broken.
> 
> My detailed notes below...
> 
> ##
> ### apache-solr-3.1.0-src.tgz
> 
> dev-tools isn't in here -- this totally boggles my mind, particularly 
> since there was a deliberate and concious switch to make the source 
> releases match what you get when doing an "svn export"
> 
> because dev-tools is missing, 3 of the top level ant targets advertised 
> using "ant -p" don't work; including 'ant idea' and 'ant eclipse' which 
> are also explicitly mentioned in the top level README.txt as how people 
> using those IDEs should get started developing the code.
> 
> This seems like a major issue to me.   

Yeah, I really don't get why we can't include them either in the source 
release.  

> 
> we're setting ourselves up to make the release look completely broken 
> right out of the gate for anyone using one of those IDEs.
> 
> Ask about this on IRC.  yonik & ryan indicated that a couple of folks had 
> said they would veto any release with dev-tools in it because that stuff 
> is suppose to be "unsupported" ... this makes no sense to me as we have 
> lots of places in the code base where things are documented as being 
> experimental, subject to change, and/or for developer use only.  i don't 
> relaly see how dev-tools should be any different.
> 
> if there is really such violent oposition to including dev-tools in src 
> releases, then the top level build.xml should not depend on it, and the 
> top level README.txt should not refer to it (except maybe with something 
> like "people interested in hacking on the src should use svn which 
> includes some unofficial 'dev-tools'"
> ---
> 
> Now that the src packages are driven by svn exports, more files exist then 
> were in RC1 and some of the changes we made to the solr/README.txt based 
> on the earlier release candidates are missleading.  
> 
> In particular a lot of things are listed as being in the "docs" directory 
> of a binary distribution, but those files *do* exist in the src packages 
> -- if you look in the "site" directory.  This seems silly, but at no point 
> is the README.txt factually incorrect, so I guess it's not a big enough 
> deal to worry about.
> 
> ---
> 
> running all tests, running the example, and building the javadocs all 
> worked fine.
> 
> ##
> ### apache-solr-3.1.0.tgz
> 
> docs look good, basic example usage works fine.
> 
> ##
> ### apache-solr-3.1.0.zip
> 
> Diffing the contents of apache-solr-3.1.0.tgz with apache-solr-3.1.0.zip 
> (using "diff --ignore-all-space --strip-trailing-cr -r") turned up a quite 
> a fiew instances where the CRLF fixing in build.xml seems to have 
> corrupted some non-ascii characters in a few files
> 
> contrib/dataimporthandler/lib/activation-LICENSE.txt 
> contrib/dataimporthandler/lib/mail-LICENSE.txt
> docs/skin/CommonMessages_de.xml
> docs/skin/CommonMessages_es.xml
> docs/skin/CommonMessages_fr.xml
> example/solr/conf/velocity/facet_dates.vm
> 
> ...but these changes don't seem to have substantively harmed the files.
> 
> ##
> ### lucene-3.1.0-src.tar.gz
> 
> tests and javadocs worked fine.
> 
> ##
> ### lucene-3.1.0.tar.gz
> 
> docs look good, demo runs fine.
> 
> ##
> ### lucene-3.1.0.zip
> 
> no differences found with lucene-3.1.0.tar.gz
> 
> 
> 
> 
> 
> -Hoss
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll



On Mar 23, 2011, at 9:20 AM, Dawid Weiss wrote:

> Sure, I'll change it. Can I alter branch_3x too?

That's fine to change 3_x, the 3.1 release is on lucene_solr_3_1 (or something 
similar).  This way it will be on in 3.2.

-Grant

> Don't know what the
> policy is after the RCs have been published.
> 
> Dawid
> 
> On Wed, Mar 23, 2011 at 2:07 PM, Grant Ingersoll  wrote:
>> Hey Dawid,
>> 
>> Thanks for doing this.  It would be good, too, if we no longer had to pass 
>> in -Dsolr.clustering.enabled=true as there is no reason why we can't just 
>> have it on like the other components.
>> 
>> -Grant
>> 
>> On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:
>> 
>>> Author: dweiss
>>> Date: Tue Mar 22 20:44:21 2011
>>> New Revision: 1084345
>>> 
>>> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev
>>> Log:
>>> Removing the note about excluded JARs (everything is included).
>>> 
>>> Modified:
>>>lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
>>> 
>>> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
>>> URL: 
>>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff
>>> ==
>>> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
>>> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
>>> 20:44:21 2011
>>> @@ -1183,12 +1183,10 @@
>>> 
>>>http://wiki.apache.org/solr/ClusteringComponent
>>> 
>>> -   This relies on third party jars which are notincluded in the
>>> -   release.  To use this component (and the "/clustering" handler)
>>> -   Those jars will need to be downloaded, and you'll need to set
>>> -   the solr.cluster.enabled system property when running solr...
>>> +   You'll need to set the solr.cluster.enabled system property
>>> +   when running solr to run with clustering enabled:
>>> 
>>> -  java -Dsolr.clustering.enabled=true -jar start.jar
>>> +   java -Dsolr.clustering.enabled=true -jar start.jar
>>> -->
>>>   >>enable="${solr.clustering.enabled:false}"
>>> 
>>> 
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1084345 - /lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml

2011-03-23 Thread Grant Ingersoll

Hey Dawid,

Thanks for doing this.  It would be good, too, if we no longer had to pass in 
-Dsolr.clustering.enabled=true as there is no reason why we can't just have it 
on like the other components.

-Grant

On Mar 22, 2011, at 4:44 PM, dwe...@apache.org wrote:

> Author: dweiss
> Date: Tue Mar 22 20:44:21 2011
> New Revision: 1084345
> 
> URL: http://svn.apache.org/viewvc?rev=1084345&view=rev
> Log:
> Removing the note about excluded JARs (everything is included).
> 
> Modified:
>lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
> 
> Modified: lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml
> URL: 
> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml?rev=1084345&r1=1084344&r2=1084345&view=diff
> ==
> --- lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml (original)
> +++ lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Tue Mar 22 
> 20:44:21 2011
> @@ -1183,12 +1183,10 @@
> 
>http://wiki.apache.org/solr/ClusteringComponent
> 
> -   This relies on third party jars which are notincluded in the
> -   release.  To use this component (and the "/clustering" handler)
> -   Those jars will need to be downloaded, and you'll need to set
> -   the solr.cluster.enabled system property when running solr...
> +   You'll need to set the solr.cluster.enabled system property 
> +   when running solr to run with clustering enabled:
> 
> -  java -Dsolr.clustering.enabled=true -jar start.jar
> +   java -Dsolr.clustering.enabled=true -jar start.jar
> -->
>   enable="${solr.clustering.enabled:false}"
> 
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.1

2011-03-22 Thread Grant Ingersoll

Overall, things look good to me. 

As discussed on IRC, one minor nit:
1. In the source bundle, the Changes.html is missing and so index.html has dead 
links.  I know Changes.html is generated.  We could just hook this into the svn 
export target and then I think the docs would be whole.

I guess I'd say +1 at this point.  Sigs look good, examples look good for both 
Solr and Lucene.  Maven artifacts look reasonable at a glance.

-Grant

On Mar 22, 2011, at 10:21 AM, Yonik Seeley wrote:

> Please vote to release the artifacts at
> http://people.apache.org/~yonik/staging_area/lucene-solr-3.1RC2
> as Lucene 3.1 and Solr 3.1
> 
> Thanks for everyone's help pulling all this together!
> 
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs

2011-03-22 Thread Grant Ingersoll (JIRA)

Review and potentially remove unused/unsupported Contribs
-

 Key: LUCENE-2981
 URL: https://issues.apache.org/jira/browse/LUCENE-2981
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
 Fix For: 3.2, 4.0


Some of our contribs appear to be lacking for development/support or are 
missing tests.  We should review whether they are even pertinent these days and 
potentially deprecate and remove them.

One of the things we did in Mahout when bringing in Colt code was to mark all 
code that didn't have tests as @deprecated and then we removed the deprecation 
once tests were added.  Those that didn't get tests added over about a 6 mos. 
period of time were removed.

I would suggest taking a hard look at:
ant
db
lucli
swing

(spatial should be gutted to some extent and moved to modules)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-21 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009169#comment-13009169
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

Third time is the charm.  I don't really care where it lives and it sounds like 
tools makes sense.  Not sure why I didn't notice that sooner.  I'll take care 
of it later today.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] Resolved: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-19 Thread Grant Ingersoll


On Mar 18, 2011, at 9:55 PM, Chris Hostetter wrote:

> 
> Just out of curiosity, i notice that it seemms to be warning if the number 
> of NOTICE files doesn't match the number of jars, but not failing (since 
> not every jar requires a NOTICE file)
> 
> isn't this something that could be simplifed by requiring that every jar 
> have a NOTICE file, and if the jar's license doesn't require a NOTICE, 
> then that file can just be blank?  (or make the code verify that every jar 
> has either a NOTICE file or an empty NO_NOTICE_NEEDED file?)

I could drop that warning, as I do have explicit NOTICE checking in place.  I 
kept it b/c I am not 100% sure yet on what needs a NOTICE file and what 
doesn't, so I wanted people to keep an eye out for it.

I don't think we need to clutter the dirs with NO_NOTICE_NEEDED files.  


> 
> : Date: Fri, 18 Mar 2011 21:33:29 + (UTC)
> : From: "Grant Ingersoll (JIRA)" 
> : Reply-To: dev@lucene.apache.org
> : To: dev@lucene.apache.org
> : Subject: [jira] Resolved: (LUCENE-2952) Make license checking/maintenance
> : easier/automated
> : 
> : 
> :  [ 
> https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
> : 
> : Grant Ingersoll resolved LUCENE-2952.
> : -
> : 
> :Resolution: Fixed
> : Fix Version/s: 4.0
> :3.2
> :  Assignee: Grant Ingersoll
> : 
> : > Make license checking/maintenance easier/automated
> : > --
> : >
> : > Key: LUCENE-2952
> : > URL: https://issues.apache.org/jira/browse/LUCENE-2952
> : > Project: Lucene - Java
> : >  Issue Type: Improvement
> : >Reporter: Grant Ingersoll
> : >Assignee: Grant Ingersoll
> : >Priority: Minor
> : > Fix For: 3.2, 4.0
> : >
> : > Attachments: LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch
> : >
> : >
> : > Instead of waiting until release to check licenses are valid, we should 
> make it a part of our build process to ensure that all dependencies have 
> proper licenses, etc.
> : 
> : --
> : This message is automatically generated by JIRA.
> : For more information on JIRA, see: http://www.atlassian.com/software/jira
> : 
> : -
> : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> : For additional commands, e-mail: dev-h...@lucene.apache.org
> : 
> : 
> 
> -Hoss
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene Solr 3.1 RC1

2011-03-18 Thread Grant Ingersoll

FYI, the placeholder for the CMS site is: 
http://svn.apache.org/repos/asf/lucene/cms/

You can simply check in there and you will see updates in the staging area.

On Mar 18, 2011, at 5:47 AM, Upayavira wrote:

> 
> 
> On Thu, 17 Mar 2011 23:36 -0400, "Mark Miller" 
> wrote:
>> 
>> On Mar 17, 2011, at 11:13 PM, Chris Hostetter wrote:
>> 
>>> patches welcome!
>> 
>> Sometimes you have to slash and burn to clean out the old under brush.
>> My patch would simply excise forrest and the website. Then, reveling in
>> the success of that great improvement, I'd sit back, take stalk and see
>> what came along.
>> 
>> But it looks like others are winding along on another track - so I'm
>> happy to let them go about it and see where we land.
>> 
>> Apache's home brew CMS scares me until proven otherwise.
> 
> I hope to get to use the homebrew CMS soon. My impression from watching
> folks use it is that, while the (internal) interface isn't pretty, it
> simply does the job. That is, once people start using it, I don't hear
> any complaints. And for a relatively new piece of software, that strikes
> me as a success.
> 
> Upayavira
> --- 
> Enterprise Search Consultant at Sourcesense UK, 
> Making Sense of Open Source
> 
> 
> ---------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Closed: (SOLR-484) Solr Website changes

2011-03-18 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll closed SOLR-484.


Resolution: Won't Fix

> Solr Website changes
> 
>
> Key: SOLR-484
> URL: https://issues.apache.org/jira/browse/SOLR-484
> Project: Solr
>  Issue Type: Bug
>  Components: documentation
>    Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-484.patch
>
>
> In looking at the Solr website it has many of the same issues that Lucene 
> Java did when it comes to ASF policies about nightly builds, etc. concerning 
> the Javadocs  
> See 
> http://lucene.markmail.org/message/a7k7kujxkhwjwfy6?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22&page=1
> and 
> http://lucene.markmail.org/message/vaks6omed4l6buth?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22&page=1
> This would suggest a change like Hadoop and Lucene Java did to separate out 
> the main site, release docs (javadocs, any other?) and developer resources.  
> Currently the javadocs on the main page are the nightly and should be made 
> less prominent.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-18 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-2952.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2
 Assignee: Grant Ingersoll

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-18 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008580#comment-13008580
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

OK, I shuffled some things around, putting the code in test-framework and made 
the appropriate changes to the builds.  Will now backport to 3_x (but not 3.1)

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-18 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008537#comment-13008537
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

I'm just going to move to the test-framework.  As Robert points out, if in the 
future we get more sophisticated about checking the classpath/libs, it will fit 
well there.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-18 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008513#comment-13008513
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

Actually, the more I think about it, it doesn't belong in modules either.

I'm inclined to say a new top level dir called committer-tools (slightly 
different from dev-tools which are redistributed.  committer-tools are not)

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Issue Comment Edited: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-18 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008473#comment-13008473
 ] 

Grant Ingersoll edited comment on LUCENE-2952 at 3/18/11 3:31 PM:
--

I'm fine w/ moving it out of dev-tools.  I'm not sure about test-framework, 
which I see more as something people building applications on Lucene/Solr use 
to test their applications on.

How about we put it in modules?  As in modules/validation?  It is, after all, 
pertinent to both L & S.

  was (Author: gsingers):
I'm fine w/ moving it out of dev-tools.  I'm not sure about test-framework, 
which I see more as something people building applications on Lucene/Solr use 
to test their applications on.

How about we put it in modules?  As in modules/validation?
  
> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-18 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008473#comment-13008473
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

I'm fine w/ moving it out of dev-tools.  I'm not sure about test-framework, 
which I see more as something people building applications on Lucene/Solr use 
to test their applications on.

How about we put it in modules?  As in modules/validation?

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene Solr 3.1 RC1

2011-03-18 Thread Grant Ingersoll

On Mar 17, 2011, at 11:36 PM, Mark Miller wrote:

> 
> On Mar 17, 2011, at 11:13 PM, Chris Hostetter wrote:
> 
>> patches welcome!
> 
> Sometimes you have to slash and burn to clean out the old under brush.
> My patch would simply excise forrest and the website. Then, reveling in the 
> success of that great improvement, I'd sit back, take stalk and see what came 
> along.
> 
> But it looks like others are winding along on another track - so I'm happy to 
> let them go about it and see where we land.
> 
> Apache's home brew CMS scares me until proven otherwise.

Nah, it's actually pretty nice.  Dead simple: Markdown + SVN + some webbased 
management.  We use it for OpenNLP.   I have started the process of converting 
us, but need time to get Forrest's XDOC to Markdown figured out.  From there, I 
want to move all the sites (Lucene, Solr, Open Rel and PyLucene) onto it and 
out into a single SVN tree structure separate from the dev trees.  I've already 
scoped that part out.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene Solr 3.1 RC1

2011-03-17 Thread Grant Ingersoll


On Mar 17, 2011, at 3:53 PM, Chris Hostetter wrote:

> 
> * CHANGES.txt says we are using Tika 0.8-SNAPSHOT and UIMA 2.3.1-SNAPSHOT, 
> but when i look at the actual jars there is no indication that these are 
> snapshots...


It should be TIKA-0.8.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Assigned: (SOLR-1942) Ability to select codec per field

2011-03-17 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1942:
-

Assignee: Grant Ingersoll

> Ability to select codec per field
> -
>
> Key: SOLR-1942
> URL: https://issues.apache.org/jira/browse/SOLR-1942
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.0
>Reporter: Yonik Seeley
>    Assignee: Grant Ingersoll
> Fix For: 4.0
>
> Attachments: SOLR-1942.patch, SOLR-1942.patch, SOLR-1942.patch, 
> SOLR-1942.patch, SOLR-1942.patch, SOLR-1942.patch, SOLR-1942.patch
>
>
> We should use PerFieldCodecWrapper to allow users to select the codec 
> per-field.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2971) Auto Generate our LICENSE.txt and NOTICE.txt files

2011-03-17 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007987#comment-13007987
 ] 

Grant Ingersoll commented on LUCENE-2971:
-

Thanks for the pointers, that should definitely be helpful if and when we add 
this.

> Auto Generate our LICENSE.txt and NOTICE.txt files
> --
>
> Key: LUCENE-2971
> URL: https://issues.apache.org/jira/browse/LUCENE-2971
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
>
> Once LUCENE-2952 is in place, we should be able to automatically generate 
> Lucene and Solr's LICENSE.txt and NOTICE.txt file (without massive 
> duplication)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1082520 - in /lucene/dev/trunk/solr/lib: servlet-api-LICENSE-ASL.txt servlet-api-LICENSE-SUN.txt servlet-api-NOTICE.txt

2011-03-17 Thread Grant Ingersoll

If its ASL, it requires a NOTICE, is my understanding.  Some of these are 
admittedly redundant/overkill since we have a ton of ASF code in here, but I'd 
like to keep a 1-1 for every jar, that way we know what every JAR is and 
whether it requires one (see for instance all the commons jars).  From that, we 
can then make a decision about what belongs in the official LICENSE and NOTICE 
files.   Also, the dependency checker code just does 1-1 checks on each jar, so 
if it is ASL it expects a NOTICE to be alongside it.

As for auto-generating, I think we will be able to de-dup before generating so 
it shouldn't be a problem.  I haven't written any code to auto-gen it yet, so 
it is still a manual process.

-Grant

On Mar 17, 2011, at 11:21 AM, Yonik Seeley wrote:

> On Thu, Mar 17, 2011 at 11:05 AM,   wrote:
>> Added: lucene/dev/trunk/solr/lib/servlet-api-NOTICE.txt
>> URL: 
>> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/lib/servlet-api-NOTICE.txt?rev=1082520&view=auto
>> ==
>> --- lucene/dev/trunk/solr/lib/servlet-api-NOTICE.txt (added)
>> +++ lucene/dev/trunk/solr/lib/servlet-api-NOTICE.txt Thu Mar 17 15:05:44 2011
>> @@ -0,0 +1,5 @@
>> +Apache Tomcat
>> +Copyright 1999-2007 The Apache Software Foundation
>> +
>> +This product includes software developed by
>> +The Apache Software Foundation (http://www.apache.org/).
> 
> Why the NOTICE file here?
> Hopefully this won't be used to automatically generate the NOTICE file for 
> Solr?
> I guess it's fine to keep just to document that there were no
> additional required notices due to the jar, but we shouldn't throw it
> all in our NOTICE file.
> 
> -Yonik
> http://lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2971) Auto Generate our LICENSE.txt and NOTICE.txt files

2011-03-17 Thread Grant Ingersoll (JIRA)

Auto Generate our LICENSE.txt and NOTICE.txt files
--

 Key: LUCENE-2971
 URL: https://issues.apache.org/jira/browse/LUCENE-2971
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


Once LUCENE-2952 is in place, we should be able to automatically generate 
Lucene and Solr's LICENSE.txt and NOTICE.txt file (without massive duplication)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2971) Auto Generate our LICENSE.txt and NOTICE.txt files

2011-03-17 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2971:


Fix Version/s: 4.0
   3.2

> Auto Generate our LICENSE.txt and NOTICE.txt files
> --
>
> Key: LUCENE-2971
> URL: https://issues.apache.org/jira/browse/LUCENE-2971
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
>
> Once LUCENE-2952 is in place, we should be able to automatically generate 
> Lucene and Solr's LICENSE.txt and NOTICE.txt file (without massive 
> duplication)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-17 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

I think this is ready to go.  It checks licenses, it checks notices.  It leaves 
room for other validation tasks (version conflicts, etc.)  It is fast.  It is 
only called for each top dir: lucene, modules, solr (there is one extra call 
when modules/benchmark gets called, but I can live with it).

I believe all LICENSE, NOTICE files are properly set now.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-17 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

latest patch

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Licenses files, Notice files and LUCENE-2952

2011-03-17 Thread Grant Ingersoll

I will probably submit a patch to RAT at some point for this, but wanted to 
prove it out here first as we have an immediate need and I don't want to wait 
for a RAT release  nor do I have the time to learn the RAT code at this point.  
Others are welcome to port it.

-Grant

On Mar 17, 2011, at 1:58 AM, Mark Miller wrote:

> We do use Apache RAT and it does not do these kinds of license checks.
> 
> On Mar 16, 2011, at 9:19 PM, Mattmann, Chris A (388J) wrote:
> 
>> Have you guys thought about using Apache RAT [1]?
>> 
>> It's not perfect but it implements a lot of license checks, and as far as I 
>> know, integrates nicely into Ant and Maven.
>> 
>> Cheers,
>> Chris
>> 
>> [1] http://incubator.apache.org/rat/
>> 
>> On Mar 16, 2011, at 5:54 PM, Robert Muir wrote:
>> 
>>> On Wed, Mar 16, 2011 at 3:57 PM, Grant Ingersoll  
>>> wrote:
>>>> As Robert can no doubt attest, we often scramble to make sure i's are 
>>>> dotted and t's are crossed when it comes to filling out LICENSE.txt and 
>>>> NOTICE.txt right before releases, thereby burdening the RM with way too 
>>>> much work in validating what dependency has which license.  Thus, we've 
>>>> been working to resolve this.
>>>> 
>>>> In prep for the landing of LUCENE-2952 and to make life easier on release 
>>>> managers going forward, we've adopted the following conventions for 
>>>> dealing with licenses:
>>>> 
>>>> 1. For every dependency (i.e. jar file), there needs to be a corresponding 
>>>> file-LICENSE-.txt file, as in: foo-2.3.1.jar has the 
>>>> corresponding foo-LICENSE-BSD.txt file (assuming foo is BSD licensed) in 
>>>> the same directory as the jar file.
>>>> 
>>>> 2.  _IF_ the license requires a NOTICE entry, then there must be a file of 
>>>> the name file-NOTICE.txt, as in foo-NOTICE.txt.
>>>> 
>>>> Failing to meet either one will break the build once L-2952 is committed 
>>>> (which should be soon for trunk and will be backported to 3.2).
>>>> 
>>>> Consider yourself notified.
>>> 
>>> +1
>>> 
>>> I think we can all agree, we want our licensing to be "rock-solid" and
>>> we should strive to raise the standards here for our project. Its
>>> actually more important than if our code even compiles.
>>> 
>>> Automated checks go a long way, thank you Grant for working on this,
>>> because we have a lot of third-party dependencies and its difficult to
>>> verify that everything is in proper order.
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> 
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> - Mark Miller
> lucidimagination.com
> 
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2968) SurroundQuery doesn't support SpanNot

2011-03-17 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007901#comment-13007901
 ] 

Grant Ingersoll commented on LUCENE-2968:
-

spn works for me, or simply ! maybe.

bq. This could also be an opportunity to port Surround to the new query parser 
in Lucene.

That's up to you.

> SurroundQuery doesn't support SpanNot
> -
>
> Key: LUCENE-2968
> URL: https://issues.apache.org/jira/browse/LUCENE-2968
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
>
> It would be nice if we could do span not in the surround query, as they are 
> quite useful for keeping searches within a boundary (say a sentence)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Licenses files, Notice files and LUCENE-2952

2011-03-16 Thread Grant Ingersoll

As Robert can no doubt attest, we often scramble to make sure i's are dotted 
and t's are crossed when it comes to filling out LICENSE.txt and NOTICE.txt 
right before releases, thereby burdening the RM with way too much work in 
validating what dependency has which license.  Thus, we've been working to 
resolve this.

In prep for the landing of LUCENE-2952 and to make life easier on release 
managers going forward, we've adopted the following conventions for dealing 
with licenses:

1. For every dependency (i.e. jar file), there needs to be a corresponding 
file-LICENSE-.txt file, as in: foo-2.3.1.jar has the 
corresponding foo-LICENSE-BSD.txt file (assuming foo is BSD licensed) in the 
same directory as the jar file.

2.  _IF_ the license requires a NOTICE entry, then there must be a file of the 
name file-NOTICE.txt, as in foo-NOTICE.txt.

Failing to meet either one will break the build once L-2952 is committed (which 
should be soon for trunk and will be backported to 3.2).

Consider yourself notified.  

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory

2011-03-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007627#comment-13007627
 ] 

Grant Ingersoll commented on SOLR-1725:
---

bq. As time passes, the case for moving to Java 6 increases.

Solr trunk is on 1.6.

> Script based UpdateRequestProcessorFactory
> --
>
> Key: SOLR-1725
> URL: https://issues.apache.org/jira/browse/SOLR-1725
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Uri Boness
> Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
> SOLR-1725.patch, SOLR-1725.patch
>
>
> A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
> support). The main goal of this plugin is to be able to configure/write 
> update processors without the need to write and package Java code.
> The update request processor factory enables writing update processors in 
> scripts located in {{solr.solr.home}} directory. The functory accepts one 
> (mandatory) configuration parameter named {{scripts}} which accepts a 
> comma-separated list of file names. It will look for these files under the 
> {{conf}} directory in solr home. When multiple scripts are defined, their 
> execution order is defined by the lexicographical order of the script file 
> name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
> The script language is resolved based on the script file extension (that is, 
> a *.js files will be treated as a JavaScript script), therefore an extension 
> is mandatory.
> Each script file is expected to have one or more methods with the same 
> signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
> *not* required to define all methods, only those hat are required by the 
> processing logic.
> The following variables are define as global variables for each script:
>  * {{req}} - The SolrQueryRequest
>  * {{rsp}}- The SolrQueryResponse
>  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2968) SurroundQuery doesn't support SpanNot

2011-03-15 Thread Grant Ingersoll (JIRA)

SurroundQuery doesn't support SpanNot
-

 Key: LUCENE-2968
 URL: https://issues.apache.org/jira/browse/LUCENE-2968
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


It would be nice if we could do span not in the surround query, as they are 
quite useful for keeping searches within a boundary (say a sentence)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

This minimizes the number of calls to validate (there is still one extra call 
via the benchmark module since it invokes the common lucene compile target).  
Also splits it out into Lucene, Solr and Modules.

I'd consider it close to good enough at this point.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

This hooks it into compile-core, but has the unfortunate side-effect of being 
called a whole bunch of times, which is not good.  Need to read up on how to 
avoid that in ant (or if anyone has suggestions, that would be great).

Otherwise, I think the baseline functionality is ready to go.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch, 
> LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-15 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

Pretty close to standalone completion.  Next step to hook it in.  I'm going to 
commit the license naming normalization now but not the validation code yet.

Also, renamed LicenseChecker to DependencyChecker as it might be useful for 
checking other things like that all jars have version numbers.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2427) UIMA jars are missing version numbers

2011-03-15 Thread Grant Ingersoll (JIRA)

UIMA jars are missing version numbers
-

 Key: SOLR-2427
 URL: https://issues.apache.org/jira/browse/SOLR-2427
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Priority: Trivial


We should have version numbers on the UIMA jar files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene & Solr a one way street?

2011-03-13 Thread Grant Ingersoll

On Mar 13, 2011, at 11:47 AM, Robert Muir wrote:

> On Sun, Mar 13, 2011 at 11:47 AM, Grant Ingersoll  wrote:
>> I guess the question people w/ Solr only hats on have (if there are such 
>> people), is which way is that street going?  It seems like most people want 
>> to pull stuff out of Solr, but they don't seem to want to put into it.  
>> That's probably where some of the resistance comes from.  If you want to 
>> modularize everything so that you can consume it outside of Solr, it usually 
>> means you don't use Solr, which sometimes comes across that you don't care 
>> if the modularization actually has a negative effect on Solr.  I'm all for 
>> modularization and enabling everyone, but not at the cost of loss of 
>> performance in Solr.  As tightly coupled as Solr is, it's pretty damn fast 
>> and resilient.  Show me that you keep that whole and I'll be +1 on 
>> everything.
> 
> Do you have any facts to back up these baseless accusations?

I apologize.  I didn't attend to accuse anyone if it was read that way.  If you 
read earlier, I actually thought the whole merge is going well and that their 
is some pretty good cross-fertilization going on.  If I didn't properly convey 
it here, the accusations are actually against those who have only Solr hats on. 
 Hint, I ain't one of them.  It is a concern I've heard from people in the 
"don't poach Solr camp".  I don't think it's the right attitude, but I do think 
it is worth mentioning the concern.I really see Lucene/Solr as a broad 
continuum of enabling technologies and really there isn't one or the other in 
my mind.

> 
> Because I'll tell you how its "seems" to me: lucene committers are
> going well beyond whats required (fixing solr) to commit changes to
> lucene.

I totally agree.  The sum of the parts is really awesome now.

> 
> Take a look at the commits list, we are the ones doing Solr's dirty work:
> * Like Uwe Schindler fixing up tons of XML related bugs in Solr,
> fixing analysis.jsp and the related request handlers.
> * Like Simon Willnauer doing the necessary improvements to IndexReader
> such that SolrIndexReader need not exist, and trying to add good codec
> support to Solr so it can take advantage of flexible indexing.

Yep and he should commit those when he is ready.  

I heartily agree this is great work.

> 
> And I guess i didnt "put any effort into solr" when i spent a huge
> chunk of this weekend tracking down jre crashes and test bugs in a
> Solr cloud test?!

I never said you didn't.  I am totally in awe of the work you are doing.  I 
wish I had half the energy and focus of some of the people who commit on a 
regular basis.

> 
> As far as modularization having a negative performance effect on Solr,
> how is this the case? Again do you have any concrete examples, or is
> this just more baseless accusations?

No, I don't.  I just said those are the concerns.  I tend to agree that they 
are unfounded.

> 
> Do you have specific benchmarks to where solr's analysis is now
> somehow slower due to the refactoring (since this is the only
> modularization thats happened from solr)?!
> Doesn't look slower to me:
> http://www.lucidimagination.com/search/document/46a8351089a98aec/protwords_txt_support_in_stemmers#46a8351089a98aec

Dude, I think the analysis modularization is awesome.  I'm about to begin 
porting it to OpenNLP for instance.  I wish it was more decoupled so I wouldn't 
have to bring all of Lucene core over and could just bring the analysis.  
Likewise for Mahout.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene & Solr a one way street?

2011-03-13 Thread Grant Ingersoll

On Mar 13, 2011, at 10:23 AM, Simon Willnauer wrote:

> Hey folks,
> 
> I have recently tried to push some refactorings towards moving stuff
> from Solr to modules land to enable users of Lucene to benefit from
> the developments that have been made in Solr land during the past with
> very little success. Actually, it was a really disappointing
> experience whenever  I tried to kick off issues towards this
> direction. On LUCENE-2308 David asked a good question why FieldType is
> not ported to Lucene rather than a new development.
> I replied with:
> 
> {quote}
> Moving stuff from Solr to Lucene involves lots of politics. It is way
> easier to let Solr adopt eventually than fight your way through the
> politics (this is my opinion though.)
> {quote}

I hadn't looked at 2308, but to me, if there are well written patches, then 
they should be considered.  More modules make a lot of sense to me, as long as 
everyone is kept whole and there are no performance losses.  Moving FTs to 
Lucene seems like a lot of work for little benefit, but maybe that is just me.  
Seems like most Lucene users like to roll their own in that stuff or use Spring.

> 
> Yet, while the answer to his question doesn't matter in this context
> but it raised some other question from Roberts side:
> 
> {quote}
> Then why do we still have merged codebases?
> If this is the way things are, then we should un-merge the two projects.
> 
> because as a lucene developer, i spend a lot of time trying to do my
> part to fix various things in Solr... if its a one-way-street then we
> need to un-merge.
> {quote}
> 
> The discussions on LUCENE-2883 changed my personal reception on how
> things work here quite dramatically. I lost almost all enthusiasm to
> even try to push developments towards moving things out of Solr and
> into modules since literally every movement in this direction starts a
> lot of politics (at least this is my understanding drawn from the
> rather non-technical disagreements I have seen folks mentioning on
> this issue). I don't care where those politics come from but in my
> opinion we need to find an agreement how we deal with "stealing"
> functionality form Solr and make them available to lucene users. My
> personal opinion is that any refactoring, consolidation of APIs etc.
> should NOT be influenced by the fact that they have been Solr private
> and "might" influence further development on solr with regards to
> backwards compatibility etc.
> 

I actually thought 2883 was a pretty good discussion.  The sum take away from 
it for me was "go for it".  One person was hesitant about it.   I think 
sometimes you need to just put up patches instead of having lengthy discussions.

> Moving features to modules should be first priority and please correct
> me if I am wrong this was one of the major reason why we merged the
> code base.

I don't think it is a first priority, but it is a benefit.  I also don't think 
it was the majority reason for the merge.  I think the majority reason was that 
most of the Solr committers were also Lucene committers and there was a fair 
amount of duplicated work and a desire to be on the same version.  
Modularization was/is also a benefit.

FWIW, I think the merge for the most part has been successful in most places.  
We have better tested code, faster code, etc.

> All users should benefit from the nice features which are
> currently hidden in the solr code base. FunctionQuery is a tiny one
> IMO and the frustration it caused on my side was immense. I don't even
> wanna try to suggest to make replication, faceting or even the cloud
> feature decoupled from Solr (I don't want to argue about the
> feasibility or the amount of work this would be here! Just lemme say
> one thing we can always branch and there is a large workforce out
> there that is willing to work on stuff like that).
> 
> I can only agree with robert that if this is a one way street that the
> merge makes no sense anymore.

I guess the question people w/ Solr only hats on have (if there are such 
people), is which way is that street going?  It seems like most people want to 
pull stuff out of Solr, but they don't seem to want to put into it.  That's 
probably where some of the resistance comes from.  If you want to modularize 
everything so that you can consume it outside of Solr, it usually means you 
don't use Solr, which sometimes comes across that you don't care if the 
modularization actually has a negative effect on Solr.  I'm all for 
modularization and enabling everyone, but not at the cost of loss of 
performance in Solr.  As tightly coupled as Solr is, it's pretty damn fast and 
resilient.  Show me that you keep that whole and I'll be +1 on everything.

You also have to keep in mind that some of these things, replication for 
instance, rely on Solr things.  Are you really going to more the HTTP protocols 
to just Lucene?  What does that even mean?  Lucene is a Java API.  It doesn't 
assume containers, etc.  Solr is th

[jira] Commented: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-11 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005666#comment-13005666
 ] 

Grant Ingersoll commented on LUCENE-2952:
-

Should note, I've only hooked it up for lucene/lib and solr/lib and not any of 
the modules or contrib.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-11 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

Here's some real progress on this.  Works in standalone mode, but is not hooked 
into the build process yet.

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch, LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-10 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2952:


Attachment: LUCENE-2952.patch

No where near being ready, but putting up something to flesh this out a little 
bit.  I don't think it even compiles yet. 

Idea:  Add dev-tools/validation and hook in code into it that does work to 
validate our systems for things like licenses, etc.  It will then be hooked in 
at compile time for both Lucene and Solr.

In this particular case, it will look for license files for each jar file and 
fail if one is missing.  This requires there to be, for every JAR file, a file 
with the same name and the name of the license.txt appended to it, as in 
foo.jar.BSD.txt or something like that (still being worked out)

> Make license checking/maintenance easier/automated
> --
>
> Key: LUCENE-2952
> URL: https://issues.apache.org/jira/browse/LUCENE-2952
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2952.patch
>
>
> Instead of waiting until release to check licenses are valid, we should make 
> it a part of our build process to ensure that all dependencies have proper 
> licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-09 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004655#comment-13004655
 ] 

Grant Ingersoll commented on LUCENE-2945:
-

The Query class already is cloneable so it needs to support what the QueryUtils 
is doing.  I think it is the anonymous inner class (or in my case, just the 
inner class) that is the one that matters for all of this.  It is an instance 
of Query and thus needs a proper equals/hashcode.  I don't really care about 
the outer containing classes other than I think it is a misnomer to call them 
Query classes when they really are factory classes for creating Lucene Queries.

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.1, 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
> LUCENE-2945.patch, LUCENE-2945.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: GSoC

2011-03-09 Thread Grant Ingersoll

 for legal reasons or similar
>>>>> things. Lets stick to the mailing list for all communication except
>>>>> you have something that should clearly not be public. This also give
>>>>> other contributors a chance to help and get interested in your work!!
>>>>> 
>>>>> simon
>>>>> 
>>>>>> David
>>>>>> 
>>>>>>> Hi David, honestly this sounds fantastic.
>>>>>>> 
>>>>>>> It would be great to have someone to work with us on this issue!
>>>>>>> 
>>>>>>> To date, progress is pretty slow-going (minor improvements, cleanups,
>>>>>>> additional stats here and there)... but we really need all the help
>>>>>>> we can get, especially from people who have a really good
>>>>>>> understanding of the various models.
>>>>>>> 
>>>>>>> In case you are interested, here are some references to discussions
>>>>>>> about adding more flexibility (with some prototypes etc):
>>>>>>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby
>>>>>>> _st eps _towards_making_lucene_s_scoring_more_flexible
>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2392
>>>>>>> 
>>>>>>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>>>>>>> 
>>>>>>>  wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I have already sent this mail to Simon Willnauer, and he suggested
>>>>>>>> me to post it here for discussion.
>>>>>>>> 
>>>>>>>> I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>>>>>>>> Budapest, Hungary. I am doing an IR-related research, and we have
>>>>>>>> considered using Lucene as our search engine. We were quite
>>>>>>>> satisfied with the speed and ease of use. However, we would like
>>>>>>>> to experiment with different ranking algorithms, and this is where
>>>>>>>> problems arise. Lucene only supports the VSM, and unfortunately
>>>>>>>> the ranking architecture seems to be tailored specifically to its
>>>>>>>> needs.
>>>>>>>> 
>>>>>>>> I would be very much interested in revamping the ranking component
>>>>>>>> as a GSoC project. The following modifications should be doable in
>>>>>>>> the allocated time frame:
>>>>>>>> - a new ranking class hierarchy, which is generic enough to allow
>>>>>>>> easy implementation of new weighting schemes (at least
>>>>>>>> bag-of-words ones), - addition of state-of-the-art ranking
>>>>>>>> methods, such as Okapi BM25, proximity and DFR models,
>>>>>>>> - configuration for ranking selection, with the old method as
>>>>>>>> default.
>>>>>>>> 
>>>>>>>> I believe all users of Lucene would profit from such a project. It
>>>>>>>> would provide the scientific community with an even more useful
>>>>>>>> research aid, while regular users could benefit from superior
>>>>>>>> ranking results.
>>>>>>>> 
>>>>>>>> Please let me know your opinion about this proposal.
>>>>>> 
>>>>>> -
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>> 
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>> 
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-09 Thread Grant Ingersoll

/browse/SOLR-2405 didn't make it in
>>>>> yesterday (apparently it didn't)? :-(  Darn... maybe I shouldn't have
>>> waited
>>>>> for a committer to agree with the issue. I would have had it in
>>> Saturday.
>>>>> 
>>>>> ~ David Smiley
>>>>> 
>>>>> On Mar 7, 2011, at 1:32 AM, Robert Muir wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have posted a release candidate for both Lucene 3.1 and Solr 3.1,
>>>>>> both from revision 1078688 of
>>>>>> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/
>>>>>> Thanks for all your help! Please test them and give your votes, the
>>>>>> tentative release date for both versions is Sunday, March 13th, 2011.
>>>>>> Only votes from Lucene PMC are binding, but everyone is welcome to
>>>>>> check the release candidates and voice their approval or disapproval.
>>>>>> The vote passes if at least three binding +1 votes are cast.
>>>>>> 
>>>>>> The release candidates are produced in parallel because in 2010 we
>>>>>> merged the development of Lucene and Solr in order to produce higher
>>>>>> quality releases. While we voted to reserve the right to release
>>>>>> Lucene by itself, in my opinion we should definitely try to avoid
>>> this
>>>>>> unless absolutely necessary, as it would ultimately cause more work
>>>>>> and complication: instead it would be far easier to just fix whatever
>>>>>> issues are discovered and respin both releases again.
>>>>>> 
>>>>>> Because of this, I ask that you cast a single vote to cover both
>>>>>> releases. If the vote succeeds, both sets of artifacts can go their
>>>>>> separate ways to the different websites.
>>>>>> 
>>>>>> Artifacts are located here: http://s.apache.org/solrcene31rc0
>>>>>> 
>>>>>> -
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> -
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>> 
>>>> 
>>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-08 Thread Grant Ingersoll

+1.  I downloaded them all, checked sigs, compiled, tested both Lucene and 
Solr, ran the Solr demo.

I also just want to thank Robert for the work he did on the licenses, etc.   We 
need to make the stuff that goes into releases more testable and verifiable.


On Mar 7, 2011, at 1:32 AM, Robert Muir wrote:

> Hi all,
> 
> I have posted a release candidate for both Lucene 3.1 and Solr 3.1,
> both from revision 1078688 of
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/
> Thanks for all your help! Please test them and give your votes, the
> tentative release date for both versions is Sunday, March 13th, 2011.
> Only votes from Lucene PMC are binding, but everyone is welcome to
> check the release candidates and voice their approval or disapproval.
> The vote passes if at least three binding +1 votes are cast.
> 
> The release candidates are produced in parallel because in 2010 we
> merged the development of Lucene and Solr in order to produce higher
> quality releases. While we voted to reserve the right to release
> Lucene by itself, in my opinion we should definitely try to avoid this
> unless absolutely necessary, as it would ultimately cause more work
> and complication: instead it would be far easier to just fix whatever
> issues are discovered and respin both releases again.
> 
> Because of this, I ask that you cast a single vote to cover both
> releases. If the vote succeeds, both sets of artifacts can go their
> separate ways to the different websites.
> 
> Artifacts are located here: http://s.apache.org/solrcene31rc0
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-07 Thread Grant Ingersoll

The Solr war (apache-solr-3.1.war) file isn't signed.  Can probably do it by 
hand.


On Mar 7, 2011, at 1:32 AM, Robert Muir wrote:

> Hi all,
> 
> I have posted a release candidate for both Lucene 3.1 and Solr 3.1,
> both from revision 1078688 of
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/
> Thanks for all your help! Please test them and give your votes, the
> tentative release date for both versions is Sunday, March 13th, 2011.
> Only votes from Lucene PMC are binding, but everyone is welcome to
> check the release candidates and voice their approval or disapproval.
> The vote passes if at least three binding +1 votes are cast.
> 
> The release candidates are produced in parallel because in 2010 we
> merged the development of Lucene and Solr in order to produce higher
> quality releases. While we voted to reserve the right to release
> Lucene by itself, in my opinion we should definitely try to avoid this
> unless absolutely necessary, as it would ultimately cause more work
> and complication: instead it would be far easier to just fix whatever
> issues are discovered and respin both releases again.
> 
> Because of this, I ask that you cast a single vote to cover both
> releases. If the vote succeeds, both sets of artifacts can go their
> separate ways to the different websites.
> 
> Artifacts are located here: http://s.apache.org/solrcene31rc0
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2952) Make license checking/maintenance easier/automated

2011-03-07 Thread Grant Ingersoll (JIRA)

Make license checking/maintenance easier/automated
--

 Key: LUCENE-2952
 URL: https://issues.apache.org/jira/browse/LUCENE-2952
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


Instead of waiting until release to check licenses are valid, we should make it 
a part of our build process to ensure that all dependencies have proper 
licenses, etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-07 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2945:


Affects Version/s: 3.1
Fix Version/s: (was: 3.1)
   3.1.1

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 3.1, 4.0
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
> LUCENE-2945.patch, LUCENE-2945.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-07 Thread Grant Ingersoll

On Mar 7, 2011, at 8:02 AM, Robert Muir wrote:

> On Mon, Mar 7, 2011 at 7:56 AM, Grant Ingersoll  wrote:
>> How do we have a release candidate if we still have issues open?  Or is this 
>> just a test run?
>> 
> 
> Anything in JIRA can make it in 3.2 instead. I said already, that
> yesterday was the time I had available to produce this RC build.
> 

I'm fine w/ it being pushed (I was going to suggest it actually), but I guess I 
missed the mail saying it was yesterday and thought I still might have time to 
fix it.  What thread was that on?

-Grant

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Lucene and Solr 3.1 release candidate

2011-03-07 Thread Grant Ingersoll

How do we have a release candidate if we still have issues open?  Or is this 
just a test run?

On Mar 7, 2011, at 1:32 AM, Robert Muir wrote:

> Hi all,
> 
> I have posted a release candidate for both Lucene 3.1 and Solr 3.1,
> both from revision 1078688 of
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_1/
> Thanks for all your help! Please test them and give your votes, the
> tentative release date for both versions is Sunday, March 13th, 2011.
> Only votes from Lucene PMC are binding, but everyone is welcome to
> check the release candidates and voice their approval or disapproval.
> The vote passes if at least three binding +1 votes are cast.
> 
> The release candidates are produced in parallel because in 2010 we
> merged the development of Lucene and Solr in order to produce higher
> quality releases. While we voted to reserve the right to release
> Lucene by itself, in my opinion we should definitely try to avoid this
> unless absolutely necessary, as it would ultimately cause more work
> and complication: instead it would be far easier to just fix whatever
> issues are discovered and respin both releases again.
> 
> Because of this, I ask that you cast a single vote to cover both
> releases. If the vote succeeds, both sets of artifacts can go their
> separate ways to the different websites.
> 
> Artifacts are located here: http://s.apache.org/solrcene31rc0
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-05 Thread Grant Ingersoll


On Mar 5, 2011, at 3:25 PM, Paul Elschot wrote:

> On Mar 5, 2011, at 9:40 AM, Paul Elschot wrote:
> 
>> What happens is that you actually end up comparing two different objects,
>> one which is the DistanceQuery and one which is the inner class, so it
>> doesn't work.
> 
> Iirc the one with the inner class is a (Lucene) Query, so why compare
> it to a (Surround) DistanceQuery from which it may have been generated?

I wasn't, that's what your patch did ;-)

It's the Lucene Query that matters.  I'm just not sure what the best way to 
generate the equals/hash from.

> Never mind, I don't have the source code here...



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-05 Thread Grant Ingersoll

On Mar 5, 2011, at 9:40 AM, Paul Elschot wrote:

> Grant,
> 
> I'm having a file system and/or hardware problem here, so I can only
> comment by mail at the moment.
> 
> The toString() implementations in (subclasses of) SrndQuery are supposed
> to provide a mostly reparsable string, so it should be possible to use
> that instead of passing in the original syntax string as in your patch.
> I would run the existing tests with an extra println at a strategic point
> to compare the parsed input to the toString() result, but I cannot do that
> now...
> 
> About the existing toString() implementations in the inner classes: as I
> understand java's "qualified this" these should not need to be redirected
> to the enclosing object for this issue. These existing toString()s were
> only used for development, so I expect no problem in reimplementing them
> in case this turns out to be necessary.

Right, I used the outer one, although I suspect it isn't correct yet, so will 
keep working on it.  It's probably safest to look at the underlying structures 
that are used to create the query.

> 
> Class objects should be unique, so I would expect the hashCode() and
> equals() in my patch to work with them, but I could not yet find a
> definite conclusion on inner classes in the java documentation. It could
> be that the explicit "inner" classes in your patch work around that.

What happens is that you actually end up comparing two different objects, one 
which is the DistanceQuery and one which is the inner class, so it doesn't work.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-05 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2945:


Attachment: LUCENE-2945.patch

OK, here's a patch with a test that passes.  I'm not entirely thrilled about 
the implementation of equals/hash on the two inner classes (used to be 
anonymous) but I do think it works.  Namely, I use the syntax of the original 
query as a string, per Paul's original suggestion as part of the hash/equals.  
It just seems awkward to have to pass that in solely for this purpose, but I 
didn't see what other information I had around that would make the object 
unique from an equals/hash standpoint.  I suppose the underlying queries list 
on the ComposedQuery might work and I can try that if others think it makes 
more sense.

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 4.0
>    Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
> LUCENE-2945.patch, LUCENE-2945.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2945:


Attachment: LUCENE-2945.patch

Here's a patch that has a test using QueryUtil that fails.  I don't think the 
getClass() approach is quite right for the base class equals.  

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 4.0
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch, 
> LUCENE-2945.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2390) Performance of usePhraseHighlighter is terrible on very large Documents, regardless of hl.maxDocCharsToAnalyze

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2390:
--

Fix Version/s: 3.1.1

> Performance of usePhraseHighlighter is terrible on very large Documents, 
> regardless of hl.maxDocCharsToAnalyze
> --
>
> Key: SOLR-2390
> URL: https://issues.apache.org/jira/browse/SOLR-2390
> Project: Solr
>  Issue Type: Bug
>  Components: highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 3.1.1, 3.2, 4.0
>
>
> There is a large performance bug here.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2939:


Fix Version/s: 3.1.1

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.1.1, 3.2, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2939:


Fix Version/s: (was: 3.1)
   3.2

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2390) Performance of usePhraseHighlighter is terrible on very large Documents, regardless of hl.maxDocCharsToAnalyze

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2390:
--

Fix Version/s: (was: 3.1)
   3.2

> Performance of usePhraseHighlighter is terrible on very large Documents, 
> regardless of hl.maxDocCharsToAnalyze
> --
>
> Key: SOLR-2390
> URL: https://issues.apache.org/jira/browse/SOLR-2390
> Project: Solr
>  Issue Type: Bug
>  Components: highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 3.2, 4.0
>
>
> There is a large performance bug here.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-03-04 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002604#comment-13002604
 ] 

Grant Ingersoll commented on LUCENE-2939:
-

I think Robert's right, we should not have shoved this in at the last minute, 
even though it is a pretty big issue for those doing highlighting of larger 
documents.  I'd say we just mark it as 3.1.1 or 3.2.

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-03-04 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002579#comment-13002579
 ] 

Grant Ingersoll commented on LUCENE-2939:
-

I'm OK either way, but it does seem like a pretty big performance bug.

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2390) Performance of usePhraseHighlighter is terrible on very large Documents, regardless of hl.maxDocCharsToAnalyze

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2390:
--

Fix Version/s: (was: 3.2)
   3.1

> Performance of usePhraseHighlighter is terrible on very large Documents, 
> regardless of hl.maxDocCharsToAnalyze
> --
>
> Key: SOLR-2390
> URL: https://issues.apache.org/jira/browse/SOLR-2390
> Project: Solr
>  Issue Type: Bug
>  Components: highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 3.1, 4.0
>
>
> There is a large performance bug here.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

3.1 help

2011-03-04 Thread Grant Ingersoll

I see a few issues left, is there anything else we need help on besides the 
usual testing?

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-03-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2939:


Lucene Fields:   (was: [New])
Fix Version/s: (was: 3.2)
   3.1

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2939) Highlighter should try and use maxDocCharsToAnalyze in WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as when using CachingTokenStream

2011-03-03 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002394#comment-13002394
 ] 

Grant Ingersoll commented on LUCENE-2939:
-

I can backport if you want.

> Highlighter should try and use maxDocCharsToAnalyze in 
> WeightedSpanTermExtractor when adding a new field to MemoryIndex as well as 
> when using CachingTokenStream
> 
>
> Key: LUCENE-2939
> URL: https://issues.apache.org/jira/browse/LUCENE-2939
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-2939.patch, LUCENE-2939.patch
>
>
> huge documents can be drastically slower than need be because the entire 
> field is added to the memory index
> this cost can be greatly reduced in many cases if we try and respect 
> maxDocCharsToAnalyze
> things can be improved even further by respecting this setting with 
> CachingTokenStream

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2949) FastVectorHighlighter FieldTermStack could likely benefit from using TermVectorMapper

2011-03-03 Thread Grant Ingersoll (JIRA)

FastVectorHighlighter FieldTermStack could likely benefit from using 
TermVectorMapper
-

 Key: LUCENE-2949
 URL: https://issues.apache.org/jira/browse/LUCENE-2949
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.0.3, 4.0
Reporter: Grant Ingersoll
Priority: Minor
 Fix For: 3.2, 4.0


Based on my reading of the FieldTermStack constructor that loads the vector 
from disk, we could probably save a bunch of time and memory by using the 
TermVectorMapper callback mechanism instead of materializing the full array of 
terms into memory and then throwing most of them out.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Unintuitive NGramTokenizer behavior

2011-03-03 Thread Grant Ingersoll


On Mar 3, 2011, at 1:10 PM, Robert Muir wrote:

> On Thu, Mar 3, 2011 at 1:00 PM, Grant Ingersoll  wrote:
>> 
>> Unfortunately, I'm not following your reasons for doing it.  I won't say I'm
>> against it at this point, but I don't see a compelling reason to change it
>> either so if you could clarify that would be great.  It's been around for
>> quite some time in it's current form and I think fits most people's
>> expectations of ngrams.
> 
> Grant I'm sorry, but I couldnt disagree more.
> 
> There are many variations on ngram tokenization (word-internal,
> word-spanning, skipgrams), besides allowing flexibility for what
> should be a "word character" and what should not be (e.g.
> punctuation), and how to handle the specifics of these.
> 
> But our n-gram tokenizer is *UNARGUABLY* completely broken for these reasons:
> 1. it discards anything after the first 1024 code units of the document.
> 2. it uses partial characters (UTF-16 code units) as its fundamental
> measure, potentially creating lots of invalid unicode.
> 3. it forms n-grams in the wrong order, contributing to #1. I
> explained this in LUCENE-1224

Sure, but those are ancillary to the whitespace question that was asked about.

> 
> Its these reasons that I suggested we completely rewrite it... people
> that are just indexing english documents with < 1024 chars per
> document and don't care about these things can use
> ClassicNGramTokenizer.


Fair enough.  Always open to improvements.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Unintuitive NGramTokenizer behavior

2011-03-03 Thread Grant Ingersoll

On Mar 3, 2011, at 9:36 AM, David Byrne wrote:

> I have a minor quibble about Lucene's NGramTokenizer.
> 
> Before I tokenize my strings, I am padding them with white space:
> 
> String foobar = " " + foo + " " + bar + " ";
> 
> When constructing term vectors from ngrams, this strategy has a couple 
> benefits.  First, it places special emphasis on the starting and ending of a 
> word.  Second, it improves the similarity between phrases with swapped words. 
>  " foo bar " matches " bar foo " more closely than "foo bar" matches "bar 
> foo".
> 
> 

I'm not following this argument.  What does the extra whitespace give you here? 

> The problem is that Lucene's NGramTokenizer trims whitespace.  This forces me 
> to do some preprocessing on my strings before I can tokenize them:
> 
> foobar.replaceAll(" ","$"); //arbitrary char not in my data
> 
> 

I'm confused.  If you are padding them up front, then why don't you just do the 
arbitrary char trick then?  Where is the extra processing?

> This is undocumented, so users won't realize their strings are being 
> trim()'ed, unless they look through the source, or examine the tokens 
> manually.
> 
> 

It may be undocumented, but I think it is pretty standard as to what users 
expect out of a tokenizer.

> I am proposing NGramTokenizer should be changed to respect whitespace.  Is 
> there a compelling reason against this?
> 
> 

Unfortunately, I'm not following your reasons for doing it.  I won't say I'm 
against it at this point, but I don't see a compelling reason to change it 
either so if you could clarify that would be great.  It's been around for quite 
some time in it's current form and I think fits most people's expectations of 
ngrams.

-Grant

[jira] Commented: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-02 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001596#comment-13001596
 ] 

Grant Ingersoll commented on LUCENE-2945:
-

bq. but there is no test added for the added hashCode() and equals().

Note, QueryUtils has methods for that.

I will review soon.

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 4.0
>    Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch, LUCENE-2945.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-2945:


Fix Version/s: 4.0
   3.1
 Assignee: Grant Ingersoll

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 4.0
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-2945-partial1.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2385) Backport latest /browse improvements to branch_3x

2011-03-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2385.
---

Resolution: Fixed

> Backport latest /browse improvements to branch_3x
> -
>
> Key: SOLR-2385
> URL: https://issues.apache.org/jira/browse/SOLR-2385
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 3.1
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>  Labels: velocity
> Fix For: 3.1
>
> Attachments: SOLR-2385.patch, SOLR-2385.patch
>
>
> There are a lot of improvements in TRUNK Velocity GUI which will work well 
> even for 3.1

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2945) Surround Query doesn't properly handle equals/hashcode

2011-03-02 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001485#comment-13001485
 ] 

Grant Ingersoll commented on LUCENE-2945:
-

What about the anonymous inner classes that actually construct the Query?  I 
think those are the primary cause of the problem.

> Surround Query doesn't properly handle equals/hashcode
> --
>
> Key: LUCENE-2945
> URL: https://issues.apache.org/jira/browse/LUCENE-2945
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 3.0.3, 4.0
>    Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-2945-partial1.patch
>
>
> In looking at using the surround queries with Solr, I am hitting issues 
> caused by collisions due to equals/hashcode not being implemented on the 
> anonymous inner classes that are created by things like DistanceQuery (branch 
> 3.x, near line 76)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Reopened: (SOLR-2385) Backport latest /browse improvements to branch_3x

2011-03-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reopened SOLR-2385:
---


> Backport latest /browse improvements to branch_3x
> -
>
> Key: SOLR-2385
> URL: https://issues.apache.org/jira/browse/SOLR-2385
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 3.1
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>  Labels: velocity
> Fix For: 3.1
>
> Attachments: SOLR-2385.patch, SOLR-2385.patch
>
>
> There are a lot of improvements in TRUNK Velocity GUI which will work well 
> even for 3.1

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2398) velocity / Solritas is throwing NumberFormatException if using Range Facets for Price

2011-03-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2398.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.1

> velocity / Solritas is throwing NumberFormatException if using Range Facets 
> for Price
> -
>
> Key: SOLR-2398
> URL: https://issues.apache.org/jira/browse/SOLR-2398
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1, 4.0
>Reporter: Bernd Fehling
>Assignee: Grant Ingersoll
>Priority: Trivial
> Fix For: 3.1, 4.0
>
> Attachments: VM_global_library.vm.patch
>
>
> velocity / Solritas is throwing NumberFormatException if using Range Facets 
> for Price
> of solr-trunk/apache-solr-4.0-2011-03-01_08-08-52/example/solr/conf/velocity/.
> This is due to a wrong format of range query in VM_global_library.vm.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Resolved: (SOLR-2385) Backport latest /browse improvements to branch_3x

2011-03-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2385.
---

Resolution: Fixed

> Backport latest /browse improvements to branch_3x
> -
>
> Key: SOLR-2385
> URL: https://issues.apache.org/jira/browse/SOLR-2385
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 3.1
>Reporter: Jan Høydahl
>Assignee: Grant Ingersoll
>  Labels: velocity
> Fix For: 3.1
>
> Attachments: SOLR-2385.patch, SOLR-2385.patch
>
>
> There are a lot of improvements in TRUNK Velocity GUI which will work well 
> even for 3.1

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2178) Use the Velocity UI as the default tutorial example

2011-03-02 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2178:
--

Fix Version/s: 4.0

> Use the Velocity UI as the default tutorial example
> ---
>
> Key: SOLR-2178
> URL: https://issues.apache.org/jira/browse/SOLR-2178
> Project: Solr
>  Issue Type: Improvement
>    Reporter: Grant Ingersoll
>    Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.0
>
>
> The /browse example in solr/example is much nicer to look at and work with, 
> we should convert the tutorial over to use it so as to present a nicer view 
> of Solr's capabilities.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1213 matches

Mail list logo