from:"Grant Ingersoll \(JIRA\)"

[jira] Issue Comment Edited: (LUCENE-1567) New flexible query parser

2009-06-16 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720379#action_12720379
]

Grant Ingersoll edited comment on LUCENE-1567 at 6/16/09 3:16 PM:
--

I need an MD5/SHA1 hash
(http://incubator.apache.org/ip-clearance/ip-clearance-template.html) for the
exact code listed in the software grant. Also include the version number of
the software used to create the hash.

Please also upload that code as a tarball on this issue. No need to worry
about the patches for now.

See https://issues.apache.org/jira/browse/INCUBATOR-77 for example.

was (Author: gsingers):
I need an MD5/SHA1 hash
(http://incubator.apache.org/ip-clearance/ip-clearance-template.html) for the
exact code listed in the software grant. Also include the version number of
the software used to create the hash.

See https://issues.apache.org/jira/browse/INCUBATOR-77 for example.

New flexible query parser
-

Key: LUCENE-1567
URL: https://issues.apache.org/jira/browse/LUCENE-1567
Project: Lucene - Java
Issue Type: New Feature
Components: QueryParser
Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
Attachments: lucene_trunk_FlexQueryParser_2009March24.patch,
lucene_trunk_FlexQueryParser_2009March26_v3.patch,
QueryParser_restructure_meetup_june2009_v2.pdf

From New flexible query parser thread by Micheal Busch
in my team at IBM we have used a different query parser than Lucene's in
our products for quite a while. Recently we spent a significant amount
of time in refactoring the code and designing a very generic
architecture, so that this query parser can be easily used for different
products with varying query syntaxes.
This work was originally driven by Andreas Neumann (who, however, left
our team); most of the code was written by Luis Alves, who has been a
bit active in Lucene in the past, and Adriano Campos, who joined our
team at IBM half a year ago. Adriano is Apache committer and PMC member
on the Tuscany project and getting familiar with Lucene now too.
We think this code is much more flexible and extensible than the current
Lucene query parser, and would therefore like to contribute it to
Lucene. I'd like to give a very brief architecture overview here,
Adriano and Luis can then answer more detailed questions as they're much
more familiar with the code than I am.
The goal was it to separate syntax and semantics of a query. E.g. 'a AND
b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
We distinguish the semantics of the different query components, e.g.
whether and how to tokenize/lemmatize/normalize the different terms or
which Query objects to create for the terms. We wanted to be able to
write a parser with a new syntax, while reusing the underlying
semantics, as quickly as possible.
In fact, Adriano is currently working on a 100% Lucene-syntax compatible
implementation to make it easy for people who are using Lucene's query
parser to switch.
The query parser has three layers and its core is what we call the
QueryNodeTree. It is a tree that initially represents the syntax of the
original query, e.g. for 'a AND b':
AND
/ \
A B
The three layers are:
1. QueryParser
2. QueryNodeProcessor
3. QueryBuilder
1. The upper layer is the parsing layer which simply transforms the
query text string into a QueryNodeTree. Currently our implementations of
this layer use javacc.
2. The query node processors do most of the work. It is in fact a
configurable chain of processors. Each processors can walk the tree and
modify nodes or even the tree's structure. That makes it possible to
e.g. do query optimization before the query is executed or to tokenize
terms.
3. The third layer is also a configurable chain of builders, which
transform the QueryNodeTree into Lucene Query objects.
Furthermore the query parser uses flexible configuration objects, which
are based on AttributeSource/Attribute. It also uses message classes that
allow to attach resource bundles. This makes it possible to translate
messages, which is an important feature of a query parser.
This design allows us to develop different query syntaxes very quickly.
Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
underlying processors and builders in a few days. We now have a 100%
compatible Lucene query parser, which means the syntax is identical and
all query parser test cases pass on the new one too using a wrapper.
Recent posts show that there is demand for query syntax improvements,
e.g improved range query syntax or operator precedence. There are
already different QP implementations in Lucene+contrib, however I think
we did not keep them all up to date and in

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-16 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720386#action_12720386
]

Grant Ingersoll commented on LUCENE-1567:
-

OK, only outstanding items for clearance are:
1. tarball and hash
2. Vote on Incubator for clearance.

New flexible query parser
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-16 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720417#action_12720417
]

Grant Ingersoll commented on LUCENE-1567:
-

Commit is separate from IP Clearance and you can't commit until the clearance
is accepted.

I just need the tarball for the code that was referenced in the software grant
along with a hash on it. In the grant, you have a file directory listing
describing the code. Take that file listing, tar it up and run md5 on it.

New flexible query parser
-

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720037#action_12720037
]

Grant Ingersoll commented on MAHOUT-65:
---

Hey Jeff,

Minor request, it seems like you have some sort of reformatting going on that
causes the patch to contain all kinds of formatting changes that make it a lot
harder to see the actual changes.

In thinking about this a little bit more, is there a way to just name a vector
and a row in a Matrix. All I really want right now is to be able to track
which Vector is associated with which document, and I could do this by setting
a unique name on the Vector and having that serialized. The name itself could
be stored in the first entry (for SparseVector, it would have to coincide with
the sCardinality stuff.

I'm fine with all the other label stuff, too.

Also, the patch doesn't apply because of the JSONVectorAdapter

Add Element Labels to Vectors and Matrices
--

Key: MAHOUT-65
URL: https://issues.apache.org/jira/browse/MAHOUT-65
Project: Mahout
Issue Type: New Feature
Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
Attachments: MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch

Many applications can benefit by accessing elements in vectors and matrices
using String labels in addition to numeric indices. Investigate adding such a
capability.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720045#action_12720045
 ] 

Grant Ingersoll commented on MAHOUT-65:
---

Is the only way to add bindings by setting the map?  Seems like
{code}
set(String label, int index, double value) 
{code}
would be useful.  And, that if bindings is null, it would create a new map.

Also, I'll see if I can work up the name thing.


 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-134) [PATCH] Cluster decode error handling

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-134.


   Resolution: Fixed
Fix Version/s: 0.2

Committed revision 785197.

 [PATCH] Cluster decode error handling
 -

 Key: MAHOUT-134
 URL: https://issues.apache.org/jira/browse/MAHOUT-134
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.2
Reporter: Robert Burrell Donkin
 Fix For: 0.2

 Attachments: mahout-cluster-format-error.patch


 ATM the javadocs are unclear as to whether null is an acceptable return value 
 and callers do not null check the return value. However, the implementation 
 may return null in or throw other runtime exceptions when the format is not 
 correct. This makes it hard to diagnose when there's a problem with the 
 format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720161#action_12720161
 ] 

Grant Ingersoll commented on MAHOUT-65:
---

OK, I will work up a patch for the name thing on a vector, unless you think 
that can be handled through the bindings thing.  Basically, I think we need a 
way to name a vector and have it carried through.

 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720199#action_12720199
 ] 

Grant Ingersoll commented on MAHOUT-65:
---

That works for Matrix.  For Vector, I was thinking, probably naively, we simply 
need to be able to add a name attribute.  For MAHOUT-130, I just dumped the 
column/cell labels out separately.  Like you said, I'm not sure we want all of 
that serialized. 

 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch, 
 MAHOUT-65d.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-65:
--

Attachment: MAHOUT-65-name.patch

Add name attribute.  Also added some docs on equals and added a strict 
equivalence notion that can be useful if one cares  about the implementation.



 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65-name.patch, MAHOUT-65.patch, MAHOUT-65b.patch, 
 MAHOUT-65c.patch, MAHOUT-65d.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720302#action_12720302
 ] 

Grant Ingersoll commented on MAHOUT-65:
---

Jeff,

One comment on the GSON serialization stuff.  It can get pretty verbose storing 
the class name repeatedly, although I do realize it's a drop in the bucket 
compared to the vector itself.  Perhaps we could do like Solr does and, if some 
abbreviated form is present where a class name is required (maybe 'DV' or 'SV') 
it could know to use those forms, otherwise it can do the full class lookup.  
Might just save a little bit on size of a serialized file, which I imagine can 
add up.

 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65-name.patch, MAHOUT-65.patch, MAHOUT-65b.patch, 
 MAHOUT-65c.patch, MAHOUT-65d.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-131) Vector improvements

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-131:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Vector improvements
 ---

 Key: MAHOUT-131
 URL: https://issues.apache.org/jira/browse/MAHOUT-131
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.2

 Attachments: MAHOUT-131.patch, MAHOUT-131.patch


 Vector and it's implementations could use a few things:
 1. DenseVector should implement equals and hashCode similar to SparseVector
 2. The VectorView asFormatString() is not compatible with actually recreating 
 any type of vector.  
 3. Add tests to VectorTest that assert that decodeFormat/asFormatString is 
 able to do a round trip.
 4. Add static AbstractVector.equivalent(Vector, Vector) that takes in two 
 vectors and compares them for equality, regardless of their implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-126:
---

Fix Version/s: 0.2
Affects Version/s: 0.2
   Status: Patch Available  (was: Open)

 Prepare document vectors from the text
 --

 Key: MAHOUT-126
 URL: https://issues.apache.org/jira/browse/MAHOUT-126
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.2

 Attachments: mahout-126-benson.patch, MAHOUT-126.patch, 
 MAHOUT-126.patch, MAHOUT-126.patch


 Clustering algorithms presently take the document vectors as input.  
 Generating these document vectors from the text can be broken in two tasks. 
 1. Create lucene index of the input  plain-text documents 
 2. From the index, generate the document vectors (sparse) with weights as 
 TF-IDF values of the term. With lucene index, this value can be calculated 
 very easily. 
 Presently, I have created two separate utilities, which could possibly be 
 invoked from another class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-126:
---

Attachment: MAHOUT-126.patch

Here's a version that is brought up to trunk and adds in MAHOUT-65-name.patch 
to allow for labeling the vectors.

Next, I'm going to run the output through some clustering

 Prepare document vectors from the text
 --

 Key: MAHOUT-126
 URL: https://issues.apache.org/jira/browse/MAHOUT-126
 Project: Mahout
  Issue Type: New Feature
Affects Versions: 0.2
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Fix For: 0.2

 Attachments: mahout-126-benson.patch, MAHOUT-126.patch, 
 MAHOUT-126.patch, MAHOUT-126.patch


 Clustering algorithms presently take the document vectors as input.  
 Generating these document vectors from the text can be broken in two tasks. 
 1. Create lucene index of the input  plain-text documents 
 2. From the index, generate the document vectors (sparse) with weights as 
 TF-IDF values of the term. With lucene index, this value can be calculated 
 very easily. 
 Presently, I have created two separate utilities, which could possibly be 
 invoked from another class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-65:
--

Attachment: MAHOUT-65-name.patch

implement hashCode better, require equals and hashcode as part of the 
interface, same as java.util.List

 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65-name.patch, MAHOUT-65-name.patch, 
 MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch, MAHOUT-65d.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-65:
--

Attachment: MAHOUT-65-name.patch

How about a version where the tests actually pass?  Will commit shortly.

 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65-name.patch, MAHOUT-65-name.patch, 
 MAHOUT-65-name.patch, MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch, 
 MAHOUT-65d.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720345#action_12720345
 ] 

Grant Ingersoll commented on MAHOUT-65:
---

Committed the name stuff: 
Committed revision 785386.

 Add Element Labels to Vectors and Matrices
 --

 Key: MAHOUT-65
 URL: https://issues.apache.org/jira/browse/MAHOUT-65
 Project: Mahout
  Issue Type: New Feature
  Components: Matrix
Affects Versions: 0.1
Reporter: Jeff Eastman
Assignee: Jeff Eastman
 Attachments: MAHOUT-65-name.patch, MAHOUT-65-name.patch, 
 MAHOUT-65-name.patch, MAHOUT-65.patch, MAHOUT-65b.patch, MAHOUT-65c.patch, 
 MAHOUT-65d.patch


 Many applications can benefit by accessing elements in vectors and matrices 
 using String labels in addition to numeric indices. Investigate adding such a 
 capability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-06-15 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719524#action_12719524
 ] 

Grant Ingersoll commented on MAHOUT-126:


Yeah, still needs the labeling stuff.

As for weights, you should be able to pass in a Weight object.  See the TFIDF 
implementation.  Likely still needs some work.

As for the Lucene error, I thought I had updated the Lucene version to be 
2.9-dev, which I believe makes this all right.

 Prepare document vectors from the text
 --

 Key: MAHOUT-126
 URL: https://issues.apache.org/jira/browse/MAHOUT-126
 Project: Mahout
  Issue Type: New Feature
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Attachments: mahout-126-benson.patch, MAHOUT-126.patch, 
 MAHOUT-126.patch


 Clustering algorithms presently take the document vectors as input.  
 Generating these document vectors from the text can be broken in two tasks. 
 1. Create lucene index of the input  plain-text documents 
 2. From the index, generate the document vectors (sparse) with weights as 
 TF-IDF values of the term. With lucene index, this value can be calculated 
 very easily. 
 Presently, I have created two separate utilities, which could possibly be 
 invoked from another class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAHOUT-132) [PATCH] Push magic names into public constants

2009-06-14 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned MAHOUT-132:
--

Assignee: Grant Ingersoll

 [PATCH] Push magic names into public constants
 --

 Key: MAHOUT-132
 URL: https://issues.apache.org/jira/browse/MAHOUT-132
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Affects Versions: 0.2
Reporter: Robert Burrell Donkin
Assignee: Grant Ingersoll
 Attachments: mahout-constants.patch


 ATM the examples (and any similar code) need to hard code magic strings for 
 directories. This makes the code more fragile and more difficult to 
 understand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-132) [PATCH] Push magic names into public constants

2009-06-14 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-132.


   Resolution: Fixed
Fix Version/s: 0.2

Committed revision 784640.

 [PATCH] Push magic names into public constants
 --

 Key: MAHOUT-132
 URL: https://issues.apache.org/jira/browse/MAHOUT-132
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Affects Versions: 0.2
Reporter: Robert Burrell Donkin
Assignee: Grant Ingersoll
 Fix For: 0.2

 Attachments: mahout-constants.patch


 ATM the examples (and any similar code) need to hard code magic strings for 
 directories. This makes the code more fragile and more difficult to 
 understand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-131) Vector improvements

2009-06-13 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-131:
---

Attachment: MAHOUT-131.patch

Updated patch implements equals/hashcode for all the Vectors and puts in 
various tests related to these issues.  It should be the case now that 
Vector.equals() acts just like List.equals(), namely that two vectors 
containing the same elements are equivalent regardless of the implementation.

 Vector improvements
 ---

 Key: MAHOUT-131
 URL: https://issues.apache.org/jira/browse/MAHOUT-131
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.2

 Attachments: MAHOUT-131.patch, MAHOUT-131.patch


 Vector and it's implementations could use a few things:
 1. DenseVector should implement equals and hashCode similar to SparseVector
 2. The VectorView asFormatString() is not compatible with actually recreating 
 any type of vector.  
 3. Add tests to VectorTest that assert that decodeFormat/asFormatString is 
 able to do a round trip.
 4. Add static AbstractVector.equivalent(Vector, Vector) that takes in two 
 vectors and compares them for equality, regardless of their implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-13 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719117#action_12719117
]

Grant Ingersoll commented on MAHOUT-121:

bq. Would it be useful to take a shot at rewriting SparseVector to use this?

You could do that, or an alternate implementation. Is there any case where one
wouldn't want this? Also, I wouldn't mind a little better name than
FastIntDouble. ;-)

Speed up distance calculations for sparse vectors
-

Key: MAHOUT-121
URL: https://issues.apache.org/jira/browse/MAHOUT-121
Project: Mahout
Issue Type: Improvement
Components: Matrix
Reporter: Shashikant Kore
Attachments: mahout-121.patch

From my mail to the Mahout mailing list.
I am working on clustering a dataset which has thousands of sparse vectors.
The complete dataset has few tens of thousands of feature items but each
vector has only couple of hundred feature items. For this, there is an
optimization in distance calculation, a link to which I found the archives of
Mahout mailing list.
http://lingpipe-blog.com/2009/03/12/speeding-up-k-means-clustering-algebra-sparse-vectors/
I tried out this optimization. The test setup had 2000 document vectors
with few hundred items. I ran canopy generation with Euclidean distance and
t1, t2 values as 250 and 200.

Current Canopy Generation: 28 min 15 sec.
Canopy Generation with distance optimization: 1 min 38 sec.
I know by experience that using Integer, Double objects instead of primitives
is computationally expensive. I changed the sparse vector implementation to
used primitive collections by Trove [
http://trove4j.sourceforge.net/ ].
Distance optimization with Trove: 59 sec
Current canopy generation with Trove: 21 min 55 sec
To sum, these two optimizations reduced cluster generation time by a 97%.
Currently, I have made the changes for Euclidean Distance, Canopy and KMeans.

Licensing of Trove seems to be an issue which needs to be addressed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-13 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719118#action_12719118
]

Grant Ingersoll commented on MAHOUT-121:

Also, seems like we could split out the original two issues that Shashikant
brought up, right?

Speed up distance calculations for sparse vectors
-

Key: MAHOUT-121
URL: https://issues.apache.org/jira/browse/MAHOUT-121
Project: Mahout
Issue Type: Improvement
Components: Matrix
Reporter: Shashikant Kore
Attachments: mahout-121.patch

Licensing of Trove seems to be an issue which needs to be addressed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENE-1687) Remove ExtendedFieldCache by rolling functionality into FieldCache

2009-06-12 Thread Grant Ingersoll (JIRA)

Remove ExtendedFieldCache by rolling functionality into FieldCache
--

 Key: LUCENE-1687
 URL: https://issues.apache.org/jira/browse/LUCENE-1687
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9


It is silly that we have ExtendedFieldCache.  It is a workaround to our 
supposed back compatibility problem.  This patch will merge the 
ExtendedFieldCache interface into FieldCache, thereby breaking back 
compatibility, but creating a much simpler API for FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-12 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718815#action_12718815
]

Grant Ingersoll commented on LUCENE-1676:
-

OK, I moved to contrib/CHANGES. I'm going to commit this today.

New Token filter for adding payloads in-stream

Key: LUCENE-1676
URL: https://issues.apache.org/jira/browse/LUCENE-1676
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/analyzers
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 2.9

Attachments: LUCENE-1676.patch

This TokenFilter is able to split a token based on a delimiter and use one
part as the token and the other part as a payload. This allows someone to
include payloads inline with tokens (presumably setup by a pipeline ahead of
time). An example is apropos. Given a | delimiter, we could have a stream
that looks like:
{quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ
dogs|NN{quote}
In this case, this would produce tokens and payloads (assuming whitespace
tokenization):
Token: the
Payload: null
Token: quick
Payload: JJ
Token: red
Pay: JJ.
and so on.
This patch will also support pluggable encoders for the payloads, so it can
convert from the character array to byte arrays as appropriate.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-12 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718817#action_12718817
 ] 

Grant Ingersoll commented on LUCENE-1676:
-

BTW, I'm curious if people have a better way to convert from char[] to byte[] 
for encoding the payloads (see FloatEncoder), other than going through Strings.

 New Token filter for adding payloads in-stream
 

 Key: LUCENE-1676
 URL: https://issues.apache.org/jira/browse/LUCENE-1676
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1676.patch


 This TokenFilter is able to split a token based on a delimiter and use one 
 part as the token and the other part as a payload.  This allows someone to 
 include payloads inline with tokens (presumably setup by a pipeline ahead of 
 time).  An example is apropos.  Given a | delimiter, we could have a stream 
 that looks like:
 {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ 
 dogs|NN{quote}
 In this case, this would produce tokens and payloads (assuming whitespace 
 tokenization):
 Token: the
 Payload: null
 Token: quick
 Payload: JJ
 Token: red
 Pay: JJ.
 and so on.
 This patch will also support pluggable encoders for the payloads, so it can 
 convert from the character array to byte arrays as appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1687) Remove ExtendedFieldCache by rolling functionality into FieldCache

2009-06-12 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718845#action_12718845
 ] 

Grant Ingersoll commented on LUCENE-1687:
-

True, but you know how we are about adding methods to an interface!

 Remove ExtendedFieldCache by rolling functionality into FieldCache
 --

 Key: LUCENE-1687
 URL: https://issues.apache.org/jira/browse/LUCENE-1687
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9


 It is silly that we have ExtendedFieldCache.  It is a workaround to our 
 supposed back compatibility problem.  This patch will merge the 
 ExtendedFieldCache interface into FieldCache, thereby breaking back 
 compatibility, but creating a much simpler API for FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-12 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718943#action_12718943
 ] 

Grant Ingersoll commented on LUCENE-1676:
-

I grabbed Apache Harmony's Integer.parseInt() code and converted it to take in 
a char array, which should speed up the IntegerEncoder.  However, the 
Float.parseInt implementation relies on some constructs that are not available 
in JDK 1.4, so that one is going to have to stay as it is.

The main problem lies in the reliance on the HexStringParser 
(https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/org/apache/harmony/luni/util/HexStringParser.java)
 which is in need of some Long specific attributes that are either JDK1.4 or 
are Harmony specific attributes of Long (I didn't take the time to investigate)

At any rate, I added the Integer stuff to ArrayUtils and also added some tests.

For reference, see: 
https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/org/apache/harmony/luni/util/FloatingPointParser.java

https://svn.apache.org/repos/asf/harmony/enhanced/classlib/archive/java6/modules/luni/src/main/java/java/lang/Integer.java



 New Token filter for adding payloads in-stream
 

 Key: LUCENE-1676
 URL: https://issues.apache.org/jira/browse/LUCENE-1676
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1676.patch


 This TokenFilter is able to split a token based on a delimiter and use one 
 part as the token and the other part as a payload.  This allows someone to 
 include payloads inline with tokens (presumably setup by a pipeline ahead of 
 time).  An example is apropos.  Given a | delimiter, we could have a stream 
 that looks like:
 {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ 
 dogs|NN{quote}
 In this case, this would produce tokens and payloads (assuming whitespace 
 tokenization):
 Token: the
 Payload: null
 Token: quick
 Payload: JJ
 Token: red
 Pay: JJ.
 and so on.
 This patch will also support pluggable encoders for the payloads, so it can 
 convert from the character array to byte arrays as appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-12 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll resolved LUCENE-1676.
-

Resolution: Fixed
Lucene Fields: (was: [New])

Committed revision 784297.

New Token filter for adding payloads in-stream

Attachments: LUCENE-1676.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-11 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718447#action_12718447
]

Grant Ingersoll commented on LUCENE-1676:
-

bq. Shouldn't the CHANGES entry in this patch go into contrib/CHANGES?

It can, I've never quite been sure. I think more people read the top-level
CHANGES, thus it is more likely to be noticed, but I'm fine either way.

New Token filter for adding payloads in-stream

Attachments: LUCENE-1676.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-979) Remove Deprecated Benchmarking Utilities from contrib/benchmark

2009-06-11 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718450#action_12718450
 ] 

Grant Ingersoll commented on LUCENE-979:


bq. What are the old benchmark utilities? 

It's like one class from the pre-Doron Task oriented approach.  I believe it's 
called Benchmark.java and was only able to do a few benchmarking tasks.

 Remove Deprecated Benchmarking Utilities from contrib/benchmark
 ---

 Key: LUCENE-979
 URL: https://issues.apache.org/jira/browse/LUCENE-979
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


 The old Benchmark utilities in contrib/benchmark have been deprecated and 
 should be removed in 2.9 of Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-979) Remove Deprecated Benchmarking Utilities from contrib/benchmark

2009-06-11 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718463#action_12718463
 ] 

Grant Ingersoll commented on LUCENE-979:


Yes.

 Remove Deprecated Benchmarking Utilities from contrib/benchmark
 ---

 Key: LUCENE-979
 URL: https://issues.apache.org/jira/browse/LUCENE-979
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9


 The old Benchmark utilities in contrib/benchmark have been deprecated and 
 should be removed in 2.9 of Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (MAHOUT-131) Vector improvements

2009-06-11 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-131:
---

Attachment: MAHOUT-131.patch

Some minor changes to Vector, etc.

 Vector improvements
 ---

 Key: MAHOUT-131
 URL: https://issues.apache.org/jira/browse/MAHOUT-131
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.2

 Attachments: MAHOUT-131.patch


 Vector and it's implementations could use a few things:
 1. DenseVector should implement equals and hashCode similar to SparseVector
 2. The VectorView asFormatString() is not compatible with actually recreating 
 any type of vector.  
 3. Add tests to VectorTest that assert that decodeFormat/asFormatString is 
 able to do a round trip.
 4. Add static AbstractVector.equivalent(Vector, Vector) that takes in two 
 vectors and compares them for equality, regardless of their implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1209) Site search powered by Lucene/Solr

2009-06-11 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718507#action_12718507
]

Grant Ingersoll commented on SOLR-1209:
---

bq. Just a small doubt - I assume Google shares revenue generated from clicks
on the site search page with ASF. Are we sure this is not affecting ASF
money-wise?

They don't share the revenue. All the Google box is right now is a Forrest
auto-generated, default, plugin.

Site search powered by Lucene/Solr
--

Key: SOLR-1209
URL: https://issues.apache.org/jira/browse/SOLR-1209
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Attachments: SOLR-1209.patch

For a number of years now, the Lucene community has been criticized for not
eating our own dog food when it comes to search. My company has built and
hosts a site search (http://www.lucidimagination.com/search) that is powered
by Apache Solr and Lucene and we'd like to donate it's use to the Lucene
community. Additionally, it allows one to search all of the Solr content from
a single place, including web, wiki, JIRA and mail archives. See also
http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
A preview of the site (for Mahout) is available at
http://people.apache.org/~gsingers/mahout/site/publish/
Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding
the search capabilities on the site.
The following patch adds a skin to the Forrest site that enables the Solr
site to search Solr only content using Lucene/Solr. When a search is
submitted, it automatically selects the Solr facet such that only Solr
content is searched. From there, users can then narrow/broaden their search
criteria.
I'm submitting this patch to Solr first, as we'd like to roll out our
capabilities to some of the smaller communities first and then broaden to the
rest of the Lucene ecosystem.
I plan on committing in a 3 or 4 days.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-131) Vector improvements

2009-06-10 Thread Grant Ingersoll (JIRA)

Vector improvements
---

 Key: MAHOUT-131
 URL: https://issues.apache.org/jira/browse/MAHOUT-131
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 0.2


Vector and it's implementations could use a few things:

1. DenseVector should implement equals and hashCode similar to SparseVector
2. The VectorView asFormatString() is not compatible with actually recreating 
any type of vector.  
3. Add tests to VectorTest that assert that decodeFormat/asFormatString is able 
to do a round trip.
4. Add static AbstractVector.equivalent(Vector, Vector) that takes in two 
vectors and compares them for equality, regardless of their implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-09 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717819#action_12717819
]

Grant Ingersoll commented on LUCENE-1678:
-

I frankly don't like renaming something like this. This is, once again, a case
of back compatibility biting us. If instead of working around back compat. we
had just made Analyzer.tokenStream be reusable, we wouldn't have to do this.
Now, instead, we are going to have a convoluted name for something (reusableTS).

In my mind, better to just make .tokenStream do the right thing and get rid of
reusableTokenStream.

Deprecate Analyzer.tokenStream
--

Key: LUCENE-1678
URL: https://issues.apache.org/jira/browse/LUCENE-1678
Project: Lucene - Java
Issue Type: Bug
Components: Analysis
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
Fix For: 2.9

The addition of reusableTokenStream to the core analyzers unfortunately broke
back compat of external subclasses:

http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
On upgrading, such subclasses would silently not be used anymore, since
Lucene's indexing invokes reusableTokenStream.
I think we should should at least deprecate Analyzer.tokenStream, today, so
that users see deprecation warnings if their classes override this method.
But going forward when we want to change the API of core classes that are
extended, I think we have to introduce entirely new classes, to keep back
compatibility.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-06-09 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717888#action_12717888
]

Grant Ingersoll commented on LUCENE-1678:
-

bq. If there are sane/smart ways to change our back compat policy, I think you
have seen that no one would object.

The sane/smart way is to do it on a case by case basis. Here is a specific
case. Generalizing it a bit, the place where it should be more easily
relaxable are the cases where we know very few people make customizations, as
in implementing Fieldable or FieldCache.

As for this specific case, the original change was the thing that broke back
compat. So, given it is already broken, why not fix it the right way?

Deprecate Analyzer.tokenStream
--

The addition of reusableTokenStream to the core analyzers unfortunately broke
back compat of external subclasses:

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-09 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-130.


Resolution: Fixed

Committed Ted's patch

 Vector should allow for other normalize powers than the L-2 norm
 

 Key: MAHOUT-130
 URL: https://issues.apache.org/jira/browse/MAHOUT-130
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-130-both-ways.patch, 
 MAHOUT-130-slight-tweaks.patch, MAHOUT-130.patch, MAHOUT-130.patch


 Modify Vector to allow other normalize functions for the Vector
 See 
 http://www.lucidimagination.com/search/document/bf3a7a7a004d4191/norm_calculations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-09 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated MAHOUT-126:
---

Attachment: MAHOUT-126.patch

Here's a first attempt at my thoughts based on the two previous patches, plus
some other ideas.

The main gist of the idea centers around the VectorIterable interface and is
driven by the o.a.mahout.utils.vectors.Driver class.

Note, I dropped the Lucene indexing part, as I don't think we need to be in the
game of creating Lucene indexes. That is a well known and well document
process that is available elsewhere. In fact, for this particular piece, I
indexed Wikipedia in Solr and then pointed the Driver class at the Lucene index.

See
http://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text
for details on using.

Prepare document vectors from the text
--

Key: MAHOUT-126
URL: https://issues.apache.org/jira/browse/MAHOUT-126
Project: Mahout
Issue Type: New Feature
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
Attachments: mahout-126-benson.patch, MAHOUT-126.patch,
MAHOUT-126.patch

Clustering algorithms presently take the document vectors as input.
Generating these document vectors from the text can be broken in two tasks.
1. Create lucene index of the input plain-text documents
2. From the index, generate the document vectors (sparse) with weights as
TF-IDF values of the term. With lucene index, this value can be calculated
very easily.
Presently, I have created two separate utilities, which could possibly be
invoked from another class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-108) Implementation of Assoication Rules learning by Apriori algorithm

2009-06-08 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717245#action_12717245
]

Grant Ingersoll commented on MAHOUT-108:

http://cwiki.apache.org/MAHOUT/howtocontribute.html

Implementation of Assoication Rules learning by Apriori algorithm
-

Key: MAHOUT-108
URL: https://issues.apache.org/jira/browse/MAHOUT-108
Project: Mahout
Issue Type: Task
Environment: Linux, Hadoop-0.17.1
Reporter: chao deng
Fix For: 0.2

Original Estimate: 504h
Remaining Estimate: 504h

Target: Association Rules learning is a popular method for discovering
interesting relations between variables in large databases. Here, we would
implement the Apriori algorithm using HadoopMapreduce parallel techniques.
Applications: Typically, association rules learning is used to discover
regularities between products in large scale transaction data in
supermarkets. For example, the rule {onions, patatoes}-beef found in the
sales data would indicate that if a customer buys onions and potatoes
together, he or she is likely to also buy beef. Such information can be used
as the basis for decisions about marketing activities. In addition to the
market basket analysis, association rules are employed today in many
application areas including Web usage mining, intrusion detection and
bioinformatics.
Apriori algorithm: Apriori is the best-known algorithm to mine association
rules. It uses a breadth-first search strategy to counting the support of
itemsets and uses a candidate generation function which exploits the downward
closure property of support

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-08 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-130:
---

Attachment: MAHOUT-130.patch

Draft.  Not sure if the optimizations make sense or not, but I think they do.

Patch applies in the core directory, not the top level

 Vector should allow for other normalize powers than the L-2 norm
 

 Key: MAHOUT-130
 URL: https://issues.apache.org/jira/browse/MAHOUT-130
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-130.patch


 Modify Vector to allow other normalize functions for the Vector
 See 
 http://www.lucidimagination.com/search/document/bf3a7a7a004d4191/norm_calculations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-08 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-130:
---

Attachment: (was: MAHOUT-130.patch)

 Vector should allow for other normalize powers than the L-2 norm
 

 Key: MAHOUT-130
 URL: https://issues.apache.org/jira/browse/MAHOUT-130
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-130.patch


 Modify Vector to allow other normalize functions for the Vector
 See 
 http://www.lucidimagination.com/search/document/bf3a7a7a004d4191/norm_calculations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-08 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-130:
---

Attachment: MAHOUT-130.patch

No reason the power needs to be int only, I suppose

 Vector should allow for other normalize powers than the L-2 norm
 

 Key: MAHOUT-130
 URL: https://issues.apache.org/jira/browse/MAHOUT-130
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-130.patch


 Modify Vector to allow other normalize functions for the Vector
 See 
 http://www.lucidimagination.com/search/document/bf3a7a7a004d4191/norm_calculations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-08 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717511#action_12717511
 ] 

Grant Ingersoll commented on MAHOUT-130:


D'oh.  My bad.  I had initialized val = 1 instead of val = 0;

All looks good now.

 Vector should allow for other normalize powers than the L-2 norm
 

 Key: MAHOUT-130
 URL: https://issues.apache.org/jira/browse/MAHOUT-130
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-130.patch


 Modify Vector to allow other normalize functions for the Vector
 See 
 http://www.lucidimagination.com/search/document/bf3a7a7a004d4191/norm_calculations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-08 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-130:
---

Attachment: MAHOUT-130.patch

Adds 0 norm and Infinite norm as well as Ted and David's suggestions, as well 
as maxValue and maxValueIndex methods.

 Vector should allow for other normalize powers than the L-2 norm
 

 Key: MAHOUT-130
 URL: https://issues.apache.org/jira/browse/MAHOUT-130
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: MAHOUT-130-both-ways.patch, MAHOUT-130.patch, 
 MAHOUT-130.patch


 Modify Vector to allow other normalize functions for the Vector
 See 
 http://www.lucidimagination.com/search/document/bf3a7a7a004d4191/norm_calculations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1208) The Default SearchComponents (QueryComponent, etc.) cannot currently support SolrCoreAware or ResourceLoaderAware

2009-06-08 Thread Grant Ingersoll (JIRA)

The Default SearchComponents (QueryComponent, etc.) cannot currently support 
SolrCoreAware or ResourceLoaderAware
-

 Key: SOLR-1208
 URL: https://issues.apache.org/jira/browse/SOLR-1208
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


The Default SearchComponents are not instantiated via the SolrResourceLoader 
and are thus not put in the waiting lists for SolrCoreAware and 
ResourceLoaderAware.  Thus, they are not constructed in the same that other 
SearchComponents might be constructed.

See 
http://www.lucidimagination.com/search/document/ef69fdc7dfb17428/default_searchcomponents_and_solrcoreaware



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1209) Site search powered by Lucene/Solr

2009-06-08 Thread Grant Ingersoll (JIRA)

Site search powered by Lucene/Solr
--

Key: SOLR-1209
URL: https://issues.apache.org/jira/browse/SOLR-1209
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor

A preview of the site (for Mahout) is available at
http://people.apache.org/~gsingers/mahout/site/publish/

Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding the
search capabilities on the site.

The following patch adds a skin to the Forrest site that enables the Solr site
to search Solr only content using Lucene/Solr. When a search is submitted, it
automatically selects the Solr facet such that only Solr content is searched.
From there, users can then narrow/broaden their search criteria.

I'm submitting this patch to Solr first, as we'd like to roll out our
capabilities to some of the smaller communities first and then broaden to the
rest of the Lucene ecosystem.

I plan on committing in a 3 or 4 days.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1209) Site search powered by Lucene/Solr

2009-06-08 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-1209:
--

Attachment: SOLR-1209.patch

Patch changing the skin to use Solr powered search

Site search powered by Lucene/Solr
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-06-04 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716311#action_12716311
]

Grant Ingersoll commented on LUCENE-1567:
-

The software Grant has been received and filed. I will update the paperwork
and work to finish this out next week, such that we can then work to commit it.

New flexible query parser
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-128) maven parent not included in build

2009-06-04 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-128.


Resolution: Fixed

committed

 maven parent not included in build
 --

 Key: MAHOUT-128
 URL: https://issues.apache.org/jira/browse/MAHOUT-128
 Project: Mahout
  Issue Type: Bug
Reporter: Benson Margulies
 Attachments: pom.diff


 The maven parent isn't included as a module, so it's pom isn't installed, so 
 building some other project that depends on mahout-core fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-02 Thread Grant Ingersoll (JIRA)

New Token filter for adding payloads in-stream


 Key: LUCENE-1676
 URL: https://issues.apache.org/jira/browse/LUCENE-1676
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9


This TokenFilter is able to split a token based on a delimiter and use one part 
as the token and the other part as a payload.  This allows someone to include 
payloads inline with tokens (presumably setup by a pipeline ahead of time).  An 
example is apropos.  Given a | delimiter, we could have a stream that looks 
like:
{quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ 
dogs|NN{quote}

In this case, this would produce tokens and payloads (assuming whitespace 
tokenization):
Token: the
Payload: null

Token: quick
Payload: JJ

Token: red
Pay: JJ.

and so on.

This patch will also support pluggable encoders for the payloads, so it can 
convert from the character array to byte arrays as appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1676) New Token filter for adding payloads in-stream

2009-06-02 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated LUCENE-1676:

Attachment: LUCENE-1676.patch

Here's a first draft of this. See the test case for an example.

New Token filter for adding payloads in-stream

Attachments: LUCENE-1676.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned MAHOUT-126:
--

Assignee: Grant Ingersoll

 Prepare document vectors from the text
 --

 Key: MAHOUT-126
 URL: https://issues.apache.org/jira/browse/MAHOUT-126
 Project: Mahout
  Issue Type: New Feature
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Attachments: MAHOUT-126.patch


 Clustering algorithms presently take the document vectors as input.  
 Generating these document vectors from the text can be broken in two tasks. 
 1. Create lucene index of the input  plain-text documents 
 2. From the index, generate the document vectors (sparse) with weights as 
 TF-IDF values of the term. With lucene index, this value can be calculated 
 very easily. 
 Presently, I have created two separate utilities, which could possibly be 
 invoked from another class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714414#action_12714414
 ] 

Grant Ingersoll commented on MAHOUT-126:


See SOLR-1193.  

 Prepare document vectors from the text
 --

 Key: MAHOUT-126
 URL: https://issues.apache.org/jira/browse/MAHOUT-126
 Project: Mahout
  Issue Type: New Feature
Reporter: Shashikant Kore
Assignee: Grant Ingersoll
 Attachments: MAHOUT-126.patch


 Clustering algorithms presently take the document vectors as input.  
 Generating these document vectors from the text can be broken in two tasks. 
 1. Create lucene index of the input  plain-text documents 
 2. From the index, generate the document vectors (sparse) with weights as 
 TF-IDF values of the term. With lucene index, this value can be calculated 
 very easily. 
 Presently, I have created two separate utilities, which could possibly be 
 invoked from another class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714509#action_12714509
]

Grant Ingersoll commented on MAHOUT-126:

So just kind of brainstorming here, but I think we should create a separate
Module for this kind of stuff, to keep out of core and give us some more
flexibility in regards to dependencies, etc.

Also (and I realize this is just a start patch), I think we should assume a
Lucene index exists already instead of maintaining code to actually create an
index. There are a lot of ways to do that and people will likely have
different fields, etc. For instance, Solr can provide all of the capabilities
here and it has distributed support, so it can scale. Moreover, though, is
people may have the info in a DB or in other places. I realize we need baby
steps, but...

I'll try to post a patch this afternoon that takes this effort and melds it
with some of my ideas for demo purposes.

Prepare document vectors from the text
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-05-29 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714515#action_12714515
]

Grant Ingersoll commented on MAHOUT-126:

Shashikant,

Couple of comments on the Lucene specific stuff, though, so that you guys can
speed up what you have.

First off, have a look at Lucene's support of TermVectorMapper. Much like SAX,
it gives you a call back mechanism such that you don't have to construct two
different data structures (i.e. many people incorrectly use the DOM to parse
XML and then extract out of the DOM into their own Data Structure when they
should use SAX instead).

You might have a look at the TermVectorComponent in Solr, as it pretty much
does what you are looking to do in this patch and I believe it to be more
efficient.

Seems like we should be able to avoid caching the whole term list in memory.
At a minimum, if you are going to, allTerms should be a MapString, Integer
that stores the term and it's DF (doc freq.), as you are currently doing the DF
lookup twice, AFAICT. DF lookup is expensive in Lucene. If you don't cache
the whole list, we should at least have an LRU cache for DF.

Prepare document vectors from the text
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-120) Site search powered by Solr

2009-05-28 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll resolved MAHOUT-120.

Resolution: Fixed

Committed revision 779594.

Site search powered by Solr
---

Key: MAHOUT-120
URL: https://issues.apache.org/jira/browse/MAHOUT-120
Project: Mahout
Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Attachments: MAHOUT-120.patch, MAHOUT-120.patch

For a number of years now, the Lucene community has been criticized for not
eating our own dog food when it comes to search. My company has built and
hosts a site search (http://www.lucidimagination.com/search) that is powered
by Apache Solr and Lucene and we'd like to donate it's use to the Lucene
community. Additionally, it allows one to search all of the Mahout content
from a single place, including web, wiki, JIRA and mail archives. See also
http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
A preview of the site is available at
http://people.apache.org/~gsingers/mahout/site/publish/
Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding
the search capabilities on the site.
The following patch adds a skin to the Forrest site that enables the Mahout
site to search Mahout only content using Lucene/Solr. When a search is
submitted, it automatically selects the Mahout facet such that only Mahout
content is searched. From there, users can then narrow/broaden their search
criteria.
I'm submitting this patch to Mahout first, as we'd like to roll out our
capabilities to some of the smaller communities first and then broaden to the
rest of the Lucene ecosystem.
I plan on committing in a 3 or 4 days.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1193) Provide option for TermVectorComponent to provide a way of retrieving TV info around a position instead of the whole vector

2009-05-28 Thread Grant Ingersoll (JIRA)

Provide option for TermVectorComponent to provide a way of retrieving TV info 
around a position instead of the whole vector
---

 Key: SOLR-1193
 URL: https://issues.apache.org/jira/browse/SOLR-1193
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


It's often useful to retrieve TermVector information around (within some user 
specified window) a specific position or offset.  The TermVectorComponent can 
easily be modifed to use a TermVectorMapper that is aware of position/offset 
information and only returns term info within the window.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (TIKA-235) Site search powered by Lucene/Solr

2009-05-28 Thread Grant Ingersoll (JIRA)

Site search powered by Lucene/Solr
--

 Key: TIKA-235
 URL: https://issues.apache.org/jira/browse/TIKA-235
 Project: Tika
  Issue Type: New Feature
Reporter: Grant Ingersoll
Priority: Minor


For a number of years now, the Lucene community has been criticized for not 
eating our own dog food when it comes to search. My company has built and 
hosts a site search (http://search.lucidimagination.com/) that is powered by 
Apache Solr and Lucene and we'd like to donate it's use to the Lucene 
community. Additionally, it allows one to search all of the Tika content from a 
single place, including web, wiki, JIRA and mail archives. See also 
http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org

A sample of what it _might_ look like is at 
http://people.apache.org/~gsingers/tika/Note, however, I am not entirely 
sure how Tika deploys just yet, so there are a few issues w/ the display

Lucid has a fault tolerant setup with replication and fail over as well as 
monitoring services in place. We are committed to maintaining and expanding the 
search capabilities on the site.

The following patch adds the basics to Tika to support the search, but isn't 
entirely done yet b/c I'm not sure what the Look and Feel Tika wants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (TIKA-235) Site search powered by Lucene/Solr

2009-05-28 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/TIKA-235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated TIKA-235:
-

Attachment: TIKA-235.patch

First draft of a patch. See also MAHOUT-120

Site search powered by Lucene/Solr
--

Key: TIKA-235
URL: https://issues.apache.org/jira/browse/TIKA-235
Project: Tika
Issue Type: New Feature
Reporter: Grant Ingersoll
Priority: Minor
Attachments: TIKA-235.patch

For a number of years now, the Lucene community has been criticized for not
eating our own dog food when it comes to search. My company has built and
hosts a site search (http://search.lucidimagination.com/) that is powered by
Apache Solr and Lucene and we'd like to donate it's use to the Lucene
community. Additionally, it allows one to search all of the Tika content from
a single place, including web, wiki, JIRA and mail archives. See also
http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
A sample of what it _might_ look like is at
http://people.apache.org/~gsingers/tika/Note, however, I am not entirely
sure how Tika deploys just yet, so there are a few issues w/ the display
Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding
the search capabilities on the site.
The following patch adds the basics to Tika to support the search, but isn't
entirely done yet b/c I'm not sure what the Look and Feel Tika wants.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-63) Self Organizing Map

2009-05-27 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAHOUT-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved MAHOUT-63.
---

Resolution: Duplicate

See MAHOUT-64

 Self Organizing Map
 ---

 Key: MAHOUT-63
 URL: https://issues.apache.org/jira/browse/MAHOUT-63
 Project: Mahout
  Issue Type: New Feature
  Components: Classification
Reporter: Farid Bourennani
Priority: Minor
   Original Estimate: 120h
  Remaining Estimate: 120h

 Implementation of  the Kohonen's Self organizing map algorithm.
 Execution: run the SOMViewer .
 takes 300 iteration.
 - The algo is too slow because of:
   GUI: the current one is a temporary one, but should be replaced by 
 prefused library as suggested by Ted.
   Self-Organizing Maps: Batch Algorithm is faster than the sequentiel one 
 that I am currently using.
 - Documentation needs to be completed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-66) EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not optimized for Sparse Vectors

2009-05-27 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/MAHOUT-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713660#action_12713660
 ] 

Grant Ingersoll commented on MAHOUT-66:
---

Can this be closed?

 EuclideanDistanceMeasure and ManhattanDistanceMeasure classes are not 
 optimized for Sparse Vectors
 --

 Key: MAHOUT-66
 URL: https://issues.apache.org/jira/browse/MAHOUT-66
 Project: Mahout
  Issue Type: Improvement
  Components: Clustering
Reporter: Pallavi Palleti
Priority: Minor
 Attachments: MAHOUT-66.patch, MAHOUT-66.patch, MAHOUT-66.patch, 
 MAHOUT-66.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-125) Remove Deprecated Ant builds

2009-05-27 Thread Grant Ingersoll (JIRA)

Remove Deprecated Ant builds


 Key: MAHOUT-125
 URL: https://issues.apache.org/jira/browse/MAHOUT-125
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor


Finish transferring functionality from build-deprecated.xml files to Maven.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1188) TermVectorComponent Efficiency improvements

2009-05-27 Thread Grant Ingersoll (JIRA)

TermVectorComponent Efficiency improvements
---

 Key: SOLR-1188
 URL: https://issues.apache.org/jira/browse/SOLR-1188
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


The TermVectorComponent currently uses a TermVectorMapper that does not 
indicate to Lucene whether positions and offsets are of interest by overriding 
isIgnoringOffsets and isIgnoringPositions.  



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1188) TermVectorComponent Efficiency improvements

2009-05-27 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1188.
---

Resolution: Fixed

Committed simple patch to override the two methods.

 TermVectorComponent Efficiency improvements
 ---

 Key: SOLR-1188
 URL: https://issues.apache.org/jira/browse/SOLR-1188
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


 The TermVectorComponent currently uses a TermVectorMapper that does not 
 indicate to Lucene whether positions and offsets are of interest by 
 overriding isIgnoringOffsets and isIgnoringPositions.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-1177) Distributed TermsComponent

2009-05-22 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1177:
-

Assignee: Grant Ingersoll

 Distributed TermsComponent
 --

 Key: SOLR-1177
 URL: https://issues.apache.org/jira/browse/SOLR-1177
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Matt Weber
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1177.patch, TermsComponent.java, 
 TermsComponent.patch


 TermsComponent should be distributed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENE-1550) Add N-Gram String Matching for Spell Checking

2009-05-20 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711150#action_12711150
 ] 

Grant Ingersoll commented on LUCENE-1550:
-

Committed revision 776704.

Thanks Tom!

 Add N-Gram String Matching for Spell Checking
 -

 Key: LUCENE-1550
 URL: https://issues.apache.org/jira/browse/LUCENE-1550
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spellchecker
Affects Versions: 2.9
Reporter: Thomas Morton
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1550.patch, LUCENE-1550.patch, LUCENE-1550.patch


 N-Gram version of edit distance based on paper by Grzegorz Kondrak, N-gram 
 similarity and distance. Proceedings of the Twelfth International Conference 
 on String Processing and Information Retrieval (SPIRE 2005), pp. 115-126,  
 Buenos Aires, Argentina, November 2005. 
 http://www.cs.ualberta.ca/~kondrak/papers/spire05.pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-05-20 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711137#action_12711137
]

Grant Ingersoll commented on SOLR-769:
--

Committed revision 776692.

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

Attachments: clustering-libs.tar, clustering-libs.tar,
SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch,
SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch,
SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar,
SOLR-769.zip

Clustering is a useful tool for working with documents and search results,
similar to the notion of dynamic faceting. Carrot2
(http://project.carrot2.org/) is a nice, BSD-licensed, library for doing
search results clustering. Mahout (http://lucene.apache.org/mahout) is well
suited for whole-corpus clustering.
The patch I lays out a contrib module that starts off w/ an integration of a
SearchComponent for doing clustering and an implementation using Carrot. In
search results mode, it will use the DocList as the input for the cluster.
While Carrot2 comes w/ a Solr input component, it is not the same as the
SearchComponent that I have in that the Carrot example actually submits a
query to Solr, whereas my SearchComponent is just chained into the Component
list and uses the ResponseBuilder to add in the cluster results.
While not fully fleshed out yet, the collection based mode will take in a
list of ids or just use the whole collection and will produce clusters.
Since this is a longer, typically offline task, there will need to be some
type of storage mechanism (and replication??) for the clusters. I _may_
push this off to a separate JIRA issue, but I at least want to present the
use case as part of the design of this component/contrib. It may even make
sense that we split this out, such that the building piece is something like
an UpdateProcessor and then the SearchComponent just acts as a lookup
mechanism.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-769) Support Document and Search Result clustering

2009-05-20 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711137#action_12711137
]

Grant Ingersoll edited comment on SOLR-769 at 5/20/09 6:42 AM:
---

Committed revision 776692.

Thanks to everyone who helped out, especially Carrot2 creators Dawid and
Stanislaw.

was (Author: gsingers):
Committed revision 776692.

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-119) Create an uber jar for use on Amazon Elastic M/R, etc.

2009-05-19 Thread Grant Ingersoll (JIRA)

Create an uber jar for use on Amazon Elastic M/R, etc.
--

 Key: MAHOUT-119
 URL: https://issues.apache.org/jira/browse/MAHOUT-119
 Project: Mahout
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


Some cloud resources have problems loading classes across JARs in the Job jar.  
See 
http://www.lucidimagination.com/search/document/3a5680dfe567d812/running_dirichlet_example_on_aemr

This can be fixed by adding a new target that creates a single Jar target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-120) Site search powered by Solr

2009-05-19 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated MAHOUT-120:
---

Attachment: MAHOUT-120.patch

Patch to change the site skin. This was created by slightly modifying the
default Forrest skin.

Site search powered by Solr
---

For a number of years now, the Lucene community has been criticized for not
eating our own dog food when it comes to search. My company has built and
hosts a site search (http://www.lucidimagination.com/search) that is powered
by Apache Solr and Lucene and we'd like to donate it's use to the Lucene
community. Additionally, it allows one to search all of the Mahout content
from a single place, including web, wiki, JIRA and mail archives. See also
http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
A preview of the site is available at
http://people.apache.org/~gsingers/mahout/site/publish/
Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding
the search capabilities on the site.
The following patch adds a skin to the Forrest site that enables the Mahout
site to search Mahout only content using Lucene/Solr. When a search is
submitted, it automatically selects the Mahout facet such that only Mahout
content is searched. From there, users can then narrow/broaden their search
criteria.
I'm submitting this patch to Mahout first, as we'd like to roll out our
capabilities to some of the smaller communities first and then broaden to the
rest of the Lucene ecosystem.
I plan on committing in a 3 or 4 days.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-120) Site search powered by Solr

2009-05-19 Thread Grant Ingersoll (JIRA)

Site search powered by Solr
---

A preview of the site is available at
http://people.apache.org/~gsingers/mahout/site/publish/

Lucid has a fault tolerant setup with replication and fail over as well as
monitoring services in place. We are committed to maintaining and expanding
the search capabilities on the site.

The following patch adds a skin to the Forrest site that enables the Mahout
site to search Mahout only content using Lucene/Solr. When a search is
submitted, it automatically selects the Mahout facet such that only Mahout
content is searched. From there, users can then narrow/broaden their search
criteria.

I'm submitting this patch to Mahout first, as we'd like to roll out our
capabilities to some of the smaller communities first and then broaden to the
rest of the Lucene ecosystem.

I plan on committing in a 3 or 4 days.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-761) Fix Flare license headers

2009-05-19 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710870#action_12710870
 ] 

Grant Ingersoll commented on SOLR-761:
--

FYI: ant rat-sources is helpful in easily identifying which files are missing 
license headers.

 Fix Flare license headers
 -

 Key: SOLR-761
 URL: https://issues.apache.org/jira/browse/SOLR-761
 Project: Solr
  Issue Type: Task
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 1.4


 Solr Flare has inconsistent use of the Apache Software License header in its 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-769) Support Document and Search Result clustering

2009-05-13 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-769:
-

Attachment: SOLR-769.patch

OK, I think all the ducks are in a row.

I intend to commit on Friday.

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-05-12 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708534#action_12708534
 ] 

Grant Ingersoll commented on SOLR-773:
--

I think, and correct me if I'm wrong, that one of the things that often happens 
with geo stuff is that there are a lot of unique values.  This often has memory 
ramifications when using with FunctionQueries since most ValueSources uninvert 
the field.

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
 SOLR-773.patch, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-773) Incorporate Local Lucene/Solr

2009-05-12 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708534#action_12708534
]

Grant Ingersoll edited comment on SOLR-773 at 5/12/09 11:02 AM:

I think, and correct me if I'm wrong, that one of the things that often happens
with geo stuff is that there are a lot of unique values. This often has memory
ramifications when using with FunctionQueries since most ValueSources uninvert
the field.

Otherwise, I like the sounds of Yonik's proposal as well.

was (Author: gsingers):
I think, and correct me if I'm wrong, that one of the things that often
happens with geo stuff is that there are a lot of unique values. This often
has memory ramifications when using with FunctionQueries since most
ValueSources uninvert the field.

Incorporate Local Lucene/Solr
-

Key: SOLR-773
URL: https://issues.apache.org/jira/browse/SOLR-773
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch,
SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch,
SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch,
SOLR-773.patch, spatial-solr.tar.gz

Local Lucene has been donated to the Lucene project. It has some Solr
components, but we should evaluate how best to incorporate it into Solr.
See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-05-12 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708541#action_12708541
 ] 

Grant Ingersoll commented on SOLR-773:
--

Also, how does the TrieRange stuff factor into this?

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
 SOLR-773.patch, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-05-12 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708680#action_12708680
]

Grant Ingersoll commented on SOLR-773:
--

{quote}
1) What is the goal we want to achieve?

* Provide a first iteration of a geographical search entity to SOLR
* Bring an external popular plugin, in out of the cold into ASF and SOLR,
helps solr users out, increases developers from 1 to many.
{quote}

Agreed on the first, not 100% certain on the second. On the second, this issue
is the gate keeper. If people reviewing the patch feel there are better ways
to do things, then we should work through them before committing. What you are
effectively seeing is an increase in the developers working on from 1 to many,
it's just not on committed code.

{quote}
2) What is the level of commitment, and road map of spatial solutions in lucene
and solr?

* The primary goal of SOLR is as a text search engine, not GIS search,
there are other and better ways to do that
without reinventing the wheel and shoe horn-ing it into lucene.
(e.g. persistent doc id mappings that can be referenced outside of
lucene, so things like postGis and other tools can be used)
* We can never fully solve everyone's needs at once, lets start with what
we have, and iterate upon it.
* I'm happy for any improvements as long as they keep to two goals A. don't
make it stupid B. don't make it complex.
{quote}

On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just
exactly a GIS search capability for Lucene/Solr? I'm not sure if I would
categorize it as shoe-horning. There are many things that Lucene/Solr can
power, GIS search is one of them. By committing this patch (or some
variation), we are saying Solr is going to support GIS search. Of course,
there are other ways to do it, but that doesn't preclude it from L/S. The
combination of text search plus GIS search is very powerful, as you know.

Still, I think Yonik's main point is why reinvent the wheel when it comes to
things like distributed search and the need for custom code for indexing, etc.
when they likely can be handled through function queries and field types and
therefore all of Solr's current functionality would just work. The other
capabilities (like sorting by a FunctionQuery) is icing on the cake that helps
solve other problems as well.

Totally agree on the other points. Also very cool to see the benchmarking info.

Incorporate Local Lucene/Solr
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-773) Incorporate Local Lucene/Solr

2009-05-12 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708680#action_12708680
]

Grant Ingersoll edited comment on SOLR-773 at 5/12/09 4:21 PM:
---

{quote}
1) What is the goal we want to achieve?

{quote}
2) What is the level of commitment, and road map of spatial solutions in lucene
and solr?

On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just
exactly a GIS search capability for Lucene/Solr? I'm not sure if I would
categorize it as shoe-horning. There are many things that Lucene/Solr can
power, GIS search with text is one of them. By committing this patch (or some
variation), we are saying Solr is going to support it. Of course, there are
other ways to do it, but that doesn't preclude it from L/S. The combination of
text search plus GIS search is very powerful, as you know.

Totally agree on the other points. Also very cool to see the benchmarking info.

was (Author: gsingers):
{quote}
1) What is the goal we want to achieve?

{quote}
2) What is the level of commitment, and road map of spatial solutions in lucene
and solr?

Totally agree on the other points. Also very cool to see the benchmarking info.

Incorporate Local

[jira] Commented: (LUCENE-1387) Add LocalLucene

2009-05-11 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708178#action_12708178
 ] 

Grant Ingersoll commented on LUCENE-1387:
-

FWIW, you might find the discussion on SOLR-773 interesting.

 Add LocalLucene
 ---

 Key: LUCENE-1387
 URL: https://issues.apache.org/jira/browse/LUCENE-1387
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spatial
Reporter: Grant Ingersoll
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 2.9

 Attachments: spatial-lucene.zip, spatial.tar.gz, spatial.zip


 Local Lucene (Geo-search) has been donated to the Lucene project, per 
 https://issues.apache.org/jira/browse/INCUBATOR-77.  This issue is to handle 
 the Lucene portion of integration.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (SOLR-1138) Query Elevation Component should gracefully handle empty queries

2009-05-04 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12705559#action_12705559
 ] 

Grant Ingersoll commented on SOLR-1138:
---

Committed revision 771268.

 Query Elevation Component should gracefully handle empty queries
 

 Key: SOLR-1138
 URL: https://issues.apache.org/jira/browse/SOLR-1138
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1138.patch


 From http://www.lucidimagination.com/search/document/3b50cd3506952f7 :
 {quote}
 In the QueryElevComponent (QEC) it currently throws an exception if  
 the input Query is null (line 329).Additionally, I've seen cases  
 where it's possible that the Query is not null (q is not set, but  
 q.alt is *:*), but the rb.getQueryString() is null, which causes an  
 NPE on line 300 or so.
 I'd like to suggest that if the Query is empty/null, the QEC should  
 just go on it's merry way as if there is nothing to do.  I don't think  
 a lack of query means that the QEC is improperly configured, as the  
 exception message implies:
   The QueryElevationComponent needs to be registered 'after' the query  
 component
 We should be making sure the QEC is properly registered during  
 initialization time.
 Thoughts?
 -Grant{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1138) Query Elevation Component should gracefully handle empty queries

2009-05-03 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1138:
--

Attachment: SOLR-1138.patch

Here's a patch that fixes this.  I plan on committing today.

 Query Elevation Component should gracefully handle empty queries
 

 Key: SOLR-1138
 URL: https://issues.apache.org/jira/browse/SOLR-1138
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-1138.patch


 From http://www.lucidimagination.com/search/document/3b50cd3506952f7 :
 {quote}
 In the QueryElevComponent (QEC) it currently throws an exception if  
 the input Query is null (line 329).Additionally, I've seen cases  
 where it's possible that the Query is not null (q is not set, but  
 q.alt is *:*), but the rb.getQueryString() is null, which causes an  
 NPE on line 300 or so.
 I'd like to suggest that if the Query is empty/null, the QEC should  
 just go on it's merry way as if there is nothing to do.  I don't think  
 a lack of query means that the QEC is improperly configured, as the  
 exception message implies:
   The QueryElevationComponent needs to be registered 'after' the query  
 component
 We should be making sure the QEC is properly registered during  
 initialization time.
 Thoughts?
 -Grant{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1138) Query Elevation Component should gracefully handle empty queries

2009-04-30 Thread Grant Ingersoll (JIRA)

Query Elevation Component should gracefully handle empty queries


 Key: SOLR-1138
 URL: https://issues.apache.org/jira/browse/SOLR-1138
 Project: Solr
  Issue Type: Bug
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor


From http://www.lucidimagination.com/search/document/3b50cd3506952f7 :
{quote}
In the QueryElevComponent (QEC) it currently throws an exception if  
the input Query is null (line 329).Additionally, I've seen cases  
where it's possible that the Query is not null (q is not set, but  
q.alt is *:*), but the rb.getQueryString() is null, which causes an  
NPE on line 300 or so.

I'd like to suggest that if the Query is empty/null, the QEC should  
just go on it's merry way as if there is nothing to do.  I don't think  
a lack of query means that the QEC is improperly configured, as the  
exception message implies:
The QueryElevationComponent needs to be registered 'after' the query  
component

We should be making sure the QEC is properly registered during  
initialization time.

Thoughts?

-Grant{quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1128) Solr Cell Extract Only should also return Metadata too

2009-04-24 Thread Grant Ingersoll (JIRA)

Solr Cell Extract Only should also return Metadata too
--

 Key: SOLR-1128
 URL: https://issues.apache.org/jira/browse/SOLR-1128
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


Just as the title says.  When using extract.only, we should also include the 
Metadata in the response

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1128) Solr Cell Extract Only should also return Metadata too

2009-04-24 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-1128.
---

Resolution: Fixed

Committed revision 768281.

 Solr Cell Extract Only should also return Metadata too
 --

 Key: SOLR-1128
 URL: https://issues.apache.org/jira/browse/SOLR-1128
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


 Just as the title says.  When using extract.only, we should also include the 
 Metadata in the response

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-20 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700782#action_12700782
]

Grant Ingersoll commented on SOLR-1099:
---

So, why not just fold all of this into the ARH? Seems like all of these
features work just as well as input parameters and there is no need for
deprecation, etc.

FieldAnalysisRequestHandler
---

Key: SOLR-1099
URL: https://issues.apache.org/jira/browse/SOLR-1099
Project: Solr
Issue Type: New Feature
Components: Analysis
Affects Versions: 1.3
Reporter: Uri Boness
Assignee: Shalin Shekhar Mangar
Fix For: 1.4

Attachments: AnalisysRequestHandler_refactored.patch,
analysis_request_handlers_incl_solrj.patch,
AnalysisRequestHandler_refactored1.patch,
FieldAnalysisRequestHandler_incl_test.patch

The FieldAnalysisRequestHandler provides the analysis functionality of the
web admin page as a service. This handler accepts a filetype/fieldname
parameter and a value and as a response returns a breakdown of the analysis
process. It is also possible to send a query value which will use the
configured query analyzer as well as a showmatch parameter which will then
mark every matched token as a match.
If this handler is added to the code base, I also recommend to rename the
current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have
them both inherit from one AnalysisRequestHandlerBase class which provides
the common functionality of the analysis breakdown and its translation to
named lists. This will also enhance the current AnalysisRequestHandler which
right now is fairly simplistic.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-19 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700622#action_12700622
]

Grant Ingersoll edited comment on SOLR-1099 at 4/19/09 6:02 PM:

Sorry for being a bit late...
Am I understanding that the main thing this does is allow you to specify one
input and get back analysis for each field you specify? Well, that and the
GET, right?

was (Author: gsingers):
Sorry for being a bit late...
Am I understand that the main thing this does is allow you to specify one input
and get back analysis for each field you specify?

FieldAnalysisRequestHandler
---

Attachments: AnalisysRequestHandler_refactored.patch,
AnalysisRequestHandler_refactored1.patch,
FieldAnalysisRequestHandler_incl_test.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1099) FieldAnalysisRequestHandler

2009-04-19 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700622#action_12700622
 ] 

Grant Ingersoll commented on SOLR-1099:
---

Sorry for being a bit late...  
Am I understand that the main thing this does is allow you to specify one input 
and get back analysis for each field you specify?

 FieldAnalysisRequestHandler
 ---

 Key: SOLR-1099
 URL: https://issues.apache.org/jira/browse/SOLR-1099
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Uri Boness
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: AnalisysRequestHandler_refactored.patch, 
 AnalysisRequestHandler_refactored1.patch, 
 FieldAnalysisRequestHandler_incl_test.patch


 The FieldAnalysisRequestHandler provides the analysis functionality of the 
 web admin page as a service. This handler accepts a filetype/fieldname 
 parameter and a value and as a response returns a breakdown of the analysis 
 process. It is also possible to send a query value which will use the 
 configured query analyzer as well as a showmatch parameter which will then 
 mark every matched token as a match.
 If this handler is added to the code base, I also recommend to rename the 
 current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
 them both inherit from one AnalysisRequestHandlerBase class which provides 
 the common functionality of the analysis breakdown and its translation to 
 named lists. This will also enhance the current AnalysisRequestHandler which 
 right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-04-19 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700628#action_12700628
]

Grant Ingersoll commented on SOLR-769:
--

Where can we download nni.jar from?

Seems like if you only need two classes it would be easy enough to replace them
with your own code.

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-769) Support Document and Search Result clustering

2009-04-19 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-769:
-

Comment: was deleted

(was: Where can we download nni.jar from?

Seems like if you only need two classes it would be easy enough to replace them
with your own code.)

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-769) Support Document and Search Result clustering

2009-04-19 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-769:
-

Attachment: SOLR-769.tar
SOLR-769.patch

OK, I think this is ready to go, except I still need to double check how it
works with release. Since we can't distribute LGPL, this is going to have to
be a source only release artifact and thus can never be in the WAR,
unfortunately.

The tarball contains the JAR files that one needs, with the exception of the
LGPL deps which are downloaded from the approp. places.

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-04-16 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699908#action_12699908
]

Grant Ingersoll commented on SOLR-769:
--

Looks like we need to make the NNI JAR be a download, too, right? It appears
to be LGPL. Where does that library come from, anyway? I don't see it on
Carrot trunk, but it is in the zip. And a search for it doesn't reveal much.

-Grant

Support Document and Search Result clustering
-

Key: SOLR-769
URL: https://issues.apache.org/jira/browse/SOLR-769
Project: Solr
Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Fix For: 1.4

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (LUCENE-1588) Update Spatial Lucene sort to use FieldComparatorSource

2009-04-15 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-1588.
-

Resolution: Fixed

This was committed.

 Update Spatial Lucene sort to use FieldComparatorSource
 ---

 Key: LUCENE-1588
 URL: https://issues.apache.org/jira/browse/LUCENE-1588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/spatial
Affects Versions: 2.9
Reporter: patrick o'leary
Assignee: patrick o'leary
Priority: Trivial
 Fix For: 2.9

 Attachments: LUCENE-1588.patch


 Update distance sorting to use FieldComparator sorting as opposed to 
 SortComparator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (SOLR-773) Incorporate Local Lucene/Solr

2009-04-15 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-773:


Assignee: Grant Ingersoll

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-04-15 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699329#action_12699329
 ] 

Grant Ingersoll commented on SOLR-773:
--

We should be able to incorporate the GeoHash stuff in Lucene now, right?  I'm 
not spatial expert, but this means we could have an update processor that only 
uses one field, right?

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-04-15 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699364#action_12699364
 ] 

Grant Ingersoll commented on SOLR-773:
--

OK, so color me a total geo newbie, but...

So, if I index the spatial.xml in the patch I just submitted and I execute:
{code}
http://localhost:8983/solr/select?q=name:five
{code}

I get one result, which is expected.

If I then do a geo search:
{code}
http://localhost:8983/solr/select?q=name:fiveqt=geolong=-74.0093994140625lat=40.75141843299745radius=100debugQuery=true
{code}

I get two results.   The second result is the other theater in the spatial.xml 
file.  Yet, it does not contain the value five in the name field even though 
it meets the spatial search criteria.

Shouldn't there just be one result?  What am I not understanding?

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
 spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-04-15 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699403#action_12699403
 ] 

Grant Ingersoll commented on SOLR-773:
--

OK, I think I understand why it does this, but it seems a little odd to me.  
The reason is due to the fact that the geo handler uses the geo QParser, which 
ignores the query parameter and produces a query solely based on the lat/lon 
information.  

Like I said, I'm a newbie to geo search, but it seems like the QParser should 
delegate the parsing of the q param to some other parser and then it would only 
do distance calculations on the docset returned from the QueryComponent.  Of 
course, I guess one could ask what the semantics are of combining a text query 
with a spatial query, but I would suppose we could combine them with either AND 
or OR, right, such that if I OR'd them together, I would get all docs matching 
the query term OR'd with all docs in the bounding box.  Similarily, AND would 
yield all docs with the term in the bounding box, right?

Again, I am likely missing something, so bear with me.

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
 spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

2009-04-13 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12698429#action_12698429
 ] 

Grant Ingersoll commented on SOLR-773:
--

This latest patch doesn't compile b/c it is missing the SpatialParams class.

 Incorporate Local Lucene/Solr
 -

 Key: SOLR-773
 URL: https://issues.apache.org/jira/browse/SOLR-773
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
 SOLR-773-local-lucene.patch, spatial-solr.tar.gz


 Local Lucene has been donated to the Lucene project.  It has some Solr 
 components, but we should evaluate how best to incorporate it into Solr.
 See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-804) include lucene misc jar in solr distro

2009-04-13 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-804.
--

Resolution: Fixed

Committed revision 764580.

Added lucene-misc-2.9-dev.jar from rev 764281 which should match the Lucene 
version on trunk.

 include lucene misc jar in solr distro
 --

 Key: SOLR-804
 URL: https://issues.apache.org/jira/browse/SOLR-804
 Project: Solr
  Issue Type: Wish
Affects Versions: 1.3
 Environment: all
Reporter: solrize
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


 It would be useful to have the lucene misc jar file included with solr.  My 
 immediate goal is to build several solr indexes in parallel on separate 
 servers, then run the index merge utility at the end to combine them into a 
 single index.  Erik H suggested I post an issue requesting including the misc 
 jar with solr.  Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-804) include lucene misc jar in solr distro

2009-04-13 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-804:
-

Fix Version/s: (was: 1.5)
   1.4

 include lucene misc jar in solr distro
 --

 Key: SOLR-804
 URL: https://issues.apache.org/jira/browse/SOLR-804
 Project: Solr
  Issue Type: Wish
Affects Versions: 1.3
 Environment: all
Reporter: solrize
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4


 It would be useful to have the lucene misc jar file included with solr.  My 
 immediate goal is to build several solr indexes in parallel on separate 
 servers, then run the index merge utility at the end to combine them into a 
 single index.  Erik H suggested I post an issue requesting including the misc 
 jar with solr.  Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (LUCENE-1567) New flexible query parser

2009-04-10 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll reassigned LUCENE-1567:
---

Assignee: Grant Ingersoll (was: Michael Busch)

New flexible query parser
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:

< 11 12 13 14 15 16 17 18 19 20 >

1501 - 1600 of 2812 matches

Mail list logo