[jira] Updated: (SOLR-1116) Add a Binary FieldType

2009-05-24 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1116:
-

Attachment: SOLR-1116.patch

The text format is standard base64 encoding

 Add a Binary FieldType
 --

 Key: SOLR-1116
 URL: https://issues.apache.org/jira/browse/SOLR-1116
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1116.patch, SOLR-1116.patch, SOLR-1116.patch


 Lucene supports binary data for field but Solr has no corresponding field 
 type. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Solr-trunk #811

2009-05-24 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/811/changes




[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-05-24 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712534#action_12712534
 ] 

Stanislaw Osinski commented on SOLR-769:


In fact, you can set Carrot2 attributes (both init- and request-time) in the 
solr config file, this should work also without the patch. Just add:

{{str name=Tokenizer.analyzerfully.qualified.class.Name/str}}

to the search component element. See 
http://wiki.apache.org/solr/ClusteringComponent for some example. You'll find 
list of Carrot2 attributes, their ids and description at: 
http://download.carrot2.org/stable/manual/#chapter.components.

 Support Document and Search Result clustering
 -

 Key: SOLR-769
 URL: https://issues.apache.org/jira/browse/SOLR-769
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
 clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip


 Clustering is a useful tool for working with documents and search results, 
 similar to the notion of dynamic faceting.  Carrot2 
 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
 search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
 suited for whole-corpus clustering.  
 The patch I lays out a contrib module that starts off w/ an integration of a 
 SearchComponent for doing clustering and an implementation using Carrot.  In 
 search results mode, it will use the DocList as the input for the cluster.   
 While Carrot2 comes w/ a Solr input component, it is not the same as the 
 SearchComponent that I have in that the Carrot example actually submits a 
 query to Solr, whereas my SearchComponent is just chained into the Component 
 list and uses the ResponseBuilder to add in the cluster results.
 While not fully fleshed out yet, the collection based mode will take in a 
 list of ids or just use the whole collection and will produce clusters.  
 Since this is a longer, typically offline task, there will need to be some 
 type of storage mechanism (and replication??) for the clusters.  I _may_ 
 push this off to a separate JIRA issue, but I at least want to present the 
 use case as part of the design of this component/contrib.  It may even make 
 sense that we split this out, such that the building piece is something like 
 an UpdateProcessor and then the SearchComponent just acts as a lookup 
 mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-05-24 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712544#action_12712544
 ] 

Koji Sekiguchi commented on SOLR-769:
-

{quote}
In fact, you can set Carrot2 attributes (both init- and request-time) in the 
solr config file, this should work also without the patch. Just add:

str name=Tokenizer.analyzerfully.qualified.class.Name/str
{quote}

Hmm, I thought I need to assign Class? type (other than String) for the 
second argument of the attribute. I'll try it.

 Support Document and Search Result clustering
 -

 Key: SOLR-769
 URL: https://issues.apache.org/jira/browse/SOLR-769
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
 clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip


 Clustering is a useful tool for working with documents and search results, 
 similar to the notion of dynamic faceting.  Carrot2 
 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
 search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
 suited for whole-corpus clustering.  
 The patch I lays out a contrib module that starts off w/ an integration of a 
 SearchComponent for doing clustering and an implementation using Carrot.  In 
 search results mode, it will use the DocList as the input for the cluster.   
 While Carrot2 comes w/ a Solr input component, it is not the same as the 
 SearchComponent that I have in that the Carrot example actually submits a 
 query to Solr, whereas my SearchComponent is just chained into the Component 
 list and uses the ResponseBuilder to add in the cluster results.
 While not fully fleshed out yet, the collection based mode will take in a 
 list of ids or just use the whole collection and will produce clusters.  
 Since this is a longer, typically offline task, there will need to be some 
 type of storage mechanism (and replication??) for the clusters.  I _may_ 
 push this off to a separate JIRA issue, but I at least want to present the 
 use case as part of the design of this component/contrib.  It may even make 
 sense that we split this out, such that the building piece is something like 
 an UpdateProcessor and then the SearchComponent just acts as a lookup 
 mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-05-24 Thread Stanislaw Osinski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712545#action_12712545
 ] 

Stanislaw Osinski commented on SOLR-769:


Ah, I should have mentioned that up front -- Carrot2 will try to convert the 
string into the type accepted by the attribute. In case of the class-types 
attributes, it will try to load the class using the current thread's context 
classloader. Conversions are also available for numeric, boolean and enum 
attributes (see: 
http://download.carrot2.org/head/javadoc/org/carrot2/util/attribute/AttributeBinder.AttributeTransformerFromString.html).
 Please let me know if that way works for you.

 Support Document and Search Result clustering
 -

 Key: SOLR-769
 URL: https://issues.apache.org/jira/browse/SOLR-769
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
 clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip


 Clustering is a useful tool for working with documents and search results, 
 similar to the notion of dynamic faceting.  Carrot2 
 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
 search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
 suited for whole-corpus clustering.  
 The patch I lays out a contrib module that starts off w/ an integration of a 
 SearchComponent for doing clustering and an implementation using Carrot.  In 
 search results mode, it will use the DocList as the input for the cluster.   
 While Carrot2 comes w/ a Solr input component, it is not the same as the 
 SearchComponent that I have in that the Carrot example actually submits a 
 query to Solr, whereas my SearchComponent is just chained into the Component 
 list and uses the ResponseBuilder to add in the cluster results.
 While not fully fleshed out yet, the collection based mode will take in a 
 list of ids or just use the whole collection and will produce clusters.  
 Since this is a longer, typically offline task, there will need to be some 
 type of storage mechanism (and replication??) for the clusters.  I _may_ 
 push this off to a separate JIRA issue, but I at least want to present the 
 use case as part of the design of this component/contrib.  It may even make 
 sense that we split this out, such that the building piece is something like 
 an UpdateProcessor and then the SearchComponent just acts as a lookup 
 mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-769) Support Document and Search Result clustering

2009-05-24 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712557#action_12712557
 ] 

Koji Sekiguchi commented on SOLR-769:
-

{code}
str name=Tokenizer.analyzerfully.qualified.class.Name/str
{code}

This works as expected w/o my patch. Thank you, Stanislaw!


 Support Document and Search Result clustering
 -

 Key: SOLR-769
 URL: https://issues.apache.org/jira/browse/SOLR-769
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.4

 Attachments: clustering-componet-shard.patch, clustering-libs.tar, 
 clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, 
 SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip


 Clustering is a useful tool for working with documents and search results, 
 similar to the notion of dynamic faceting.  Carrot2 
 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
 search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
 suited for whole-corpus clustering.  
 The patch I lays out a contrib module that starts off w/ an integration of a 
 SearchComponent for doing clustering and an implementation using Carrot.  In 
 search results mode, it will use the DocList as the input for the cluster.   
 While Carrot2 comes w/ a Solr input component, it is not the same as the 
 SearchComponent that I have in that the Carrot example actually submits a 
 query to Solr, whereas my SearchComponent is just chained into the Component 
 list and uses the ResponseBuilder to add in the cluster results.
 While not fully fleshed out yet, the collection based mode will take in a 
 list of ids or just use the whole collection and will produce clusters.  
 Since this is a longer, typically offline task, there will need to be some 
 type of storage mechanism (and replication??) for the clusters.  I _may_ 
 push this off to a separate JIRA issue, but I at least want to present the 
 use case as part of the design of this component/contrib.  It may even make 
 sense that we split this out, such that the building piece is something like 
 an UpdateProcessor and then the SearchComponent just acts as a lookup 
 mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-785) Distributed SpellCheckComponent

2009-05-24 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712562#action_12712562
 ] 

Shalin Shekhar Mangar commented on SOLR-785:


Matthew, thanks for the patch. Can you please include a unit test?

Also, I'm thinking that we could refactor the Lucene spell checker to fetch the 
suggestions without the edit distance and find out the top 'n' suggestions 
after performing an edit distance on the aggregator. What do you think?

 Distributed SpellCheckComponent
 ---

 Key: SOLR-785
 URL: https://issues.apache.org/jira/browse/SOLR-785
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Attachments: spelling-shard.patch


 Enhance the SpellCheckComponent to run in a distributed (sharded) environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1116) Add a Binary FieldType

2009-05-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712575#action_12712575
 ] 

Noble Paul commented on SOLR-1116:
--

I plan to commit this in a day or two . Please let me know if there is any 
feedback

 Add a Binary FieldType
 --

 Key: SOLR-1116
 URL: https://issues.apache.org/jira/browse/SOLR-1116
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.4

 Attachments: SOLR-1116.patch, SOLR-1116.patch, SOLR-1116.patch


 Lucene supports binary data for field but Solr has no corresponding field 
 type. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-914) Presence of finalize() in the codebase

2009-05-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712576#action_12712576
 ] 

Noble Paul commented on SOLR-914:
-

what do we plan to do with this? 

 Presence of finalize() in the codebase 
 ---

 Key: SOLR-914
 URL: https://issues.apache.org/jira/browse/SOLR-914
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
 Environment: Tomcat 6, JRE 6
Reporter: Kay Kay
Priority: Minor
 Fix For: 1.4

   Original Estimate: 480h
  Remaining Estimate: 480h

 There seems to be a number of classes - that implement finalize() method.  
 Given that it is perfectly ok for a Java VM to not to call it - may be - 
 there has to some other way  { try .. finally - when they are created to 
 destroy them } to destroy them and the presence of finalize() method , ( 
 depending on implementation ) might not serve what we want and in some cases 
 can end up delaying the gc process, depending on the algorithms. 
 $ find . -name *.java | xargs grep finalize
 ./contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/JdbcDataSource.java:
   protected void finalize() {
 ./src/java/org/apache/solr/update/SolrIndexWriter.java:  protected void 
 finalize() {
 ./src/java/org/apache/solr/core/CoreContainer.java:  protected void 
 finalize() {
 ./src/java/org/apache/solr/core/SolrCore.java:  protected void finalize() {
 ./src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:  protected 
 void finalize() throws Throwable {
 May be we need to revisit these occurences from a design perspective to see 
 if they are necessary / if there is an alternate way of managing guaranteed 
 destruction of resources. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr nightly build failure

2009-05-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
The same test fails in my box consistently

On Sat, May 23, 2009 at 4:58 PM, Mark Miller markrmil...@gmail.com wrote:
   [junit] Running
 org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
   [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 20.341 sec





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


[jira] Updated: (SOLR-1183) Example script not update for new analysis path from SOLR-1099

2009-05-24 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1183:


Attachment: SOLR-1183.patch

 Example script not update for new analysis path from SOLR-1099
 --

 Key: SOLR-1183
 URL: https://issues.apache.org/jira/browse/SOLR-1183
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1183.patch


 The example script example/exampleAnalysis/post.sh attempts to post to the 
 path http://localhost:8983/solr/analysis
  however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
 default as of r767412
 A simple fix is to change to http://localhost:8983/solr/analysis/document

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1183) Example script not updated for new analysis path from SOLR-1099

2009-05-24 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1183:


Description: 

The example script example/exampleAnalysis/post.sh attempts to post to the path 
http://localhost:8983/solr/analysis
 however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
default as of r767412

A simple fix is to change to http://localhost:8983/solr/analysis/document

  was:


The example script example/exampleAnalysis/post.sh attempts to post to the path 
http://localhost:8983/solr/analysis
 however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
default as of r767412

A simple fix is to change to http://localhost:8983/solr/analysis/document

Summary: Example script not updated for new analysis path from 
SOLR-1099  (was: Example script not update for new analysis path from SOLR-1099)

 Example script not updated for new analysis path from SOLR-1099
 ---

 Key: SOLR-1183
 URL: https://issues.apache.org/jira/browse/SOLR-1183
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1183.patch


 The example script example/exampleAnalysis/post.sh attempts to post to the 
 path http://localhost:8983/solr/analysis
  however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
 default as of r767412
 A simple fix is to change to http://localhost:8983/solr/analysis/document

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.