[jira] Updated: (SOLR-1116) Add a Binary FieldType
[ https://issues.apache.org/jira/browse/SOLR-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1116: - Attachment: SOLR-1116.patch The text format is standard base64 encoding Add a Binary FieldType -- Key: SOLR-1116 URL: https://issues.apache.org/jira/browse/SOLR-1116 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1116.patch, SOLR-1116.patch, SOLR-1116.patch Lucene supports binary data for field but Solr has no corresponding field type. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Solr-trunk #811
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/811/changes
[jira] Commented: (SOLR-769) Support Document and Search Result clustering
[ https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712534#action_12712534 ] Stanislaw Osinski commented on SOLR-769: In fact, you can set Carrot2 attributes (both init- and request-time) in the solr config file, this should work also without the patch. Just add: {{str name=Tokenizer.analyzerfully.qualified.class.Name/str}} to the search component element. See http://wiki.apache.org/solr/ClusteringComponent for some example. You'll find list of Carrot2 attributes, their ids and description at: http://download.carrot2.org/stable/manual/#chapter.components. Support Document and Search Result clustering - Key: SOLR-769 URL: https://issues.apache.org/jira/browse/SOLR-769 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: clustering-componet-shard.patch, clustering-libs.tar, clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering. The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results. While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??) for the clusters. I _may_ push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-769) Support Document and Search Result clustering
[ https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712544#action_12712544 ] Koji Sekiguchi commented on SOLR-769: - {quote} In fact, you can set Carrot2 attributes (both init- and request-time) in the solr config file, this should work also without the patch. Just add: str name=Tokenizer.analyzerfully.qualified.class.Name/str {quote} Hmm, I thought I need to assign Class? type (other than String) for the second argument of the attribute. I'll try it. Support Document and Search Result clustering - Key: SOLR-769 URL: https://issues.apache.org/jira/browse/SOLR-769 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: clustering-componet-shard.patch, clustering-libs.tar, clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering. The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results. While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??) for the clusters. I _may_ push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-769) Support Document and Search Result clustering
[ https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712545#action_12712545 ] Stanislaw Osinski commented on SOLR-769: Ah, I should have mentioned that up front -- Carrot2 will try to convert the string into the type accepted by the attribute. In case of the class-types attributes, it will try to load the class using the current thread's context classloader. Conversions are also available for numeric, boolean and enum attributes (see: http://download.carrot2.org/head/javadoc/org/carrot2/util/attribute/AttributeBinder.AttributeTransformerFromString.html). Please let me know if that way works for you. Support Document and Search Result clustering - Key: SOLR-769 URL: https://issues.apache.org/jira/browse/SOLR-769 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: clustering-componet-shard.patch, clustering-libs.tar, clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering. The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results. While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??) for the clusters. I _may_ push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-769) Support Document and Search Result clustering
[ https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712557#action_12712557 ] Koji Sekiguchi commented on SOLR-769: - {code} str name=Tokenizer.analyzerfully.qualified.class.Name/str {code} This works as expected w/o my patch. Thank you, Stanislaw! Support Document and Search Result clustering - Key: SOLR-769 URL: https://issues.apache.org/jira/browse/SOLR-769 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: clustering-componet-shard.patch, clustering-libs.tar, clustering-libs.tar, SOLR-769-analyzerClass.patch, SOLR-769-lib.zip, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.tar, SOLR-769.zip Clustering is a useful tool for working with documents and search results, similar to the notion of dynamic faceting. Carrot2 (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing search results clustering. Mahout (http://lucene.apache.org/mahout) is well suited for whole-corpus clustering. The patch I lays out a contrib module that starts off w/ an integration of a SearchComponent for doing clustering and an implementation using Carrot. In search results mode, it will use the DocList as the input for the cluster. While Carrot2 comes w/ a Solr input component, it is not the same as the SearchComponent that I have in that the Carrot example actually submits a query to Solr, whereas my SearchComponent is just chained into the Component list and uses the ResponseBuilder to add in the cluster results. While not fully fleshed out yet, the collection based mode will take in a list of ids or just use the whole collection and will produce clusters. Since this is a longer, typically offline task, there will need to be some type of storage mechanism (and replication??) for the clusters. I _may_ push this off to a separate JIRA issue, but I at least want to present the use case as part of the design of this component/contrib. It may even make sense that we split this out, such that the building piece is something like an UpdateProcessor and then the SearchComponent just acts as a lookup mechanism. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-785) Distributed SpellCheckComponent
[ https://issues.apache.org/jira/browse/SOLR-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712562#action_12712562 ] Shalin Shekhar Mangar commented on SOLR-785: Matthew, thanks for the patch. Can you please include a unit test? Also, I'm thinking that we could refactor the Lucene spell checker to fetch the suggestions without the edit distance and find out the top 'n' suggestions after performing an edit distance on the aggregator. What do you think? Distributed SpellCheckComponent --- Key: SOLR-785 URL: https://issues.apache.org/jira/browse/SOLR-785 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Attachments: spelling-shard.patch Enhance the SpellCheckComponent to run in a distributed (sharded) environment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1116) Add a Binary FieldType
[ https://issues.apache.org/jira/browse/SOLR-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712575#action_12712575 ] Noble Paul commented on SOLR-1116: -- I plan to commit this in a day or two . Please let me know if there is any feedback Add a Binary FieldType -- Key: SOLR-1116 URL: https://issues.apache.org/jira/browse/SOLR-1116 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Assignee: Noble Paul Fix For: 1.4 Attachments: SOLR-1116.patch, SOLR-1116.patch, SOLR-1116.patch Lucene supports binary data for field but Solr has no corresponding field type. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-914) Presence of finalize() in the codebase
[ https://issues.apache.org/jira/browse/SOLR-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712576#action_12712576 ] Noble Paul commented on SOLR-914: - what do we plan to do with this? Presence of finalize() in the codebase --- Key: SOLR-914 URL: https://issues.apache.org/jira/browse/SOLR-914 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Environment: Tomcat 6, JRE 6 Reporter: Kay Kay Priority: Minor Fix For: 1.4 Original Estimate: 480h Remaining Estimate: 480h There seems to be a number of classes - that implement finalize() method. Given that it is perfectly ok for a Java VM to not to call it - may be - there has to some other way { try .. finally - when they are created to destroy them } to destroy them and the presence of finalize() method , ( depending on implementation ) might not serve what we want and in some cases can end up delaying the gc process, depending on the algorithms. $ find . -name *.java | xargs grep finalize ./contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/JdbcDataSource.java: protected void finalize() { ./src/java/org/apache/solr/update/SolrIndexWriter.java: protected void finalize() { ./src/java/org/apache/solr/core/CoreContainer.java: protected void finalize() { ./src/java/org/apache/solr/core/SolrCore.java: protected void finalize() { ./src/common/org/apache/solr/common/util/ConcurrentLRUCache.java: protected void finalize() throws Throwable { May be we need to revisit these occurences from a design perspective to see if they are necessary / if there is an alternate way of managing guaranteed destruction of resources. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr nightly build failure
The same test fails in my box consistently On Sat, May 23, 2009 at 4:58 PM, Mark Miller markrmil...@gmail.com wrote: [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 20.341 sec -- - Noble Paul | Principal Engineer| AOL | http://aol.com
[jira] Updated: (SOLR-1183) Example script not update for new analysis path from SOLR-1099
[ https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1183: Attachment: SOLR-1183.patch Example script not update for new analysis path from SOLR-1099 -- Key: SOLR-1183 URL: https://issues.apache.org/jira/browse/SOLR-1183 Project: Solr Issue Type: Bug Components: Analysis Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1183.patch The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1183) Example script not updated for new analysis path from SOLR-1099
[ https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1183: Description: The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document was: The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document Summary: Example script not updated for new analysis path from SOLR-1099 (was: Example script not update for new analysis path from SOLR-1099) Example script not updated for new analysis path from SOLR-1099 --- Key: SOLR-1183 URL: https://issues.apache.org/jira/browse/SOLR-1183 Project: Solr Issue Type: Bug Components: Analysis Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1183.patch The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.