[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: SOLR-342.patch Update of patch to account for the fact that mergeFactor is only for Log based merges. I left it as the mergeFactor tag, but put in an instanceof clause in the init method of the SolrIndexWriter to check to see if the mergeFactor is settable. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-330) Use new Lucene Token APIs (reuse and char[] buff)
[ https://issues.apache.org/jira/browse/SOLR-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566610#action_12566610 ] Grant Ingersoll commented on SOLR-330: -- Note, this patch also includes SOLR-468 Use new Lucene Token APIs (reuse and char[] buff) - Key: SOLR-330 URL: https://issues.apache.org/jira/browse/SOLR-330 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-330.patch Lucene is getting new Token APIs for better performance. - token reuse - char[] offset + len instead of String Requires a new version of lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-471) Distributed Solr Client
[ https://issues.apache.org/jira/browse/SOLR-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1256#action_1256 ] Nguyen Kien Trung commented on SOLR-471: Thanks Yonik. Actually I did have a glance at SOLR-303 As I'm doing a Java project which requires interaction with multiple customized-solr instances and it happened to me that the requirement was not meet with the solution which SOLR-303 offers, so I made the workaround with the thought that the patch may be helpful to those who are having same situation like me. I'm quite new to solr but very excited with the promising features that solr is going to achieve Distributed Solr Client --- Key: SOLR-471 URL: https://issues.apache.org/jira/browse/SOLR-471 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Reporter: Nguyen Kien Trung Priority: Minor Attachments: distributedclient.patch Inspired by memcached java clients. The ability to update/search/delete among many solr instances Client parametters: - List of solr servers - Number of replicas Client functions: - Update: using consistent hashing to determine what documents are going to be stored in what server. Get the list of servers (equal to number of replicas) and issue parallel UPDATE - Search: parallel search all servers, aggregate distinct results - Delete: parallel delete in all servers -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-469) DB Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566599#action_12566599 ] Noble Paul commented on SOLR-469: - We are planning to eliminate the schema creation step. So we may not need to put in those details which are already present in schema.xml and we can simplify the data-config and eliminate the copyField also. So we must introduce a verifier which ensures that the data-config is in sync with the schema.xml. DB Import RequestHandler Key: SOLR-469 URL: https://issues.apache.org/jira/browse/SOLR-469 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Noble Paul Priority: Minor Fix For: 1.3 Attachments: SOLR-469.patch We need a RequestHandler Which can import data from a DB or other dataSources into the Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103). The way it works is as follows. * Provide a configuration file (xml) to the Handler which takes in the necessary SQL queries and mappings to a solr schema - It also takes in a properties file for the data source configuraution * Given the configuration it can also generate the solr schema.xml * It is registered as a RequestHandler which can take two commands do-full-import, do-delta-import - do-full-import - dumps all the data from the Database into the index (based on the SQL query in configuration) - do-delta-import - dumps all the data that has changed since last import. (We assume a modified-timestamp column in tables) * It provides a admin page - where we can schedule it to be run automatically at regular intervals - It shows the status of the Handler (idle, full-import, delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566643#action_12566643 ] Grant Ingersoll commented on SOLR-342: -- I did some benchmarking of the autocommit functionality in Lucene (as opposed to in Solr, which is different). Currently, in Lucene autocommit is true by default, meaning that every time there is a flush, it is also committed. Solr adds its own layer on top of this with its commit semantics. There is a noticeable difference in memory used and speed in Lucene performance between autocommit = false and autocommit = true. Some rough numbers using the autocommit.alg in Lucene's benchmark contrib (from trunk): Operation round ac ram runCnt recsPerRunrec/s elapsedSec avgUsedMemavgTotalMem [java] MAddDocs_20 0rue2.001 20400.1 499.9061,322,608 68,780,032 [java] MAddDocs_20 - 1lse2.00 - - 1 - - 20 - - 499.9 - - 400.08 - 49,373,632 - 75,018,240 [java] MAddDocs_20 2rue2.001 20383.7 521.2770,716,096 75,018,240 [java] MAddDocs_20 - 3lse2.00 - - 1 - - 20 - - 552.7 - - 361.89 - 68,069,464 - 75,018,240 The first row has autocommit = true, second is false, and then alternating. The key value is the rec/s, which is: 1. ac = true 400.1 2. ac = false 499.9 3. ac = true 383.7 4. ac = false 552.7 Notice also the diff in avgUsedMem. Adding this functionality may, perhaps, be more important to Solr's performance than the flush by RAM capability. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566864#action_12566864 ] oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:15 PM: -- Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields? Thanks in advance for any help you can provide! was (Author: oleg_gnatovskiy): Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: code lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst /code Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields? Thanks in advance for any help you can provide! Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566864#action_12566864 ] oleg_gnatovskiy edited comment on SOLR-236 at 2/7/08 4:18 PM: -- Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some fields I get an empty collapse_counts list. It could be that those fields have a large number of different values that it collapses on. Is there a limit to the number of values that collaose_counts displays? Thanks in advance for any help you can provide! was (Author: oleg_gnatovskiy): Hello everyone. I am planning to implement chain collapsing on a high traffic production environment, so I'd like to use a stable version of Solr. It doesn't seem like you have a chain collapse patch for Solr 1.2, so I tried the Solr 1.1 patch. It seems to work fine at collapsing, but how do I get a countt for the documents other then the one being displayed? As a result I see: lst name=collapse_counts int name=Restaurant2414/int int name=Bar/Club9/int int name=Directory Services37/int /lst Does that mean that there are 2414 more Restaurants, 9 more Bars and 37 more Directory Services? If so, then that's great. However when I collapse on some integer fields I get an empty list for collapse_counts. Do counts only work for text fields? Thanks in advance for any help you can provide! Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Attachments: field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566865#action_12566865 ] Fuad Efendi commented on SOLR-127: -- This is an alternative to initially proposed HTTP-caching, and it is extremely easy to implement: Simply add request parameter http.header=If-Modified-Since: Tue, 05 Feb 2008 03:50:00 GMT (better is to use other names, do not use http.header parameter; see below...) Let SOLR to respond via standard XML message Not Modified, and avoid using 304 response code What do you think? We can even encapsulate MAX-AGE, EXPIRES, and other useful stuff (like as additional UPDATE-FREQUENCY: 30 days) into XML, and all those staff can depend on internal Lucene statistics (and not on hard-coded values in SOLR-CONFIG). We should not use HTTP-Protocol response headers such as 304/400/500 to describe SOLR's external API. Sample: Apache HTTPD front-end, Tomcat (Struts-based middleware), and SOLR (backend). With your initial proposal different users will get different data. Why? Multithreading at Apache HTTPD. At least, there are some possible fluctuations, cache is not shared in some configurations, etc. Each thread may get own copy of last-modified, and different users will see different data. It won't work for most business cases. Without HTTP: is modified? when is next update of BOOKS category? - all caches around the world have the same timestamp for BOOKS category ... ... ... Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches
[ https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566869#action_12566869 ] Fuad Efendi commented on SOLR-127: -- Of course ETag etc. will synchronize caches; but anyway why do we need such features of HTTP specs? HTTP Caching is widely used to cache responces from HTTP Servers, content (HTML, PDF, JPG, EXE) can be cached at coprorate proxy, and locally in Internet Explorer's internal cache. That is the main idea. *Are SOLR-XML responses roving the world and reaching internal cache of Mozilla Firefox, or corporate caching proxies?* -Not. Clients of SOLR: Middleware. Do they need to act as caching-proxy? May be Just another use case: middleware publishes current time weather together with response from SOLR; middleware wants to cache responses from SOLR and do not rely on requests coming from end users because of frequent weather changes ;) - it depends on implementation of such middleware, for sure, it will try to cache SolrDocument objects instead of pure XML, and such kind of caching is not HTTP-related. Make Solr more friendly to external HTTP caches --- Key: SOLR-127 URL: https://issues.apache.org/jira/browse/SOLR-127 Project: Solr Issue Type: Wish Reporter: Hoss Man Assignee: Hoss Man Fix For: 1.3 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch an offhand comment I saw recently reminded me of something that really bugged me about the serach solution i used *before* Solr -- it didn't play nicely with HTTP caches that might be sitting in front of it. at the moment, Solr doesn't put in particularly usefull info in the HTTP Response headers to aid in caching (ie: Last-Modified), responds to all HEAD requests with a 400, and doesn't do anything special with If-Modified-Since. t the very least, we can set a Last-Modified based on when the current IndexReder was open (if not the Date on the IndexReader) and use the same info to determing how to respond to If-Modified-Since requests. (for the record, i think the reason this hasn't occured to me in the 2+ years i've been using Solr, is because with the internal caching, i've yet to need to put a proxy cache in front of Solr) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
/example/solr/bin is empty in trunk
Is it correct?.. I want to try distribution/replication in v.2.3