[jira] Resolved: (SOLR-750) DateField.parseMath doesn't handle non-existent Z
[ https://issues.apache.org/jira/browse/SOLR-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-750. --- Resolution: Invalid The exception is correct -- that is an invalid date string (as far as being input to parseMath, toInternal, or DateField.getAnalyzer().tokenStream is concerned) The SOLR-540 patch is doing something it shouldn't be (which seems likely since it makes absolutely no sense to try and highlight a DateField) *and/or* the Highlighter has a bug (why is getBestTextFragments passing an indexed token to an Analzyer?) Either way: parseMath is doing the right thing. DateField.parseMath doesn't handle non-existent Z - Key: SOLR-750 URL: https://issues.apache.org/jira/browse/SOLR-750 Project: Solr Issue Type: Bug Affects Versions: 1.3 Reporter: David Smiley Priority: Minor Attachments: SOLR-750_DateField_no_Z.patch Original Estimate: 0.25h Remaining Estimate: 0.25h I've run into situations when trying to use SOLR-540 (wildcard highlight spec) such that if attempts to highlight a date field, I get a stack trace from DateField.parseMath puking because there isn't a Z at the end of an otherwise good date-time string. It was very easy to fix the code to make it react gracefully to no Z. Attached is the patch. This bug isn't really related to SOLR-540 so please apply it without waiting for 540. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-749) QParser and ValueSourceParser init bug
[ https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627984#action_12627984 ] Grant Ingersoll commented on SOLR-749: -- I think we need a test case for this. QParser and ValueSourceParser init bug -- Key: SOLR-749 URL: https://issues.apache.org/jira/browse/SOLR-749 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Hoss Man Fix For: 1.3 Attachments: SOLR-749.patch As noticed by Maximilian Hütter in this email thread... http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169 ...when a person tries to register a QParser (or ValueSourceParser) with the same name as a standard implementation it gets blown away by the initialization code for the standard impls. we need to allow people to override these standard names the same way they can with responseWriters, etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-540) Add support for hl.fl=*
[ https://issues.apache.org/jira/browse/SOLR-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627985#action_12627985 ] David Smiley commented on SOLR-540: --- Hey SOLR-540 people, please see the comment thread on SOLR-750 which is apparently a bug with this patch. Add support for hl.fl=* --- Key: SOLR-540 URL: https://issues.apache.org/jira/browse/SOLR-540 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch Adds support for the star value for the hl.fl parameter, i.e. highlighting will be done on all fields (static and dynamic). Particularly useful in conjunction with hl.requireFieldMatch=true, this way one can specify generic highlighting parameters independent of the query/searched fields. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-751) WordDelimiterFilter doesn't adjust startOffset
[ https://issues.apache.org/jira/browse/SOLR-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Oestreicher updated SOLR-751: Attachment: SOLR-751.patch WordDelimiterFilter doesn't adjust startOffset -- Key: SOLR-751 URL: https://issues.apache.org/jira/browse/SOLR-751 Project: Solr Issue Type: Bug Affects Versions: 1.3, 1.4 Reporter: Stefan Oestreicher Attachments: SOLR-751.patch If the first character of a token gets stripped the startOffset of that token is not adjusted. With the last character it behaves as expected. I'll attach a patch for the TestWordDelimiterFilter testcase which reproduces that issue shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-751) WordDelimiterFilter doesn't adjust startOffset
WordDelimiterFilter doesn't adjust startOffset -- Key: SOLR-751 URL: https://issues.apache.org/jira/browse/SOLR-751 Project: Solr Issue Type: Bug Affects Versions: 1.3, 1.4 Reporter: Stefan Oestreicher Attachments: SOLR-751.patch If the first character of a token gets stripped the startOffset of that token is not adjusted. With the last character it behaves as expected. I'll attach a patch for the TestWordDelimiterFilter testcase which reproduces that issue shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-749) QParser and ValueSourceParser init bug
[ https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-749: Assignee: Grant Ingersoll QParser and ValueSourceParser init bug -- Key: SOLR-749 URL: https://issues.apache.org/jira/browse/SOLR-749 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Hoss Man Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-749.patch As noticed by Maximilian Hütter in this email thread... http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169 ...when a person tries to register a QParser (or ValueSourceParser) with the same name as a standard implementation it gets blown away by the initialization code for the standard impls. we need to allow people to override these standard names the same way they can with responseWriters, etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-540) Add support for hl.fl=*
[ https://issues.apache.org/jira/browse/SOLR-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Kotthoff updated SOLR-540: --- Attachment: SOLR-540-highlight-all.patch Attaching new patch which only highlights on stored text/string fields. Added test case to verify that. Add support for hl.fl=* --- Key: SOLR-540 URL: https://issues.apache.org/jira/browse/SOLR-540 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch Adds support for the star value for the hl.fl parameter, i.e. highlighting will be done on all fields (static and dynamic). Particularly useful in conjunction with hl.requireFieldMatch=true, this way one can specify generic highlighting parameters independent of the query/searched fields. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-749) QParser and ValueSourceParser init bug
[ https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-749: - Attachment: SOLR-749.patch Hoss's patch plus unit tests QParser and ValueSourceParser init bug -- Key: SOLR-749 URL: https://issues.apache.org/jira/browse/SOLR-749 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Hoss Man Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-749.patch, SOLR-749.patch As noticed by Maximilian Hütter in this email thread... http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169 ...when a person tries to register a QParser (or ValueSourceParser) with the same name as a standard implementation it gets blown away by the initialization code for the standard impls. we need to allow people to override these standard names the same way they can with responseWriters, etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Solr's use of Lucene's Compression field
Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception , it occurred to me that we probably should refactor Solr's offering of compression. Currently, we rely on Field.COMPRESS from Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 , because it only offers the highest level of compression, which is also the slowest. Obviously, Solr needs to handle the compression on the server side. I think we should have Solr do the compression, allowing users to set the level of compression (maybe even make it pluggable to put in your own compression techniques) and then just use Lucene's binary field capability. Granted, this is lower priority since I doubt many people use compression to begin with, but, still it would be useful. -Grant
[jira] Updated: (SOLR-341) PHP Solr Client
[ https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-341: -- Fix Version/s: 1.4 PHP Solr Client --- Key: SOLR-341 URL: https://issues.apache.org/jira/browse/SOLR-341 Project: Solr Issue Type: New Feature Components: clients - php Affects Versions: 1.2 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other json_decode function implementation). Solr = 1.2 Reporter: Donovan Jimenez Priority: Trivial Fix For: 1.4 Attachments: SolrPhpClient.2008-09-02.zip, SolrPhpClient.zip Developed this client when the example PHP source didn't meet our needs. The company I work for agreed to release it under the terms of the Apache License. This version is slightly different from what I originally linked to on the dev mailing list. I've incorporated feedback from Yonik and hossman to simplify the client and only accept one response format (JSON currently). When Solr 1.3 is released the client can be updated to use the PHP or Serialized PHP response writer. example usage from my original mailing list post: ?php require_once('Solr/Service.php'); $start = microtime(true); $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 8180, '/solr'); try { $response = $solr-search('solr', 0, 10, array(/* you can include other parameters here */)); echo 'search returned with status = ', $response-responseHeader-status, ' and took ', microtime(true) - $start, ' seconds', \n; //here's how you would access results //Notice that I've mapped the values by name into a tree of stdClass objects //and arrays (actually, most of this is done by json_decode ) if ($response-response-numFound 0) { $doc_number = $response-response-start; foreach ($response-response-docs as $doc) { $doc_number++; echo $doc_number, ': ', $doc-text, \n; } } //for the purposes of seeing the available structure of the response //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on the response before //any values are accessed may result in different behavior (in case //anyone has some troubles debugging) //print_r($response); } catch (Exception $e) { echo $e-getMessage(), \n; } ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-747) improve solr example config
[ https://issues.apache.org/jira/browse/SOLR-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628034#action_12628034 ] Otis Gospodnetic commented on SOLR-747: --- Hoss, I think so, yes. improve solr example config --- Key: SOLR-747 URL: https://issues.apache.org/jira/browse/SOLR-747 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Reporter: Yonik Seeley Priority: Minor Fix For: 1.3 Attachments: SOLR-747.patch Improve the solr example solrconfig.xml and schema.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-748) FacetComponent helper classes are package restricted
[ https://issues.apache.org/jira/browse/SOLR-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628036#action_12628036 ] Wojtek Piaseczny commented on SOLR-748: --- A more detailed list of changes: ShardFacetCount, DistribFieldFacet, and FacetInfo classes become final public. Their member variables become private, and are accessible (get set) through public accessors. FieldFacet becomes a public class. Its member variables become protected, and are accessible (get set) through public accessors. ResponseBuilder's private member variable _facetInfo renamed to facetInfo and made public. FacetComponent uses public accessors to access class members. FacetComponent helper classes are package restricted Key: SOLR-748 URL: https://issues.apache.org/jira/browse/SOLR-748 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Wojtek Piaseczny Fix For: 1.4 Attachments: 748.patch Original discussion: http://www.nabble.com/Package-Access-Issues---Extending-FacetComponent-to19148122.html The FacetComponent class uses several helper classes that currently have package-restricted access. This makes it impossible to extend the FacetComponent without rewriting most of its functionality. A proposed solution is to make those classes public and make their public member variables accessibly only through get and set functions (i.e. make them private). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-538) CopyField maxLength property
[ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628045#action_12628045 ] Chris Harris commented on SOLR-538: --- Thanks, Lars; that was fast. I think this patch is going to be handy. I'm wondering what people thought about an alternative approach to keeping stored fields from being too large, which would require mucking around with Lucene. In particular, the idea would be to allow field definitions like this: field name=body type=text indexed=true stored=true omitNorms=false compressed=true maxFieldLength=2000 storeOnlyAnalyzedText=true / Here we've made the normal Lucene maxFieldLength (i.e. # tokens to analyze) configurable a field-by-field basis. And in this declaration we've also made it so that what is stored is a function of what is analyzed. (Here if the first 2,000 tokens correspond to the first, say, 8,000 characters, then those 8,000 characters are what's going to be actually stored in the stored field.) This seems a little more natural than lopping off the text after a fixed number of characters. If I could do the above, I'm thinking I would use that single field for both searching and highlighting. But if you wanted a separate field for highlighting (and were willing to have things run slower than with the current patch), then you could do this: field name=body type=text indexed=true stored=false omitNorms=false / field name=highlighting type=text indexed=false stored=true compressed=true maxFieldLength=2000 storeOnlyAnalyzedText=true / copyField src=body dest=highlighting / CopyField maxLength property Key: SOLR-538 URL: https://issues.apache.org/jira/browse/SOLR-538 Project: Solr Issue Type: Improvement Components: update Reporter: Nicolas Dessaigne Priority: Minor Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch As discussed shortly on the mailing list (http://www.mail-archive.com/[EMAIL PROTECTED]/msg09807.html), the objective of this task is to add a maxLength property to the CopyField command. This property simply limits the number of characters that are copied. This is particularly useful to avoid very slow highlighting when the index contains big documents. Example : copyField source=text dest=highlight maxLength=3 / This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-538) CopyField maxLength property
[ https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628054#action_12628054 ] Lars Kotthoff commented on SOLR-538: Interesting idea, but this should probably be a separate issue. It would require more significant changes, for example the update handler should probably warn when the value for a field is truncated etc. CopyField maxLength property Key: SOLR-538 URL: https://issues.apache.org/jira/browse/SOLR-538 Project: Solr Issue Type: Improvement Components: update Reporter: Nicolas Dessaigne Priority: Minor Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, SOLR-538.patch, SOLR-538.patch, SOLR-538.patch As discussed shortly on the mailing list (http://www.mail-archive.com/[EMAIL PROTECTED]/msg09807.html), the objective of this task is to add a maxLength property to the CopyField command. This property simply limits the number of characters that are copied. This is particularly useful to avoid very slow highlighting when the index contains big documents. Example : copyField source=text dest=highlight maxLength=3 / This approach has also the advantage of limiting the index size for large documents (the original text field does not need to be stored and to have term vectors). However, the index is bigger for small documents... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-749) QParser and ValueSourceParser init bug
[ https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-749. -- Resolution: Fixed Committed on trunk and on branch-1.3 QParser and ValueSourceParser init bug -- Key: SOLR-749 URL: https://issues.apache.org/jira/browse/SOLR-749 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Hoss Man Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-749.patch, SOLR-749.patch As noticed by Maximilian Hütter in this email thread... http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169 ...when a person tries to register a QParser (or ValueSourceParser) with the same name as a standard implementation it gets blown away by the initialization code for the standard impls. we need to allow people to override these standard names the same way they can with responseWriters, etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-747) improve solr example config
[ https://issues.apache.org/jira/browse/SOLR-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-747. --- Resolution: Fixed made suggested changes and committed. improve solr example config --- Key: SOLR-747 URL: https://issues.apache.org/jira/browse/SOLR-747 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Reporter: Yonik Seeley Priority: Minor Fix For: 1.3 Attachments: SOLR-747.patch Improve the solr example solrconfig.xml and schema.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr's use of Lucene's Compression field
Agreed. It was the simplest thing to do at the time, but it would definitely be preferrable to offer the much faster lesser levels of compression. -Mike On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote: Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception , it occurred to me that we probably should refactor Solr's offering of compression. Currently, we rely on Field.COMPRESS from Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 , because it only offers the highest level of compression, which is also the slowest. Obviously, Solr needs to handle the compression on the server side. I think we should have Solr do the compression, allowing users to set the level of compression (maybe even make it pluggable to put in your own compression techniques) and then just use Lucene's binary field capability. Granted, this is lower priority since I doubt many people use compression to begin with, but, still it would be useful. -Grant
[jira] Created: (SOLR-752) Allow better Field Compression options
Allow better Field Compression options -- Key: SOLR-752 URL: https://issues.apache.org/jira/browse/SOLR-752 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression It would be good if Solr handled field compression outside of Lucene's Field.COMPRESS capabilities, since those capabilities are less than ideal when it comes to control over compression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: prototype Solr 1.3 RC 1
Seems like all issues have been closed. What is the plan for the release now? On Fri, Aug 29, 2008 at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED]wrote: I created a Hudson task to do the building/archival tasks for the release candidates.It is a on-demand task (i.e. not scheduled) See http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/ for the job in general. The artifacts (including Maven) are at: http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/ The web site (including javadocs): http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/release-candidate/build/docs/index.html I haven't gone through with a fine tooth comb yet, hence the prototype in the subject line, but my preliminary skimming of it seems like it is on track. I will cover it more later today. In the meantime, feedback is appreciated. Cheers, Grant -- Regards, Shalin Shekhar Mangar.
Re: prototype Solr 1.3 RC 1
On Wed, Sep 3, 2008 at 2:57 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Seems like all issues have been closed. What is the plan for the release now? I need to update the lucene libs again first... MikeM found+fixed a lucene bug today. -Yonik
Re: Solr's use of Lucene's Compression field
Also I see that another Lucene bug (LUCENE-1374) was found relating to compressed fields in lucene (when we first added compressed field support to solr a lucene bug involving lazy-loaded fields and compression was uncovered, too). It would be good to change the implementation simply to avoid relying on a deprecated lucene feature that isn't well exercised in development. -Mike On 3-Sep-08, at 11:36 AM, Mike Klaas wrote: Agreed. It was the simplest thing to do at the time, but it would definitely be preferrable to offer the much faster lesser levels of compression. -Mike On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote: Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception , it occurred to me that we probably should refactor Solr's offering of compression. Currently, we rely on Field.COMPRESS from Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 , because it only offers the highest level of compression, which is also the slowest. Obviously, Solr needs to handle the compression on the server side. I think we should have Solr do the compression, allowing users to set the level of compression (maybe even make it pluggable to put in your own compression techniques) and then just use Lucene's binary field capability. Granted, this is lower priority since I doubt many people use compression to begin with, but, still it would be useful. -Grant
[jira] Commented: (SOLR-742) Unable to create dynamic fields with custom DataImportHandler transformer
[ https://issues.apache.org/jira/browse/SOLR-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628138#action_12628138 ] Wojtek Piaseczny commented on SOLR-742: --- That fixed my issue, thank you! Unable to create dynamic fields with custom DataImportHandler transformer - Key: SOLR-742 URL: https://issues.apache.org/jira/browse/SOLR-742 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.3 Reporter: Wojtek Piaseczny Fix For: 1.4 Attachments: SOLR-742.patch, SOLR-742.patch Discussion at: http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-to19226532.html Dynamic fields aren't created when specified in a DataImportHandler's transformer. Reproducing the issue: I have defined a dynamic field (of type sdouble) in my schema called _dynamic*. Inside the transformer's transformRow method, I am adding the name-value pair _dynamicTest and '1.0'. No errors are observed, but the data does not appear in the index after importing is complete. Interestingly, I can specify that same name-value pair combination in the DataImportHandler's config file, and it does appear in the index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Realtime Search for Social Networks Collaboration
Hi Yonik, The SOLR 2 list looks good. The question is, who is going to do the work? I tried to simplify the scope of Ocean as much as possible to make it possible (and slowly at that over time) for me to eventually finish what is mentioned on the wiki. I think SOLR is very cool and was major step forward when it came out. I also think it's got a lot of things now which makes integration difficult to do properly. I did try to integrate and received a lukewarm response and so decided to just move ahead separately until folks have time to collaborate. We probably should try to integrate SOLR and Ocean somehow however we may want to simply reduce the scope a bit and figure what is needed most, with the main use case being social networks. I think the problem with integration with SOLR is it was designed with a different problem set in mind than Ocean, originally the CNET shopping application. Facets were important, realtime was not needed because pricing doesn't change very often. I designed Ocean for social networks and actually further into the future realtime messaging based mobile applications. SOLR needs to be backward compatible and support it's existing user base. How do you plan on doing this for a SOLR 2 if the architecture is changed dramatically? SOLR solves a problem set that is very common making SOLR very useful in many situations. However I wanted Ocean to be like GData. So I wanted the scalability of Google which SOLR doesn't quite have yet, and the realtime, and then I figured the other stuff could be added later, stuff people seem to spend a lot of time on in the SOLR community currently (spellchecker, db imports, many others). I did use some of the SOLR terminology in building Ocean, like snapshots! But most of it is a digression. I tried to use schemas, but they just make the system harder to use. For distributed search I prefer serialized objects as this enables things like SpanQueries and payloads without writing request handlers and such. Also there is no need to write new request handlers and deploy (an expensive operation for systems that are in the 100s of servers) them as any new classes are simply dynamically loaded by the server from the client. A lot is now outlined on the wiki site http://wiki.apache.org/lucene-java/OceanRealtimeSearch now and there will be a lot more javadocs in the forthcoming patch. The latest code is also available all the time at http://oceansearch.googlecode.com/svn/trunk/trunk/oceanlucene I do welcome more discussion and if there are Solr developers who wish to work on Ocean feel free to drop me a line. Most of all though I think it would be useful for social networks interested in realtime search to get involved as it may be something that is difficult for one company to have enough resources to implement to a production level. I think this is where open source collaboration is particularly useful. Cheers, Jason Rutherglen [EMAIL PROTECTED] On Wed, Sep 3, 2008 at 4:56 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: I am wondering if there are social networks (or anyone else) out there who would be interested in collaborating with Apache on realtime search to get it to the point it can be used in production. Good timing Jason, I think you'll find some other people right here at Apache (solr-dev) that want to collaborate in this area: http://www.nabble.com/solr2%3A-Onward-and-Upward-td19224805.html I've looked at your wiki briefly, and all the high level goals/features seem to really be synergistic with where we are going with Solr2. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: solr2: Onward and Upward
I think Hoss has a good point here. Solr has not shipped 1.3 yet and really needs to. A lot of the functionality mentioned would probably break any backward compatibility and/or require large rewrites of code. For Ocean I guess I should just state more clearly that it's really supposed to be a replacement for SQL databases like what Google has done with GData and not just realtime search using Lucene. There may be some issues with doing this, however they can and should be addressed. This article by Adam Bosworth explains well how a massively scalable search database has many benefits over scaling SQL database systems http://acmqueue.com/modules.php?name=Contentpa=showpagepid=337 I see this as the clear future for most companies, even if it takes a long time for even a few companies to implement outside of Google. There are too many cost and feature advantages in using search based databases, rather than using a mix of SQL and then doing batch based updates later. I doubt most companies would try to do it at this point, however one would say the same thing about SQL databases in the 1970s. In any case, SOLR is very cool and it would be great to see some of the analyzers, NumberUtils and other things go back into core Lucene at some point. Jason On Fri, Aug 29, 2008 at 3:13 PM, Chris Hostetter [EMAIL PROTECTED] wrote: You guys are all nuts. I'm barely hanging on by a thread keeping up with all of the 1.3 stuff, and you're already talking about 1.4, 1.X, and 2.0 ... madness i tell you, madness! PS: seriously, I'm going to hold off on actually reading this thread untill 1.3 is shipped. it doesn't mean i'm not interested, it just means i'm interested later. -Hoss
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Harris updated SOLR-284: -- Attachment: rich.patch This update is just to make a tiny refactoring, bringing all the handler's parsing classes under src\java\org\apache\solr\handler\rich and all the testing classes under src\test\org\apache\solr\handler\rich All tests pass. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Reporter: Eric Pugh Fix For: 1.4 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, source.zip, test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.