[jira] Resolved: (SOLR-750) DateField.parseMath doesn't handle non-existent Z

2008-09-03 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-750.
---

Resolution: Invalid

The exception is correct -- that is an invalid date string (as far as being 
input to parseMath, toInternal, or DateField.getAnalyzer().tokenStream is 
concerned)

The SOLR-540 patch is doing something it shouldn't be (which seems likely since 
it makes absolutely no sense to try and highlight a DateField) *and/or* the 
Highlighter has a bug (why is getBestTextFragments passing an indexed token to 
an Analzyer?)

Either way: parseMath is doing the right thing.

 DateField.parseMath doesn't handle non-existent Z
 -

 Key: SOLR-750
 URL: https://issues.apache.org/jira/browse/SOLR-750
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: David Smiley
Priority: Minor
 Attachments: SOLR-750_DateField_no_Z.patch

   Original Estimate: 0.25h
  Remaining Estimate: 0.25h

 I've run into situations when trying to use SOLR-540 (wildcard highlight 
 spec) such that if attempts to highlight a date field, I get a stack trace 
 from DateField.parseMath puking because there isn't a Z at the end of an 
 otherwise good date-time string.  It was very easy to fix the code to make it 
 react gracefully to no Z.  Attached is the patch.  This bug isn't really 
 related to SOLR-540 so please apply it without waiting for 540.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-749) QParser and ValueSourceParser init bug

2008-09-03 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627984#action_12627984
 ] 

Grant Ingersoll commented on SOLR-749:
--

I think we need a test case for this.

 QParser and ValueSourceParser init bug
 --

 Key: SOLR-749
 URL: https://issues.apache.org/jira/browse/SOLR-749
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Hoss Man
 Fix For: 1.3

 Attachments: SOLR-749.patch


 As noticed by Maximilian Hütter in this email thread...
 http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169
 ...when a person tries to register a QParser (or ValueSourceParser) with the 
 same name as a standard implementation it gets blown away by the 
 initialization code for the standard impls.
 we need to allow people to override these standard names the same way they 
 can with responseWriters, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-540) Add support for hl.fl=*

2008-09-03 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627985#action_12627985
 ] 

David Smiley commented on SOLR-540:
---

Hey SOLR-540 people, please see the comment thread on SOLR-750 which is 
apparently a bug with this patch.

 Add support for hl.fl=*
 ---

 Key: SOLR-540
 URL: https://issues.apache.org/jira/browse/SOLR-540
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: SOLR-540-highlight-all.patch, 
 SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch


 Adds support for the star value for the hl.fl parameter, i.e. highlighting 
 will be done on all fields (static and dynamic). Particularly useful in 
 conjunction with hl.requireFieldMatch=true, this way one can specify 
 generic highlighting parameters independent of the query/searched fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-751) WordDelimiterFilter doesn't adjust startOffset

2008-09-03 Thread Stefan Oestreicher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Oestreicher updated SOLR-751:


Attachment: SOLR-751.patch

 WordDelimiterFilter doesn't adjust startOffset
 --

 Key: SOLR-751
 URL: https://issues.apache.org/jira/browse/SOLR-751
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3, 1.4
Reporter: Stefan Oestreicher
 Attachments: SOLR-751.patch


 If the first character of a token gets stripped the startOffset of that token 
 is not adjusted. With the last character it behaves as expected. I'll attach 
 a patch for the TestWordDelimiterFilter testcase which reproduces that issue 
 shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-751) WordDelimiterFilter doesn't adjust startOffset

2008-09-03 Thread Stefan Oestreicher (JIRA)
WordDelimiterFilter doesn't adjust startOffset
--

 Key: SOLR-751
 URL: https://issues.apache.org/jira/browse/SOLR-751
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3, 1.4
Reporter: Stefan Oestreicher
 Attachments: SOLR-751.patch

If the first character of a token gets stripped the startOffset of that token 
is not adjusted. With the last character it behaves as expected. I'll attach a 
patch for the TestWordDelimiterFilter testcase which reproduces that issue 
shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-749) QParser and ValueSourceParser init bug

2008-09-03 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-749:


Assignee: Grant Ingersoll

 QParser and ValueSourceParser init bug
 --

 Key: SOLR-749
 URL: https://issues.apache.org/jira/browse/SOLR-749
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-749.patch


 As noticed by Maximilian Hütter in this email thread...
 http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169
 ...when a person tries to register a QParser (or ValueSourceParser) with the 
 same name as a standard implementation it gets blown away by the 
 initialization code for the standard impls.
 we need to allow people to override these standard names the same way they 
 can with responseWriters, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-540) Add support for hl.fl=*

2008-09-03 Thread Lars Kotthoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Kotthoff updated SOLR-540:
---

Attachment: SOLR-540-highlight-all.patch

Attaching new patch which only highlights on stored text/string fields. Added 
test case to verify that.

 Add support for hl.fl=*
 ---

 Key: SOLR-540
 URL: https://issues.apache.org/jira/browse/SOLR-540
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: SOLR-540-highlight-all.patch, 
 SOLR-540-highlight-all.patch, SOLR-540-highlight-all.patch, 
 SOLR-540-highlight-all.patch


 Adds support for the star value for the hl.fl parameter, i.e. highlighting 
 will be done on all fields (static and dynamic). Particularly useful in 
 conjunction with hl.requireFieldMatch=true, this way one can specify 
 generic highlighting parameters independent of the query/searched fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-749) QParser and ValueSourceParser init bug

2008-09-03 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-749:
-

Attachment: SOLR-749.patch

Hoss's patch plus unit tests

 QParser and ValueSourceParser init bug
 --

 Key: SOLR-749
 URL: https://issues.apache.org/jira/browse/SOLR-749
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-749.patch, SOLR-749.patch


 As noticed by Maximilian Hütter in this email thread...
 http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169
 ...when a person tries to register a QParser (or ValueSourceParser) with the 
 same name as a standard implementation it gets blown away by the 
 initialization code for the standard impls.
 we need to allow people to override these standard names the same way they 
 can with responseWriters, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Solr's use of Lucene's Compression field

2008-09-03 Thread Grant Ingersoll
Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception 
, it occurred to me that we probably should refactor Solr's offering  
of compression.  Currently, we rely on Field.COMPRESS from Lucene, but  
this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 
, because it only offers the highest level of compression, which is  
also the slowest.


Obviously, Solr needs to handle the compression on the server side.  I  
think we should have Solr do the compression, allowing users to set  
the level of compression (maybe even make it pluggable to put in your  
own compression techniques) and then just use Lucene's binary field  
capability.  Granted, this is lower priority since I doubt many people  
use compression to begin with, but, still it would be useful.


-Grant


[jira] Updated: (SOLR-341) PHP Solr Client

2008-09-03 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-341:
--

Fix Version/s: 1.4

 PHP Solr Client
 ---

 Key: SOLR-341
 URL: https://issues.apache.org/jira/browse/SOLR-341
 Project: Solr
  Issue Type: New Feature
  Components: clients - php
Affects Versions: 1.2
 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other 
 json_decode function implementation). Solr = 1.2
Reporter: Donovan Jimenez
Priority: Trivial
 Fix For: 1.4

 Attachments: SolrPhpClient.2008-09-02.zip, SolrPhpClient.zip


 Developed this client when the example PHP source didn't meet our needs.  The 
 company I work for agreed to release it under the terms of the Apache License.
 This version is slightly different from what I originally linked to on the 
 dev mailing list.  I've incorporated feedback from Yonik and hossman to 
 simplify the client and only accept one response format (JSON currently).
 When Solr 1.3 is released the client can be updated to use the PHP or 
 Serialized PHP response writer.
 example usage from my original mailing list post:
 ?php
 require_once('Solr/Service.php');
 $start = microtime(true);
 $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 
 8180, '/solr');
 try
 {
 $response = $solr-search('solr', 0, 10,
 array(/* you can include other parameters here */));
 echo 'search returned with status = ', 
 $response-responseHeader-status,
 ' and took ', microtime(true) - $start, ' seconds', \n;
 //here's how you would access results
 //Notice that I've mapped the values by name into a tree of stdClass 
 objects
 //and arrays (actually, most of this is done by json_decode )
 if ($response-response-numFound  0)
 {
 $doc_number = $response-response-start;
 foreach ($response-response-docs as $doc)
 {
 $doc_number++;
 echo $doc_number, ': ', $doc-text, \n;
 }
 }
 //for the purposes of seeing the available structure of the response
 //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on 
 the response before
 //any values are accessed may result in different behavior (in case
 //anyone has some troubles debugging)
 //print_r($response);
 }
 catch (Exception $e)
 {
 echo $e-getMessage(), \n;
 }
 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-747) improve solr example config

2008-09-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628034#action_12628034
 ] 

Otis Gospodnetic commented on SOLR-747:
---

Hoss, I think so, yes.


 improve solr example config
 ---

 Key: SOLR-747
 URL: https://issues.apache.org/jira/browse/SOLR-747
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-747.patch


 Improve the solr example solrconfig.xml and schema.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-748) FacetComponent helper classes are package restricted

2008-09-03 Thread Wojtek Piaseczny (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628036#action_12628036
 ] 

Wojtek Piaseczny commented on SOLR-748:
---

A more detailed list of changes:

ShardFacetCount, DistribFieldFacet, and FacetInfo classes become final public. 
Their member variables become private, and are accessible (get  set) through 
public accessors.

FieldFacet becomes a public class. Its member variables become protected, and 
are accessible (get  set) through public accessors.

ResponseBuilder's private member variable _facetInfo renamed to facetInfo and 
made public.

FacetComponent uses public accessors to access class members.


 FacetComponent helper classes are package restricted
 

 Key: SOLR-748
 URL: https://issues.apache.org/jira/browse/SOLR-748
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Wojtek Piaseczny
 Fix For: 1.4

 Attachments: 748.patch


 Original discussion:
 http://www.nabble.com/Package-Access-Issues---Extending-FacetComponent-to19148122.html
 The FacetComponent class uses several helper classes that currently have 
 package-restricted access. This makes it impossible to extend the 
 FacetComponent without rewriting most of its functionality.
 A proposed solution is to make those classes public and make their public 
 member variables accessibly only through get and set functions (i.e. make 
 them private).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-538) CopyField maxLength property

2008-09-03 Thread Chris Harris (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628045#action_12628045
 ] 

Chris Harris commented on SOLR-538:
---

Thanks, Lars; that was fast. I think this patch is going to be handy.

I'm wondering what people thought about an alternative approach to keeping 
stored fields from being too large, which would require mucking around with 
Lucene. In particular, the idea would be to allow field definitions like this:

field name=body type=text indexed=true stored=true
 omitNorms=false compressed=true
 maxFieldLength=2000 storeOnlyAnalyzedText=true
/

Here we've made the normal Lucene maxFieldLength (i.e. # tokens to analyze) 
configurable a field-by-field basis. And in this declaration we've also made it 
so that what is stored is a function of what is analyzed. (Here if the first 
2,000 tokens correspond to the first, say, 8,000 characters, then those 8,000 
characters are what's going to be actually stored in the stored field.) This 
seems a little more natural than lopping off the text after a fixed number of 
characters.

If I could do the above, I'm thinking I would use that single field for both 
searching and highlighting. But if you wanted a separate field for highlighting 
(and were willing to have things run slower than with the current patch), then 
you could do this:

field name=body type=text indexed=true stored=false omitNorms=false 
/
field name=highlighting type=text indexed=false stored=true
 compressed=true maxFieldLength=2000 storeOnlyAnalyzedText=true /
copyField src=body dest=highlighting /


 CopyField maxLength property
 

 Key: SOLR-538
 URL: https://issues.apache.org/jira/browse/SOLR-538
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Nicolas Dessaigne
Priority: Minor
 Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, 
 SOLR-538.patch, SOLR-538.patch, SOLR-538.patch


 As discussed shortly on the mailing list (http://www.mail-archive.com/[EMAIL 
 PROTECTED]/msg09807.html), the objective of this task is to add a maxLength 
 property to the CopyField command. This property simply limits the number 
 of characters that are copied.
 This is particularly useful to avoid very slow highlighting when the index 
 contains big documents.
 Example :
 copyField source=text dest=highlight maxLength=3 /
 This approach has also the advantage of limiting the index size for large 
 documents (the original text field does not need to be stored and to have 
 term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-538) CopyField maxLength property

2008-09-03 Thread Lars Kotthoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628054#action_12628054
 ] 

Lars Kotthoff commented on SOLR-538:


Interesting idea, but this should probably be a separate issue. It would 
require more significant changes, for example the update handler should 
probably warn when the value for a field is truncated etc.

 CopyField maxLength property
 

 Key: SOLR-538
 URL: https://issues.apache.org/jira/browse/SOLR-538
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Nicolas Dessaigne
Priority: Minor
 Attachments: CopyFieldMaxLength.patch, CopyFieldMaxLength.patch, 
 SOLR-538.patch, SOLR-538.patch, SOLR-538.patch


 As discussed shortly on the mailing list (http://www.mail-archive.com/[EMAIL 
 PROTECTED]/msg09807.html), the objective of this task is to add a maxLength 
 property to the CopyField command. This property simply limits the number 
 of characters that are copied.
 This is particularly useful to avoid very slow highlighting when the index 
 contains big documents.
 Example :
 copyField source=text dest=highlight maxLength=3 /
 This approach has also the advantage of limiting the index size for large 
 documents (the original text field does not need to be stored and to have 
 term vectors). However, the index is bigger for small documents...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-749) QParser and ValueSourceParser init bug

2008-09-03 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-749.
--

Resolution: Fixed

Committed on trunk and on branch-1.3

 QParser and ValueSourceParser init bug
 --

 Key: SOLR-749
 URL: https://issues.apache.org/jira/browse/SOLR-749
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Hoss Man
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-749.patch, SOLR-749.patch


 As noticed by Maximilian Hütter in this email thread...
 http://www.nabble.com/SOLR-218-problem-to19266169.html#a19266169
 ...when a person tries to register a QParser (or ValueSourceParser) with the 
 same name as a standard implementation it gets blown away by the 
 initialization code for the standard impls.
 we need to allow people to override these standard names the same way they 
 can with responseWriters, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-747) improve solr example config

2008-09-03 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-747.
---

Resolution: Fixed

made suggested changes and committed.

 improve solr example config
 ---

 Key: SOLR-747
 URL: https://issues.apache.org/jira/browse/SOLR-747
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-747.patch


 Improve the solr example solrconfig.xml and schema.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr's use of Lucene's Compression field

2008-09-03 Thread Mike Klaas
Agreed.  It was the simplest thing to do at the time, but it would  
definitely be preferrable to offer the much faster lesser levels of  
compression.


-Mike

On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote:

Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception 
, it occurred to me that we probably should refactor Solr's offering  
of compression.  Currently, we rely on Field.COMPRESS from Lucene,  
but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 
, because it only offers the highest level of compression, which is  
also the slowest.


Obviously, Solr needs to handle the compression on the server side.   
I think we should have Solr do the compression, allowing users to  
set the level of compression (maybe even make it pluggable to put in  
your own compression techniques) and then just use Lucene's binary  
field capability.  Granted, this is lower priority since I doubt  
many people use compression to begin with, but, still it would be  
useful.


-Grant




[jira] Created: (SOLR-752) Allow better Field Compression options

2008-09-03 Thread Grant Ingersoll (JIRA)
Allow better Field Compression options
--

 Key: SOLR-752
 URL: https://issues.apache.org/jira/browse/SOLR-752
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor


See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression

It would be good if Solr handled field compression outside of Lucene's 
Field.COMPRESS capabilities, since those capabilities are less than ideal when 
it comes to control over compression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: prototype Solr 1.3 RC 1

2008-09-03 Thread Shalin Shekhar Mangar
Seems like all issues have been closed. What is the plan for the release
now?

On Fri, Aug 29, 2008 at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED]wrote:

 I created a Hudson task to do the building/archival tasks for the release
 candidates.It is a on-demand task (i.e. not scheduled)

 See
 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/ for
 the job in general.

 The artifacts (including Maven) are at:

 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/

 The web site (including javadocs):

 http://hudson.zones.apache.org/hudson/job/Solr%20Release%20Candidate/lastSuccessfulBuild/artifact/release-candidate/build/docs/index.html

 I haven't gone through with a fine tooth comb yet, hence the prototype in
 the subject line, but my preliminary skimming of it seems like it is on
 track.   I will cover it more later today.  In the meantime, feedback is
 appreciated.

 Cheers,
 Grant




-- 
Regards,
Shalin Shekhar Mangar.


Re: prototype Solr 1.3 RC 1

2008-09-03 Thread Yonik Seeley
On Wed, Sep 3, 2008 at 2:57 PM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 Seems like all issues have been closed. What is the plan for the release
 now?

I need to update the lucene libs again first... MikeM found+fixed a
lucene bug today.

-Yonik


Re: Solr's use of Lucene's Compression field

2008-09-03 Thread Mike Klaas
Also I see that another Lucene bug (LUCENE-1374) was found relating to  
compressed fields in lucene (when we first added compressed field  
support to solr a lucene bug involving lazy-loaded fields and  
compression was uncovered, too).


It would be good to change the implementation simply to avoid relying  
on a deprecated lucene feature that isn't well exercised in development.


-Mike

On 3-Sep-08, at 11:36 AM, Mike Klaas wrote:

Agreed.  It was the simplest thing to do at the time, but it would  
definitely be preferrable to offer the much faster lesser levels of  
compression.


-Mike

On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote:

Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception 
, it occurred to me that we probably should refactor Solr's  
offering of compression.  Currently, we rely on Field.COMPRESS from  
Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 
, because it only offers the highest level of compression, which is  
also the slowest.


Obviously, Solr needs to handle the compression on the server  
side.  I think we should have Solr do the compression, allowing  
users to set the level of compression (maybe even make it pluggable  
to put in your own compression techniques) and then just use  
Lucene's binary field capability.  Granted, this is lower priority  
since I doubt many people use compression to begin with, but, still  
it would be useful.


-Grant






[jira] Commented: (SOLR-742) Unable to create dynamic fields with custom DataImportHandler transformer

2008-09-03 Thread Wojtek Piaseczny (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12628138#action_12628138
 ] 

Wojtek Piaseczny commented on SOLR-742:
---

That fixed my issue, thank you!

 Unable to create dynamic fields with custom DataImportHandler transformer
 -

 Key: SOLR-742
 URL: https://issues.apache.org/jira/browse/SOLR-742
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Wojtek Piaseczny
 Fix For: 1.4

 Attachments: SOLR-742.patch, SOLR-742.patch


 Discussion at: 
 http://www.nabble.com/Creating-dynamic-fields-with-DataImportHandler-to19226532.html
 Dynamic fields aren't created when specified in a DataImportHandler's 
 transformer. 
 Reproducing the issue:
 I have defined a dynamic field (of type sdouble) in my schema called 
 _dynamic*. Inside the transformer's transformRow method, I am adding the 
 name-value pair _dynamicTest and '1.0'. No errors are observed, but the 
 data does not appear in the index after importing is complete.
 Interestingly, I can specify that same name-value pair combination in the 
 DataImportHandler's config file, and it does appear in the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Realtime Search for Social Networks Collaboration

2008-09-03 Thread Jason Rutherglen
Hi Yonik,

The SOLR 2 list looks good.  The question is, who is going to do the
work?  I tried to simplify the scope of Ocean as much as possible to
make it possible (and slowly at that over time) for me to eventually
finish what is mentioned on the wiki.  I think SOLR is very cool and
was   major step forward when it came out.  I also think it's got a
lot of things now which makes integration difficult to do properly.  I
did try to integrate and received a lukewarm response and so decided
to just move ahead separately until folks have time to collaborate.
We probably should try to integrate SOLR and Ocean somehow however we
may want to simply reduce the scope a bit and figure what is needed
most, with the main use case being social networks.

I think the problem with integration with SOLR is it was designed with
a different problem set in mind than Ocean, originally the CNET
shopping application.  Facets were important, realtime was not needed
because pricing doesn't change very often.  I designed Ocean for
social networks and actually further into the future realtime
messaging based mobile applications.

SOLR needs to be backward compatible and support it's existing user
base.  How do you plan on doing this for a SOLR 2 if the architecture
is changed dramatically?  SOLR solves a problem set that is very
common making SOLR very useful in many situations.  However I wanted
Ocean to be like GData.  So I wanted the scalability of Google which
SOLR doesn't quite have yet, and the realtime, and then I figured the
other stuff could be added later, stuff people seem to spend a lot of
time on in the SOLR community currently (spellchecker, db imports,
many others).  I did use some of the SOLR terminology in building
Ocean, like snapshots!  But most of it is a digression.  I tried to
use schemas, but they just make the system harder to use.  For
distributed search I prefer serialized objects as this enables things
like SpanQueries and payloads without writing request handlers and
such.  Also there is no need to write new request handlers and deploy
(an expensive operation for systems that are in the 100s of servers)
them as any new classes are simply dynamically loaded by the server
from the client.

A lot is now outlined on the wiki site
http://wiki.apache.org/lucene-java/OceanRealtimeSearch now and there
will be a lot more javadocs in the forthcoming patch.  The latest code
is also available all the time at
http://oceansearch.googlecode.com/svn/trunk/trunk/oceanlucene

I do welcome more discussion and if there are Solr developers who wish
to work on Ocean feel free to drop me a line.  Most of all though I
think it would be useful for social networks interested in realtime
search to get involved as it may be something that is difficult for
one company to have enough resources to implement to a production
level.  I think this is where open source collaboration is
particularly useful.

Cheers,

Jason Rutherglen
[EMAIL PROTECTED]

On Wed, Sep 3, 2008 at 4:56 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen
 [EMAIL PROTECTED] wrote:
 I am wondering
 if there are social networks (or anyone else) out there who would be
 interested in collaborating with Apache on realtime search to get it
 to the point it can be used in production.

 Good timing Jason, I think you'll find some other people right here
 at Apache (solr-dev) that want to collaborate in this area:

 http://www.nabble.com/solr2%3A-Onward-and-Upward-td19224805.html

 I've looked at your wiki briefly, and all the high level goals/features seem
 to really be synergistic with where we are going with Solr2.

 -Yonik

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: solr2: Onward and Upward

2008-09-03 Thread Jason Rutherglen
I think Hoss has a good point here.  Solr has not shipped 1.3 yet and
really needs to.  A lot of the functionality mentioned would probably
break any backward compatibility and/or require large rewrites of
code.

For Ocean I guess I should just state more clearly that it's really
supposed to be a replacement for SQL databases like what Google has
done with GData and not just realtime search using Lucene.  There may
be some issues with doing this, however they can and should be
addressed.  This article by Adam Bosworth explains well how a
massively scalable search database has many benefits over scaling SQL
database systems
http://acmqueue.com/modules.php?name=Contentpa=showpagepid=337  I
see this as the clear future for most companies, even if it takes a
long time for even a few companies to implement outside of Google.
There are too many cost and feature advantages in using search based
databases, rather than using a mix of SQL and then doing batch based
updates later.  I doubt most companies would try to do it at this
point, however one would say the same thing about SQL databases in the
1970s.

In any case, SOLR is very cool and it would be great to see some of
the analyzers, NumberUtils and other things go back into core Lucene
at some point.

Jason

On Fri, Aug 29, 2008 at 3:13 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:

 You guys are all nuts.  I'm barely hanging on by a thread keeping up with
 all of the 1.3 stuff, and you're already talking about 1.4, 1.X, and 2.0
 ... madness i tell you, madness!

 PS: seriously, I'm going to hold off on actually reading this thread
 untill 1.3 is shipped.  it doesn't mean i'm not interested, it just means
 i'm interested later.


 -Hoss




[jira] Updated: (SOLR-284) Parsing Rich Document Types

2008-09-03 Thread Chris Harris (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Harris updated SOLR-284:
--

Attachment: rich.patch

This update is just to make a tiny refactoring, bringing all the handler's 
parsing classes under 

src\java\org\apache\solr\handler\rich

and all the testing classes under 

src\test\org\apache\solr\handler\rich

All tests pass.

 Parsing Rich Document Types
 ---

 Key: SOLR-284
 URL: https://issues.apache.org/jira/browse/SOLR-284
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Eric Pugh
 Fix For: 1.4

 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
 rich.patch, rich.patch, rich.patch, source.zip, test-files.zip, 
 test-files.zip, test.zip, un-hardcode-id.diff


 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
 that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
 Solr.
 There is a wiki page with information here: 
 http://wiki.apache.org/solr/UpdateRichDocuments
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.