[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.

2007-06-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504132
 ] 

Hoss Man commented on SOLR-193:
---

these comments are very happhazard, and in the best order i can think of (not 
hte order i wrote them)

> Perhaps it would be better to leave out the edge cases and just focus on the 
> SolrDocument 

...i don't mind big patches that have a lot of things ... it's just weird when 
there is a big patch with a lot of stuff and it's not clear what it's for :) 
... i was mainly looking for someplace where an UpdateHandler was making a 
SolrDocument and then calling build on it.

> Is the only difference between the input Document and output Document that it 
> has boosts?

there is some more complexity in Lucene docs because of things like the 
Fieldable options but i don't think those really impact a SolrDocument API 
since that info is abstracted into the schema and can't be set on a per 
document basis.

> Should we have:
>  SolrDocument
>   + BoostedSolrDocument

BoostedSolrDocument seems to specific to the methods added, and not to the 
purpose of the class ... i would call it a "SolrInputDocument" 
(IndexSolrDocument is too vague since the term "index" is used so much in the 
code base)

The basic structure in the new patch looks fine by the way, no real concerns 
from me once the comments are cleaned up (one question: should SolrDocument 
implement Map> ??)

> This is for SOLR-139. to 'modify' a document, you load the existing Document 
> - change it - 
> then store it back.
> 
> These two functions can happily live in a new class, and could be attached to 
> SOLR-139.

...oh, right, i forgot about the "update in place" patch  yeah i don't 
think those methods should live in DocumentBuilder (am i alone in thinking 
DocumentBuilder should probably be deprecated completely once this stuff is 
commited? ... or ... hmmm ... it could probably be subclassed by one that 
supports adding a whole SolrInputDocument at once, or one that can start with 
an older Document and update it with a new SolrInputDocument ... but we can 
worry about that later)

"updating" is a direct example of the type of thing i refered to in LUCENE-778 
about why a single Lucene Document class is bad.  to support updating you 
should have  an explicitly means of composing the output class into the input 
class ... but in that case you're dealing directly with Lucene Documents -- i 
can understand why we would need to modify a Lucene document into a 
SolrInputDocument ... but i don't think we really need to worry about the 
SolrDocument => SolrInputDocument case right?



> General SolrDocument interface to manage field values.
> --
>
> Key: SOLR-193
> URL: https://issues.apache.org/jira/browse/SOLR-193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
> Attachments: SOLR-193-SimpleSolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch
>
>
> In an effort to make SOLR-139 (the "modify" command) more manageable, i 
> extracted out a large chunk.  This patch adds a general SolrDocument 
> interface and includes a concrete implementation (SimpleSolrDoc)
> SOLR-139 needs some way to transport document values independent of the 
> lucene Document.  This is required for the INCREMENT command and useful for 
> modifying documents.  SolrDocument is also generally useful for SOLR-20
> - - - - - -
> The one (potentially) controversial part is that I added a function to 
> FieldType:
>  public Object toExternalValue(Fieldable f);
> This asks each field type to convert its Fieldable into its real type, for 
> example IntField.java has:
>  public Integer toExternalValue(Fieldable f) {
>return Integer.valueOf( toExternal(f) );
>  }
> By default, it returns a string value.  If this addition is too much, there 
> are other (less clean) ways to handle the INCREMENT command.  My real 
> motivation for this addition is that it makes it possible to implement an 
> embeddable SOLR-20 client that does not need an HTTP connection. 
> - - - -
> The SimpleSolrDoc implementation was written for SOLR-20.  It needs to play 
> nice with EL, so it implements a few extra map function that may not seem 
> necessary:
>  ${doc.values['name']]} gets a collection
>  ${doc.valueMap['name']]} gets a single value for the field
> - - - -
> The tests cover all "toExternalValue" changes in schema.*  
> SimpleSolrDoc and DocumentBuilder have 100% test coverage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-193) General SolrDocument interface to manage field values.

2007-06-12 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-193:
---

Attachment: SOLR-193-SimpleSolrDocument.patch

Here is a much much smaller patch that only adds the SolrDocument *class* and 
BoostableSolrDocument subclass.

We can work through the other bits later, but this would be sufficient for 
SOLR-20

It is a quick eclipes refactoring, so the comments may not make sense.  I'll 
check that over in better detail after you all get a chance to look at it...

> General SolrDocument interface to manage field values.
> --
>
> Key: SOLR-193
> URL: https://issues.apache.org/jira/browse/SOLR-193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
> Attachments: SOLR-193-SimpleSolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch
>
>
> In an effort to make SOLR-139 (the "modify" command) more manageable, i 
> extracted out a large chunk.  This patch adds a general SolrDocument 
> interface and includes a concrete implementation (SimpleSolrDoc)
> SOLR-139 needs some way to transport document values independent of the 
> lucene Document.  This is required for the INCREMENT command and useful for 
> modifying documents.  SolrDocument is also generally useful for SOLR-20
> - - - - - -
> The one (potentially) controversial part is that I added a function to 
> FieldType:
>  public Object toExternalValue(Fieldable f);
> This asks each field type to convert its Fieldable into its real type, for 
> example IntField.java has:
>  public Integer toExternalValue(Fieldable f) {
>return Integer.valueOf( toExternal(f) );
>  }
> By default, it returns a string value.  If this addition is too much, there 
> are other (less clean) ways to handle the INCREMENT command.  My real 
> motivation for this addition is that it makes it possible to implement an 
> embeddable SOLR-20 client that does not need an HTTP connection. 
> - - - -
> The SimpleSolrDoc implementation was written for SOLR-20.  It needs to play 
> nice with EL, so it implements a few extra map function that may not seem 
> necessary:
>  ${doc.values['name']]} gets a collection
>  ${doc.valueMap['name']]} gets a single value for the field
> - - - -
> The tests cover all "toExternalValue" changes in schema.*  
> SimpleSolrDoc and DocumentBuilder have 100% test coverage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.

2007-06-12 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504100
 ] 

Yonik Seeley commented on SOLR-193:
---

> This sounds fine. We should *defiantly* solve any know problems with the 
> Lucene document interface.
>  Just using an interface (rather then a concrete class) will be a huge help. 

I know this runs contrary to common java OO wisdom, but interfaces can really 
suck.
They don't hurt the *consumer* of a class, but cause major headaches for the 
*provider*, trying to evolve an interface and still provide backward 
compatibility (it's pretty much impossible).

In Lucene, where we have had a class (like Analyzer), it was trivial adding new 
functionality like getPositionIncrement().  If it had been an interface, it 
would have been impossible without breaking all the custom analyzers out there. 
 Where we have had interfaces, and added a new method, we simply broke some 
peoples code.

So if it's something that a customer might possibly subclass, a class used as 
an interface is a much better option.
If it's internal, or package projected, or something where you *really* need 
multiple inheritance, then an interface is fine.

> General SolrDocument interface to manage field values.
> --
>
> Key: SOLR-193
> URL: https://issues.apache.org/jira/browse/SOLR-193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
> Attachments: SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch
>
>
> In an effort to make SOLR-139 (the "modify" command) more manageable, i 
> extracted out a large chunk.  This patch adds a general SolrDocument 
> interface and includes a concrete implementation (SimpleSolrDoc)
> SOLR-139 needs some way to transport document values independent of the 
> lucene Document.  This is required for the INCREMENT command and useful for 
> modifying documents.  SolrDocument is also generally useful for SOLR-20
> - - - - - -
> The one (potentially) controversial part is that I added a function to 
> FieldType:
>  public Object toExternalValue(Fieldable f);
> This asks each field type to convert its Fieldable into its real type, for 
> example IntField.java has:
>  public Integer toExternalValue(Fieldable f) {
>return Integer.valueOf( toExternal(f) );
>  }
> By default, it returns a string value.  If this addition is too much, there 
> are other (less clean) ways to handle the INCREMENT command.  My real 
> motivation for this addition is that it makes it possible to implement an 
> embeddable SOLR-20 client that does not need an HTTP connection. 
> - - - -
> The SimpleSolrDoc implementation was written for SOLR-20.  It needs to play 
> nice with EL, so it implements a few extra map function that may not seem 
> necessary:
>  ${doc.values['name']]} gets a collection
>  ${doc.valueMap['name']]} gets a single value for the field
> - - - -
> The tests cover all "toExternalValue" changes in schema.*  
> SimpleSolrDoc and DocumentBuilder have 100% test coverage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.

2007-06-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504095
 ] 

Ryan McKinley commented on SOLR-193:



For background.  This class has functionality used for other issues including 
SOLR-104, SOLR-139.  For a while i tried keeping the functionality in different 
patches, but it became too much of a nightmare to maintain.  Perhaps it would 
be better to leave out the edge cases and just focus on the SolrDocument 
interface now...


> what is setDistinctByDefault, or setDistinctOrderMatters ?
> 

These options let you say if the field values should be backed by a Map 
or a List, the DistinctOrderMatters says if it should be Map or 
LinkedHashMap

These were useful for SOLR-104 when you SQL join a table and may get duplicate 
rows, but only want the distinct values to keep fields.

Now that you point it out, (and there is a good chance it will be in trunk 
soon) It would make more sense to implement these features as different 
subclasses of SimpleSolrDocument.


> Also, what is the purpose/use of DocumentBuilder.build and 
> DocumentBuilder.loadStoredFields 

This is for SOLR-139.  to 'modify' a document, you load the existing Document - 
change it - then store it back.

These two functions can happily live in a new class, and could be attached to 
SOLR-139.


>   2) I thought the SolrDocument API was for incoming documents ... 

I hope it is also useful for modifying existing Documents and transforming 
incoming/outgoing documents (but I'll raise that issue later ;)


> I think it's a mistake to try and have one single Interface for all three. 
> ... At the very least there should be a seperate API for the indexing side 
> and the query side (because of the boost issue) which can be  
> subclass/superclass relationships.
> 

This sounds fine.  We should *defiantly* solve any know problems with the 
Lucene document interface.  Just using an interface (rather then a concrete 
class) will be a huge help.

Is the only difference between the input Document and output Document that it 
has boosts?

Should we have:
 SolrDocument
   + BoostedSolrDocument

 or

 SolrDocument
   + IndexSolrDocument

Any thoughts on the common use case where I want to pull a document out of the 
index (no boosts) change it, then put it back?  Do i need to make a new class 
and copy all the fields?  Should SOLR-20 be able to index a SolrDocument (no 
boosts) as well as a BoostedSolrDocument?  I think so...


Thanks for looking at this!  


> General SolrDocument interface to manage field values.
> --
>
> Key: SOLR-193
> URL: https://issues.apache.org/jira/browse/SOLR-193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
> Attachments: SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch
>
>
> In an effort to make SOLR-139 (the "modify" command) more manageable, i 
> extracted out a large chunk.  This patch adds a general SolrDocument 
> interface and includes a concrete implementation (SimpleSolrDoc)
> SOLR-139 needs some way to transport document values independent of the 
> lucene Document.  This is required for the INCREMENT command and useful for 
> modifying documents.  SolrDocument is also generally useful for SOLR-20
> - - - - - -
> The one (potentially) controversial part is that I added a function to 
> FieldType:
>  public Object toExternalValue(Fieldable f);
> This asks each field type to convert its Fieldable into its real type, for 
> example IntField.java has:
>  public Integer toExternalValue(Fieldable f) {
>return Integer.valueOf( toExternal(f) );
>  }
> By default, it returns a string value.  If this addition is too much, there 
> are other (less clean) ways to handle the INCREMENT command.  My real 
> motivation for this addition is that it makes it possible to implement an 
> embeddable SOLR-20 client that does not need an HTTP connection. 
> - - - -
> The SimpleSolrDoc implementation was written for SOLR-20.  It needs to play 
> nice with EL, so it implements a few extra map function that may not seem 
> necessary:
>  ${doc.values['name']]} gets a collection
>  ${doc.valueMap['name']]} gets a single value for the field
> - - - -
> The tests cover all "toExternalValue" changes in schema.*  
> SimpleSolrDoc and DocumentBuilder have 100% test coverage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries

2007-06-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504068
 ] 

Hoss Man commented on SOLR-135:
---

>> yes, I like separate package names better but i'm worried about the 
>> impact on dependent code.
>> ...
>> Are you suggesting its ok to move XML.java and SolrException.java to 
>> o.a.s.common? That seems kinda extreme 
>> for anyone using the classes...

I'm not sure.  I think we've been talking for a long time about refactoring 
some of the classes into different packages, which really only affects their 
organization when developers look at them -- if we are now also looking at 
reorganizing them into jars, and ensuring that certain subsets can be compiled 
into their own jar with no dependencie on files not in that jar -- then i think 
we might as well do both at once.  I said, I could probably be convinced that 
this isn't that important, and we should continue using the same package names 
in a new src/common directory -- so perhaps a better question to ask is: do we 
want to rework the packages too?"

Most of the classes you listed seem like perfect candidates for  new "common" 
package (or at the very least o.a.s.common.util, o.a.s.common.params), but i 
have to admit i hadn't really considered SolrException ... on one hand it's 
used so pervasively it should be considered "common" (not including it would 
mean changing a *lot* of APIs of things we want to be able to include in the 
common jar) on the other hand it does have very HTTP specific error codes in it.


Just spit balling here... what if o.a.s.common.SolrException was a base class 
with no error codes (it looks like all of the "Common" classes just use 
"BAD_REQUEST" at this point so refactoring it out would be clean, and the http 
codes don't make sense in a 'common' context anyway) and 
o.a.s.util.SolrException a real (non deprecated) subclass that adds the 
ErrorCodes ... anyone catching util.SolrException is golden, anyone catching 
common.SolrException can either infer an ErrorCode from context, or assume 
BAD_REQUEST (a static utility in util.SolrException could make this easy by 
wrapping the common.SolrException in a util.SolrException.

ugh.




> Restructure / Refactor codebase for shared libraries
> 
>
> Key: SOLR-135
> URL: https://issues.apache.org/jira/browse/SOLR-135
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-135-RestructureForCommonJar.patch, 
> SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch
>
>
> For SOLR-20 and other java projects, it would be nice to have common code 
> share a codebase that does not require lucene or junit to compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.

2007-06-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504052
 ] 

Hoss Man commented on SOLR-193:
---

i'm not sure that i understand a lot of what's going on here ... the basic API 
for SolrDocument makes sense to me, but i'm not sure that i understand some of 
the methods in SimpleSolrDoc ... what is setDistinctByDefault, or 
setDistinctOrderMatters ?

Also, what is the purpose/use of DocumentBuilder.build and 
DocumentBuilder.loadStoredFields ... neither seems to be used anywhere ... if 
they are not intended for use by existing clients of DocumentBuilder, but new 
client code not year written that won't care about any of the existing stateful 
methods in DocumentBuilder,  perhaps they (the two new methods) should live in 
a separate class?

The spirit of DocumentBuilder.build makes sense to me in the context of the 
issue title -- but loadStoredFields on the other hand really doesn't make sense 
to me at all...
  1) DocumentBuilder is only involved in in building Lucene Document objects to 
index ... so why have a method in it for converting from a Lucene Document 
object to something else?
  2) I thought the SolrDocument API was for incoming documents ... why a method 
for adding values to it from an existing Lucene Document, or special logic for 
looking at stored fields?
  3) if the goal is for SolrDocument to be general enough to handle 
pre-indexing or post-searching Document representation, then we should not 
attempt to model boosts in it ... those should only live in a subclass used for 
indexing purposes (Lucene made this mistake early on, and it's caused countless 
amounts of confusion to this date) ... the loadStoredFields seems to suffer 
from this confusion by trying to access the field boosts of a Lucene Document 
that (appears to be) the result of a search --- they don't exist in those 
instances of Lucene Documents.

If these methods are not intended for use by existing clients of 
DocumentBuilder, but new client code not year written that doesn't care about 
any of the existing stateful methods in DocumentBuilder,  perhaps they (the two 
new methods) should live in a separate class?)

Hmmm... rereading the issue summary and the comments about playing nice with EL 
i see the goal is for a generic representation both in a java client sending 
docs to and reading docs back from Solr, as well as internally within Solr (or 
embedded Solr contexts) ... I think it's a mistake to try and have one single 
Interface for all three.  At the very least there should be a seperate API for 
the indexing side and the query side (because of the boost issue) which can be 
subclass/superclass relationships.

I ranted about this in a related Lucene Jira issue (note also the email 
discussion linked to from one of my comments in that issue) ...

https://issues.apache.org/jira/browse/LUCENE-778

> General SolrDocument interface to manage field values.
> --
>
> Key: SOLR-193
> URL: https://issues.apache.org/jira/browse/SOLR-193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
> Attachments: SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, 
> SOLR-193-SolrDocument.patch
>
>
> In an effort to make SOLR-139 (the "modify" command) more manageable, i 
> extracted out a large chunk.  This patch adds a general SolrDocument 
> interface and includes a concrete implementation (SimpleSolrDoc)
> SOLR-139 needs some way to transport document values independent of the 
> lucene Document.  This is required for the INCREMENT command and useful for 
> modifying documents.  SolrDocument is also generally useful for SOLR-20
> - - - - - -
> The one (potentially) controversial part is that I added a function to 
> FieldType:
>  public Object toExternalValue(Fieldable f);
> This asks each field type to convert its Fieldable into its real type, for 
> example IntField.java has:
>  public Integer toExternalValue(Fieldable f) {
>return Integer.valueOf( toExternal(f) );
>  }
> By default, it returns a string value.  If this addition is too much, there 
> are other (less clean) ways to handle the INCREMENT command.  My real 
> motivation for this addition is that it makes it possible to implement an 
> embeddable SOLR-20 client that does not need an HTTP connection. 
> - - - -
> The SimpleSolrDoc implementation was written for SOLR-20.  It needs to play 
> nice with EL, so it implements a few extra map function that may not seem 
> necessary:
>  ${doc.values['name']]} gets a collection
>  ${doc.valueMap['name']]} gets a single value for the field
> - - - -
> The tests cover all "toExternalValue" changes in schema.*  
> SimpleSolrDoc and DocumentBuilder have 100% test coverage.

-- 
Th

[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries

2007-06-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504039
 ] 

Ryan McKinley commented on SOLR-135:


As a note to anyone not looking at the patch...  this would not break API 
compatibility, but it would add a lot of empty classes that look like:

@Deprecated
public class XML extends org.apache.solr.common.XML {
  // don't use this class!
}

> Restructure / Refactor codebase for shared libraries
> 
>
> Key: SOLR-135
> URL: https://issues.apache.org/jira/browse/SOLR-135
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-135-RestructureForCommonJar.patch, 
> SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch
>
>
> For SOLR-20 and other java projects, it would be nice to have common code 
> share a codebase that does not require lucene or junit to compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries

2007-06-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504038
 ] 

Ryan McKinley commented on SOLR-135:


> 
>it would be easy to move this package to live in src/common if people think 
>there is a need, my main concern is just that we shouldn't have 
>"org.apache.solr.util" living in two places (src/java and src/common)
> 

yes, I like separate package names better but i'm worried about the impact 
on dependent code.  

The classes needed for SOLR-20 are:
 http://solrstuff.org/svn/solrj/src/org/apache/solr/util/
 http://solrstuff.org/svn/solrj/src/org/apache/solr/request/
 http://solrstuff.org/svn/solrj/src/org/apache/solr/core/

Are you suggesting its ok to move XML.java and SolrException.java to 
o.a.s.common?  That seems kinda extreme for anyone using the classes...

If it is ok, i'm all for it...  if not, I think we should make the 'common' 
package and put anything new in there, adding comments to the classes that 
should be moved in the future.  


> Restructure / Refactor codebase for shared libraries
> 
>
> Key: SOLR-135
> URL: https://issues.apache.org/jira/browse/SOLR-135
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-135-RestructureForCommonJar.patch, 
> SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch
>
>
> For SOLR-20 and other java projects, it would be nice to have common code 
> share a codebase that does not require lucene or junit to compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-135) Restructure / Refactor codebase for shared libraries

2007-06-12 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-135:
--

Attachment: SOLR-135-RestructureForCommonJar.patch

as far as i can tell the manifest merging that the ant docs describe for the 
 task just flat out don't work, so we just wont use the new macro for hte 
post.jar


NOTE: with this patch, the intent is to svn copy XML.java to the new common 
dir, then patch the existing file to purge it's body and add the deprecated 
messages.

as i mentioned before, this appraoch doesn't use src/common at all ... it 
assumes a new "org.apache.solr.common" package in src/java and uses 
include/exclude rules to make sure things in that package live in the common 
jar (and compile first)

it would be easy to move this package to live in src/common if people think 
there is a need, my main concern is just that we shouldn't have 
"org.apache.solr.util" living in two places (src/java and src/common)

> Restructure / Refactor codebase for shared libraries
> 
>
> Key: SOLR-135
> URL: https://issues.apache.org/jira/browse/SOLR-135
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-135-RestructureForCommonJar.patch, 
> SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch
>
>
> For SOLR-20 and other java projects, it would be nice to have common code 
> share a codebase that does not require lucene or junit to compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-135) Restructure / Refactor codebase for shared libraries

2007-06-12 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-135:
--

Attachment: SOLR-135-RestructureForCommonJar.patch

here's what i've got so far ...it occured to me that if we use seperate package 
names, we don't actually need to separate the code out, we can do it all with 
exclude/include directives.

this has one small glitch at the moment, post.jar isnt' getting it's main-class 
set properly ... might be a mistake i made, or it might be a defect in the 
manifest merging ant is suppose to do ... i'll check it out later

(this isn't a big deal though, post.jar has never really had a good manifest 
file, i was just trying to clean that up when i added the macro)

> Restructure / Refactor codebase for shared libraries
> 
>
> Key: SOLR-135
> URL: https://issues.apache.org/jira/browse/SOLR-135
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-135-RestructureForCommonJar.patch, 
> SOLR-135-RestructureForCommonJar.patch
>
>
> For SOLR-20 and other java projects, it would be nice to have common code 
> share a codebase that does not require lucene or junit to compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-236) Field collapsing

2007-06-12 Thread Mike Klaas

On 12-Jun-07, at 2:36 PM, Yonik Seeley wrote:


On 6/12/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

The way I do field collapsing is simply gathering documents and
collapsing them until I've gathered X groups for user display (which
usually involves looking at a few tens of documents more, rather than
the entire 3,000,000+ result set).


Isn't this then dependent on the order of the documents in the index?
Or it sounds like you don't "promote" lower scoring documents into a
higher scoring group unless they both happen to be in the top docs
requested?


Precisely.  I don't care how many docs are in a group, just avoiding  
displaying two documents in the same group.  That way you can process  
the docs in score order for essentially zero cost.


-Mike


Re: [jira] Commented: (SOLR-236) Field collapsing

2007-06-12 Thread Yonik Seeley

On 6/12/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

The way I do field collapsing is simply gathering documents and
collapsing them until I've gathered X groups for user display (which
usually involves looking at a few tens of documents more, rather than
the entire 3,000,000+ result set).


Isn't this then dependent on the order of the documents in the index?
Or it sounds like you don't "promote" lower scoring documents into a
higher scoring group unless they both happen to be in the top docs
requested?

-Yonik


Re: [jira] Commented: (SOLR-236) Field collapsing

2007-06-12 Thread Mike Klaas

On 11-Jun-07, at 5:48 PM, Chris Hostetter wrote:



: Yes, the current JIRA patch uses the FieldCache.

I just ment in contrast with Mike's comment about iterating over  
all the
stored fields to support the "post-faceting" situation (but frankly  
i'm
not sure that i undersatnd what the "post-faceting" situation is,  
so feel

free to ignore me)


I'm not sure either--I assume that it means facet on a DocSet that is  
limited to the the representative doc in each collapsed group.  Or is  
it faceting within each group?


If so, then all documents in the result set needs to be collapsed to  
determine this list of docs (which perhaps is not too inefficient?).   
The way I do field collapsing is simply gathering documents and  
collapsing them until I've gathered X groups for user display (which  
usually involves looking at a few tens of documents more, rather than  
the entire 3,000,000+ result set).


I'm going to bow out now, as I don't think I understand what exactly  
we're talking about 


-Mike


[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries

2007-06-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504009
 ] 

Hoss Man commented on SOLR-135:
---

one the topic of adding a src/common directory ... i think in the long run 
we'll be happier if there is no overlap in the java package names that live in 
this directory and the ones that live in src/java (much the way the only java 
packages in src/webapp are o.a.s.servlet) ... so using 
src/common/org/apache/solr/common/XML.java may be a better way to go (even 
though it means we would need to leave a deprecated 
src/java/org/apache/solr/util/XML.java subclassing it in src/java)  I could 
probably be convinced that this isn't that important, but i've definitely found 
it confusing for people that some of the lucene-java contribs reuse the same 
package names as the core classes in some cases)

on the subject of the build.xml ... now that we've got three instances of 
 and two of  we probably want to make some macros for them to 
reduce redundency.

Gimme 30 minutes to see if i can whip up a derivitive patch ... if i dont' 
attach one it means i got sidetracked with something else.

> Restructure / Refactor codebase for shared libraries
> 
>
> Key: SOLR-135
> URL: https://issues.apache.org/jira/browse/SOLR-135
> Project: Solr
>  Issue Type: Wish
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-135-RestructureForCommonJar.patch
>
>
> For SOLR-20 and other java projects, it would be nice to have common code 
> share a codebase that does not require lucene or junit to compile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-243) Create a hook to allow custome code to create custome index readers

2007-06-12 Thread John Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wang updated SOLR-243:
---

Attachment: indexReaderFactory.patch

My apologies for not being patient with this process.
I have made the requested changes and submitted another patch.

Please let me know if these are the correct things to do.

Thanks

-John

> Create a hook to allow custome code to create custome index readers
> ---
>
> Key: SOLR-243
> URL: https://issues.apache.org/jira/browse/SOLR-243
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
> Environment: Solr core
>Reporter: John Wang
> Fix For: 1.3
>
> Attachments: indexReaderFactory.patch, indexReaderFactory.patch, 
> indexReaderFactory.patch
>
>
> I have a customized IndexReader and I want to write a Solr plugin to use my 
> derived IndexReader implementation. Currently IndexReader instantiation is 
> hard coded to be: 
> IndexReader.open(path)
> It would be really useful if this is done thru a plugable factory that can be 
> configured, e.g. IndexReaderFactory
> interface IndexReaderFactory{
>  IndexReader newReader(String name,String path);
> }
> the default implementation would just return: IndexReader.open(path)
> And in the newSearcher and getSearcher methods in SolrCore class can call the 
> current factory implementation to get the IndexReader instance and then build 
> the SolrIndexSearcher by passing in the reader.
> It would be really nice to add this improvement soon (This seems to be a 
> trivial addition) as our project really depends on this.
> Thanks
> -John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Chris Hostetter

: In this case, relative to solr.home makes the most sense.  I like
: -Dsolr.data.dir=XXX out of the box, but enabling it explicitly isn't hard...

Yeah, I'm probably just being overly paranoid.  Making dataDir be relative
Solr Home is probably the way it should have worked all along ... so as
long as it's heavily documented in CHANGES.txt i think we'll be fine.

i suspect if anyone was specifying a solrhome *and* specifying a dataDir
we would have gotten a question about dataDir not being relative solrhome.
(allthough maybe we will now that 1.2 has the  block un
commented)

We could just add an optional rel="cwd" (vs rel="solr") attribute to the
 tag and make it really explicit.



-Hoss



Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Ryan McKinley


I'm ambivalent though - I'm happy to reverse the change to the example 
solrconfig.xml too, though I like that one can fire up the example 
configuration with a different data directory easily.




I am happy either way - relative to solr home or just commented out.

In this case, relative to solr.home makes the most sense.  I like 
-Dsolr.data.dir=XXX out of the box, but enabling it explicitly isn't hard...








Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Erik Hatcher


On Jun 12, 2007, at 1:57 PM, Chris Hostetter wrote:



: Instead, maybe a relative path should be made relative to Solr's  
home

: directory instead of to the current working directory?

I'm not sure how i feel about that ... it would be one thing if  
Solr could

generate an error if the dataDir didn't exist, but since we create the
directory on the fly as needed, changing this behavior  is
relative working directory vs solr.home) could really confuse people.

Does the system property substitution stuff in the solrconfig deal  
with

substitutions that contain other substitutions?  if so then maybe we
should set the solr.solr.home system property if we see that solr  
home has

been specified with JNDI, and make the example solrconfig something
like...

${solr.data.dir:${solr.solr.home}/solr/data}


No, it does not support that type of substitution.


...but people can still choose to use something like...

data

...and have it mean "the data directory in my current working  
directory"



Of course, i may just be paranoid about breaking esoteric use cases.


It's a good point to consider.  Personally I'd never run with things  
specified out of the current directory that way, so its hard for me  
to identify with the troubles this change would make.  If folks were  
using the example application as-is this change wouldn't affect them.


I'm ambivalent though - I'm happy to reverse the change to the  
example solrconfig.xml too, though I like that one can fire up the  
example configuration with a different data directory easily.


Erik



[jira] Commented: (SOLR-243) Create a hook to allow custome code to create custome index readers

2007-06-12 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503959
 ] 

Hoss Man commented on SOLR-243:
---

1) i'm sorry, i transposed the lines in my mind when i was readingthe patch 
(you've made a private constructor public, not the otherway arround -- my 
mistake)

2) yes, you're using Config.findClass ... what yonik asked was if there was a 
particular reason not to use Config.newInstance(name) in the 
loadIndexReaderFactory ... there is a lot of duplicate code in that method 
(mainly exception handling) that Config.newInstance takes care of for you.

3) I think you're missing my point about indexDefaults and mainIndex ... it's 
not a matter of just picking one, it's making it work with both so that a 
factory can be specified in the defaults for use anytime an IndexReader is 
opened, or from mainIndex awhen the "main index" is opened.  I just poked 
around and found that the relevant class is "SolrIndexConfig"  ... my 
suggestion was that this be where the IndexReaderFactory hook be so that it 
works the same way.



I'm sorry if you feel like you are jumping through a lot of hoops ... it's not 
my intention to be difficult, i'm just making comments on the patch and asking 
general questions (not specificly directed at your patch) about how Solr as a 
project can best support the topic of this issue (hooks to allow custom code to 
create custom index readers).

If the patch you have works well for you that's great, but that doesn't mean it 
will work well for everyone, which is something committers have to keep that in 
mind ... making public API changes (including new config syntax and especially 
new plugin hooks) is a serious change to the project and has to be considered 
very carefully because we have to be able to support it for a very very long 
time.



> Create a hook to allow custome code to create custome index readers
> ---
>
> Key: SOLR-243
> URL: https://issues.apache.org/jira/browse/SOLR-243
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
> Environment: Solr core
>Reporter: John Wang
> Fix For: 1.3
>
> Attachments: indexReaderFactory.patch, indexReaderFactory.patch
>
>
> I have a customized IndexReader and I want to write a Solr plugin to use my 
> derived IndexReader implementation. Currently IndexReader instantiation is 
> hard coded to be: 
> IndexReader.open(path)
> It would be really useful if this is done thru a plugable factory that can be 
> configured, e.g. IndexReaderFactory
> interface IndexReaderFactory{
>  IndexReader newReader(String name,String path);
> }
> the default implementation would just return: IndexReader.open(path)
> And in the newSearcher and getSearcher methods in SolrCore class can call the 
> current factory implementation to get the IndexReader instance and then build 
> the SolrIndexSearcher by passing in the reader.
> It would be really nice to add this improvement soon (This seems to be a 
> trivial addition) as our project really depends on this.
> Thanks
> -John

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Chris Hostetter

: Instead, maybe a relative path should be made relative to Solr's home
: directory instead of to the current working directory?

I'm not sure how i feel about that ... it would be one thing if Solr could
generate an error if the dataDir didn't exist, but since we create the
directory on the fly as needed, changing this behavior  is
relative working directory vs solr.home) could really confuse people.

Does the system property substitution stuff in the solrconfig deal with
substitutions that contain other substitutions?  if so then maybe we
should set the solr.solr.home system property if we see that solr home has
been specified with JNDI, and make the example solrconfig something
like...

${solr.data.dir:${solr.solr.home}/solr/data}

...but people can still choose to use something like...

data

...and have it mean "the data directory in my current working directory"


Of course, i may just be paranoid about breaking esoteric use cases.

-Hoss



Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Erik Hatcher


On Jun 12, 2007, at 12:37 PM, Ryan McKinley wrote:


Erik Hatcher wrote:

On Jun 12, 2007, at 1:34 AM, Ryan McKinley wrote:

-  
+  ${solr.data.dir:./solr/data}
+



I just ran into something weird with this...

If you set the solr home with JNDI or solr.solr.home, the dataDir  
still defaults to "./solr/data" -- not the data directory  
relative to solr home.  This requires you to *also* set:  
"solr.data.dir" if you want to use a different data directory.


I think we should comment it out so that the by default, setting  
solr home moves everything.
Instead, maybe a relative path should be made relative to Solr's  
home directory instead of to the current working directory?


that sounds good.

It is nice to be able to set the data directory with a property...


Does this patch work for you, Ryan?


Index: src/java/org/apache/solr/core/SolrCore.java
===
--- src/java/org/apache/solr/core/SolrCore.java (revision 546568)
+++ src/java/org/apache/solr/core/SolrCore.java (working copy)
@@ -17,6 +17,8 @@
package org.apache.solr.core;
+import static org.apache.solr.core.Config.getInstanceDir;
+
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
@@ -73,8 +75,8 @@
   public static Logger log = Logger.getLogger(SolrCore.class.getName 
());

   private final IndexSchema schema;
-  private final String dataDir;
-  private final String index_path;
+  private final File dataDir;
+  private final File index_path;
   private final UpdateHandler updateHandler;
   private static final long startTime = System.currentTimeMillis();
   private final RequestHandlers reqHandlers = new RequestHandlers();
@@ -114,8 +116,8 @@
   }
   public IndexSchema getSchema() { return schema; }
-  public String getDataDir() { return dataDir; }
-  public String getIndexDir() { return index_path; }
+  public String getDataDir() { return dataDir.getAbsolutePath(); }
+  public String getIndexDir() { return index_path.getAbsolutePath(); }
   // gets a non-caching searcher
   public SolrIndexSearcher newSearcher(String name) throws  
IOException {

@@ -187,18 +189,19 @@
   core = this;   // set singleton
   if (dataDir ==null) {
-dataDir = SolrConfig.config.get 
("dataDir",Config.getInstanceDir()+"data");
+dataDir = SolrConfig.config.get("dataDir", getInstanceDir() 
+"data");

   }
-  log.info("Opening new SolrCore at " + Config.getInstanceDir()  
+ ", dataDir="+dataDir);
+  log.info("Opening new SolrCore at " + getInstanceDir() + ",  
dataDir="+dataDir);

   if (schema==null) {
 schema = new IndexSchema("schema.xml");
   }
   this.schema = schema;
-  this.dataDir = dataDir;
-  this.index_path = dataDir + "/" + "index";
+  File dataDirTemp = new File(dataDir);
+  this.dataDir = dataDirTemp;
+  this.index_path = new File(this.dataDir,"index");
   this.maxWarmingSearchers = SolrConfig.config.getInt("query/ 
maxWarmingSearchers",Integer.MAX_VALUE);

@@ -421,7 +424,7 @@
 // if this fails, we need to decrement onDeckSearchers again.
 SolrIndexSearcher tmp;
 try {
-  tmp = new SolrIndexSearcher(schema, "main", index_path, true);
+  tmp = new SolrIndexSearcher(schema, "main", getIndexDir(), true);
 } catch (Throwable th) {
   synchronized(searcherLock) {
 onDeckSearchers--;


Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Ryan McKinley

Erik Hatcher wrote:


On Jun 12, 2007, at 1:34 AM, Ryan McKinley wrote:

-  
+  ${solr.data.dir:./solr/data}
+



I just ran into something weird with this...

If you set the solr home with JNDI or solr.solr.home, the dataDir 
still defaults to "./solr/data" -- not the data directory relative to 
solr home.  This requires you to *also* set: "solr.data.dir" if you 
want to use a different data directory.


I think we should comment it out so that the by default, setting solr 
home moves everything.


Instead, maybe a relative path should be made relative to Solr's home 
directory instead of to the current working directory?




that sounds good.

It is nice to be able to set the data directory with a property...



Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2007-06-12 Thread Erik Hatcher


On Jun 12, 2007, at 1:34 AM, Ryan McKinley wrote:

-  
+  ${solr.data.dir:./solr/data}
+



I just ran into something weird with this...

If you set the solr home with JNDI or solr.solr.home, the dataDir  
still defaults to "./solr/data" -- not the data directory relative  
to solr home.  This requires you to *also* set: "solr.data.dir" if  
you want to use a different data directory.


I think we should comment it out so that the by default, setting  
solr home moves everything.


Instead, maybe a relative path should be made relative to Solr's home  
directory instead of to the current working directory?


Erik