[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.
[ https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504132 ] Hoss Man commented on SOLR-193: --- these comments are very happhazard, and in the best order i can think of (not hte order i wrote them) > Perhaps it would be better to leave out the edge cases and just focus on the > SolrDocument ...i don't mind big patches that have a lot of things ... it's just weird when there is a big patch with a lot of stuff and it's not clear what it's for :) ... i was mainly looking for someplace where an UpdateHandler was making a SolrDocument and then calling build on it. > Is the only difference between the input Document and output Document that it > has boosts? there is some more complexity in Lucene docs because of things like the Fieldable options but i don't think those really impact a SolrDocument API since that info is abstracted into the schema and can't be set on a per document basis. > Should we have: > SolrDocument > + BoostedSolrDocument BoostedSolrDocument seems to specific to the methods added, and not to the purpose of the class ... i would call it a "SolrInputDocument" (IndexSolrDocument is too vague since the term "index" is used so much in the code base) The basic structure in the new patch looks fine by the way, no real concerns from me once the comments are cleaned up (one question: should SolrDocument implement Map> ??) > This is for SOLR-139. to 'modify' a document, you load the existing Document > - change it - > then store it back. > > These two functions can happily live in a new class, and could be attached to > SOLR-139. ...oh, right, i forgot about the "update in place" patch yeah i don't think those methods should live in DocumentBuilder (am i alone in thinking DocumentBuilder should probably be deprecated completely once this stuff is commited? ... or ... hmmm ... it could probably be subclassed by one that supports adding a whole SolrInputDocument at once, or one that can start with an older Document and update it with a new SolrInputDocument ... but we can worry about that later) "updating" is a direct example of the type of thing i refered to in LUCENE-778 about why a single Lucene Document class is bad. to support updating you should have an explicitly means of composing the output class into the input class ... but in that case you're dealing directly with Lucene Documents -- i can understand why we would need to modify a Lucene document into a SolrInputDocument ... but i don't think we really need to worry about the SolrDocument => SolrInputDocument case right? > General SolrDocument interface to manage field values. > -- > > Key: SOLR-193 > URL: https://issues.apache.org/jira/browse/SOLR-193 > Project: Solr > Issue Type: New Feature >Reporter: Ryan McKinley > Attachments: SOLR-193-SimpleSolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch > > > In an effort to make SOLR-139 (the "modify" command) more manageable, i > extracted out a large chunk. This patch adds a general SolrDocument > interface and includes a concrete implementation (SimpleSolrDoc) > SOLR-139 needs some way to transport document values independent of the > lucene Document. This is required for the INCREMENT command and useful for > modifying documents. SolrDocument is also generally useful for SOLR-20 > - - - - - - > The one (potentially) controversial part is that I added a function to > FieldType: > public Object toExternalValue(Fieldable f); > This asks each field type to convert its Fieldable into its real type, for > example IntField.java has: > public Integer toExternalValue(Fieldable f) { >return Integer.valueOf( toExternal(f) ); > } > By default, it returns a string value. If this addition is too much, there > are other (less clean) ways to handle the INCREMENT command. My real > motivation for this addition is that it makes it possible to implement an > embeddable SOLR-20 client that does not need an HTTP connection. > - - - - > The SimpleSolrDoc implementation was written for SOLR-20. It needs to play > nice with EL, so it implements a few extra map function that may not seem > necessary: > ${doc.values['name']]} gets a collection > ${doc.valueMap['name']]} gets a single value for the field > - - - - > The tests cover all "toExternalValue" changes in schema.* > SimpleSolrDoc and DocumentBuilder have 100% test coverage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-193) General SolrDocument interface to manage field values.
[ https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-193: --- Attachment: SOLR-193-SimpleSolrDocument.patch Here is a much much smaller patch that only adds the SolrDocument *class* and BoostableSolrDocument subclass. We can work through the other bits later, but this would be sufficient for SOLR-20 It is a quick eclipes refactoring, so the comments may not make sense. I'll check that over in better detail after you all get a chance to look at it... > General SolrDocument interface to manage field values. > -- > > Key: SOLR-193 > URL: https://issues.apache.org/jira/browse/SOLR-193 > Project: Solr > Issue Type: New Feature >Reporter: Ryan McKinley > Attachments: SOLR-193-SimpleSolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch > > > In an effort to make SOLR-139 (the "modify" command) more manageable, i > extracted out a large chunk. This patch adds a general SolrDocument > interface and includes a concrete implementation (SimpleSolrDoc) > SOLR-139 needs some way to transport document values independent of the > lucene Document. This is required for the INCREMENT command and useful for > modifying documents. SolrDocument is also generally useful for SOLR-20 > - - - - - - > The one (potentially) controversial part is that I added a function to > FieldType: > public Object toExternalValue(Fieldable f); > This asks each field type to convert its Fieldable into its real type, for > example IntField.java has: > public Integer toExternalValue(Fieldable f) { >return Integer.valueOf( toExternal(f) ); > } > By default, it returns a string value. If this addition is too much, there > are other (less clean) ways to handle the INCREMENT command. My real > motivation for this addition is that it makes it possible to implement an > embeddable SOLR-20 client that does not need an HTTP connection. > - - - - > The SimpleSolrDoc implementation was written for SOLR-20. It needs to play > nice with EL, so it implements a few extra map function that may not seem > necessary: > ${doc.values['name']]} gets a collection > ${doc.valueMap['name']]} gets a single value for the field > - - - - > The tests cover all "toExternalValue" changes in schema.* > SimpleSolrDoc and DocumentBuilder have 100% test coverage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.
[ https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504100 ] Yonik Seeley commented on SOLR-193: --- > This sounds fine. We should *defiantly* solve any know problems with the > Lucene document interface. > Just using an interface (rather then a concrete class) will be a huge help. I know this runs contrary to common java OO wisdom, but interfaces can really suck. They don't hurt the *consumer* of a class, but cause major headaches for the *provider*, trying to evolve an interface and still provide backward compatibility (it's pretty much impossible). In Lucene, where we have had a class (like Analyzer), it was trivial adding new functionality like getPositionIncrement(). If it had been an interface, it would have been impossible without breaking all the custom analyzers out there. Where we have had interfaces, and added a new method, we simply broke some peoples code. So if it's something that a customer might possibly subclass, a class used as an interface is a much better option. If it's internal, or package projected, or something where you *really* need multiple inheritance, then an interface is fine. > General SolrDocument interface to manage field values. > -- > > Key: SOLR-193 > URL: https://issues.apache.org/jira/browse/SOLR-193 > Project: Solr > Issue Type: New Feature >Reporter: Ryan McKinley > Attachments: SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch > > > In an effort to make SOLR-139 (the "modify" command) more manageable, i > extracted out a large chunk. This patch adds a general SolrDocument > interface and includes a concrete implementation (SimpleSolrDoc) > SOLR-139 needs some way to transport document values independent of the > lucene Document. This is required for the INCREMENT command and useful for > modifying documents. SolrDocument is also generally useful for SOLR-20 > - - - - - - > The one (potentially) controversial part is that I added a function to > FieldType: > public Object toExternalValue(Fieldable f); > This asks each field type to convert its Fieldable into its real type, for > example IntField.java has: > public Integer toExternalValue(Fieldable f) { >return Integer.valueOf( toExternal(f) ); > } > By default, it returns a string value. If this addition is too much, there > are other (less clean) ways to handle the INCREMENT command. My real > motivation for this addition is that it makes it possible to implement an > embeddable SOLR-20 client that does not need an HTTP connection. > - - - - > The SimpleSolrDoc implementation was written for SOLR-20. It needs to play > nice with EL, so it implements a few extra map function that may not seem > necessary: > ${doc.values['name']]} gets a collection > ${doc.valueMap['name']]} gets a single value for the field > - - - - > The tests cover all "toExternalValue" changes in schema.* > SimpleSolrDoc and DocumentBuilder have 100% test coverage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.
[ https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504095 ] Ryan McKinley commented on SOLR-193: For background. This class has functionality used for other issues including SOLR-104, SOLR-139. For a while i tried keeping the functionality in different patches, but it became too much of a nightmare to maintain. Perhaps it would be better to leave out the edge cases and just focus on the SolrDocument interface now... > what is setDistinctByDefault, or setDistinctOrderMatters ? > These options let you say if the field values should be backed by a Map or a List, the DistinctOrderMatters says if it should be Map or LinkedHashMap These were useful for SOLR-104 when you SQL join a table and may get duplicate rows, but only want the distinct values to keep fields. Now that you point it out, (and there is a good chance it will be in trunk soon) It would make more sense to implement these features as different subclasses of SimpleSolrDocument. > Also, what is the purpose/use of DocumentBuilder.build and > DocumentBuilder.loadStoredFields This is for SOLR-139. to 'modify' a document, you load the existing Document - change it - then store it back. These two functions can happily live in a new class, and could be attached to SOLR-139. > 2) I thought the SolrDocument API was for incoming documents ... I hope it is also useful for modifying existing Documents and transforming incoming/outgoing documents (but I'll raise that issue later ;) > I think it's a mistake to try and have one single Interface for all three. > ... At the very least there should be a seperate API for the indexing side > and the query side (because of the boost issue) which can be > subclass/superclass relationships. > This sounds fine. We should *defiantly* solve any know problems with the Lucene document interface. Just using an interface (rather then a concrete class) will be a huge help. Is the only difference between the input Document and output Document that it has boosts? Should we have: SolrDocument + BoostedSolrDocument or SolrDocument + IndexSolrDocument Any thoughts on the common use case where I want to pull a document out of the index (no boosts) change it, then put it back? Do i need to make a new class and copy all the fields? Should SOLR-20 be able to index a SolrDocument (no boosts) as well as a BoostedSolrDocument? I think so... Thanks for looking at this! > General SolrDocument interface to manage field values. > -- > > Key: SOLR-193 > URL: https://issues.apache.org/jira/browse/SOLR-193 > Project: Solr > Issue Type: New Feature >Reporter: Ryan McKinley > Attachments: SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch > > > In an effort to make SOLR-139 (the "modify" command) more manageable, i > extracted out a large chunk. This patch adds a general SolrDocument > interface and includes a concrete implementation (SimpleSolrDoc) > SOLR-139 needs some way to transport document values independent of the > lucene Document. This is required for the INCREMENT command and useful for > modifying documents. SolrDocument is also generally useful for SOLR-20 > - - - - - - > The one (potentially) controversial part is that I added a function to > FieldType: > public Object toExternalValue(Fieldable f); > This asks each field type to convert its Fieldable into its real type, for > example IntField.java has: > public Integer toExternalValue(Fieldable f) { >return Integer.valueOf( toExternal(f) ); > } > By default, it returns a string value. If this addition is too much, there > are other (less clean) ways to handle the INCREMENT command. My real > motivation for this addition is that it makes it possible to implement an > embeddable SOLR-20 client that does not need an HTTP connection. > - - - - > The SimpleSolrDoc implementation was written for SOLR-20. It needs to play > nice with EL, so it implements a few extra map function that may not seem > necessary: > ${doc.values['name']]} gets a collection > ${doc.valueMap['name']]} gets a single value for the field > - - - - > The tests cover all "toExternalValue" changes in schema.* > SimpleSolrDoc and DocumentBuilder have 100% test coverage. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries
[ https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504068 ] Hoss Man commented on SOLR-135: --- >> yes, I like separate package names better but i'm worried about the >> impact on dependent code. >> ... >> Are you suggesting its ok to move XML.java and SolrException.java to >> o.a.s.common? That seems kinda extreme >> for anyone using the classes... I'm not sure. I think we've been talking for a long time about refactoring some of the classes into different packages, which really only affects their organization when developers look at them -- if we are now also looking at reorganizing them into jars, and ensuring that certain subsets can be compiled into their own jar with no dependencie on files not in that jar -- then i think we might as well do both at once. I said, I could probably be convinced that this isn't that important, and we should continue using the same package names in a new src/common directory -- so perhaps a better question to ask is: do we want to rework the packages too?" Most of the classes you listed seem like perfect candidates for new "common" package (or at the very least o.a.s.common.util, o.a.s.common.params), but i have to admit i hadn't really considered SolrException ... on one hand it's used so pervasively it should be considered "common" (not including it would mean changing a *lot* of APIs of things we want to be able to include in the common jar) on the other hand it does have very HTTP specific error codes in it. Just spit balling here... what if o.a.s.common.SolrException was a base class with no error codes (it looks like all of the "Common" classes just use "BAD_REQUEST" at this point so refactoring it out would be clean, and the http codes don't make sense in a 'common' context anyway) and o.a.s.util.SolrException a real (non deprecated) subclass that adds the ErrorCodes ... anyone catching util.SolrException is golden, anyone catching common.SolrException can either infer an ErrorCode from context, or assume BAD_REQUEST (a static utility in util.SolrException could make this easy by wrapping the common.SolrException in a util.SolrException. ugh. > Restructure / Refactor codebase for shared libraries > > > Key: SOLR-135 > URL: https://issues.apache.org/jira/browse/SOLR-135 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-135-RestructureForCommonJar.patch, > SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch > > > For SOLR-20 and other java projects, it would be nice to have common code > share a codebase that does not require lucene or junit to compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-193) General SolrDocument interface to manage field values.
[ https://issues.apache.org/jira/browse/SOLR-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504052 ] Hoss Man commented on SOLR-193: --- i'm not sure that i understand a lot of what's going on here ... the basic API for SolrDocument makes sense to me, but i'm not sure that i understand some of the methods in SimpleSolrDoc ... what is setDistinctByDefault, or setDistinctOrderMatters ? Also, what is the purpose/use of DocumentBuilder.build and DocumentBuilder.loadStoredFields ... neither seems to be used anywhere ... if they are not intended for use by existing clients of DocumentBuilder, but new client code not year written that won't care about any of the existing stateful methods in DocumentBuilder, perhaps they (the two new methods) should live in a separate class? The spirit of DocumentBuilder.build makes sense to me in the context of the issue title -- but loadStoredFields on the other hand really doesn't make sense to me at all... 1) DocumentBuilder is only involved in in building Lucene Document objects to index ... so why have a method in it for converting from a Lucene Document object to something else? 2) I thought the SolrDocument API was for incoming documents ... why a method for adding values to it from an existing Lucene Document, or special logic for looking at stored fields? 3) if the goal is for SolrDocument to be general enough to handle pre-indexing or post-searching Document representation, then we should not attempt to model boosts in it ... those should only live in a subclass used for indexing purposes (Lucene made this mistake early on, and it's caused countless amounts of confusion to this date) ... the loadStoredFields seems to suffer from this confusion by trying to access the field boosts of a Lucene Document that (appears to be) the result of a search --- they don't exist in those instances of Lucene Documents. If these methods are not intended for use by existing clients of DocumentBuilder, but new client code not year written that doesn't care about any of the existing stateful methods in DocumentBuilder, perhaps they (the two new methods) should live in a separate class?) Hmmm... rereading the issue summary and the comments about playing nice with EL i see the goal is for a generic representation both in a java client sending docs to and reading docs back from Solr, as well as internally within Solr (or embedded Solr contexts) ... I think it's a mistake to try and have one single Interface for all three. At the very least there should be a seperate API for the indexing side and the query side (because of the boost issue) which can be subclass/superclass relationships. I ranted about this in a related Lucene Jira issue (note also the email discussion linked to from one of my comments in that issue) ... https://issues.apache.org/jira/browse/LUCENE-778 > General SolrDocument interface to manage field values. > -- > > Key: SOLR-193 > URL: https://issues.apache.org/jira/browse/SOLR-193 > Project: Solr > Issue Type: New Feature >Reporter: Ryan McKinley > Attachments: SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch, SOLR-193-SolrDocument.patch, > SOLR-193-SolrDocument.patch > > > In an effort to make SOLR-139 (the "modify" command) more manageable, i > extracted out a large chunk. This patch adds a general SolrDocument > interface and includes a concrete implementation (SimpleSolrDoc) > SOLR-139 needs some way to transport document values independent of the > lucene Document. This is required for the INCREMENT command and useful for > modifying documents. SolrDocument is also generally useful for SOLR-20 > - - - - - - > The one (potentially) controversial part is that I added a function to > FieldType: > public Object toExternalValue(Fieldable f); > This asks each field type to convert its Fieldable into its real type, for > example IntField.java has: > public Integer toExternalValue(Fieldable f) { >return Integer.valueOf( toExternal(f) ); > } > By default, it returns a string value. If this addition is too much, there > are other (less clean) ways to handle the INCREMENT command. My real > motivation for this addition is that it makes it possible to implement an > embeddable SOLR-20 client that does not need an HTTP connection. > - - - - > The SimpleSolrDoc implementation was written for SOLR-20. It needs to play > nice with EL, so it implements a few extra map function that may not seem > necessary: > ${doc.values['name']]} gets a collection > ${doc.valueMap['name']]} gets a single value for the field > - - - - > The tests cover all "toExternalValue" changes in schema.* > SimpleSolrDoc and DocumentBuilder have 100% test coverage. -- Th
[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries
[ https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504039 ] Ryan McKinley commented on SOLR-135: As a note to anyone not looking at the patch... this would not break API compatibility, but it would add a lot of empty classes that look like: @Deprecated public class XML extends org.apache.solr.common.XML { // don't use this class! } > Restructure / Refactor codebase for shared libraries > > > Key: SOLR-135 > URL: https://issues.apache.org/jira/browse/SOLR-135 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-135-RestructureForCommonJar.patch, > SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch > > > For SOLR-20 and other java projects, it would be nice to have common code > share a codebase that does not require lucene or junit to compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries
[ https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504038 ] Ryan McKinley commented on SOLR-135: > >it would be easy to move this package to live in src/common if people think >there is a need, my main concern is just that we shouldn't have >"org.apache.solr.util" living in two places (src/java and src/common) > yes, I like separate package names better but i'm worried about the impact on dependent code. The classes needed for SOLR-20 are: http://solrstuff.org/svn/solrj/src/org/apache/solr/util/ http://solrstuff.org/svn/solrj/src/org/apache/solr/request/ http://solrstuff.org/svn/solrj/src/org/apache/solr/core/ Are you suggesting its ok to move XML.java and SolrException.java to o.a.s.common? That seems kinda extreme for anyone using the classes... If it is ok, i'm all for it... if not, I think we should make the 'common' package and put anything new in there, adding comments to the classes that should be moved in the future. > Restructure / Refactor codebase for shared libraries > > > Key: SOLR-135 > URL: https://issues.apache.org/jira/browse/SOLR-135 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-135-RestructureForCommonJar.patch, > SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch > > > For SOLR-20 and other java projects, it would be nice to have common code > share a codebase that does not require lucene or junit to compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-135) Restructure / Refactor codebase for shared libraries
[ https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-135: -- Attachment: SOLR-135-RestructureForCommonJar.patch as far as i can tell the manifest merging that the ant docs describe for the task just flat out don't work, so we just wont use the new macro for hte post.jar NOTE: with this patch, the intent is to svn copy XML.java to the new common dir, then patch the existing file to purge it's body and add the deprecated messages. as i mentioned before, this appraoch doesn't use src/common at all ... it assumes a new "org.apache.solr.common" package in src/java and uses include/exclude rules to make sure things in that package live in the common jar (and compile first) it would be easy to move this package to live in src/common if people think there is a need, my main concern is just that we shouldn't have "org.apache.solr.util" living in two places (src/java and src/common) > Restructure / Refactor codebase for shared libraries > > > Key: SOLR-135 > URL: https://issues.apache.org/jira/browse/SOLR-135 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-135-RestructureForCommonJar.patch, > SOLR-135-RestructureForCommonJar.patch, SOLR-135-RestructureForCommonJar.patch > > > For SOLR-20 and other java projects, it would be nice to have common code > share a codebase that does not require lucene or junit to compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-135) Restructure / Refactor codebase for shared libraries
[ https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-135: -- Attachment: SOLR-135-RestructureForCommonJar.patch here's what i've got so far ...it occured to me that if we use seperate package names, we don't actually need to separate the code out, we can do it all with exclude/include directives. this has one small glitch at the moment, post.jar isnt' getting it's main-class set properly ... might be a mistake i made, or it might be a defect in the manifest merging ant is suppose to do ... i'll check it out later (this isn't a big deal though, post.jar has never really had a good manifest file, i was just trying to clean that up when i added the macro) > Restructure / Refactor codebase for shared libraries > > > Key: SOLR-135 > URL: https://issues.apache.org/jira/browse/SOLR-135 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-135-RestructureForCommonJar.patch, > SOLR-135-RestructureForCommonJar.patch > > > For SOLR-20 and other java projects, it would be nice to have common code > share a codebase that does not require lucene or junit to compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-236) Field collapsing
On 12-Jun-07, at 2:36 PM, Yonik Seeley wrote: On 6/12/07, Mike Klaas <[EMAIL PROTECTED]> wrote: The way I do field collapsing is simply gathering documents and collapsing them until I've gathered X groups for user display (which usually involves looking at a few tens of documents more, rather than the entire 3,000,000+ result set). Isn't this then dependent on the order of the documents in the index? Or it sounds like you don't "promote" lower scoring documents into a higher scoring group unless they both happen to be in the top docs requested? Precisely. I don't care how many docs are in a group, just avoiding displaying two documents in the same group. That way you can process the docs in score order for essentially zero cost. -Mike
Re: [jira] Commented: (SOLR-236) Field collapsing
On 6/12/07, Mike Klaas <[EMAIL PROTECTED]> wrote: The way I do field collapsing is simply gathering documents and collapsing them until I've gathered X groups for user display (which usually involves looking at a few tens of documents more, rather than the entire 3,000,000+ result set). Isn't this then dependent on the order of the documents in the index? Or it sounds like you don't "promote" lower scoring documents into a higher scoring group unless they both happen to be in the top docs requested? -Yonik
Re: [jira] Commented: (SOLR-236) Field collapsing
On 11-Jun-07, at 5:48 PM, Chris Hostetter wrote: : Yes, the current JIRA patch uses the FieldCache. I just ment in contrast with Mike's comment about iterating over all the stored fields to support the "post-faceting" situation (but frankly i'm not sure that i undersatnd what the "post-faceting" situation is, so feel free to ignore me) I'm not sure either--I assume that it means facet on a DocSet that is limited to the the representative doc in each collapsed group. Or is it faceting within each group? If so, then all documents in the result set needs to be collapsed to determine this list of docs (which perhaps is not too inefficient?). The way I do field collapsing is simply gathering documents and collapsing them until I've gathered X groups for user display (which usually involves looking at a few tens of documents more, rather than the entire 3,000,000+ result set). I'm going to bow out now, as I don't think I understand what exactly we're talking about -Mike
[jira] Commented: (SOLR-135) Restructure / Refactor codebase for shared libraries
[ https://issues.apache.org/jira/browse/SOLR-135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504009 ] Hoss Man commented on SOLR-135: --- one the topic of adding a src/common directory ... i think in the long run we'll be happier if there is no overlap in the java package names that live in this directory and the ones that live in src/java (much the way the only java packages in src/webapp are o.a.s.servlet) ... so using src/common/org/apache/solr/common/XML.java may be a better way to go (even though it means we would need to leave a deprecated src/java/org/apache/solr/util/XML.java subclassing it in src/java) I could probably be convinced that this isn't that important, but i've definitely found it confusing for people that some of the lucene-java contribs reuse the same package names as the core classes in some cases) on the subject of the build.xml ... now that we've got three instances of and two of we probably want to make some macros for them to reduce redundency. Gimme 30 minutes to see if i can whip up a derivitive patch ... if i dont' attach one it means i got sidetracked with something else. > Restructure / Refactor codebase for shared libraries > > > Key: SOLR-135 > URL: https://issues.apache.org/jira/browse/SOLR-135 > Project: Solr > Issue Type: Wish >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-135-RestructureForCommonJar.patch > > > For SOLR-20 and other java projects, it would be nice to have common code > share a codebase that does not require lucene or junit to compile. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-243) Create a hook to allow custome code to create custome index readers
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated SOLR-243: --- Attachment: indexReaderFactory.patch My apologies for not being patient with this process. I have made the requested changes and submitted another patch. Please let me know if these are the correct things to do. Thanks -John > Create a hook to allow custome code to create custome index readers > --- > > Key: SOLR-243 > URL: https://issues.apache.org/jira/browse/SOLR-243 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 > Environment: Solr core >Reporter: John Wang > Fix For: 1.3 > > Attachments: indexReaderFactory.patch, indexReaderFactory.patch, > indexReaderFactory.patch > > > I have a customized IndexReader and I want to write a Solr plugin to use my > derived IndexReader implementation. Currently IndexReader instantiation is > hard coded to be: > IndexReader.open(path) > It would be really useful if this is done thru a plugable factory that can be > configured, e.g. IndexReaderFactory > interface IndexReaderFactory{ > IndexReader newReader(String name,String path); > } > the default implementation would just return: IndexReader.open(path) > And in the newSearcher and getSearcher methods in SolrCore class can call the > current factory implementation to get the IndexReader instance and then build > the SolrIndexSearcher by passing in the reader. > It would be really nice to add this improvement soon (This seems to be a > trivial addition) as our project really depends on this. > Thanks > -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: In this case, relative to solr.home makes the most sense. I like : -Dsolr.data.dir=XXX out of the box, but enabling it explicitly isn't hard... Yeah, I'm probably just being overly paranoid. Making dataDir be relative Solr Home is probably the way it should have worked all along ... so as long as it's heavily documented in CHANGES.txt i think we'll be fine. i suspect if anyone was specifying a solrhome *and* specifying a dataDir we would have gotten a question about dataDir not being relative solrhome. (allthough maybe we will now that 1.2 has the block un commented) We could just add an optional rel="cwd" (vs rel="solr") attribute to the tag and make it really explicit. -Hoss
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
I'm ambivalent though - I'm happy to reverse the change to the example solrconfig.xml too, though I like that one can fire up the example configuration with a different data directory easily. I am happy either way - relative to solr home or just commented out. In this case, relative to solr.home makes the most sense. I like -Dsolr.data.dir=XXX out of the box, but enabling it explicitly isn't hard...
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
On Jun 12, 2007, at 1:57 PM, Chris Hostetter wrote: : Instead, maybe a relative path should be made relative to Solr's home : directory instead of to the current working directory? I'm not sure how i feel about that ... it would be one thing if Solr could generate an error if the dataDir didn't exist, but since we create the directory on the fly as needed, changing this behavior is relative working directory vs solr.home) could really confuse people. Does the system property substitution stuff in the solrconfig deal with substitutions that contain other substitutions? if so then maybe we should set the solr.solr.home system property if we see that solr home has been specified with JNDI, and make the example solrconfig something like... ${solr.data.dir:${solr.solr.home}/solr/data} No, it does not support that type of substitution. ...but people can still choose to use something like... data ...and have it mean "the data directory in my current working directory" Of course, i may just be paranoid about breaking esoteric use cases. It's a good point to consider. Personally I'd never run with things specified out of the current directory that way, so its hard for me to identify with the troubles this change would make. If folks were using the example application as-is this change wouldn't affect them. I'm ambivalent though - I'm happy to reverse the change to the example solrconfig.xml too, though I like that one can fire up the example configuration with a different data directory easily. Erik
[jira] Commented: (SOLR-243) Create a hook to allow custome code to create custome index readers
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503959 ] Hoss Man commented on SOLR-243: --- 1) i'm sorry, i transposed the lines in my mind when i was readingthe patch (you've made a private constructor public, not the otherway arround -- my mistake) 2) yes, you're using Config.findClass ... what yonik asked was if there was a particular reason not to use Config.newInstance(name) in the loadIndexReaderFactory ... there is a lot of duplicate code in that method (mainly exception handling) that Config.newInstance takes care of for you. 3) I think you're missing my point about indexDefaults and mainIndex ... it's not a matter of just picking one, it's making it work with both so that a factory can be specified in the defaults for use anytime an IndexReader is opened, or from mainIndex awhen the "main index" is opened. I just poked around and found that the relevant class is "SolrIndexConfig" ... my suggestion was that this be where the IndexReaderFactory hook be so that it works the same way. I'm sorry if you feel like you are jumping through a lot of hoops ... it's not my intention to be difficult, i'm just making comments on the patch and asking general questions (not specificly directed at your patch) about how Solr as a project can best support the topic of this issue (hooks to allow custom code to create custom index readers). If the patch you have works well for you that's great, but that doesn't mean it will work well for everyone, which is something committers have to keep that in mind ... making public API changes (including new config syntax and especially new plugin hooks) is a serious change to the project and has to be considered very carefully because we have to be able to support it for a very very long time. > Create a hook to allow custome code to create custome index readers > --- > > Key: SOLR-243 > URL: https://issues.apache.org/jira/browse/SOLR-243 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 > Environment: Solr core >Reporter: John Wang > Fix For: 1.3 > > Attachments: indexReaderFactory.patch, indexReaderFactory.patch > > > I have a customized IndexReader and I want to write a Solr plugin to use my > derived IndexReader implementation. Currently IndexReader instantiation is > hard coded to be: > IndexReader.open(path) > It would be really useful if this is done thru a plugable factory that can be > configured, e.g. IndexReaderFactory > interface IndexReaderFactory{ > IndexReader newReader(String name,String path); > } > the default implementation would just return: IndexReader.open(path) > And in the newSearcher and getSearcher methods in SolrCore class can call the > current factory implementation to get the IndexReader instance and then build > the SolrIndexSearcher by passing in the reader. > It would be really nice to add this improvement soon (This seems to be a > trivial addition) as our project really depends on this. > Thanks > -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
: Instead, maybe a relative path should be made relative to Solr's home : directory instead of to the current working directory? I'm not sure how i feel about that ... it would be one thing if Solr could generate an error if the dataDir didn't exist, but since we create the directory on the fly as needed, changing this behavior is relative working directory vs solr.home) could really confuse people. Does the system property substitution stuff in the solrconfig deal with substitutions that contain other substitutions? if so then maybe we should set the solr.solr.home system property if we see that solr home has been specified with JNDI, and make the example solrconfig something like... ${solr.data.dir:${solr.solr.home}/solr/data} ...but people can still choose to use something like... data ...and have it mean "the data directory in my current working directory" Of course, i may just be paranoid about breaking esoteric use cases. -Hoss
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
On Jun 12, 2007, at 12:37 PM, Ryan McKinley wrote: Erik Hatcher wrote: On Jun 12, 2007, at 1:34 AM, Ryan McKinley wrote: - + ${solr.data.dir:./solr/data} + I just ran into something weird with this... If you set the solr home with JNDI or solr.solr.home, the dataDir still defaults to "./solr/data" -- not the data directory relative to solr home. This requires you to *also* set: "solr.data.dir" if you want to use a different data directory. I think we should comment it out so that the by default, setting solr home moves everything. Instead, maybe a relative path should be made relative to Solr's home directory instead of to the current working directory? that sounds good. It is nice to be able to set the data directory with a property... Does this patch work for you, Ryan? Index: src/java/org/apache/solr/core/SolrCore.java === --- src/java/org/apache/solr/core/SolrCore.java (revision 546568) +++ src/java/org/apache/solr/core/SolrCore.java (working copy) @@ -17,6 +17,8 @@ package org.apache.solr.core; +import static org.apache.solr.core.Config.getInstanceDir; + import java.io.File; import java.io.IOException; import java.util.ArrayList; @@ -73,8 +75,8 @@ public static Logger log = Logger.getLogger(SolrCore.class.getName ()); private final IndexSchema schema; - private final String dataDir; - private final String index_path; + private final File dataDir; + private final File index_path; private final UpdateHandler updateHandler; private static final long startTime = System.currentTimeMillis(); private final RequestHandlers reqHandlers = new RequestHandlers(); @@ -114,8 +116,8 @@ } public IndexSchema getSchema() { return schema; } - public String getDataDir() { return dataDir; } - public String getIndexDir() { return index_path; } + public String getDataDir() { return dataDir.getAbsolutePath(); } + public String getIndexDir() { return index_path.getAbsolutePath(); } // gets a non-caching searcher public SolrIndexSearcher newSearcher(String name) throws IOException { @@ -187,18 +189,19 @@ core = this; // set singleton if (dataDir ==null) { -dataDir = SolrConfig.config.get ("dataDir",Config.getInstanceDir()+"data"); +dataDir = SolrConfig.config.get("dataDir", getInstanceDir() +"data"); } - log.info("Opening new SolrCore at " + Config.getInstanceDir() + ", dataDir="+dataDir); + log.info("Opening new SolrCore at " + getInstanceDir() + ", dataDir="+dataDir); if (schema==null) { schema = new IndexSchema("schema.xml"); } this.schema = schema; - this.dataDir = dataDir; - this.index_path = dataDir + "/" + "index"; + File dataDirTemp = new File(dataDir); + this.dataDir = dataDirTemp; + this.index_path = new File(this.dataDir,"index"); this.maxWarmingSearchers = SolrConfig.config.getInt("query/ maxWarmingSearchers",Integer.MAX_VALUE); @@ -421,7 +424,7 @@ // if this fails, we need to decrement onDeckSearchers again. SolrIndexSearcher tmp; try { - tmp = new SolrIndexSearcher(schema, "main", index_path, true); + tmp = new SolrIndexSearcher(schema, "main", getIndexDir(), true); } catch (Throwable th) { synchronized(searcherLock) { onDeckSearchers--;
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
Erik Hatcher wrote: On Jun 12, 2007, at 1:34 AM, Ryan McKinley wrote: - + ${solr.data.dir:./solr/data} + I just ran into something weird with this... If you set the solr home with JNDI or solr.solr.home, the dataDir still defaults to "./solr/data" -- not the data directory relative to solr home. This requires you to *also* set: "solr.data.dir" if you want to use a different data directory. I think we should comment it out so that the by default, setting solr home moves everything. Instead, maybe a relative path should be made relative to Solr's home directory instead of to the current working directory? that sounds good. It is nice to be able to set the data directory with a property...
Re: svn commit: r544356 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
On Jun 12, 2007, at 1:34 AM, Ryan McKinley wrote: - + ${solr.data.dir:./solr/data} + I just ran into something weird with this... If you set the solr home with JNDI or solr.solr.home, the dataDir still defaults to "./solr/data" -- not the data directory relative to solr home. This requires you to *also* set: "solr.data.dir" if you want to use a different data directory. I think we should comment it out so that the by default, setting solr home moves everything. Instead, maybe a relative path should be made relative to Solr's home directory instead of to the current working directory? Erik