[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655944#action_12655944 ] Wolf Siberski commented on LUCENE-1473: --- Thanks to Doug and Jason for your constructive feedback. Let me first clarify the purpose and scope of the patch. IMHO, the discussion about Serialization in Lucene is not clear-cut at all. My opinion is that moving all distribution-related code out of the core leads to a cleaner separation of concerns and thus is better design. On the other hand with removing Serializable we limit the Lucene application space at least a bit (e.g., no support for dynamic class loading), and abandon the advantages default Java serialization offers. Therefore the patch is to be taken as contribution to explore the design space (as Michaels patch on custom readers explored the Serializable option), and not as a full-fledged solution proposal. [Doug] The removal of Serializeable will break compatibility, so must be well-advertised. Sure. I removed Serializable to catch all related errors; this was not meant as proposal for a final patch. [Doug] The Searchable API was designed for remote use and does not include HitCollector-based access. Currently Searchable does include a HitCollector-based search method, although the comment says that 'HitCollector-based access to remote indexes is discouraged'. The only reason to provide an implementation is that I wanted to keep the Searchable contract. Is remote access the only purpose of Searchable/MultiSearcher? Is it ok to break compatibility with respect to these classes? IMHO a significant fraction of the current clumsiness in the remote package stems from my attempt to fully preserve the Searchable API. [Doug] Weighting, and hence ranking, does not appear to be implemented correctly by this patch. True, I was a bit too fast here. We could either solve it along the line you propose, or revert to pass the Weight again instead of the Query. The issue IMHO is orthogonal to the Serializable discussion and more related to the question how a good remote search interface and protocol should look like. [Jason] Restricting people to XML will probably not be suitable though. The patch does not limit serialization to XML. It just requires that encoding to and decoding from String is implemented, no matter how. I used XML/XStream as proof-of-concept implementation, but don't propose to make XML mandatory. The main reason for introduction of the Serializer interface was to emphasize that XML/XStream is just one implemantation option. Actually, the current approach feels like at least one indirection more than required; for a final solution I would try to come up with a better design. [Jason] It seems the alternative solutions to serialization simply shift the problem around but do not really solve the underlying issues (speed, versioning, writing custom serialization code, and perhaps dynamic classloading). In a sense, the problem is indeed 'only' shifted around and not yet solved. The good thing about this shift is that Lucene core becomes decoupled from these issues. The only real limitation I see is that dynamic classloading can't be realized anymore. With respect to speed, I don't think that encoding/decoding is a significant performance factor in distributed search, but this would need to be benchmarked. With respect to versioning, my patch still keeps all options open. What is more important, Lucene users can now decide if they need compatibility between different versions, and roll their own encoding/decoding if they need it. Of course, if they are willing to contribute and maintain custom serializers which preserve back compatibility, they can do it in contrib as well as they could have done it in the core. Custom serialization is still possible although the standard Java serialization framework can't be used anymore for that purpose, and I admit that this is a disadvantage. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12656071#action_12656071 ] Doug Cutting commented on LUCENE-1473: -- Therefore the patch is to be taken as contribution to explore the design space [ ... ] Yes, and it is much appreciated for that. Thanks again! Currently Searchable does include a HitCollector-based search method [ ... ] You're right. I misremembered. This dates back to the origin of Searchable. http://svn.apache.org/viewvc?view=revrevision=149813 Personally, I think it would be reasonable for a distributed implementation to throw an exception if one tries to use a HitCollector. We could either solve it along the line you propose, or revert to pass the Weight again instead of the Query. Without using an introspection-based serialization like Java serialization it would be difficult to pass a Weight over the wire using public APIs, since most implementations are not public. But, since Weight's are constructed via a standard protocol, the method I outlined could work. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655676#action_12655676 ] Wolf Siberski commented on LUCENE-1473: --- This seems to be the right way to go. The patch attached removes all dependencies to Serializable and Remote from the core and moves it to contrib/remote. I introduced a new interface RemoteSearcher (not RemoteSearchable because I didn't want to pass Weights around), implemented by DefaultRemoteSearcher. An adapter realizing Searchable and delegating to RemoteSearcher is also included (RemoteSearcherAdapter. Encoding/Decoding of Lucene objects is delegated to the org.apache.lucene.remote.Serializer. For a sample serialization, I employed XStream which offers XML serialization (nearly) out-of-the-box. Everything is rather undocumented and would need a lot of cleanup, but as proof-of-concept it should be ok. Core and remote tests pass, with one exception: it is not possible anymore to serialize a RAMDirectory. What I don't like with the current patch is that a lot of different objects are passed around to keep the Searchable interface alive. Would it be possible to refactor such that Searchable represents a higher-level interface (or introduce a new alternative abstraction)? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655689#action_12655689 ] Mark Miller commented on LUCENE-1473: - Thanks Wolf, +1 on the change. This issue proposes to do the same thing: LUCENE-1407 Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655806#action_12655806 ] Doug Cutting commented on LUCENE-1473: -- Thanks, Wolf, this looks like a promising approach. Jason, John: would this sort of thing meet your needs? I'm not sure we can remove everything from trunk immediately. Rather we should deprecate things and remove them in 3.0. The removal of Serializeable will break compatibility, so must be well-advertised. HitCollector-based search should simply not be supported in distributed search. The Searchable API was designed for remote use and does not include HitCollector-based access. Weighting, and hence ranking, does not appear to be implemented correctly by this patch. An approach that might work would be to: - extend MultiSearcher - pass its CachedDfSource to remote searchers along with queries - construct a Weight on the search node using the CachedDfSource Does that make sense? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655845#action_12655845 ] Jason Rutherglen commented on LUCENE-1473: -- To Wolf: Your patch looked like it was quite a bit of work, nice job! Restricting people to XML will probably not be suitable though. Some may want JSON or something that more directly encodes the objects. General: It seems the alternative solutions to serialization simply shift the problem around but do not really solve the underlying issues (speed, versioning, writing custom serialization code, and perhaps dynamic classloading). The externalizable code will not be too lengthy and should be more convenient than alternatives to implement (with the code necessary being roughly equivalent to an equals method). For example protocol buffers requires maintaining files that remind me of IDL files from CORBA to describe the objects. Deprecating serialization entirely needs to be taken to the java-user mailing list as there are quite a number of installations relying on it. If this is something that overlaps with SOLR then it would be good for the SOLR folks to separate it out as a serialization library that could be used outside of the SOLR server. This would be a good idea for most of the SOLR functionality otherwise there would seem to be redundant development occurring. I'll finish up the Externalizable patch once LUCENE-1314 is completed (IndexReader.clone) as it is something that needs feedback and testing to ensure it's workable for 2.9, whereas Externalizable is somewhat easier. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12655894#action_12655894 ] Doug Cutting commented on LUCENE-1473: -- shift the problem around but do not really solve the underlying issues That's the idea, actually, to shift it out of the core into contrib. We could use Externalizeable there, with no XML. Deprecating serialization entirely needs to be taken to the java-user mailing list as there are quite a number of installations relying on it. No, we make decisions on the java-dev mailing list. Also, it won't go away, folks might just have to update their code to use different APIs if and when when they upgrade to 3.0. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, lucene-contrib-remote.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
See http://lucene.markmail.org/message/fu34tuomnqejchfj?q=RemoteSearchable for just such a proposal On Dec 8, 2008, at 1:52 PM, Doug Cutting (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654513 #action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back- compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654513#action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea - Original Message From: Doug Cutting (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, 8 December, 2008 19:52:46 Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654513#action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
I think an important piece to make this work is the query parser/syntax. We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes it very easy for external system to submit queries. We also have XML definitions for a result set. I think the only way to make this work though, is probably a more detailed query syntax (similar to SQL), so that it can be easily extended with new clauses/functions without breaking existing code. I would also suggest that any core queries classes have a representation here. I would also like to see a way for proprietary clauses to be supported (like calls in SQL). On Dec 8, 2008, at 3:37 PM, eks dev wrote: That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/ hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea - Original Message From: Doug Cutting (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, 8 December, 2008 19:52:46 Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions [ https://issues.apache.org/jira/browse/LUCENE-1473? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanelfocusedCommentId=12654513#action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/ LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
Well, there's the pretty sophisticated and extensible XML query parser in contrib. I've still only scratched the surface of it, but it meets the specs you mentioned. Erik On Dec 8, 2008, at 4:51 PM, robert engels wrote: I think an important piece to make this work is the query parser/ syntax. We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes it very easy for external system to submit queries. We also have XML definitions for a result set. I think the only way to make this work though, is probably a more detailed query syntax (similar to SQL), so that it can be easily extended with new clauses/functions without breaking existing code. I would also suggest that any core queries classes have a representation here. I would also like to see a way for proprietary clauses to be supported (like calls in SQL). On Dec 8, 2008, at 3:37 PM, eks dev wrote: That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/ hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea - Original Message From: Doug Cutting (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, 8 December, 2008 19:52:46 Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654513 #action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
The problem with that is that in most cases you still need a string based syntax that people can enter... I guess you can always have an advanced search page that builds and submits the XML query behind the scenes. On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote: Well, there's the pretty sophisticated and extensible XML query parser in contrib. I've still only scratched the surface of it, but it meets the specs you mentioned. Erik On Dec 8, 2008, at 4:51 PM, robert engels wrote: I think an important piece to make this work is the query parser/ syntax. We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes it very easy for external system to submit queries. We also have XML definitions for a result set. I think the only way to make this work though, is probably a more detailed query syntax (similar to SQL), so that it can be easily extended with new clauses/functions without breaking existing code. I would also suggest that any core queries classes have a representation here. I would also like to see a way for proprietary clauses to be supported (like calls in SQL). On Dec 8, 2008, at 3:37 PM, eks dev wrote: That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/ thrift/hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea - Original Message From: Doug Cutting (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, 8 December, 2008 19:52:46 Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions [ https://issues.apache.org/jira/browse/LUCENE-1473? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanelfocusedCommentId=12654513#action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/ LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
Building your own parser with Antlr is really easy. Using Ragel is harder, but yields insane parsing performance. Is there any reason to worry about library-bundled parsers if you're making something more complex then a college project? On Tue, Dec 9, 2008 at 01:49, robert engels [EMAIL PROTECTED] wrote: The problem with that is that in most cases you still need a string based syntax that people can enter... I guess you can always have an advanced search page that builds and submits the XML query behind the scenes. On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote: Well, there's the pretty sophisticated and extensible XML query parser in contrib. I've still only scratched the surface of it, but it meets the specs you mentioned. Erik On Dec 8, 2008, at 4:51 PM, robert engels wrote: I think an important piece to make this work is the query parser/syntax. We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes it very easy for external system to submit queries. We also have XML definitions for a result set. I think the only way to make this work though, is probably a more detailed query syntax (similar to SQL), so that it can be easily extended with new clauses/functions without breaking existing code. I would also suggest that any core queries classes have a representation here. I would also like to see a way for proprietary clauses to be supported (like calls in SQL). On Dec 8, 2008, at 3:37 PM, eks dev wrote: That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea - Original Message From: Doug Cutting (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, 8 December, 2008 19:52:46 Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654513#action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
I only meant is from a persistence standpoint - if you need a full human enterable query syntax anyway, why not just use that as the persistence format. On Dec 8, 2008, at 4:53 PM, Earwin Burrfoot wrote: Building your own parser with Antlr is really easy. Using Ragel is harder, but yields insane parsing performance. Is there any reason to worry about library-bundled parsers if you're making something more complex then a college project? On Tue, Dec 9, 2008 at 01:49, robert engels [EMAIL PROTECTED] wrote: The problem with that is that in most cases you still need a string based syntax that people can enter... I guess you can always have an advanced search page that builds and submits the XML query behind the scenes. On Dec 8, 2008, at 4:40 PM, Erik Hatcher wrote: Well, there's the pretty sophisticated and extensible XML query parser in contrib. I've still only scratched the surface of it, but it meets the specs you mentioned. Erik On Dec 8, 2008, at 4:51 PM, robert engels wrote: I think an important piece to make this work is the query parser/ syntax. We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes it very easy for external system to submit queries. We also have XML definitions for a result set. I think the only way to make this work though, is probably a more detailed query syntax (similar to SQL), so that it can be easily extended with new clauses/functions without breaking existing code. I would also suggest that any core queries classes have a representation here. I would also like to see a way for proprietary clauses to be supported (like calls in SQL). On Dec 8, 2008, at 3:37 PM, eks dev wrote: That sounds much better. Trying to distribute lucene (my reason why all this would be interesting) itself is just not going to work for far too many applications and will put burden on API extensions. My point is, I do not want to distribute Lucene Index, I need to distribute my application that is using Lucene. Think of it like having distributed Luke, usefull by itself, but not really usefull for slightly more complex use cases. My Hit class is specialized Lucene Hit object, my Query has totally diferent features and agregates Lucene Query... this is what I can control, what I need to send over the wire and that is the place where I define what is my Version/API, if lucene API Classes change and all existing featurs remain, I have no problems in keeping my serialized objects compatible. So the versioning becomes under my control, Lucene provides only features, library. Having light layer, easily extensible, on top of the core API would be just great, as fas as I am concerned java Serialization is not my world, having something light and extensible in etch/thrift/hadop IPC/ProtocolBuffers direction is much more thrilling. That is exactly the road hadoop, nutch, katta and probably many others are taking, having comon base that supports such cases is maybe good idea, why not making RemoteSearchable using hadoop IPC, or etch/thrift ... Maybe there are other reasons to suport java serialization, I do not know. Just painting one view on this idea - Original Message From: Doug Cutting (JIRA) [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, 8 December, 2008 19:52:46 Subject: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions [ https://issues.apache.org/jira/browse/LUCENE-1473? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanelfocusedCommentId=12654513#action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back-compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/ LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
The problem with that is that in most cases you still need a string based syntax that people can enter... The XML syntax includes a UserQuery tag for embedding user input of this type. I guess you can always have an advanced search page that builds and submits the XML query behind the scenes. Contrib now includes a worked demo web app showing how a very typical search form is converted into XML using XSL. User input is a mixture of edit boxes for classic QueryParser syntax used on free-text fields but also includes drop-downs and checkboxes etc that map to other non-free-text fields. Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654108#action_12654108 ] Michael McCandless commented on LUCENE-1473: {quote} Often in the past was ensuring backwards-compatibility the part of writing patches that took the longest and involved the most discussions. {quote} It very much still is, as I'm learning with LUCENE-1458! Your first example is missing the read/writeExternal methods. I think the proposed approach is rather heavy-weight -- we will have implemented readExternal, writeExternal, this new CustomExtenralizableReader, package private init methods, make private inner classes package private, the need to javadoc specifically the current externalization format written for each of our classes, the future need to help users to understand how they an achieve back compatibility by subclassing CustomExternalizableReader, etc. I guess my feeling is all of that is a good amount more work than just deciding to directly implement back compatibility, ourselves. EG, to do your example in a future world where we do support back compat of serialized classes (NOTE -- none of the code below is compiled/tested): First a util class for managing versions: {code} public class Versions { private int current; int add(String desc) { // TODO: do something more interesting with desc return current++; } void write(ObjectOutput out) throws IOException { // TODO: writeVInt out.writeByte((byte) current); } void read(ObjectInput in) throws IOException { // TODO: readVInt final byte version = in.readByte(); if (version current) throw new IOException(this object was serialized by a newer version of Lucene (got + version + but expected = + current + )); } } {code} Then, someone creates SomeClass: {code} public class SomeClass implements Externalizable { private int one; private int two; private static final Versions versions = new Versions(); private static final int VERSION0 = versions.add(start); public SomeClass() {}; public void writeExternal(ObjectOutput out) throws IOException { versions.write(out); out.writeInt(one); out.writeInt(two); } public void readExternal(ObjectInput in) throws IOException { versions.read(in); one = in.readInt(); two = in.readInt(); } ... } {code} Then on adding field three: {code} public class SomeClass implements Externalizable { private int one; private int two; private int three; private static final Versions versions = new Versions(); private static final int VERSION0 = versions.add(start); private static final int VERSION1 = versions.add(the new field three); public SomeClass() {}; public void writeExternal(ObjectOutput out) throws IOException { versions.write(out); out.writeInt(one); out.writeInt(two); } public void readExternal(ObjectInput in) throws IOException { int version = versions.read(in); one = in.readInt(); two = in.readInt(); if (version = VERSION1) three = in.readInt(); else // default three = -3; } ... } {code} In fact I think we should switch to Versions utils class for writing/reading our index files... Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654147#action_12654147 ] Michael Busch commented on LUCENE-1473: --- {quote} Your first example is missing the read/writeExternal methods. {quote} Oups, I forgot to copypaste it. It's in the attached patch file though. {quote} I think the proposed approach is rather heavy-weight {quote} Really? In case we go the Externalizable way anyway, then I think this approach doesn't add too much overhead. You only need to add init() and move the deserialization code from readExternal() to the reader's readExternal. It's really not too much more code. And, the code changes are straightforward when the class changes. No need to worry about how to initialize newly added variables if an old version is read for example. What I think will be the most work is documenting and explaining this. But this would be an expert API, so probably people who really need to use it are most likely looking into the sources anyway. But for the record: I'm totally fine with using Serializable and just adding the serialVersionUID. Just if we use Externalizable, we might want to consider something like this to avoid new backwards- compatibility requirements. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653735#action_12653735 ] Michael McCandless commented on LUCENE-1473: SerializeUtils is missing from the patch. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653869#action_12653869 ] Doug Cutting commented on LUCENE-1473: -- How to write a unit test for multiple versions? We can save, in files, serialized instances of each query type from the oldest release we intend to support. Then read each of thes queries and check that it s equal to a current query that's meant to be equivalent (ssuming all queries implement equals well). Something similar would need to be done for each class that is meant to be transmitted cross-version. This tests that older queries may be processed by newer code. It does not test that newer queries can be processed by older code. Documentation is a big part of this effort, that should be completed first. What guarantees to we intend to provide? Once we've documented these, then we can begin writing tests. For example, we may only guarantee that older queries work with newer code, and that newer hits work with older code. To test that we'd need to have an old jar around that we could test against. This will be a trickier test to configure. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653955#action_12653955 ] Jason Rutherglen commented on LUCENE-1473: -- Doug wrote: We can save, in files, serialized instances of each query type from the oldest release we intend to support. Then read each of thes queries and check that it s equal to a current query that's meant to be equivalent (ssuming all queries implement equals well). Something similar would need to be done for each class that is meant to be transmitted cross-version. This tests that older queries may be processed by newer code. It does not test that newer queries can be processed by older code. Documentation is a big part of this effort, that should be completed first. What guarantees to we intend to provide? Once we've documented these, then we can begin writing tests. For example, we may only guarantee that older queries work with newer code, and that newer hits work with older code. To test that we'd need to have an old jar around that we could test against. This will be a trickier test to configure. -- Makes sense. I guarantee 2.9 and above classes will be backward compatible with the previous classes. I think that for 3.0 we'll start to create new replacement classes that will not conflict with the old classes. I'd really like to redesign the query, similarity, and scoring code to work with flexible indexing and allow new algorithms. This new code will not create changes in the existing query, similarity, and scoring code which will remain serialization compatible with 2.9. The 2.9 query, similarity, and scoring should leverage the new query, similarity and scoring code to be backwards compatible. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653972#action_12653972 ] Doug Cutting commented on LUCENE-1473: -- I guarantee 2.9 and above classes will be backward compatible with the previous classes. It sounds like you are personally guaranteeing that all serializeable classes will be forever compatible. That's not what we'd need. We'd need a proposed policy for the project to consider in terms of major and minor releases, specifying forward and/or backward compatibility guarantees. For example, we might say, within a major release cycle, serialized queries from older releases will work with newer releases, however serialized queries from newer releases will not generally work with older releases, since we might add new kinds of queries in the course of a major release cycle. Similarly detailed statements would need to be made for each Externalizeable, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653297#action_12653297 ] Michael McCandless commented on LUCENE-1473: bq. It seems best to remove Serialization from Lucene so that users are not confused and create a better solution. I don't think that's the case. If we choose to only support live serialization then we should add implements Serializable but spell out clearly in the javadocs that there is no guarantee of cross-version compatibility (long term persistence) and in fact that often there are incompatibilities. I think live serialization is still a useful feature. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653321#action_12653321 ] Michael McCandless commented on LUCENE-1473: bq. For classes that no one submits an Externalizable patch for, the serialVersionUID needs to be added. The serialVersionUID approach would be too simplistic, because we can't simply bump it up whenever we make a change since that then breaks back compatibility. We would have to override write/readObject or write/readExternal, and serialVersionUID would not be used. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653378#action_12653378 ] John Wang commented on LUCENE-1473: --- Mike: If you have class A implements Serializable, with a defined suid, say 1. Let A2 be a newer version of class A, and suid is not changed, say 1. Let's say A2 has a new field. Imaging A is running in VM1 and A2 is running in VM2. Serialization between VM1 and VM2 of class A is ok, just that A will not get the new fields. Which is fine since VM1 does not make use of it. You can argue that A2 will not get the needed field from serialized A, but isn't that better than crashing? Either the case, I think the behavior is better than it is currently. (maybe that's why Eclipse and Findbug both report the lacking of suid definition in lucene code a warning) I agree adding Externalizable implementation is more work, but it would make the serialization story correct. -John Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653413#action_12653413 ] Doug Cutting commented on LUCENE-1473: -- Serialization between VM1 and VM2 of class A is ok, just that A will not get the new fields. Which is fine since VM1 does not make use of it. But VM1 might require an older field that the new field replaced, and VM1 may then crash in an unpredictable way. Not defining explicit suid's is more conservative: you get a well-defined exception when things might not work. Defining suid's but doing nothing else about compatibility is playing fast-and-loose: it might work in many cases, but it also might cause strange, hard-to-diagnose problems in others. If we want Lucene to work reliably across versions, then we need to commit to that goal as a project, define the limits of the compatibility, implement Externalizeable, add tests, etc. Just adding suid's doesn't achieve that, so far as I can see. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653421#action_12653421 ] robert engels commented on LUCENE-1473: --- Even if you changed SUIDs based on version changes, there is the very real possibility that the new code CAN'T be instantiated in any meaningful way from the old data. Then what would you do? Even if you had all of the old classes, and their dependencies available from dynamic classloading, it still won't work UNLESS every new feature is designed with backwards compatibility with previous versions - a burden that is just too great when required of all Lucene code. Given that, as has been discussed, there are other formats that can be used where isolated backwards persistence is desired (like XML based query descriptions). Even these won't work if the XML description references explicit classes - which is why designing such a format for a near limitless query structure (given user defined query classes) is probably impossible. So strive for a decent solution that covers most cases, and fails gracefully when it can't work. using standard serialization (with proper transient fields) seems to fit this bill, since in a stable API, most core classes should remain fairly constant, and those that are bound to change may take explicit steps in their serialization (if deemed needed) Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653545#action_12653545 ] John Wang commented on LUCENE-1473: --- The discussion here is whether it is better to have 100% of the time failing vs. 10% of the time failing. (these are just meaningless numbers to express a point) I do buy Doug's comment about getting into a weird state due to data serialization, but this is something Externalizable would solve. This discussion has digressed to general Java serialization design, where it originally scoped only to several lucene classes. If it is documented that lucene only supports serialization of classes from the same jar, is that really enough, doesn't it also depend on the compiler, if someone were to build their own jar? Furthermore, in a distributed environment with lotsa machines, it is always idea to upgrade bit by bit, is taking this functionality away by imposing this restriction a good trade-off to just implementing Externalizable for a few classes, if Serializable is deemed to be dangerous, which I am not so sure given the lucene classes we are talking about. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653553#action_12653553 ] Doug Cutting commented on LUCENE-1473: -- This discussion has digressed to general Java serialization design, where it originally scoped only to several lucene classes. Which classes? The existing patch applies to one class. Jason said, If it looks ok, I will implement Externalizable in other classes. but never said which. It would be good to know how wide the impact of the proposed change would be. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653563#action_12653563 ] John Wang commented on LUCENE-1473: --- For our problem, it is Query all all its derived and encapsulated classes. I guess the title of the bug is too generic. As far as my comment about other lucene classes, one can just go to the lucene javadoc and click on Tree and look for Serializable. If you want me to, I can go an fetch the complete list, but here are some examples: 1) Document (Field etc.) 2) OpenBitSet, Filter ... 3) Sort, SortField 4) Term 5) TopDocs, Hits etc. For the top level API. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653057#action_12653057 ] Mark Harwood commented on LUCENE-1473: -- The contrib section of Lucene contains an XML-based query parser which aims to provide full-coverage of Lucene queries/filters and provide extensibility to support 3rd party classes. I use this regularly in distributed deployments and this allows both non-Java clients and long-term persistence of queries with good stability across Lucene versions. Although I have not conducted formal benchmarks I have not been drawn to XML parsing as a bottleneck - search execution and/or document retrieves are normally the main bottlenecks. Maintaining XML parsing code is an overhead but ultimately helps decouple requests from the logic that executes requests. In serializing Lucene Query/Filter objects we are dealing with the classes which combine both the representation of the request criteria (what needs to be done) and the implementation (how things are done). We are forever finessing the how bit of this equation e.g. moving from RangeQuery to RangeFilters to TrieRangeFilter. The criteria however remains relatively static ( I just want to search on a range) and so it is dangerous to build clients that refer tdirectly to query implementation classes. The XML parser provides a language-independent abstraction for clients to define what they want to be done without being too tied to how this is implemented. Cheers Mark Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653058#action_12653058 ] robert engels commented on LUCENE-1473: --- Even better. Thanks Mark. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653109#action_12653109 ] Yonik Seeley commented on LUCENE-1473: -- bq. The contrib section of Lucene contains an XML-based query parser which aims to provide full-coverage of Lucene queries Thanks for the reminder... Solr has pluggable query parsers now, and I've been meaning to check this out as a way to provide a more programmatic query specification. Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]