Hi Angelo, I agree with your answers to 1 and 2.
I have (minor) doubts on your proposed solutions for 3, 4 and 5. Specifically in a big data storage like Cassandra, rows and columns are continuously added, updated and removed and applications should be designed to deal with those situations. So as developer I should be able to handle the case that a row or column I query does not exist anymore and act upon. It should not cause a warning message being logged in some logfile, since it is something I expect to happen every now and then. Note that a stored value cannot be null, so both returning null and throwing a checked exception are valid options here. I would prefer returning null, as this prevents the need for try/catch and is slightly easier for developers to handle. Regarding 6; making the API 'smarter' is not an option. Values stored in Cassandra are untyped (all stored as byte[]). Hector supports types, but only by providing type information in advance. You cannot reproduce the original type, as this information is not persisted (i.e. '3' could both be a string or an integer). In my opinion an unchecked exception should be thrown, as the invoker clearly doesn't expect this error to occur and has no clue how to recover from it. Regarding 7; not sure if I understand what you mean; what should happen? What should our API do with unchecked exceptions thrown internally by the Hector API we invoke? Digging into the Hector API it seems that unchecked exceptions are thrown in case the specified column family does not exist. In case retrieved rows or columns do not exist, simply null is returned. This brings me to these conclusions so far: * ColumnFamily's are not changed often and should considered to be 'fixed' (note that before 0.7 they could not be created/dropped at runtime). This means that methods like exists(columnfamily) are available for ColumnFamily's and that it is a reasonable assumption that API methods are only invoked on existing ColumnFamily's. When an API method is invoked on a ColumnFamily that does not exist, the invoker probably didn't expect this to happen and is unlikely capable of recovering from this error. And so an unchecked exception should be thrown (as the Hector API already does internally). * Rows, super columns and columns are changed very often and may come and go at any time. An invoker is therefore expected to be able to deal with these situations and recover from it. Therefore either a checked exception should be thrown or null should be returned. If the API method cannot possibly return null in normal situations, returning null is preferred. * Because of the volatile behavior of rows and columns, the methods exists(columnFamilyName, rowKey) and exists(columnFamilyName, rowKey, superColumnName, columnName) should not be used, so they should be deprecated. As Marcel stated, constructions like if(!exists(cf,row)) setValue(cf, row, value) should be discouraged. * If provided input arguments are invalid, the invoker could have known this in advance. As this is unexpected and the invoker cannot recover from it (it should prevent to invoke methods with invalid arguments at all times), an unchecked exception should be thrown. Hence: 1. Throw an unchecked exception (NullPointerException) 2. Throw an unchecked exception (IllegalArgumentException) 3. Throw an unchecked exception (i.e. IllegalArgumentException or 'HInvalidRequestException' thrown internally by Hector) 4. Return null 5. Return null 6. Throw an unchecked exception (IllegalArgumentException) 7. Do nothing special; let Hector throw the unchecked exceptions Do you agree with this conclusion? Regards, Ivo From: [email protected] [mailto:[email protected]] On Behalf Of Angelo van der Sijpt Sent: maandag 2 mei 2011 12:54 To: [email protected] Subject: Re: [Amdatu-developers] Cassandra Persistence Manager API Hi, I took a quick look at the rest of the PersistenceManager API, and the first thing I noticed is that it contains Hector API class references. However, getting to your question: in general I feel you should only throw a checked exception if something has really gone wrong that can be usefully recovered from, and throw unchecked exceptions for (a) violated preconditions, since those won't occur for well-behaved code, and (b) real disasters. Applying those rules, we get to 1. throw a NullPointerException if any of the required parameters is null, while you state in Javadoc that it cannot be. This is a public API, so you should be defensive here, and fail quickly. 2. idem. 3, 4 and 5 are roughly in the same ballpark: I think they are part of the normal flow of your application, and you should be prepared to handle the absence of certain data; I would return null, and not throw an exception, since there is no reasonable way to recover from this, because we don't know exactly what is going on. I would, however, log this as a warning. 6. is a hard one, and depends on how 'smart' you want your API to be. If you state 'if it's not the correct type, you're out of luck', I think an exception would be in order; perhaps even let the ClassCastException bubble up. (You can of course make the API smarter, trying to coerce the value you get into the requested type, perhaps even by parsing strings to numbers, but that is outside the scope of this question. Also, I think Cassandra would disapprove of this.) 7. Apparently, Hector uses the exception mechanism a bit differently than we do. I would not let myself be constrained by that, but rather use the mechanism I want. As for the consistency: that is the way Cassandra works, and represents the tradeoffs that have been made to get the scalability they require. When using the Amdatu Cassandra service, you should be very well aware of the 'eventual consistency' that Cassandra has. Therefore, I don't think our API should try to hide that, and give a 'consistent feel': we would by lying to our users that way. As for the usefulness of 'exists', I have no opinion either way. My $0.0297 (that should come down to 2 euro-cents), Angelo On May 2, 2011, at 10:28 AM, Ivo Ladage-van Doorn wrote: Hi All, With the individual releases of subprojects, we are trying to 'finalize' some of the APIs exposed by these subprojects. One example of such API is the Cassandra Persistence Manager, which provides a persistence API for Cassandra. While improving and extending the javadoc describing this API, I realized that it still lacks a consistent approach in handling all kind of errors. So I want to come up with some guidelines on when a method should throw a checked exception, when it should throw an unchecked exception and when it should return null. By example I would like to discuss this for the following method, as it covers most use cases: <T> T getValue(String columnFamily, String rowKey, String superColumn, String column, Class<T> clazz); This method returns the value from a column and/or super column for the specified row key in the specified column family. Now there are many reasons why this value cannot be returned, being; - 1. Null input arguments. The specified column family and row key must not be null. - 2. Invalid input arguments. If the specified column family is of type 'super', both superColumn and column must not be null. In case the specified column family if of type 'standard' however, superColumn must be null and column must not be null. - 3. Inexistent columnFamily. The specified column family might not exist (i.e. deleted just before this method was invoked by another service/thread). - 4. Inexistent row key. The specified row key might not exist (anymore). - 5. Inexistent column. The specified column and/or super column might not exist. - 6. The value stored in the column to be retrieved is not of type T (i.e. the value is stored as byte[] but retrieved as String) - 7. The Hector API used internally by the Cassandra Persistence Manager throws an unchecked exception, for whatever reason. Note that the Hector API always throws unchecked exceptions, even in case of a query on a column family or row key that does not exist. In this case, it may be worthwhile to look at how this is handled by the Hector API and Thrift API (the Hector API invokes the Thrift API). But they made different choices; where the Thrift API mostly throws checked exceptions, the Hector API catches these exceptions and rewraps and rethrows them as unchecked exceptions (extending HectorException). Also important to note is that because of Cassandra's consistency mechanism, at any point in time the API calls may fail because column families, rows or columns have just been dropped or modified by another thread (as there is no locking mechanism in Cassandra). So one could argue if methods like exists() are very useful. On the other hand; exists(columnFamily) might still be useful as column families are not created/dropped on a daily basis (as opposed to rows and columns). So WDYT? Please provide an answer to what you would expect in the cases 1-7 as described above. Regards, Ivo GX Software | Ivo Ladage-van Doorn | Product Architect | Wijchenseweg 111 | 6538 SW Nijmegen | The Netherlands | T +31(0)24 - 388 82 61 | F +31(0)24 - 388 86 21 |[email protected]<mailto:[email protected]> | www.gxsoftware.com<http://www.gxsoftware.com> | twitter.com/GXSoftware<http://twitter.com/GXSoftware> _______________________________________________ Amdatu-developers mailing list [email protected]<mailto:[email protected]> http://lists.amdatu.org/mailman/listinfo/amdatu-developers
_______________________________________________ Amdatu-developers mailing list [email protected] http://lists.amdatu.org/mailman/listinfo/amdatu-developers

