Re: [Amdatu-developers] Cassandra Persistence Manager API

Ivo Ladage-van Doorn Mon, 02 May 2011 05:57:24 -0700

Hi Angelo,

I agree with your answers to 1 and 2.


I have (minor) doubts on your proposed solutions for 3, 4 and 5. Specifically 
in a big data storage like Cassandra, rows and columns are continuously added, 
updated and removed and applications should be designed to deal with those 
situations. So as developer I should be able to handle the case that a row or 
column I query does not exist anymore and act upon. It should not cause a 
warning message being logged in some logfile, since it is something I expect to 
happen every now and then. Note that a stored value cannot be null, so both 
returning null and throwing a checked exception are valid options here. I would 
prefer returning null, as this prevents the need for try/catch and is slightly 
easier for developers to handle.

Regarding 6; making the API 'smarter' is not an option. Values stored in 
Cassandra are untyped (all stored as byte[]). Hector supports types, but only 
by providing type information in advance. You cannot reproduce the original 
type, as this information is not persisted (i.e. '3' could both be a string or 
an integer). In my opinion an unchecked exception should be thrown, as the 
invoker clearly doesn't expect this error to occur and has no clue how to 
recover from it.

Regarding 7; not sure if I understand what you mean; what should happen? What 
should our API do with unchecked exceptions thrown internally by the Hector API 
we invoke? Digging into the Hector API it seems that unchecked exceptions are 
thrown in case the specified column family does not exist. In case retrieved 
rows or columns do not exist, simply null is returned.

This brings me to these conclusions so far:

*         ColumnFamily's are not changed often and should considered to be 
'fixed' (note that before 0.7 they could not be created/dropped at runtime). 
This means that methods like exists(columnfamily) are available for 
ColumnFamily's and that it is a reasonable assumption that API methods are only 
invoked on existing ColumnFamily's. When an API method is invoked on a 
ColumnFamily that does not exist, the invoker probably didn't expect this to 
happen and is unlikely capable of recovering from this error. And so an 
unchecked exception should be thrown (as the Hector API already does 
internally).

*         Rows, super columns and columns are changed very often and may come 
and go at any time. An invoker is therefore expected to be able to deal with 
these situations and recover from it. Therefore either a checked exception 
should be thrown or null should be returned. If the API method cannot possibly 
return null in normal situations, returning null is preferred.

*         Because of the volatile behavior of rows and columns, the methods 
exists(columnFamilyName, rowKey) and exists(columnFamilyName, rowKey, 
superColumnName, columnName) should not be used, so they should be deprecated. 
As Marcel stated, constructions like if(!exists(cf,row)) setValue(cf, row, 
value) should be discouraged.

*         If provided input arguments are invalid, the invoker could have known 
this in advance. As this is unexpected and the invoker cannot recover from it 
(it should prevent to invoke methods with invalid arguments at all times), an 
unchecked exception should be thrown.

Hence:

1.       Throw an unchecked exception (NullPointerException)

2.       Throw an unchecked exception (IllegalArgumentException)

3.       Throw an unchecked exception (i.e. IllegalArgumentException or 
'HInvalidRequestException' thrown internally by Hector)

4.       Return null

5.       Return null

6.       Throw an unchecked exception (IllegalArgumentException)

7.       Do nothing special; let Hector throw the unchecked exceptions

Do you agree with this conclusion?

Regards, Ivo

From: [email protected] 
[mailto:[email protected]] On Behalf Of Angelo van der Sijpt
Sent: maandag 2 mei 2011 12:54
To: [email protected]
Subject: Re: [Amdatu-developers] Cassandra Persistence Manager API

Hi,

I took a quick look at the rest of the PersistenceManager API, and the first 
thing I noticed is that it contains Hector API class references.

However, getting to your question: in general I feel you should only throw a 
checked exception if something has really gone wrong that can be usefully 
recovered from, and throw unchecked exceptions for (a) violated preconditions, 
since those won't occur for well-behaved code, and (b) real disasters. Applying 
those rules, we get to
1. throw a NullPointerException if any of the required parameters is null, 
while you state in Javadoc that it cannot be. This is a public API, so you 
should be defensive here, and fail quickly.
2. idem.
3, 4 and 5 are roughly in the same ballpark: I think they are part of the 
normal flow of your application, and you should be prepared to handle the 
absence of certain data; I would return null, and not throw an exception, since 
there is no reasonable way to recover from this, because we don't know exactly 
what is going on. I would, however, log this as a warning.
6. is a hard one, and depends on how 'smart' you want your API to be. If you 
state 'if it's not the correct type, you're out of luck', I think an exception 
would be in order; perhaps even let the ClassCastException bubble up. (You can 
of course make the API smarter, trying to coerce the value you get into the 
requested type, perhaps even by parsing strings to numbers, but that is outside 
the scope of this question. Also, I think Cassandra would disapprove of this.)
7. Apparently, Hector uses the exception mechanism a bit differently than we 
do. I would not let myself be constrained by that, but rather use the mechanism 
I want.

As for the consistency: that is the way Cassandra works, and represents the 
tradeoffs that have been made to get the scalability they require. When using 
the Amdatu Cassandra service, you should be very well aware of the 'eventual 
consistency' that Cassandra has. Therefore, I don't think our API should try to 
hide that, and give a 'consistent feel': we would by lying to our users that 
way. As for the usefulness of 'exists', I have no opinion either way.

My $0.0297 (that should come down to 2 euro-cents),

Angelo


On May 2, 2011, at 10:28 AM, Ivo Ladage-van Doorn wrote:


Hi All,

With the individual releases of subprojects, we are trying to 'finalize' some 
of the APIs exposed by these subprojects. One example of such API is the 
Cassandra Persistence Manager, which provides a persistence API for Cassandra. 
While improving and extending the javadoc describing this API, I realized that 
it still lacks a consistent approach in handling all kind of errors. So I want 
to come up with some guidelines on when a method should throw a checked 
exception, when it should throw an unchecked exception and when it should 
return null.
By example I would like to discuss this for the following method, as it covers 
most use cases:

<T> T getValue(String columnFamily, String rowKey, String superColumn, String 
column, Class<T> clazz);

This method returns the value from a column and/or super column for the 
specified row key in the specified column family. Now there are many reasons 
why this value cannot be returned, being;

-          1. Null input arguments. The specified column family and row key 
must not be null.
-          2. Invalid input arguments. If the specified column family is of 
type 'super', both superColumn and column must not be null. In case the 
specified column family if of type 'standard' however, superColumn must be null 
and column must not be null.
-          3. Inexistent columnFamily. The specified column family might not 
exist (i.e. deleted just before this method was invoked by another 
service/thread).
-          4. Inexistent row key. The specified row key might not exist 
(anymore).
-          5. Inexistent column. The specified column and/or super column might 
not exist.
-          6. The value stored in the column to be retrieved is not of type T 
(i.e. the value is stored as byte[] but retrieved as String)
-          7. The Hector API used internally by the Cassandra Persistence 
Manager throws an unchecked exception, for whatever reason. Note that the 
Hector API always throws unchecked exceptions, even in case of a query on a 
column family or row key that does not exist.

In this case, it may be worthwhile to look at how this is handled by the Hector 
API and Thrift API (the Hector API invokes the Thrift API). But they made 
different choices; where the Thrift API mostly throws checked exceptions, the 
Hector API catches these exceptions and rewraps and rethrows them as unchecked 
exceptions (extending HectorException). Also important to note is that because 
of Cassandra's consistency mechanism, at any point in time the API calls may 
fail because column families, rows or columns have just been dropped or 
modified by another thread (as there is no locking mechanism in Cassandra). So 
one could argue if methods like exists() are very useful. On the other hand; 
exists(columnFamily) might still be useful as column families are not 
created/dropped on a daily basis (as opposed to rows and columns).
So WDYT? Please provide an answer to what you would expect in the cases 1-7 as 
described above.

Regards, Ivo


GX Software | Ivo Ladage-van Doorn | Product Architect | Wijchenseweg 111 | 
6538 SW Nijmegen | The Netherlands | T +31(0)24 - 388 82 61 | F +31(0)24 - 388 
86 21 
|[email protected]<mailto:[email protected]> 
| www.gxsoftware.com<http://www.gxsoftware.com> | 
twitter.com/GXSoftware<http://twitter.com/GXSoftware>

_______________________________________________
Amdatu-developers mailing list
[email protected]<mailto:[email protected]>
http://lists.amdatu.org/mailman/listinfo/amdatu-developers

_______________________________________________
Amdatu-developers mailing list
[email protected]
http://lists.amdatu.org/mailman/listinfo/amdatu-developers

Re: [Amdatu-developers] Cassandra Persistence Manager API

Reply via email to