Re: CQL injection attacks?

2011-07-03 Thread dnallsopp
Quoting Eric Evans :

> On Sat, 2011-07-02 at 19:17 +0100, dnalls...@taz.qinetiq.com wrote:
> > Just to illustrate; the typical injection pattern is:
> > select * from users where KEY='jsmith'; DROP COLUMNFAMILY 'users';
>
> No, each CQL query must contain exactly one statement, so this sort of
> attack would not work.

Excellent, that changes the picture enormously! I guess it might be worth adding
this fact to the preamble of the documentation?

[...]

> TTBMK, there are currently no drivers with bugs that egregious, so make
> use of the driver's parameter substitution, sanitize your input, and you
> shouldn't have anything to worry about (there is almost certainly less
> risk of an injection attack than with SQL).

Thanks very much,

David.




This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


Re: CQL injection attacks?

2011-07-02 Thread dnallsopp
Quoting Stephen Connolly :

All,

As Stephen said, regardless of the transfer protocol, if the content is parsed,
then there is the potential for attacks.

Just to illustrate; the typical injection pattern is:

String user = getUserName()
String cql = "select * from users where KEY='"+user+"';"
execute_cql(cql)

Now, if the user string is obtained from an external source (e.g. web form or
other UI), then the attacker may enter a username of:

jsmith'; DROP COLUMNFAMILY 'users

which results in a CQL query of:

select * from users where KEY='jsmith'; DROP COLUMNFAMILY 'users';

Ouch.

See also the obligatory XKCD cartoon: http://xkcd.com/327/

I guess one way to protect against this would be to pre-encode 'tainted' inputs
as hex bytes, e.g. (using the examples from
https://github.com/rantav/hector/wiki/Using-CQL)

update Standard1 set '626972746879656172' = '31393736' WHERE KEY =
'6d796b657931'

instead of

update StandardLong1 set 'birthyear' = '1976' WHERE KEY = 'mykey1'

which ensures that there aren't any single quotes or other dangerous characters
in those inputs - though I'm not sure if this works if you've set
validators/comparators other than BytesType?

> nate,
>
> that is not relevant. cql is a text query that gets parsed. without
> parameters you have to build the query by string concatenation. if i give
> you a string which contains a single quote, unless you have written your app
> to escape that quote, i can force a corrupted query on you that does
> something else. .. cql injection attacks
>
> - Stephen
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
> On 30 Jun 2011 20:20, "Nate McCall"  wrote:
> > The CQL drivers are all still sitting on top of the execute_cql_query
> > Thrift API method for now.
> >
> > On Wed, Jun 29, 2011 at 2:12 PM,  wrote:
> >>
> >> Someone asked a while ago whether Cassandra was vulnerable to injection
> attacks:
> >>
> >>
>
http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra
> >>
> >> With Thrift, the answer was 'no'.
> >>
> >> With CQL, presumably the situation is different, at least until prepared
> >> statements are possible (CASSANDRA-2475) ?
> >>
> >> Has there been any discussion on this already that someone could point me
> to,
> >> please? I couldn't see anything on JIRA (searching for CQL AND injection,
> CQL
> >> AND security, etc).






This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


Re: faster ByteBuffer comparison

2011-07-02 Thread dnallsopp
Quoting Yang :

I'd guess that getLong() is not faster because get() probably already benefits
from processor cache etc.

There are two concrete subclasses of ByteBuffer that implement get() -
HeapByteBuffer and DirectByteBuffer (for mapped memory).

It might be possible to optimise the comparison a little for the case where two
HeapByteBuffers are being compared - you could create a subclass with a
compareUnsigned method that directly accesses the byte[] in ByteBuffer, rather
than calling get().

However, you'd need to check how often this case occurs, and then benchmark
whether it really is any faster - Hotspot may optimise the current code well
enough that it makes little difference.

Optimising DirectByteBuffer comparisons, or a mixture of HeapByteBuffer and
DirectByteBuffer, looks unpromising, as DirectByteBuffer uses non-public
classes like sun.misc.Unsafe.

> I can see from profiling that a lot of the time in both reading and writing
> are spend on ByteBuffer compare on the column names (for long rows with many
> columns)
>
> I looked at the ByteBufferUtil.unsignedCompareByteBuffer() , it's basically
> the same structure as standard JVM ByteBuffer.compare()
> looping over each byte doing a ByteBuffer.get()
>
> is there a faster (probably hardware-based) compare ? I tried doing 8 bytes
> at a time by doing getLong() and it actually seems slower
>
> thanks
> Yang
>





This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


Re: InvalidRequestException when inserting - why?

2011-07-01 Thread dnallsopp
Quoting Jonathan Ellis  On Fri, Jul 1, 2011 at 7:12 AM,   wrote:
> > I assume there's something wrong with the data (the column has
> validation_class:
> > UTF8Type, so is it because I'm inserting non-UTF8 bytes?) but the exception
> > doesn't explain.
>
> That would do it, but it looks like you've cut off the rest of the
> exception message so it's hard to say.




This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


InvalidRequestException when inserting - why?

2011-07-01 Thread dnallsopp

When attempting to insert a column. I get the following exception:

InvalidRequestException(why="[Keyspace][ColumnFamily][9cc58234708d] =
[6a53ac0452f67acd71b35463d475762b7f69cc0ea7f9e0cb0ca24f0e45170d48dafae04bf7b966fa75c7fb2bad0eace0ff23b265e8b0e35c7b0bbc2a516bb75b2007eb35ab1308b8c646428e049184024464eacb481b6168dc166f57ba2a66375bb10501051383cfdd558e8cc897aeba7e1a732d7c30c9bb73fddf0b8b0b22a3b95e26c3a85008347d9058c0634762d18a7110599b23579be4b84e878ce99009df2d78f83d2232eeb59d1a89ddaf764cdc2098d0e42ab54c73a3eea70210ba54171210d48588b5b302ce7deb99e414c00ab2a5b72b14a3256884059bc6961b4dab7472606a655795e807fd78bba10a5e24d1aa6899e0ee5713b6ff99e034c0ecc7f7bb7b60aa6fd6875ee57067eed89a43e3b06bdc63dcd7bdd5d6fa056f5ae9a597c7beece8bef27c6c7dd32664f2b

I assume there's something wrong with the data (the column has validation_class:
UTF8Type, so is it because I'm inserting non-UTF8 bytes?) but the exception
doesn't explain.

Could someone tell me the reason please? (and could this exception be tweaked to
provide the reason?)

Thanks,

David.

Cassandra 0.7.6, Pycassa 1.0.8. The hex values above are random junk, not real
data.


This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


CQL injection attacks?

2011-06-29 Thread dnallsopp

Someone asked a while ago whether Cassandra was vulnerable to injection attacks:

http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra

With Thrift, the answer was 'no'.

With CQL, presumably the situation is different, at least until prepared
statements are possible (CASSANDRA-2475) ?

Has there been any discussion on this already that someone could point me to,
please? I couldn't see anything on JIRA (searching for CQL AND injection, CQL
AND security, etc).

Thanks.


This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


Cannot set column value to zero

2011-06-29 Thread dnallsopp
I had a strange problem recently where I was unable to set the value of a column
to '0' (it always returned '1') but setting it to other values worked fine:

[default@Test] set Urls['rowkey']['status']='1';
Value inserted.
[default@Test] get Urls['rowkey'];
=> (column=status, value=1, timestamp=1309189541891000)
Returned 1 results.

[default@Test] set Urls['rowkey']['status']='0';
Value inserted.
[default@Test] get Urls['rowkey'];
=> (column=status, value=1, timestamp=1309189551407616)
Returned 1 results.

This was on a one-node test cluster (v0.7.6) with no other clients; setting
other values (e.g. '9') worked fine. However, attempting to set the value back
to '0' always resulted in a value of '1'.

I noticed this shortly after truncating the CF.

The column family was shown as follows below. One thing that looks odd is that
on other test clusters the Column Name is followed by a reference to
the index, e.g. "Column Name: status (737461747573)" - but here it isn't.

I was wondering if there was some interaction between truncating the CF and the
use of a KEYS index? (Presumably it would be safer to delete all data
directories in order to wipe the cluster during experimentation, rather than
truncating?)

Unfortunately I'm not sure how to recreate the situation as this was a test
machine on which I played around with various configurations - but maybe
someone has seen a similar problem elsewhere? In the end I had to wipe the data
and start again, and all seemed fine, although the index reference is still
absent as mentioned above.

[default@Test] describe keyspace;
Keyspace: Test:
...
ColumnFamily: Foo
  default_validation_class: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 0.0/14400
  Memtable thresholds: 0.5/128/60 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [Foo.737461747573]
  Column Metadata:
Column Name: status
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
...


This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.


Priority queue in a single row - performance falls over time

2011-05-25 Thread dnallsopp


Hi all,

I'm trying to implement a priority queue for holding a large number (millions)
of items that need to be processed in time order. My solution works - but gets
slower and slower until performance is unacceptable - even with a small number
of items.

Each item essentially needs to be popped off the queue (some arbitrary work is
then done) and then the item is returned to the queue with a new timestamp
indicating when it should be processed again. We thus cycle through all work
items eventually, but some may come around more frequently than others.

I am implementing this as a single Cassandra row, in a CF with a TimeUUID
comparator.

Each column name is a TimeUUID, with an arbitrary column value describing the
work item; the columns are thus sorted in time order.

To pop items, I do a get() such as:

 cf.get(row_key, column_finish=now, column_start=yesterday, column_count=1000)

to get all the items at the head of the queue (if any) whose time exceeds the
current system time.

For each item retrieved, I do a delete to remove the old column, then an insert
with a fresh TimeUUID column name (system time + arbitrary increment), thus
putting the item back somewhere in the queue (currently, the back of the queue)

I do a batch_mutate for all these deletes and inserts, with a queue size of
2000. These are currently interleaved i.e. delete1-insert1-delete2-insert2...

This all appears to work correctly, but the performance starts at around 8000
cycles/sec, falls to around 1800/sec over the first 250K cycles, and continues
to fall over time, down to about 150/sec, after a few million cycles. This
happens regardless of the overall size of the row (I have tried sizes from 1000
to 100,000 items). My target performance is 1000 cycles/sec (but my data store
will need to handle other work concurrently).

I am currently using just a single node running on localhost, using a pycassa
client. 4 core, 4GB machine, Fedora 14.

Is this expected behaviour (is there just too much churn for a single row to
perform well), or am I doing something wrong?

Would https://issues.apache.org/jira/browse/CASSANDRA-2583 in version 0.8.1 fix
this problem (I am using version 0.7.6)?

Thanks!

David.


This message was sent using IMP, the Internet Messaging Program.

This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is addressed.
If you are not the intended recipient of this email, you must neither
take any action based upon its contents, nor copy or show it to anyone.
Please contact the sender if you believe you have received this email in
error. QinetiQ may monitor email traffic data and also the content of
email for the purposes of security. QinetiQ Limited (Registered in
England & Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.