Re: Storing values of mixed types in a list

2014-06-24 Thread Tuukka Mustonen
Unfortunately, I need to query per list items. That's why I'm running
Cassandra 2.1rc1 (offers secondary indexes for collections).

I'm also studying Dynamo, it seems to be somewhat more dynamic by nature
and allows mixed type lists. As I understood it, also Cassandra supports
dynamic schemas, but only through Thrift protocol. Also, I don't think it
changes the fact that collections need to be strongly-typed in Cassandra,
no matter what protocol is used?

Tuukka



On Tue, Jun 24, 2014 at 9:41 PM, DuyHai Doan  wrote:

> "Jeremy, with blob field (ByteBuffer), I can query exact matches (just
> encode the value in query), but greater/less than queries would not work.
> Any sort of serialization kills "native" ways to query data" --> Not
> necessarily. You still use "normal" types (uuid, string, timestamp,...) for
> clustering columns and use them for querying. For the cells where you store
> values, use blob type.
>
>
>
>
> On Tue, Jun 24, 2014 at 8:21 PM, Tuukka Mustonen <
> tuukka.musto...@gmail.com> wrote:
>
>> What if I need to query by list items?
>>
>> 1. Jeremy, with blob field (ByteBuffer), I can query exact matches (just
>> encode the value in query), but greater/less than queries would not work.
>> Any sort of serialization kills "native" ways to query data
>> 2. Even with user defined types, I would need to define separate fields
>> for each value. Running queries would be cumbersome (something like WHERE
>> items CONTAINS {'text_value': 'foobar'} or WHERE items CONTAINS
>> {'int_value': 3}. Pavel, did you mean like this?
>>
>> I'm running 2.1rc1 with python driver 2.0.2.
>>
>> Tuukka
>>
>>
>> On Tue, Jun 24, 2014 at 4:39 PM, Pavel Kogan 
>> wrote:
>>
>>> 1) You can use list of strings which are serialized JSONs, or use
>>> ByteBuffer with your own serialization as Jeremy suggested.
>>> 2) Use Cassandra 2.1 (not officially released yet) were there is new
>>> feature of user defined types.
>>>
>>> Pavel
>>>
>>>
>>>
>>>
>>> On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma 
>>> wrote:
>>>
 Use a ByteBuffer value type with your own serialization (we use
 protobuf for complex value structures)
  On Jun 24, 2014 5:30 AM, "Tuukka Mustonen" 
 wrote:

> Hello,
>
> I need to store a list of mixed types in Cassandra. The list may
> contain numbers, strings and booleans. So I would need something like
> list.
>
> Is this possible in Cassandra and if not, what workaround would you
> suggest for storing a list of mixed type items? I sketched a few (using a
> list per type, using list of user types in Cassandra 2.1, etc.), but I get
> a bad feeling about each.
>
> Couldn't find an "exact" answer to this through searches...
> Regards,
> Tuukka
>
> P.S. I first asked this at SO before realizing the traffic there is
> very low:
> http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra
>
>
>>>
>>
>


Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Andrey Ilinykh
you can set rpc_address to 0.0.0.0, then it will listen on all interfaces.
Also you have to modify security group settings to allow incoming
connection for port 9160. But it is a really bad idea. By this way you open
your cluster to whole world, ssh tunnel is the best way.


On Tue, Jun 24, 2014 at 10:01 PM, Huiliang Zhang  wrote:

> Thanks. Is there a way to configure Cassandra to use elastic ip instead of
> private ip?
>
>
> On Tue, Jun 24, 2014 at 9:29 PM, Andrey Ilinykh 
> wrote:
>
>> Cassandra knows nothing about elastic ip. You have to use ssh tunnel or
>> run your client on ec2 instance.
>>
>> Thank you,
>>   Andrey
>>
>>
>> On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang  wrote:
>>
>>> Hi,
>>>
>>> I am using Cassandra on EC2 instances. My cassandra always returns
>>> private ips of the instances to the thrift program. Then the program cannot
>>> connect to the private ips.
>>>
>>> I already changed the
>>> rpc_address: elastic ip
>>> rpc_address: elastic ip
>>>
>>> Then I restarted the cassandra cluster. But the system.peers still save
>>> the private ips as peer address.
>>>
>>> How to fix this?
>>>
>>> Thanks,
>>> Huiliang
>>>
>>>
>>
>


Re: Use Cassnadra thrift API with collection type

2014-06-24 Thread Huiliang Zhang
Yes, I realized the way to use CQL.

I checked the way how map data is represented by using cassandra-cli. For
each element in the map, it use the key as part of the column name and
value as the column value. I just cannot insert this by using thrift API
because I already defined a CompositeType column comparator. Is it a way to
run a second program to insert map data with a different comparator?

Thanks.


On Mon, Jun 23, 2014 at 10:21 AM, Sylvain Lebresne 
wrote:

> On Mon, Jun 23, 2014 at 6:19 PM, James Campbell <
> ja...@breachintelligence.com> wrote:
>
>>  Huilang,
>>
>>
>>  Since there hasn't been another reply yet, I'll throw out an idea that
>> worked for us as part of a test, though it does not seem exactly like a
>> "preferred" way since it crosses code-bases.  We built the type using
>>  straight java type, then used the Datastax v2 driver's DataType class
>> serializer.
>>
>>
>>  Concretely, it would look like the following (adapting your code):
>>
>> Column column = new Column();
>> column.name=columnSerializer.toByteBuffer(colname); // the
>> column name of the map type, it works with other kinds of data type
>>
>> ​column.value = DataType.map(DataType.ascii,
>> DataType.decimal).serialize(yourMapGoesHere);
>> column.timestamp = new Date().getTime();
>>
>> ...
>>
>
> This is exactly equivalent to what Huiliang posted and will thus not work
> any better.
>
> Collections are internally not store as one "thrift column" per
> collection. Each element of the collection is a separate "thrift column"
> and the exact encoding depends on the collection. The fact is, updating CQL
> collection from thrift is technically possible but it is not recommended in
> any way. I strongly advise you to stick to CQL if you want to use CQL
> collections.
>
>  --
> Sylvain
>
>>
>>
>>  --
>> *From:* Huiliang Zhang 
>> *Sent:* Friday, June 20, 2014 10:10 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Use Cassnadra thrift API with collection type
>>
>> Hi,
>>
>>  I have a problem when insert data of the map type into a cassandra
>> table. I tried all kinds of MapSerializer to serialize the Map data and did
>> not succeed.
>>
>>  My code is like this:
>> Column column = new Column();
>> column.name=columnSerializer.toByteBuffer(colname); // the
>> column name of the map type, it works with other kinds of data type
>> column.value =
>> MapSerializer.getInstance(AsciiSerializer.instance,
>> DecimalSerializer.instance).serialize(someMapData);
>> column.timestamp = new Date().getTime();
>>
>> Mutation mutation = new Mutation();
>> mutation.column_or_supercolumn = new ColumnOrSuperColumn();
>> mutation.column_or_supercolumn.column = column;
>> mutationList.add(mutation);
>>
>>  The data was input into the cassandra DB however it cannot be retrieved
>> by CQL3 with the following error:
>> ERROR 14:32:48,192 Exception in thread Thread[Thrift:4,5,main]
>> java.lang.AssertionError
>> at
>> org.apache.cassandra.cql3.statements.ColumnGroupMap.getCollection(ColumnGroupMap.java:88)
>> at
>> org.apache.cassandra.cql3.statements.SelectStatement.getCollectionValue(SelectStatement.java:1185)
>> at
>> org.apache.cassandra.cql3.statements.SelectStatement.handleGroup(SelectStatement.java:1169)
>> at
>> org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:1076)
>> ...
>>
>>  So the question is how to write map data into cassandra by thrift API.
>> Appreciated for any help.
>>
>>  Thanks,
>>  Huiliang
>>
>>
>>
>>
>


Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Huiliang Zhang
Thanks. Is there a way to configure Cassandra to use elastic ip instead of
private ip?


On Tue, Jun 24, 2014 at 9:29 PM, Andrey Ilinykh  wrote:

> Cassandra knows nothing about elastic ip. You have to use ssh tunnel or
> run your client on ec2 instance.
>
> Thank you,
>   Andrey
>
>
> On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang  wrote:
>
>> Hi,
>>
>> I am using Cassandra on EC2 instances. My cassandra always returns
>> private ips of the instances to the thrift program. Then the program cannot
>> connect to the private ips.
>>
>> I already changed the
>> rpc_address: elastic ip
>> rpc_address: elastic ip
>>
>> Then I restarted the cassandra cluster. But the system.peers still save
>> the private ips as peer address.
>>
>> How to fix this?
>>
>> Thanks,
>> Huiliang
>>
>>
>


Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Andrey Ilinykh
Cassandra knows nothing about elastic ip. You have to use ssh tunnel or run
your client on ec2 instance.

Thank you,
  Andrey


On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang  wrote:

> Hi,
>
> I am using Cassandra on EC2 instances. My cassandra always returns private
> ips of the instances to the thrift program. Then the program cannot connect
> to the private ips.
>
> I already changed the
> rpc_address: elastic ip
> rpc_address: elastic ip
>
> Then I restarted the cassandra cluster. But the system.peers still save
> the private ips as peer address.
>
> How to fix this?
>
> Thanks,
> Huiliang
>
>


EC2 cassandra cluster node address problem

2014-06-24 Thread Huiliang Zhang
Hi,

I am using Cassandra on EC2 instances. My cassandra always returns private
ips of the instances to the thrift program. Then the program cannot connect
to the private ips.

I already changed the
rpc_address: elastic ip
rpc_address: elastic ip

Then I restarted the cassandra cluster. But the system.peers still save the
private ips as peer address.

How to fix this?

Thanks,
Huiliang


Re: Does the default LIMIT applies to automatic paging?

2014-06-24 Thread DuyHai Doan
Yes. And I advise to set fetchSize to a smaller value than 10 000. 1000 is
a good start. As long as there are still results, the iterator will fetch
data for you by batch of "fechtSize"


On Tue, Jun 24, 2014 at 9:03 PM, ziju feng  wrote:

> Does that mean the iterator will give me all the data instead of 1
> rows?
>
>
> On Mon, Jun 23, 2014 at 10:20 PM, DuyHai Doan 
> wrote:
>
>> With the Java Driver,  set the fetchSize and use ResultSet.iterator
>> Le 24 juin 2014 01:04, "ziju feng"  a écrit :
>>
>> Hi All,
>>>
>>> I have a wide row table that I want to iterate through all rows under a
>>> specific partition key. The table may contains around one million rows per
>>> partition
>>>
>>> I was wondering if the default 1 rows LIMIT applies to automatic
>>> pagination in C* 2.0 (I'm using Datastax driver). If so, what is best way
>>> to retrieve all rows of a given partition? Should I use a super large LIMIT
>>> value or should I manually page through the table?
>>>
>>> Thanks,
>>>
>>> Ziju
>>>
>>
>


Re: Does the default LIMIT applies to automatic paging?

2014-06-24 Thread ziju feng
Does that mean the iterator will give me all the data instead of 1 rows?


On Mon, Jun 23, 2014 at 10:20 PM, DuyHai Doan  wrote:

> With the Java Driver,  set the fetchSize and use ResultSet.iterator
> Le 24 juin 2014 01:04, "ziju feng"  a écrit :
>
> Hi All,
>>
>> I have a wide row table that I want to iterate through all rows under a
>> specific partition key. The table may contains around one million rows per
>> partition
>>
>> I was wondering if the default 1 rows LIMIT applies to automatic
>> pagination in C* 2.0 (I'm using Datastax driver). If so, what is best way
>> to retrieve all rows of a given partition? Should I use a super large LIMIT
>> value or should I manually page through the table?
>>
>> Thanks,
>>
>> Ziju
>>
>


Re: Storing values of mixed types in a list

2014-06-24 Thread DuyHai Doan
"Jeremy, with blob field (ByteBuffer), I can query exact matches (just
encode the value in query), but greater/less than queries would not work.
Any sort of serialization kills "native" ways to query data" --> Not
necessarily. You still use "normal" types (uuid, string, timestamp,...) for
clustering columns and use them for querying. For the cells where you store
values, use blob type.




On Tue, Jun 24, 2014 at 8:21 PM, Tuukka Mustonen 
wrote:

> What if I need to query by list items?
>
> 1. Jeremy, with blob field (ByteBuffer), I can query exact matches (just
> encode the value in query), but greater/less than queries would not work.
> Any sort of serialization kills "native" ways to query data
> 2. Even with user defined types, I would need to define separate fields
> for each value. Running queries would be cumbersome (something like WHERE
> items CONTAINS {'text_value': 'foobar'} or WHERE items CONTAINS
> {'int_value': 3}. Pavel, did you mean like this?
>
> I'm running 2.1rc1 with python driver 2.0.2.
>
> Tuukka
>
>
> On Tue, Jun 24, 2014 at 4:39 PM, Pavel Kogan 
> wrote:
>
>> 1) You can use list of strings which are serialized JSONs, or use
>> ByteBuffer with your own serialization as Jeremy suggested.
>> 2) Use Cassandra 2.1 (not officially released yet) were there is new
>> feature of user defined types.
>>
>> Pavel
>>
>>
>>
>>
>> On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma 
>> wrote:
>>
>>> Use a ByteBuffer value type with your own serialization (we use protobuf
>>> for complex value structures)
>>>  On Jun 24, 2014 5:30 AM, "Tuukka Mustonen" 
>>> wrote:
>>>
 Hello,

 I need to store a list of mixed types in Cassandra. The list may
 contain numbers, strings and booleans. So I would need something like
 list.

 Is this possible in Cassandra and if not, what workaround would you
 suggest for storing a list of mixed type items? I sketched a few (using a
 list per type, using list of user types in Cassandra 2.1, etc.), but I get
 a bad feeling about each.

 Couldn't find an "exact" answer to this through searches...
 Regards,
 Tuukka

 P.S. I first asked this at SO before realizing the traffic there is
 very low:
 http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra


>>
>


Re: Storing values of mixed types in a list

2014-06-24 Thread Tuukka Mustonen
What if I need to query by list items?

1. Jeremy, with blob field (ByteBuffer), I can query exact matches (just
encode the value in query), but greater/less than queries would not work.
Any sort of serialization kills "native" ways to query data
2. Even with user defined types, I would need to define separate fields for
each value. Running queries would be cumbersome (something like WHERE items
CONTAINS {'text_value': 'foobar'} or WHERE items CONTAINS {'int_value': 3}.
Pavel, did you mean like this?

I'm running 2.1rc1 with python driver 2.0.2.

Tuukka


On Tue, Jun 24, 2014 at 4:39 PM, Pavel Kogan 
wrote:

> 1) You can use list of strings which are serialized JSONs, or use
> ByteBuffer with your own serialization as Jeremy suggested.
> 2) Use Cassandra 2.1 (not officially released yet) were there is new
> feature of user defined types.
>
> Pavel
>
>
>
>
> On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma 
> wrote:
>
>> Use a ByteBuffer value type with your own serialization (we use protobuf
>> for complex value structures)
>>  On Jun 24, 2014 5:30 AM, "Tuukka Mustonen" 
>> wrote:
>>
>>> Hello,
>>>
>>> I need to store a list of mixed types in Cassandra. The list may contain
>>> numbers, strings and booleans. So I would need something like list.
>>>
>>> Is this possible in Cassandra and if not, what workaround would you
>>> suggest for storing a list of mixed type items? I sketched a few (using a
>>> list per type, using list of user types in Cassandra 2.1, etc.), but I get
>>> a bad feeling about each.
>>>
>>> Couldn't find an "exact" answer to this through searches...
>>> Regards,
>>> Tuukka
>>>
>>> P.S. I first asked this at SO before realizing the traffic there is very
>>> low:
>>> http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra
>>>
>>>
>


Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-24 Thread Kevin Burton
Yes… I confirmed that getBytesUnsafe works…

I also have a unit test for it so if cassandra ever changes anything we'll
pick it up.

One point in your above code.  I still think charsets are behind a
synchronized code block.

So your above code wouldn't be super fast on multi-core machines.  I
usually use guava's Charsets class since they have static references to all
of them.

… just wanted to point that out since it could bite someone :-P …




On Tue, Jun 24, 2014 at 12:13 AM, Olivier Michallat <
olivier.michal...@datastax.com> wrote:

> Assuming we're talking about the DataStax Java driver:
>
> getBytes will throw an exception, because it validates that the column is
> of type BLOB. But you can use getBytesUnsafe:
>
> ByteBuffer b = row.getBytesUnsafe("aTextColumn");
> // if you want to check it:
> Charset.forName("UTF-8").decode(b);
>
> Regarding whether this will continue working in the future: from the
> driver's perspective, the fact that the native protocol uses UTF-8 is an
> implementation detail, but I doubt this will change any time soon.
>
>
>
>
> On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan  wrote:
>
>> Good idea, bytes are merely processed by the server so you're saving a
>> lot of Cpu. AFAIK getBytes should work fine.
>> Le 24 juin 2014 05:50, "Kevin Burton"  a écrit :
>>
>> I'm building a webservice whereby I read the data from cassandra, then
>>> write it over the wire.
>>>
>>> It's going to push LOTS of content, and encoding/decoding performance
>>> has really bitten us in the future.  So I try to avoid transparent
>>> encoding/decoding if I can avoid it.
>>>
>>> So right now, I have a huge blob of text that's a 'text' column.
>>>
>>> Logically it *should* be text, because that's what it is...
>>>
>>> Can I just keep it as text so our normal tools work on it, but get it as
>>> raw UTF8 if I call getBytes?
>>>
>>> This way I can call getBytes and then send it right over the wire as
>>> pre-encoded UTF8 data.
>>>
>>> ... and of course the question is whether it will continue working in
>>> the future :-P
>>>
>>> I'll write a test of it of course but I wanted to see what you guys
>>> thought of this idea.
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> Skype: *burtonator*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations
>>> are people.
>>>
>>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile


War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Consistency level used when applying atomic batches

2014-06-24 Thread John Sumsion
The atomic batches feature is all about moving the multi-statement application 
to the server side to avoid having to worry about retry logic on the client 
side.  I'm glad that the client doesn't have to worry about it.

An earlier thread about consistency level and atomic batches broke the 
execution of atomic batches into two steps (with an assumed final 3rd step):
1) writing the batch to the batch log
2) applying the individual statements in the batch
3) deleting the batch from the batch log

The question for me is what #2 does, and what guarantees are in place if the 
batch execution returns successfully.  In addition, I wondered how the 
consistency level used when executing the batch affects what is done in step #2.

>From what I can tell from cassandra server code, there is an additional step 
>#0, and there are more guarantees than I thought.

http://fossies.org/linux/misc/apache-cassandra-2.0.8-src.tar.gz/apache-cassandra-2.0.8-src/src/java/org/apache/cassandra/service/StorageProxy.java#l_537

Here is a revised and enhanced order of operations:
a) pre-check replicas for all statements in the batch, assuring that the 
correct number of nodes are available for the consistency level passed from the 
client with the batch
b) write batch to 2 nodes in the local datacenter, unless you have a 
single-node datacenter and use CL.ANY, and wait until those writes are complete 
-- where the batch is written, and how many copies of the batch are written has 
nothing much to do with the consistency level passed from the client with the 
batch -- an attempt is made to get two nodes from two separate racks to write 
the batch
c) for each statement in the batch, execute that statement with the consistency 
level passed from the client with the batch, and wait until the consistency 
level is satisfied on those writes
d) remove the batch from where it had been written, but don't wait

This is interesting, because there are a few guarantees here:
- step a) implies that no writes (including the batch) will occur if any 
consistency-level-required replica is unavailable right before starting to 
submit the writes
- step c) implies that all writes will be complete using the consistency level 
passed with the batch
- steps a), b), and c) will be complete BEFORE the batch execution is 
acknowledged to the client

If I've understood this correctly, please let me know.  If I've come to the 
wrong conclusions, please help me understand.

John...


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.



Re: Storing values of mixed types in a list

2014-06-24 Thread Pavel Kogan
1) You can use list of strings which are serialized JSONs, or use
ByteBuffer with your own serialization as Jeremy suggested.
2) Use Cassandra 2.1 (not officially released yet) were there is new
feature of user defined types.

Pavel




On Tue, Jun 24, 2014 at 9:18 AM, Jeremy Jongsma  wrote:

> Use a ByteBuffer value type with your own serialization (we use protobuf
> for complex value structures)
> On Jun 24, 2014 5:30 AM, "Tuukka Mustonen" 
> wrote:
>
>> Hello,
>>
>> I need to store a list of mixed types in Cassandra. The list may contain
>> numbers, strings and booleans. So I would need something like list.
>>
>> Is this possible in Cassandra and if not, what workaround would you
>> suggest for storing a list of mixed type items? I sketched a few (using a
>> list per type, using list of user types in Cassandra 2.1, etc.), but I get
>> a bad feeling about each.
>>
>> Couldn't find an "exact" answer to this through searches...
>> Regards,
>> Tuukka
>>
>> P.S. I first asked this at SO before realizing the traffic there is very
>> low:
>> http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra
>>
>>


Re: How to perform Range Queries in Cassandra

2014-06-24 Thread Jeremy Jongsma
You'd be better off using external indexing (ElasticSearch or Solr),
Cassandra isn't really designed for this sort of querying.
On Jun 24, 2014 3:09 AM, "Mike Carter"  wrote:

> Hello!
>
>
> I'm a beginner in C* and I'm quite struggling with it.
>
> I’d like to measure the performance of some Cassandra-Range-Queries. The
> idea is to execute multidimensional range-queries on Cassandra. E.g. there
> is a given table of 1million rows with 10 columns and I like to execute
> some queries like “select count(*) from testable where d=1 and v1<10 and v2
> >20 and v3 <45 and v4>70 … allow filtering”.  This kind of queries is very
> slow in C* and soon the tables are bigger, I get a read-timeout probably
> caused by long scan operations.
>
> In further tests I like to extend the dimensions to more than 200 hundreds
> and the rows to 100millions, but actually I can’t handle this small table.
> Should reorganize the data or is it impossible to perform such high
> multi-dimensional queries on Cassandra?
>
>
>
>
>
> The setup:
>
> Cassandra is installed on a single node with 2 TB disk space and 180GB Ram.
>
> Connected to Test Cluster at localhost:9160.
>
> [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
>
>
>
> Keyspace:
>
> CREATE KEYSPACE test WITH replication = {
>
>   'class': 'SimpleStrategy',
>
>   'replication_factor': '1'
>
> };
>
>
>
>
>
> Table:
>
> CREATE TABLE testc21 (
>
>   key int,
>
>   d int,
>
>   v1 int,
>
>   v10 int,
>
>   v2 int,
>
>   v3 int,
>
>   v4 int,
>
>   v5 int,
>
>   v6 int,
>
>   v7 int,
>
>   v8 int,
>
>   v9 int,
>
>   PRIMARY KEY (key)
>
> ) WITH
>
>   bloom_filter_fp_chance=0.01 AND
>
>   caching='ROWS_ONLY' AND
>
>   comment='' AND
>
>   dclocal_read_repair_chance=0.00 AND
>
>   gc_grace_seconds=864000 AND
>
>   index_interval=128 AND
>
>   read_repair_chance=0.10 AND
>
>   replicate_on_write='true' AND
>
>   populate_io_cache_on_flush='false' AND
>
>   default_time_to_live=0 AND
>
>   speculative_retry='99.0PERCENTILE' AND
>
>   memtable_flush_period_in_ms=0 AND
>
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>
>   compression={'sstable_compression': 'LZ4Compressor'};
>
>
>
> CREATE INDEX testc21_d_idx ON testc21 (d);
>
>
>
>  select * from testc21 limit 10;
>
> key| d | v1 | v10 | v2 | v3 | v4  | v5 | v6 | v7 | v8 | v9
>
> +---++-+++-+++++-
>
>  302602 | 1 | 56 |  55 | 26 | 45 |  67 | 75 | 25 | 50 | 26 |  54
>
>  531141 | 1 | 90 |  77 | 86 | 42 |  76 | 91 | 47 | 31 | 77 |  27
>
>  693077 | 1 | 67 |  71 | 14 | 59 | 100 | 90 | 11 | 15 |  6 |  19
>
>4317 | 1 | 70 |  77 | 44 | 77 |  41 | 68 | 33 |  0 | 99 |  14
>
>  927961 | 1 | 15 |  97 | 95 | 80 |  35 | 36 | 45 |  8 | 11 | 100
>
>  313395 | 1 | 68 |  62 | 56 | 85 |  14 | 96 | 43 |  6 | 32 |   7
>
>  368168 | 1 |  3 |  63 | 55 | 32 |  18 | 95 | 67 | 78 | 83 |  52
>
>  671830 | 1 | 14 |  29 | 28 | 17 |  42 | 42 |  4 |  6 | 61 |  93
>
>   62693 | 1 | 26 |  48 | 15 | 22 |  73 | 94 | 86 |  4 | 66 |  63
>
>  488360 | 1 |  8 |  57 | 86 | 31 |  51 |  9 | 40 | 52 | 91 |  45
>
> Mike
>


Re: Storing values of mixed types in a list

2014-06-24 Thread Jeremy Jongsma
Use a ByteBuffer value type with your own serialization (we use protobuf
for complex value structures)
On Jun 24, 2014 5:30 AM, "Tuukka Mustonen" 
wrote:

> Hello,
>
> I need to store a list of mixed types in Cassandra. The list may contain
> numbers, strings and booleans. So I would need something like list.
>
> Is this possible in Cassandra and if not, what workaround would you
> suggest for storing a list of mixed type items? I sketched a few (using a
> list per type, using list of user types in Cassandra 2.1, etc.), but I get
> a bad feeling about each.
>
> Couldn't find an "exact" answer to this through searches...
> Regards,
> Tuukka
>
> P.S. I first asked this at SO before realizing the traffic there is very
> low:
> http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra
>
>


Re: Adding large text blob causes read timeout...

2014-06-24 Thread Jonathan Haddad
Can you do you query in the cli after setting "tracing on"?


On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan  wrote:

> Yes but adding the extra one ends up by * 1000. The limit in CQL3
> specifies the number of logical rows, not the number of physical columns in
> the storage engine
> Le 24 juin 2014 08:30, "Kevin Burton"  a écrit :
>
> oh.. the difference between the the ONE field and the remaining 29 is
>> massive.
>>
>> It's like 200ms for just the 29 columns.. adding the extra one cause it
>> to timeout .. > 5000ms...
>>
>>
>> On Mon, Jun 23, 2014 at 10:30 PM, DuyHai Doan 
>> wrote:
>>
>>> Don't forget that when you do the Select with limit set to 1000,
>>> Cassandra is actually fetching 1000 * 29 physical columns (29 fields per
>>> logical row).
>>>
>>> Adding one extra big html column may be too much and cause timeout. Try
>>> to:
>>>
>>> 1. Select only the big html only
>>> 2. Or reduce the limit incrementally until no timeout
>>> Le 24 juin 2014 06:22, "Kevin Burton"  a écrit :
>>>
>>> I have a table with a schema mostly of small fields.  About 30 of them.

 The primary key is:

 primary key( bucket, sequence )

 … I have 100 buckets and the idea is that sequence is ever increasing.
  This way I can read from bucket zero, and everything after sequence N and
 get all the writes ordered by time.

 I'm running

 SELECT ... FROM content WHERE bucket=0 AND sequence>0 ORDER BY sequence
 ASC LIMIT 1000;

 … using the have driver.

 If I add ALL the fields, except one, so 29 fields, the query is fast.
  Only 129ms….

 However, if I add the 'html' field, which is snapshot of HTML obvious,
 the query times out…

 I'm going to add tracing and try to track it down further, but I
 suspect I'm doing something stupid.

 Is it going to burn me that the data is UTF8 encoded? I can't image
 decoding UTF8 is going to be THAT slow but perhaps cassandra is doing
 something silly under the covers?

 cqlsh doesn't time out … it actually works fine but it uses 100% CPU
 while writing out the data so it's not a good comparison unfortunately


 ception in thread "main"
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: ...:9042
 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
  at
 com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
 at
 com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
  at
 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:172)
 at
 com.datastax.driver.core.SessionManager.execute(SessionManager.java:92)
  at
 com.spinn3r.artemis.robot.console.BenchmarkContentStream.main(BenchmarkContentStream.java:100)
 Caused by:
 com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
 tried for query failed (tried: dev4.wdc.sl.spinn3r.com/10.24.23.94:9042
 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
  at
 com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
 at
 com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:724)


 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.


>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Does the default LIMIT applies to automatic paging?

2014-06-24 Thread Laing, Michael
And with python use future.has_more_pages and
future.start_fetching_next_page().


On Tue, Jun 24, 2014 at 1:20 AM, DuyHai Doan  wrote:

> With the Java Driver,  set the fetchSize and use ResultSet.iterator
> Le 24 juin 2014 01:04, "ziju feng"  a écrit :
>
> Hi All,
>>
>> I have a wide row table that I want to iterate through all rows under a
>> specific partition key. The table may contains around one million rows per
>> partition
>>
>> I was wondering if the default 1 rows LIMIT applies to automatic
>> pagination in C* 2.0 (I'm using Datastax driver). If so, what is best way
>> to retrieve all rows of a given partition? Should I use a super large LIMIT
>> value or should I manually page through the table?
>>
>> Thanks,
>>
>> Ziju
>>
>


Storing values of mixed types in a list

2014-06-24 Thread Tuukka Mustonen
Hello,

I need to store a list of mixed types in Cassandra. The list may contain
numbers, strings and booleans. So I would need something like list.

Is this possible in Cassandra and if not, what workaround would you suggest
for storing a list of mixed type items? I sketched a few (using a list per
type, using list of user types in Cassandra 2.1, etc.), but I get a bad
feeling about each.

Couldn't find an "exact" answer to this through searches...
Regards,
Tuukka

P.S. I first asked this at SO before realizing the traffic there is very
low:
http://stackoverflow.com/questions/24380158/storing-a-list-of-mixed-types-in-cassandra


How to perform Range Queries in Cassandra

2014-06-24 Thread Mike Carter
Hello!


I'm a beginner in C* and I'm quite struggling with it.

I’d like to measure the performance of some Cassandra-Range-Queries. The
idea is to execute multidimensional range-queries on Cassandra. E.g. there
is a given table of 1million rows with 10 columns and I like to execute
some queries like “select count(*) from testable where d=1 and v1<10 and v2
>20 and v3 <45 and v4>70 … allow filtering”.  This kind of queries is very
slow in C* and soon the tables are bigger, I get a read-timeout probably
caused by long scan operations.

In further tests I like to extend the dimensions to more than 200 hundreds
and the rows to 100millions, but actually I can’t handle this small table.
Should reorganize the data or is it impossible to perform such high
multi-dimensional queries on Cassandra?





The setup:

Cassandra is installed on a single node with 2 TB disk space and 180GB Ram.

Connected to Test Cluster at localhost:9160.

[cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]



Keyspace:

CREATE KEYSPACE test WITH replication = {

  'class': 'SimpleStrategy',

  'replication_factor': '1'

};





Table:

CREATE TABLE testc21 (

  key int,

  d int,

  v1 int,

  v10 int,

  v2 int,

  v3 int,

  v4 int,

  v5 int,

  v6 int,

  v7 int,

  v8 int,

  v9 int,

  PRIMARY KEY (key)

) WITH

  bloom_filter_fp_chance=0.01 AND

  caching='ROWS_ONLY' AND

  comment='' AND

  dclocal_read_repair_chance=0.00 AND

  gc_grace_seconds=864000 AND

  index_interval=128 AND

  read_repair_chance=0.10 AND

  replicate_on_write='true' AND

  populate_io_cache_on_flush='false' AND

  default_time_to_live=0 AND

  speculative_retry='99.0PERCENTILE' AND

  memtable_flush_period_in_ms=0 AND

  compaction={'class': 'SizeTieredCompactionStrategy'} AND

  compression={'sstable_compression': 'LZ4Compressor'};



CREATE INDEX testc21_d_idx ON testc21 (d);



select * from testc21 limit 10;

key| d | v1 | v10 | v2 | v3 | v4  | v5 | v6 | v7 | v8 | v9

+---++-+++-+++++-

 302602 | 1 | 56 |  55 | 26 | 45 |  67 | 75 | 25 | 50 | 26 |  54

 531141 | 1 | 90 |  77 | 86 | 42 |  76 | 91 | 47 | 31 | 77 |  27

 693077 | 1 | 67 |  71 | 14 | 59 | 100 | 90 | 11 | 15 |  6 |  19

   4317 | 1 | 70 |  77 | 44 | 77 |  41 | 68 | 33 |  0 | 99 |  14

 927961 | 1 | 15 |  97 | 95 | 80 |  35 | 36 | 45 |  8 | 11 | 100

 313395 | 1 | 68 |  62 | 56 | 85 |  14 | 96 | 43 |  6 | 32 |   7

 368168 | 1 |  3 |  63 | 55 | 32 |  18 | 95 | 67 | 78 | 83 |  52

 671830 | 1 | 14 |  29 | 28 | 17 |  42 | 42 |  4 |  6 | 61 |  93

  62693 | 1 | 26 |  48 | 15 | 22 |  73 | 94 | 86 |  4 | 66 |  63

 488360 | 1 |  8 |  57 | 86 | 31 |  51 |  9 | 40 | 52 | 91 |  45

Mike


Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-24 Thread Robert Stupp
You can use getBytesUnsafe on the UTF8 column

--
Sent from my iPhone 

> Am 24.06.2014 um 09:13 schrieb Olivier Michallat 
> :
> 
> Assuming we're talking about the DataStax Java driver:
> 
> getBytes will throw an exception, because it validates that the column is of 
> type BLOB. But you can use getBytesUnsafe:
> 
> ByteBuffer b = row.getBytesUnsafe("aTextColumn");
> // if you want to check it:
> Charset.forName("UTF-8").decode(b);
> 
> Regarding whether this will continue working in the future: from the driver's 
> perspective, the fact that the native protocol uses UTF-8 is an 
> implementation detail, but I doubt this will change any time soon.
> 
> 
> 
> 
>> On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan  wrote:
>> Good idea, bytes are merely processed by the server so you're saving a lot 
>> of Cpu. AFAIK getBytes should work fine.
>> 
>> Le 24 juin 2014 05:50, "Kevin Burton"  a écrit :
>> 
>>> I'm building a webservice whereby I read the data from cassandra, then 
>>> write it over the wire.
>>> 
>>> It's going to push LOTS of content, and encoding/decoding performance has 
>>> really bitten us in the future.  So I try to avoid transparent 
>>> encoding/decoding if I can avoid it.
>>> 
>>> So right now, I have a huge blob of text that's a 'text' column.
>>> 
>>> Logically it *should* be text, because that's what it is...
>>> 
>>> Can I just keep it as text so our normal tools work on it, but get it as 
>>> raw UTF8 if I call getBytes?
>>> 
>>> This way I can call getBytes and then send it right over the wire as 
>>> pre-encoded UTF8 data.
>>> 
>>> ... and of course the question is whether it will continue working in the 
>>> future :-P
>>> 
>>> I'll write a test of it of course but I wanted to see what you guys thought 
>>> of this idea.
>>> 
>>> -- 
>>> Founder/CEO Spinn3r.com
>>> Location: San Francisco, CA
>>> Skype: burtonator
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
>>> people.
> 


Re: Can I call getBytes on a text column to get the raw (already encoded UTF8)

2014-06-24 Thread Olivier Michallat
Assuming we're talking about the DataStax Java driver:

getBytes will throw an exception, because it validates that the column is
of type BLOB. But you can use getBytesUnsafe:

ByteBuffer b = row.getBytesUnsafe("aTextColumn");
// if you want to check it:
Charset.forName("UTF-8").decode(b);

Regarding whether this will continue working in the future: from the
driver's perspective, the fact that the native protocol uses UTF-8 is an
implementation detail, but I doubt this will change any time soon.




On Tue, Jun 24, 2014 at 7:23 AM, DuyHai Doan  wrote:

> Good idea, bytes are merely processed by the server so you're saving a lot
> of Cpu. AFAIK getBytes should work fine.
> Le 24 juin 2014 05:50, "Kevin Burton"  a écrit :
>
> I'm building a webservice whereby I read the data from cassandra, then
>> write it over the wire.
>>
>> It's going to push LOTS of content, and encoding/decoding performance has
>> really bitten us in the future.  So I try to avoid transparent
>> encoding/decoding if I can avoid it.
>>
>> So right now, I have a huge blob of text that's a 'text' column.
>>
>> Logically it *should* be text, because that's what it is...
>>
>> Can I just keep it as text so our normal tools work on it, but get it as
>> raw UTF8 if I call getBytes?
>>
>> This way I can call getBytes and then send it right over the wire as
>> pre-encoded UTF8 data.
>>
>> ... and of course the question is whether it will continue working in the
>> future :-P
>>
>> I'll write a test of it of course but I wanted to see what you guys
>> thought of this idea.
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> Skype: *burtonator*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
>> people.
>>
>>