Re: Cassandra counters replication uses more traffic than client increments?

2013-01-08 Thread Sylvain Lebresne
Since you're asking about counters, I'll note too that the internal
representation of counters is pretty fat. In you RF=2 case, each counter is
probably about 64 bytes internally, while on the client side you send only
a 8 bytes value for each increment. So I don't think there is anything
unexpected in having more traffic server to server than client to client.

--
Sylvain


On Wed, Jan 9, 2013 at 3:11 AM, aaron morton wrote:

> Can you measure the incoming client traffic on the nodes in DC 1 on port
> 9160 ? That would be more of an Apples to Apples comparison.
>
> I've taken a look at some of the captured packets and it looks like
> there's much more service information in DC-to-DC traffic compared to
>
> client-to-server traffic -- although I am by no means certain here.
>
> In addition to writes the the potential sources of cross DC traffic are
> Gossip and Repair. Gossip is pretty light weight (for a 4 node cluster) and
> repair only happens if you ask it to. There could also be hints delivered
> from DC 1 to DC 2, these would show up in the logs on DC1.
>
> Of the top of my head the Internal RowMutation serialisation is not too
> different to the Thrift mutation messages.
>
> There is also a message header, it includes: Source IP, an int for the
> verb, some overhead for the key/values, the string FORWARD and the
> forwarding IP address.
>
> Compare this to a mutation message: keyspace name, row key, column family
> ID (int), column name, value + list/hash overhead.
>
> So for small single column updates the ratio of overhead to payload is
> kind of high.
>
> - Is it indeed the case that server-to-server replication traffic can be
> significantly more bloated than client-to-server traffic? Or do I need to
> review my testing methodology?
>
> The meta data on the inter node messages is pretty static, the bigger the
> payloads the lower the ratio of overhead to payload. This is the same as
> messages that go between nodes within the same DC.
>
> - Is there anything that can be done to reduce cross-DC replication
> traffic? Perhaps some compression scheme?
>
> fixed in 1.2
> https://issues.apache.org/jira/browse/CASSANDRA-3127?attachmentOrder=desc
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/01/2013, at 11:36 PM, Sergey Olefir  wrote:
>
> So with the holidays hopefully being over, I thought I'd ask again :)
>
> Could someone please help with answers to the two questions:
> - Is it reasonable to expect that cross-datacenter node-to-node replication
> traffic is greater than actual client-to-server traffic that generates this
> activity? Specifically talking about counter increments.
> - Is there anything that can be done to lower the amount of
> cross-datacenter
> replication traffic while keeping actual replication going (i.e. we can't
> afford to not replicate data, but we can afford e.g. delays in
> replication)?
>
> Best regards,
> Sergey
>
>
> Sergey Olefir wrote
>
> Hi,
>
> as part of our ongoing tests with Cassandra, we've tried to evaluate the
> amount of traffic generated in client-to-server and server-to-server
> (replication scenarios).
>
> The results we are getting are surprising.
>
> Our setup:
> - Cassandra 1.1.7.
> - 3 DC with 2 nodes each.
> - NetworkTopology replication strategy with 2 replicas per DC (so
> basically each node contains full data set).
> - 100 clients concurrently incrementing counters at the rate of the
> roughly 100 / second (i.e. about 10k increments per second). Clients
> perform writes to DC:1 only. server-to-server traffic measurement was done
> in DC:2.
> - Clients use batches to write to the server (up to 100 increments per
> batch, overall each client writes 1 or 2 batches per second).
>
> Clients are Java-based accessing Cassandra via hector. Run on Windows box.
>
> Traffic measurement for clients (on Windows) was done via Resource Monitor
> and packet capture via Network Monitor. The overall traffic appears to be
> roughly 700KB/sec (kilobytes) for ~1 increments).
>
> Traffic measurement for server-to-server was done on DC:2 via packet
> capture. This capture specifically included only nodes in other
> datacenters (so no internal DC traffic was captured).
>
> The vast majority of traffic was directed to one node DC:2-1. DC2-2
> received like 1/30 of the traffic. I think I've read somewhere that
> Cassandra directs DC-to-DC traffic to one node, so this makes sense.
>
> What is surprising though -- is the amount of traffic. It looks to be
> roughly twice the amount of the total traffic generated by clients, i.e.
> something like 1.5MB/sec (megabytes). Note -- this only counts incoming
> traffic.
>
> I've taken a look at some of the captured packets and it looks like
> there's much more service information in DC-to-DC traffic compared to
> client-to-server traffic -- although I am by no means certain here.
>
>
> Overall I have a couple of questions:
> -

Re: Date Index?

2013-01-08 Thread aaron morton
There has to be one equality clause in there, and thats the thing to cassandra 
uses to select of disk. The others are in memory filters. 

So if you have one on the year+month you can have a simple select clause and it 
limits the amount of data that has to be read. 

If you have like many 10's to 100's millions of things in the same month you 
may want to do some performance testing. There can still be times when you want 
to support common read paths by using custom / hand rolled indexes.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 6:05 AM, stephen.m.thomp...@wellsfargo.com wrote:

> Hi folks –
>  
> Question about secondary indexes.  How are people doing date indexes?I 
> have a date column in my tables in RDBMS that we use frequently, such as look 
> at all records recorded in the last month.  What is the best practice for 
> being able to do such a query?  It seems like there could be an advantage to 
> adding a couple of columns like this:
>  
> {timestamp=2013/01/08 12:32:01 -0500}
> {month=201301}
> {day=08}
>  
> And then I could do secondary index on the month and day columns?  Would that 
> be the best way to do something like this?  Is there any accepted “best 
> practice” on this yet?
>  
> Thanks!
> Steve



Re: How long does it take for a write to actually happen?

2013-01-08 Thread aaron morton
> EC2 m1.large node
You will have a much happier time if you use a m1.xlarge. 

> We set MAX_HEAP_SIZE="6G" and HEAP_NEWSIZE="400M"  
Thats a pretty low new heap size.

> checks for new entries (in "Entries" CF, with indexed column status=1), 
> processes them, and sets the status to 2, when done
This is not the best data model. 
You may be better have one CF for the unprocessed and one for the process. 
Or if you really need a queue using something like Kafka. 

> I will appreciate any advice on how to speed the writes up,
Writes are instantly available for reading. 
The first thing I would do is see where the delay is. Use the nodetool cfstats 
to see the local write latency, or track the write latency from the client 
perspective. 

If you are looking for near real time / continuous computation style processing 
take a look at http://storm-project.net/ and register for this talk from a 
Brian O'Neill one of my fellow Data Stax MVP's 
http://learn.datastax.com/WebinarCEPDistributedProcessingonCassandrawithStorm_Registration.html

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 5:48 AM, Vitaly Sourikov  wrote:

> Hi,
> we are currently at an early stage of our project and have only one Cassandra 
> 1.1.7 node hosted on EC2 m1.large node, where the data is written to the 
> ephemeral disk, and /var/lib/cassandra/data is just a soft link to it. Commit 
> logs and caches are still on /var/lib/cassandra/. We set MAX_HEAP_SIZE="6G" 
> and HEAP_NEWSIZE="400M"  
> 
> On the client-side, we use Astyanax 1.56.18 to access the data.  We have a 
> processing server that writes to Cassandra, and an online server that reads 
> from it. The former wakes up every 0.5-5sec., checks for new entries (in 
> "Entries" CF, with indexed column status=1), processes them, and sets the 
> status to 2, when done. The online server checks once a second if an entry 
> that should be processed got the status 2 and sends it to its client side for 
> display. Processing takes 5-10 seconds and updates various columns in the 
> "Entries" CF few times on the way. One of these columns may contain ~12KB of 
> textual data, others are just short strings or numbers.
> 
> Now, our problem is that it takes 20-40 seconds before the online server 
> actually sees the change - and it is way too long, this process is supposed 
> to be nearly real-time. Moreover, in sqlsh, if I perform a similar update, it 
> is immediately seen in the following select results, but the updates from the 
> back-end server also do not appear for 20-40 seconds. 
> 
> I tried switching the row caches for that table and in yaml on and of. I 
> tried commitlog_sync: batch with commitlog_sync_batch_window_in_ms: 50. 
> Nothing helped. 
> 
> I will appreciate any advice on how to speed the writes up, or at least an 
> explanation why this happens.
> 
> thanks,
> Vitaly



Re: Helenos 1.3 released

2013-01-08 Thread aaron morton
I was going to say update the Wiki, but it's already there. 

Thanks for contributing. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 1:02 AM, Tomek Kuprowski  wrote:

> Dear all,
> 
> I'd like to announce new release of Helenos.
> 
> Project site: https://github.com/tomekkup/helenos
> 
> Changelog: https://github.com/tomekkup/helenos/wiki/Changelog
> 
> Download: https://sourceforge.net/projects/helenos-gui/files/
> 
> Preview: http://www.youtube.com/watch?v=gOWHN6bCybQ&hd=1&autoplay=1
> 
> Regards
> 
> Tom 



Re: Cassandra counters replication uses more traffic than client increments?

2013-01-08 Thread aaron morton
Can you measure the incoming client traffic on the nodes in DC 1 on port 9160 ? 
That would be more of an Apples to Apples comparison. 

>> I've taken a look at some of the captured packets and it looks like
>> there's much more service information in DC-to-DC traffic compared to
>> client-to-server traffic -- although I am by no means certain here.

In addition to writes the the potential sources of cross DC traffic are Gossip 
and Repair. Gossip is pretty light weight (for a 4 node cluster) and repair 
only happens if you ask it to. There could also be hints delivered from DC 1 to 
DC 2, these would show up in the logs on DC1.

Of the top of my head the Internal RowMutation serialisation is not too 
different to the Thrift mutation messages.   

There is also a message header, it includes: Source IP, an int for the verb, 
some overhead for the key/values, the string FORWARD and the forwarding IP 
address. 

Compare this to a mutation message: keyspace name, row key, column family ID 
(int), column name, value + list/hash overhead.

So for small single column updates the ratio of overhead to payload is kind of 
high. 

>> - Is it indeed the case that server-to-server replication traffic can be
>> significantly more bloated than client-to-server traffic? Or do I need to
>> review my testing methodology?

The meta data on the inter node messages is pretty static, the bigger the 
payloads the lower the ratio of overhead to payload. This is the same as 
messages that go between nodes within the same DC. 

>> - Is there anything that can be done to reduce cross-DC replication
>> traffic? Perhaps some compression scheme?
fixed in 1.2
https://issues.apache.org/jira/browse/CASSANDRA-3127?attachmentOrder=desc

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/01/2013, at 11:36 PM, Sergey Olefir  wrote:

> So with the holidays hopefully being over, I thought I'd ask again :)
> 
> Could someone please help with answers to the two questions:
> - Is it reasonable to expect that cross-datacenter node-to-node replication
> traffic is greater than actual client-to-server traffic that generates this
> activity? Specifically talking about counter increments.
> - Is there anything that can be done to lower the amount of cross-datacenter
> replication traffic while keeping actual replication going (i.e. we can't
> afford to not replicate data, but we can afford e.g. delays in replication)?
> 
> Best regards,
> Sergey
> 
> 
> Sergey Olefir wrote
>> Hi,
>> 
>> as part of our ongoing tests with Cassandra, we've tried to evaluate the
>> amount of traffic generated in client-to-server and server-to-server
>> (replication scenarios).
>> 
>> The results we are getting are surprising.
>> 
>> Our setup:
>> - Cassandra 1.1.7.
>> - 3 DC with 2 nodes each.
>> - NetworkTopology replication strategy with 2 replicas per DC (so
>> basically each node contains full data set).
>> - 100 clients concurrently incrementing counters at the rate of the
>> roughly 100 / second (i.e. about 10k increments per second). Clients
>> perform writes to DC:1 only. server-to-server traffic measurement was done
>> in DC:2.
>> - Clients use batches to write to the server (up to 100 increments per
>> batch, overall each client writes 1 or 2 batches per second).
>> 
>> Clients are Java-based accessing Cassandra via hector. Run on Windows box.
>> 
>> Traffic measurement for clients (on Windows) was done via Resource Monitor
>> and packet capture via Network Monitor. The overall traffic appears to be
>> roughly 700KB/sec (kilobytes) for ~1 increments).
>> 
>> Traffic measurement for server-to-server was done on DC:2 via packet
>> capture. This capture specifically included only nodes in other
>> datacenters (so no internal DC traffic was captured).
>> 
>> The vast majority of traffic was directed to one node DC:2-1. DC2-2
>> received like 1/30 of the traffic. I think I've read somewhere that
>> Cassandra directs DC-to-DC traffic to one node, so this makes sense.
>> 
>> What is surprising though -- is the amount of traffic. It looks to be
>> roughly twice the amount of the total traffic generated by clients, i.e.
>> something like 1.5MB/sec (megabytes). Note -- this only counts incoming
>> traffic.
>> 
>> I've taken a look at some of the captured packets and it looks like
>> there's much more service information in DC-to-DC traffic compared to
>> client-to-server traffic -- although I am by no means certain here.
>> 
>> 
>> Overall I have a couple of questions:
>> - Is it indeed the case that server-to-server replication traffic can be
>> significantly more bloated than client-to-server traffic? Or do I need to
>> review my testing methodology?
>> - Is there anything that can be done to reduce cross-DC replication
>> traffic? Perhaps some compression scheme? Or some delay before replication
>> allowing for possibly more increments to be merged together?
>> 
>> 
>> Best regards,
>> S

Re: The solution of the write-in load of Cassandra?

2013-01-08 Thread aaron morton
> - Owing to what does a log continue coming out? (Is it presumed?)  
Were you noticing file times changing?
The log files are recycled so it may have been that or from the 10 second 
commit log fsync.

Can you provide more details on what you saw?

>  
> The data of the contents of SSTABLE after a reboot was lost in large 
> quantities. 
> (The data of the first part of load writing was lost especially.)  
Did you put the SSTables back?
Once data is committed to the SSTable the relevant parts of the commit log are 
marked as no longer necessary. Once the commit log segment is recycled that 
data is gone.

> If the disk domains where Cassandra writes in a commitment log run short, 
> the core dump would occur and Cassandra will have carried out the process down

I don't think Cassandra would shut down in that case, thought it may have 
changed. It would probably block the writes.

>  Although the design of a resource is important, isn't there any method of 
>   perceiving a process down beforehand? 
Not sure what you mean here.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/01/2013, at 23:59, hiroshi.kise...@hitachi.com wrote:

> 
> Dear everyone 
> 
> I am troubled by the solution of the write-in load of Cassandra.  
> Please teach, if there is a good solution. 
> Cassandra is working only by one server and an action is Cassandra-1.1.7. 
> Cassandra-1.1.8 was the same. 
> 
>  
> 1. Where a certain amount of load is applied from a thrift client, write in. 
> 2. A disk fills and an exception begins to happen. 
>   (However, a Cassandra process continuation.)  
> 3. Stop Cassandra in this state. 
> 4. Archive the file stored in the SSTABLE domain by tar, and extend a disk 
> domain. 
> 5. Reboot Cassandra. (In series action of a reboot of Cassandra, data is 
> returned to 
>   SSTABLE from a commitment log.)  
> 
> The storage location of the commitment log was different from SSTABLE, 
> and disk storage capacity had sufficient margin. 
> 
> 6. The contents of the data returned to SSTABLE were checked. 
> 
> 
>  
>  
> After the disk filled, also where writing is stopped from a thrift client, 
> the log from 
> Memtable to a disk which is going to flush continued being outputted to 1 
> second 
> at intervals of about 1 time. 
> (Although an exception does not come out, it is unknown whether writing was 
> successful.)  
> - Owing to what does a log continue coming out? (Is it presumed?)  
> 
>  
> The data of the contents of SSTABLE after a reboot was lost in large 
> quantities. 
> (The data of the first part of load writing was lost especially.)  
> 
> - Is this an unavoidable phenomenon in employment of one server? 
> - What is the cause? (Thinking is ?)  - what is the method of the measure? 
> 
> 
>  (changing a viewpoint)
> If the disk domains where Cassandra writes in a commitment log run short, 
> the core dump would occur and Cassandra will have carried out the process 
> down. 
> - Although the design of a resource is important, isn't there any method of 
>   perceiving a process down beforehand? 
> --
> Hiroshi KIse
> Hitachi, Ltd., Information & Telecommunication System Company


JIRA for native IAuthorizer and IAuthenticator ?

2013-01-08 Thread Frank Hsueh
I am very interested in the native IAuthorizer and IAuthenticator
implementation. However, I can't find a JIRA entry to follow in the 1.2.1
[1] or 1.2.2 [2] issues page.

does anybody know about it ?

thanks !

[1]
https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%221.2.1%22%20AND%20project%20%3D%20CASSANDRA
[2]
https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%221.2.2%22%20AND%20project%20%3D%20CASSANDRA




On Wed, Jan 2, 2013 at 7:00 AM, Sylvain Lebresne wrote:

> The Cassandra team wishes you a very happy new year 2013, and is very
> pleased
> to announce the release of Apache Cassandra version 1.2.0. Cassandra 1.2.0
> is a
> new major release for the Apache Cassandra distributed database. This
> version
> adds numerous improvements[1,2] including (but not restricted to):
> - Virtual nodes[4]
> - The final version of CQL3 (featuring many improvements)
> - Atomic batches[5]
> - Request tracing[6]
> - Numerous performance improvements[7]
> - A new binary protocol for CQL3[8]
> - Improved configuration options[9]
> - And much more...
>
> Please make sure to carefully read the release notes[2] before upgrading.
>
> Both source and binary distributions of Cassandra 1.2.0 can be downloaded
> at:
>
>  http://cassandra.apache.org/download/
>
> Or you can use the debian package available from the project APT
> repository[3]
> (you will need to use the 12x series).
>
> The Cassandra Team
>
> [1]: http://goo.gl/JmKp3 (CHANGES.txt)
> [2]: http://goo.gl/47bFz (NEWS.txt)
> [3]: http://wiki.apache.org/cassandra/DebianPackaging
> [4]: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
> [5]: http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
> [6]: http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
> [7]:
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
> [8]: http://www.datastax.com/dev/blog/binary-protocol
> [9]:
> http://www.datastax.com/dev/blog/configuration-changes-in-cassandra-1-2
>



-- 
Frank Hsueh | frank.hs...@gmail.com


remote datacentre consistency

2013-01-08 Thread Jabbar
I'm a bit confused about how a two datacentre apache cassandra cluster
keeps the data consistent.

>From what I understand a client application in datacentre1 contacts a
coordinator node which sends the data to the local replicas and it also
sends the updates to the remote coordinator in the remote data centre.

Does the local coordinator send the updates asynchronously to the local
replicas and the remote coordinator node?

What happens if the bandwidth is severely restricted to the remote
datacentre? Do the updates for the remote coordinator keep getting buffered
up in the local coordinator?

What happens if the connection to the remote coordinator is down? Would
hinted hand off be used to recover from this scenario?  What options are
there to synchronise the remote datacentre if the connectivity comes back
after a couple of days?


-- 
Thanks

 A Jabbar Azam


Re: inconsistent hadoop/cassandra results

2013-01-08 Thread aaron morton
Assuming their were no further writes, running repair or using CL all should 
have fixed it. 

Can you describe the inconsistency between runs? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/01/2013, at 2:16 AM, Brian Jeltema  wrote:

> I need some help understanding unexpected behavior I saw in some recent 
> experiments with Cassandra 1.1.5 and Hadoop 1.0.3:
> 
> I've written a small map/reduce job that simply counts the number of columns 
> in each row of a static CF (call it Foo) 
> and generates a list of every row and column count. A relatively small 
> fraction of the rows have a large number
> of columns; worst case is approximately 36 million. So when I set up the job, 
> I used wide-row support:
> 
> ConfigHelper.setInputColumnFamily(job.getConfiguration(), "fooKS", "Foo", 
> WIDE_ROWS); // where WIDE_ROWS == true
> 
> When I ran this job using the default CL (1) I noticed that the results 
> varied from run to run, which I attributed to inconsistent
> replicas, since Foo was generated with CL == 1 and the RF == 3. 
> 
> So I ran repair for that CF on every node. The cassandra log on every node 
> contains lines similar to:
> 
>   INFO [AntiEntropyStage:1] 2013-01-05 20:38:48,605 AntiEntropyService.java 
> (line 778) [repair #e4a1d7f0-579d-11e2--d64e0a75e6df] Foo is fully synced
> 
> However, repeated runs were still inconsistent. Then I set CL to ALL, which I 
> presumed would always result in identical
> output, but repeated runs initially continued to be inconsistent. However, I 
> noticed that the results seemed to
> be converging, and after several runs (somewhere between 4 and 6) I finally 
> was producing identical results on every run.
> Then I set CL to QUORUM, and again generated inconsistent results.
> 
> Does this behavior make sense?
> 
> Brian



Re: Script to load sstables from v1.0.x to v 1.1.x

2013-01-08 Thread Rob Coli
On Tue, Jan 8, 2013 at 11:56 AM, Todd Nine  wrote:
> Our current production
> cluster is still on 1.0.x, so we can either fire up a 1.0.x cluster, then
> upgrade every node to accomplish this, or just use the script.

No 1.0 cluster is required to restore 1.0 directory structure to a 1.1
cluster and have the tables be migrated by Cassandra. The 1.1 node
should look at the 1.0 directory structure you just restored and
migrate it automagically.

> We also have
> a different number of nodes in stage vs production, so we'd still need to
> run a repair if we did a straight sstable copy.

This is a compelling reason to bulk load. My commentary merely points
out that if you *aren't* changing cluster size/topology, Cassandra 1.1
should be migrating the sstables for you. :)

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Script to load sstables from v1.0.x to v 1.1.x

2013-01-08 Thread Todd Nine
Our use case is for testing migrations in our data, as well as stress testing 
outside our production environment.  To do this, we load our backups into a 
fresh cluster, then perform our testing.  Our current production cluster is 
still on 1.0.x, so we can either fire up a 1.0.x cluster, then upgrade every 
node to accomplish this, or just use the script. We also have a different 
number of nodes in stage vs production, so we'd still need to run a repair if 
we did a straight sstable copy.   The script is a lot faster and easier for us 
than going through the upgrade process, then running repair to ensure the data 
is distributed correctly in the ring.



--  
Todd Nine


On Tuesday, January 8, 2013 at 12:32 PM, Michael Kjellman wrote:

> I thought this was to load between separate clusters not to upgrade within 
> the same cluster. No?
>  
> On Jan 8, 2013, at 11:29 AM, "Rob Coli"  (mailto:rc...@palominodb.com)> wrote:
>  
> > On Tue, Jan 8, 2013 at 8:41 AM, Todd Nine  > (mailto:todd.n...@gmail.com)> wrote:
> > > I have recently been trying to restore backups from a v1.0.x cluster we
> > > have into a 1.1.7 cluster. This has not been as trivial as I expected, and
> > > I've had a lot of help from the IRC channel in tackling this problem. As a
> > > way of saying thanks, I'd like to contribute the updated ruby script I was
> > > originally given for accomplishing this task. Here it is.
> > >  
> >  
> >  
> > While I laud your contribution, I am still not fully understanding why
> > this is not working "automagically", as it should :
> >  
> > http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement
> > "
> > What about upgrading?
> >  
> > Do you need to manually move all pre-1.1 data files to the new
> > directory structure before upgrading to 1.1? No. Immediately after
> > Cassandra 1.1 starts, it checks to see whether it has old directory
> > structure and migrates all data files (including backups and
> > snapshots) to the new directory structure if needed. So, just upgrade
> > as you always do (don’t forget to read NEWS.txt first), and you will
> > get more control over data files for free.
> > "
> >  
> > Is it possible that, for example, the installation of the debian
> > package results in your 1.1.x node starting up before you intend it
> > to.. and then when you start it again with the 1.0 paths, it doesn't
> > try to change the paths?
> >  
> > " * To check if sstables needs migration, we look at the System
> > directory. If it contains a directory for the status cf, we'll attempt
> > a sstable migrating. "
> >  
> > This quote from Directories.java (thx driftx!) suggests that any
> > starting of a 1.1 node, which would result in a "Status" columnfamily
> > being created, would make sstablesNeedsMigration return false.
> >  
> > If this is your case due to the use of the debian package or similar
> > which auto-starts, your input is welcomed at :
> >  
> > https://issues.apache.org/jira/browse/CASSANDRA-2356
> >  
> > =Rob
> >  
> > --  
> > =Robert Coli
> > AIM>ALK - rc...@palominodb.com (mailto:rc...@palominodb.com)
> > YAHOO - rcoli.palominob
> > SKYPE - rcoli_palominodb
> >  
>  
>  
> Southfield Public School students safely access the tech tools they need on 
> and off campus with the Barracuda Web Filter.
>  
> Quick installation and easy to use- try the Barracuda Web Filter free for 30 
> days: http://on.fb.me/Vj6JBd  



Re: Script to load sstables from v1.0.x to v 1.1.x

2013-01-08 Thread Michael Kjellman
I thought this was to load between separate clusters not to upgrade within the 
same cluster. No?

On Jan 8, 2013, at 11:29 AM, "Rob Coli"  wrote:

> On Tue, Jan 8, 2013 at 8:41 AM, Todd Nine  wrote:
>>  I have recently been trying to restore backups from a v1.0.x cluster we
>> have into a 1.1.7 cluster.  This has not been as trivial as I expected, and
>> I've had a lot of help from the IRC channel in tackling this problem.  As a
>> way of saying thanks, I'd like to contribute the updated ruby script I was
>> originally given for accomplishing this task.  Here it is.
> 
> While I laud your contribution, I am still not fully understanding why
> this is not working "automagically", as it should :
> 
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement
> "
> What about upgrading?
> 
> Do you need to manually move all pre-1.1 data files to the new
> directory structure before upgrading to 1.1? No. Immediately after
> Cassandra 1.1 starts, it checks to see whether it has old directory
> structure and migrates all data files (including backups and
> snapshots) to the new directory structure if needed. So, just upgrade
> as you always do (don’t forget to read NEWS.txt first), and you will
> get more control over data files for free.
> "
> 
> Is it possible that, for example, the installation of the debian
> package results in your 1.1.x node starting up before you intend it
> to.. and then when you start it again with the 1.0 paths, it doesn't
> try to change the paths?
> 
> " * To check if sstables needs migration, we look at the System
> directory. If it contains a directory for the status cf, we'll attempt
> a sstable migrating. "
> 
> This quote from Directories.java (thx driftx!) suggests that any
> starting of a 1.1 node, which would result in a "Status" columnfamily
> being created, would make sstablesNeedsMigration return false.
> 
> If this is your case due to the use of the debian package or similar
> which auto-starts, your input is welcomed at :
> 
> https://issues.apache.org/jira/browse/CASSANDRA-2356
> 
> =Rob
> 
> -- 
> =Robert Coli
> AIM>ALK - rc...@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb

Southfield Public School students safely access the tech tools they need on and 
off campus with the Barracuda Web Filter.

Quick installation and easy to use- try the Barracuda Web Filter free for 30 
days: http://on.fb.me/Vj6JBd


Re: Script to load sstables from v1.0.x to v 1.1.x

2013-01-08 Thread Rob Coli
On Tue, Jan 8, 2013 at 8:41 AM, Todd Nine  wrote:
>   I have recently been trying to restore backups from a v1.0.x cluster we
> have into a 1.1.7 cluster.  This has not been as trivial as I expected, and
> I've had a lot of help from the IRC channel in tackling this problem.  As a
> way of saying thanks, I'd like to contribute the updated ruby script I was
> originally given for accomplishing this task.  Here it is.

While I laud your contribution, I am still not fully understanding why
this is not working "automagically", as it should :

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement
"
What about upgrading?

Do you need to manually move all pre-1.1 data files to the new
directory structure before upgrading to 1.1? No. Immediately after
Cassandra 1.1 starts, it checks to see whether it has old directory
structure and migrates all data files (including backups and
snapshots) to the new directory structure if needed. So, just upgrade
as you always do (don’t forget to read NEWS.txt first), and you will
get more control over data files for free.
"

Is it possible that, for example, the installation of the debian
package results in your 1.1.x node starting up before you intend it
to.. and then when you start it again with the 1.0 paths, it doesn't
try to change the paths?

" * To check if sstables needs migration, we look at the System
directory. If it contains a directory for the status cf, we'll attempt
a sstable migrating. "

This quote from Directories.java (thx driftx!) suggests that any
starting of a 1.1 node, which would result in a "Status" columnfamily
being created, would make sstablesNeedsMigration return false.

If this is your case due to the use of the debian package or similar
which auto-starts, your input is welcomed at :

https://issues.apache.org/jira/browse/CASSANDRA-2356

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: help turning compaction..hours of run to get 0% compaction....

2013-01-08 Thread B. Todd Burruss
i'll second edward's comment.  cassandra is designed to scale horizontally,
so if disk I/O is slowing you down then you must scale


On Tue, Jan 8, 2013 at 7:10 AM, Jim Cistaro  wrote:

>  One metric to watch is pending compactions (via nodetool
> compactionstats).  This count will give you some idea of whether you are
> falling behind with compactions.  The other measure is how long you are
> compacting after your inserts have stopped.
>
>  If I understand correctly, since you never update the data, that would
> explain why the compaction logging shows 100% of orig.  With size-tiered,
> you are flushing small files, compacting when you get 4 of like size, etc.
>  Since you have no updates, the compaction will not shrink the data.
>
>  As Aaron said, use iostat –x (or dstat) to see if you are taxing the
> disks.  If so, then leveled compaction may be your option (for reasons
> already stated).  If not taxing the disks, then you might want to increase
> your compaction throughput, as you suggested.
>
>  Depending on what version you are using, another thing to possibly tune
> is the size of sstables when flushed to disk.  In your case of insert only,
> the smaller the flush size, the more times that row is going to be
> rewritten during a compaction (hence increase I/O).
>
>  jc
>
>   From: Edward Capriolo 
> Reply-To: "user@cassandra.apache.org" 
> Date: Monday, January 7, 2013 2:33 PM
>
> To: "user@cassandra.apache.org" 
> Subject: Re: help turning compaction..hours of run to get 0%
> compaction
>
>  There is some point where you simply need more machines.
>
> On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman 
> wrote:
>
>>  Right, I guess I'm saying that you should try loading your data with
>> leveled compaction and see how your compaction load is.
>>
>>  Your work load sounds like leveled will fit much better than size
>> tiered.
>>
>>   From: Brian Tarbox 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Monday, January 7, 2013 1:58 PM
>> To: "user@cassandra.apache.org" 
>> Subject: Re: help turning compaction..hours of run to get 0%
>> compaction
>>
>>  The problem I see is that it already takes me more than 24 hours just
>> to load my data...during which time the logs say I'm spending tons of time
>> doing compaction.  For example in the last 72 hours I'm consumed* 20
>> hours* per machine on compaction.
>>
>>  Can I conclude from that than I should be (perhaps drastically)
>> increasing my compaction_mb_per_sec on the theory that I'm getting behind?
>>
>>  The fact that it takes me 3 days or more to run a test means its hard
>> to just play with values and see what works best, so I'm trying to
>> understand the behavior in detail.
>>
>>  Thanks.
>>
>>  Brain
>>
>>
>> On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman > > wrote:
>>
>>>  http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
>>>
>>>  "If you perform at least twice as many reads as you do writes, leveled
>>> compaction may actually save you disk I/O, despite consuming more I/O for
>>> compaction. This is especially true if your reads are fairly random and
>>> don’t focus on a single, hot dataset."
>>>
>>>   From: Brian Tarbox 
>>> Reply-To: "user@cassandra.apache.org" 
>>>  Date: Monday, January 7, 2013 12:56 PM
>>> To: "user@cassandra.apache.org" 
>>> Subject: Re: help turning compaction..hours of run to get 0%
>>> compaction
>>>
>>>  I have not specified leveled compaction so I guess I'm defaulting to
>>> size tiered?  My data (in the column family causing the trouble) insert
>>> once, ready many, update-never.
>>>
>>>  Brian
>>>
>>>
>>> On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman <
>>> mkjell...@barracuda.com> wrote:
>>>
  Size tiered or leveled compaction?

   From: Brian Tarbox 
 Reply-To: "user@cassandra.apache.org" 
 Date: Monday, January 7, 2013 12:03 PM
 To: "user@cassandra.apache.org" 
 Subject: help turning compaction..hours of run to get 0% compaction

  I have a column family where I'm doing 500 inserts/sec for 12 hours
 or so at time.  At some point my performance falls off a cliff due to time
 spent doing compactions.

  I'm seeing row after row of logs saying that after 1 or 2 hours of
 compactiing it reduced to 100% of 99% of the original.

  I'm trying to understand what direction this data points me to in
 term of configuration change.

 a) increase my compaction_throughput_mb_per_sec because I'm
 falling behind (am I falling behind?)

 b) enable multi-threaded compaction?

  Any help is appreciated.

  Brian

 --
 Join Barracuda Networks in the fight against hunger.
 To learn how you can help in your community, please visit:
 http://on.fb.me/UAdL4f
   ­­

>>>
>>>
>>> --
>>> Join Barracuda Networks in the fight against hunger.
>>> To learn how you can help in your community, please visit

Date Index?

2013-01-08 Thread Stephen.M.Thompson
Hi folks -

Question about secondary indexes.  How are people doing date indexes?I have 
a date column in my tables in RDBMS that we use frequently, such as look at all 
records recorded in the last month.  What is the best practice for being able 
to do such a query?  It seems like there could be an advantage to adding a 
couple of columns like this:

{timestamp=2013/01/08 12:32:01 -0500}
{month=201301}
{day=08}

And then I could do secondary index on the month and day columns?  Would that 
be the best way to do something like this?  Is there any accepted "best 
practice" on this yet?

Thanks!
Steve


Script to load sstables from v1.0.x to v 1.1.x

2013-01-08 Thread Todd Nine
Hi all,
  I have recently been trying to restore backups from a v1.0.x cluster we
have into a 1.1.7 cluster.  This has not been as trivial as I expected, and
I've had a lot of help from the IRC channel in tackling this problem.  As a
way of saying thanks, I'd like to contribute the updated ruby script I was
originally given for accomplishing this task.  Here it is.

https://gist.github.com/1c161edab88a4e4aea06


It takes a keyspace directory as the input, then creates symlinks in the
output directory with the 1.1 structure pointing to the 1.0 sstables.  If
you've specified a host, it will then invoke the sstableloader for each of
the Keyspaces and CFs it discovers in the output directory.  I hope this is
helpful to someone else.  I'll keep the gist updated as I update the script.

Todd


Re: about validity of recipe "A node join using external data copy methods"

2013-01-08 Thread Edward Capriolo
It has been true since about 0.8. in the old days ANTI-COMPACTION stunk and
many weird errors would cause node joins to have to be retried N times.

Now node moves/joins seem to work near 100% of the time (in 1.0.7) they are
also very fast and efficient.

If you want to move a node to new hardware you can do it with rsync, but I
would not use the technique for growing the cluster. It is error prone, and
ends up being more work.

On Tue, Jan 8, 2013 at 10:57 AM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:

>   " Now streaming is very efficient rarely fails and there is no need to
> do it this way anymore"
>
>
>
> I guess it's true in v1.2.
>
> Is it true also in v1.1 ?
>
>
>
> Thanks.
>
>
>
> Dominique
>
>
>
>
>
> *De :* Edward Capriolo [mailto:edlinuxg...@gmail.com]
> *Envoyé :* mardi 8 janvier 2013 16:01
> *À :* user@cassandra.apache.org
> *Objet :* Re: about validity of recipe "A node join using external data
> copy methods"
>
>
>
> Basically this recipe is from the old days when we had anti-compaction.
> Now streaming is very efficient rarely fails and there is no need to do it
> this way anymore. This recipe will be abolished from the second edition. It
> still likely works except when using counters.
>
>
>
> Edward
>
>
>
> On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique <
> dominique.dev...@thalesgroup.com> wrote:
>
> Hi,
>
>
>
> Edward Capriolo described in his Cassandra book a faster way [1] to start
> new nodes if the cluster size doubles, from N to 2 *N.
>
>
>
> It's about splitting in 2 parts each token range taken in charge, after
> the split, with 2 nodes: the existing one, and a new one. And for starting
> a new node, one needs to:
>
> - copy the data records from the corresponding node (without the "system"
> records)
>
> - start the new node with "auto_bootstrap: false"
>
>
>
> This raises 2 questions:
>
>
>
> A) is this recipe still valid with v1.1 and v1.2 ?
>
>
>
> B) do we still need to start the new node with "auto_bootstrap: false" ?
>
> My guess is "yes" as the happening of the bootstrap phase is not recorded
> into the data records.
>
>
>
> Thanks.
>
>
>
> Dominique
>
>
>
> [1] see recipe "A node join using external data copy methods", page 165
>
>
>


Re: Astyanax

2013-01-08 Thread Brian O'Neill
Not sure where you are on the learning curve, but I've put a couple "getting
started" projects out on github:
https://github.com/boneill42/astyanax-quickstart

And the latest from the webinar is here:
https://github.com/boneill42/naughty-or-nice
http://brianoneill.blogspot.com/2013/01/creating-your-frist-java-application
-w.html

-brian

---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42    €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Radek Gruchalski 
Reply-To:  
Date:  Tuesday, January 8, 2013 10:17 AM
To:  "user@cassandra.apache.org" 
Cc:  "user@cassandra.apache.org" 
Subject:  Re: Astyanax

Hi,

We are using astyanax and we found out that github wiki with stackoverflow
is the most comprehensive set of documentation.

Do you have any specific questions?

Kind regards,
Radek Gruchalski

On 8 Jan 2013, at 15:46, Everton Lima  wrote:

> I was studing by there, but I would to know if anyone knows other sources.
> 
> 2013/1/8 Markus Klems 
>> The wiki? https://github.com/Netflix/astyanax/wiki
>> 
>> 
>> On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima  wrote:
>>> Hi,
>>> Someone has or could indicate some good tutorial or book to learn Astyanax?
>>> 
>>> Thanks
>>> 
>>> -- 
>>> Everton Lima Aleixo
>>> Mestrando em Ciência da Computação pela UFG
>>> Programador no LUPA
>>> 
>> 
> 
> 
> 
> -- 
> Everton Lima Aleixo
> Bacharel em Ciência da Computação pela UFG
> Mestrando em Ciência da Computação pela UFG
> Programador no LUPA
> 




Re: CQL3 Frame Length

2013-01-08 Thread Ben Hood
Hey Sylvain,

Thanks for explaining the rationale. When you look at from the perspective
of the use cases you mention, it makes sense to be able to supply the
reader with the frame size up front.

I've opted to go for serializing the frame into a buffer. Although this
could materialize an arbitrarily large amount of memory, ultimately the
driving application has control of the degree to which this can occur, so
in the grander scheme of things, you can still maintain streaming semantics.

Thanks for the heads up.

Cheers,

Ben


On Tue, Jan 8, 2013 at 4:08 PM, Sylvain Lebresne wrote:

> Mostly this is because having the frame length is convenient to have in
> practice.
>
> Without pretending that there is only one way to write a server, it is
> common
> to separate the phase "read a frame from the network" from the phase
> "decode
> the frame" which is often simpler if you can read the frame upfront. Also,
> if
> you don't have the frame size, it means you need to decode the whole frame
> before being able to decode the next one, and so you can't parallelize the
> decoding.
>
> It is true however that it means for the write side that you need to
> either be
> able to either pre-compute the frame body size or to serialize it in memory
> first. That's a trade of for making it easier on the read side. But if you
> want
> my opinion, on the write side too it's probably worth parallelizing the
> message
> encoding (which require you encode it in memory first) since it's an
> asynchronous protocol and so there will likely be multiple writer
> simultaneously.
>
> --
> Sylvain
>
>
>
> On Tue, Jan 8, 2013 at 12:48 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>
>> Hi,
>>
>> I've read the CQL wire specification and naively, I can't see how the
>> frame length length header is used.
>>
>> To me, it looks like on the read side, you know which type of structures
>> to expect based on the opcode and each structure is TLV encoded.
>>
>> On the write side, you need to encode TLV structures as well, but you
>> don't know the overall frame length until you've encoded it. So it would
>> seem that you either need to pre-calculate the cumulative TLV size before
>> you serialize the frame body, or you serialize the frame body to a buffer
>> which you can then get the size of and then write to the socket, after
>> having first written the count out.
>>
>> Is there potentially an implicit assumption that the reader will want to
>> pre-buffer the entire frame before decoding it?
>>
>> Cheers,
>>
>> Ben
>>
>
>


Re: CQL3 Frame Length

2013-01-08 Thread Sylvain Lebresne
Mostly this is because having the frame length is convenient to have in
practice.

Without pretending that there is only one way to write a server, it is
common
to separate the phase "read a frame from the network" from the phase "decode
the frame" which is often simpler if you can read the frame upfront. Also,
if
you don't have the frame size, it means you need to decode the whole frame
before being able to decode the next one, and so you can't parallelize the
decoding.

It is true however that it means for the write side that you need to either
be
able to either pre-compute the frame body size or to serialize it in memory
first. That's a trade of for making it easier on the read side. But if you
want
my opinion, on the write side too it's probably worth parallelizing the
message
encoding (which require you encode it in memory first) since it's an
asynchronous protocol and so there will likely be multiple writer
simultaneously.

--
Sylvain



On Tue, Jan 8, 2013 at 12:48 PM, Ben Hood <0x6e6...@gmail.com> wrote:

> Hi,
>
> I've read the CQL wire specification and naively, I can't see how the
> frame length length header is used.
>
> To me, it looks like on the read side, you know which type of structures
> to expect based on the opcode and each structure is TLV encoded.
>
> On the write side, you need to encode TLV structures as well, but you
> don't know the overall frame length until you've encoded it. So it would
> seem that you either need to pre-calculate the cumulative TLV size before
> you serialize the frame body, or you serialize the frame body to a buffer
> which you can then get the size of and then write to the socket, after
> having first written the count out.
>
> Is there potentially an implicit assumption that the reader will want to
> pre-buffer the entire frame before decoding it?
>
> Cheers,
>
> Ben
>


RE: about validity of recipe "A node join using external data copy methods"

2013-01-08 Thread DE VITO Dominique
" Now streaming is very efficient rarely fails and there is no need to do it 
this way anymore"

I guess it's true in v1.2.
Is it true also in v1.1 ?

Thanks.

Dominique


De : Edward Capriolo [mailto:edlinuxg...@gmail.com]
Envoyé : mardi 8 janvier 2013 16:01
À : user@cassandra.apache.org
Objet : Re: about validity of recipe "A node join using external data copy 
methods"

Basically this recipe is from the old days when we had anti-compaction. Now 
streaming is very efficient rarely fails and there is no need to do it this way 
anymore. This recipe will be abolished from the second edition. It still likely 
works except when using counters.

Edward

On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique 
mailto:dominique.dev...@thalesgroup.com>> 
wrote:
Hi,

Edward Capriolo described in his Cassandra book a faster way [1] to start new 
nodes if the cluster size doubles, from N to 2 *N.

It's about splitting in 2 parts each token range taken in charge, after the 
split, with 2 nodes: the existing one, and a new one. And for starting a new 
node, one needs to:
- copy the data records from the corresponding node (without the "system" 
records)
- start the new node with "auto_bootstrap: false"

This raises 2 questions:

A) is this recipe still valid with v1.1 and v1.2 ?

B) do we still need to start the new node with "auto_bootstrap: false" ?
My guess is "yes" as the happening of the bootstrap phase is not recorded into 
the data records.

Thanks.

Dominique

[1] see recipe "A node join using external data copy methods", page 165



Re: Astyanax

2013-01-08 Thread Radek Gruchalski
Hi,

We are using astyanax and we found out that github wiki with stackoverflow is 
the most comprehensive set of documentation.

Do you have any specific questions?

Kind regards,
Radek Gruchalski

On 8 Jan 2013, at 15:46, Everton Lima  wrote:

> I was studing by there, but I would to know if anyone knows other sources.
> 
> 2013/1/8 Markus Klems 
>> The wiki? https://github.com/Netflix/astyanax/wiki
>> 
>> 
>> On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima  wrote:
>>> Hi,
>>> Someone has or could indicate some good tutorial or book to learn Astyanax?
>>> 
>>> Thanks
>>> 
>>> -- 
>>> Everton Lima Aleixo
>>> Mestrando em Ciência da Computação pela UFG
>>> Programador no LUPA
> 
> 
> 
> -- 
> Everton Lima Aleixo
> Bacharel em Ciência da Computação pela UFG
> Mestrando em Ciência da Computação pela UFG
> Programador no LUPA
> 


Re: help turning compaction..hours of run to get 0% compaction....

2013-01-08 Thread Jim Cistaro
One metric to watch is pending compactions (via nodetool compactionstats).  
This count will give you some idea of whether you are falling behind with 
compactions.  The other measure is how long you are compacting after your 
inserts have stopped.

If I understand correctly, since you never update the data, that would explain 
why the compaction logging shows 100% of orig.  With size-tiered, you are 
flushing small files, compacting when you get 4 of like size, etc.  Since you 
have no updates, the compaction will not shrink the data.

As Aaron said, use iostat –x (or dstat) to see if you are taxing the disks.  If 
so, then leveled compaction may be your option (for reasons already stated).  
If not taxing the disks, then you might want to increase your compaction 
throughput, as you suggested.

Depending on what version you are using, another thing to possibly tune is the 
size of sstables when flushed to disk.  In your case of insert only, the 
smaller the flush size, the more times that row is going to be rewritten during 
a compaction (hence increase I/O).

jc

From: Edward Capriolo mailto:edlinuxg...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 2:33 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: help turning compaction..hours of run to get 0% compaction

There is some point where you simply need more machines.

On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman 
mailto:mkjell...@barracuda.com>> wrote:
Right, I guess I'm saying that you should try loading your data with leveled 
compaction and see how your compaction load is.

Your work load sounds like leveled will fit much better than size tiered.

From: Brian Tarbox mailto:tar...@cabotresearch.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 1:58 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: help turning compaction..hours of run to get 0% compaction

The problem I see is that it already takes me more than 24 hours just to load 
my data...during which time the logs say I'm spending tons of time doing 
compaction.  For example in the last 72 hours I'm consumed 20 hours per machine 
on compaction.

Can I conclude from that than I should be (perhaps drastically) increasing my 
compaction_mb_per_sec on the theory that I'm getting behind?

The fact that it takes me 3 days or more to run a test means its hard to just 
play with values and see what works best, so I'm trying to understand the 
behavior in detail.

Thanks.

Brain


On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman 
mailto:mkjell...@barracuda.com>> wrote:
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

"If you perform at least twice as many reads as you do writes, leveled 
compaction may actually save you disk I/O, despite consuming more I/O for 
compaction. This is especially true if your reads are fairly random and don’t 
focus on a single, hot dataset."

From: Brian Tarbox mailto:tar...@cabotresearch.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 12:56 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: help turning compaction..hours of run to get 0% compaction

I have not specified leveled compaction so I guess I'm defaulting to size 
tiered?  My data (in the column family causing the trouble) insert once, ready 
many, update-never.

Brian


On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman 
mailto:mkjell...@barracuda.com>> wrote:
Size tiered or leveled compaction?

From: Brian Tarbox mailto:tar...@cabotresearch.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 12:03 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: help turning compaction..hours of run to get 0% compaction

I have a column family where I'm doing 500 inserts/sec for 12 hours or so at 
time.  At some point my performance falls off a cliff due to time spent doing 
compactions.

I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing 
it reduced to 100% of 99% of the original.

I'm trying to understand what direction this data points me to in term of 
configuration change.

   a) increase my compaction_throughput_mb_per_sec because I'm falling behind 
(am I falling behind?)

   b) enable multi-threaded compaction?

Any help is appreciated.

Brian

--
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: 
http://on.fb.me/UAdL4f
  ­­



Re: about validity of recipe "A node join using external data copy methods"

2013-01-08 Thread Edward Capriolo
Basically this recipe is from the old days when we had anti-compaction. Now
streaming is very efficient rarely fails and there is no need to do it this
way anymore. This recipe will be abolished from the second edition. It
still likely works except when using counters.

Edward

On Tue, Jan 8, 2013 at 7:27 AM, DE VITO Dominique <
dominique.dev...@thalesgroup.com> wrote:

>   Hi,
>
>
>
> Edward Capriolo described in his Cassandra book a faster way [1] to start
> new nodes if the cluster size doubles, from N to 2 *N.
>
>
>
> It's about splitting in 2 parts each token range taken in charge, after
> the split, with 2 nodes: the existing one, and a new one. And for starting
> a new node, one needs to:
>
> - copy the data records from the corresponding node (without the "system"
> records)
>
> - start the new node with "auto_bootstrap: false"
>
>
>
> This raises 2 questions:
>
>
>
> A) is this recipe still valid with v1.1 and v1.2 ?
>
>
>
> B) do we still need to start the new node with "auto_bootstrap: false" ?
>
> My guess is "yes" as the happening of the bootstrap phase is not recorded
> into the data records.
>
>
>
> Thanks.
>
>
>
> Dominique
>
>
>
> [1] see recipe "A node join using external data copy methods", page 165
>


Re: Astyanax

2013-01-08 Thread Everton Lima
I was studing by there, but I would to know if anyone knows other sources.

2013/1/8 Markus Klems 

> The wiki? https://github.com/Netflix/astyanax/wiki
>
>
> On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima wrote:
>
>> Hi,
>> Someone has or could indicate some good tutorial or book to learn
>> Astyanax?
>>
>> Thanks
>>
>> --
>> Everton Lima Aleixo
>> Mestrando em Ciência da Computação pela UFG
>> Programador no LUPA
>>
>>
>


-- 
Everton Lima Aleixo
Bacharel em Ciência da Computação pela UFG
Mestrando em Ciência da Computação pela UFG
Programador no LUPA


Re: Astyanax

2013-01-08 Thread Markus Klems
The wiki? https://github.com/Netflix/astyanax/wiki


On Tue, Jan 8, 2013 at 2:44 PM, Everton Lima wrote:

> Hi,
> Someone has or could indicate some good tutorial or book to learn Astyanax?
>
> Thanks
>
> --
> Everton Lima Aleixo
> Mestrando em Ciência da Computação pela UFG
> Programador no LUPA
>
>


about validity of recipe "A node join using external data copy methods"

2013-01-08 Thread DE VITO Dominique
Hi,

Edward Capriolo described in his Cassandra book a faster way [1] to start new 
nodes if the cluster size doubles, from N to 2 *N.

It's about splitting in 2 parts each token range taken in charge, after the 
split, with 2 nodes: the existing one, and a new one. And for starting a new 
node, one needs to:
- copy the data records from the corresponding node (without the "system" 
records)
- start the new node with "auto_bootstrap: false"

This raises 2 questions:

A) is this recipe still valid with v1.1 and v1.2 ?

B) do we still need to start the new node with "auto_bootstrap: false" ?
My guess is "yes" as the happening of the bootstrap phase is not recorded into 
the data records.

Thanks.

Dominique

[1] see recipe "A node join using external data copy methods", page 165


CQL3 Frame Length

2013-01-08 Thread Ben Hood
Hi,

I've read the CQL wire specification and naively, I can't see how the frame
length length header is used.

To me, it looks like on the read side, you know which type of structures to
expect based on the opcode and each structure is TLV encoded.

On the write side, you need to encode TLV structures as well, but you don't
know the overall frame length until you've encoded it. So it would seem
that you either need to pre-calculate the cumulative TLV size before you
serialize the frame body, or you serialize the frame body to a buffer which
you can then get the size of and then write to the socket, after having
first written the count out.

Is there potentially an implicit assumption that the reader will want to
pre-buffer the entire frame before decoding it?

Cheers,

Ben


The solution of the write-in load of Cassandra?

2013-01-08 Thread hiroshi.kise.rk

Dear everyone 

I am troubled by the solution of the write-in load of Cassandra.  
Please teach, if there is a good solution. 
Cassandra is working only by one server and an action is Cassandra-1.1.7. 
Cassandra-1.1.8 was the same. 

 
1. Where a certain amount of load is applied from a thrift client, write in. 
2. A disk fills and an exception begins to happen. 
   (However, a Cassandra process continuation.)  
3. Stop Cassandra in this state. 
4. Archive the file stored in the SSTABLE domain by tar, and extend a disk 
domain. 
5. Reboot Cassandra. (In series action of a reboot of Cassandra, data is 
returned to 
   SSTABLE from a commitment log.)  

The storage location of the commitment log was different from SSTABLE, 
and disk storage capacity had sufficient margin. 

6. The contents of the data returned to SSTABLE were checked. 


 
 
After the disk filled, also where writing is stopped from a thrift client, the 
log from 
Memtable to a disk which is going to flush continued being outputted to 1 
second 
at intervals of about 1 time. 
(Although an exception does not come out, it is unknown whether writing was 
successful.)  
- Owing to what does a log continue coming out? (Is it presumed?)  

 
The data of the contents of SSTABLE after a reboot was lost in large 
quantities. 
(The data of the first part of load writing was lost especially.)  

- Is this an unavoidable phenomenon in employment of one server? 
- What is the cause? (Thinking is ?)  - what is the method of the measure? 


 (changing a viewpoint)
If the disk domains where Cassandra writes in a commitment log run short, 
the core dump would occur and Cassandra will have carried out the process down. 
- Although the design of a resource is important, isn't there any method of 
   perceiving a process down beforehand? 
--
Hiroshi KIse
Hitachi, Ltd., Information & Telecommunication System Company


Re: Cassandra counters replication uses more traffic than client increments?

2013-01-08 Thread Sergey Olefir
So with the holidays hopefully being over, I thought I'd ask again :)

Could someone please help with answers to the two questions:
- Is it reasonable to expect that cross-datacenter node-to-node replication
traffic is greater than actual client-to-server traffic that generates this
activity? Specifically talking about counter increments.
- Is there anything that can be done to lower the amount of cross-datacenter
replication traffic while keeping actual replication going (i.e. we can't
afford to not replicate data, but we can afford e.g. delays in replication)?

Best regards,
Sergey


Sergey Olefir wrote
> Hi,
> 
> as part of our ongoing tests with Cassandra, we've tried to evaluate the
> amount of traffic generated in client-to-server and server-to-server
> (replication scenarios).
> 
> The results we are getting are surprising.
> 
> Our setup:
> - Cassandra 1.1.7.
> - 3 DC with 2 nodes each.
> - NetworkTopology replication strategy with 2 replicas per DC (so
> basically each node contains full data set).
> - 100 clients concurrently incrementing counters at the rate of the
> roughly 100 / second (i.e. about 10k increments per second). Clients
> perform writes to DC:1 only. server-to-server traffic measurement was done
> in DC:2.
> - Clients use batches to write to the server (up to 100 increments per
> batch, overall each client writes 1 or 2 batches per second).
> 
> Clients are Java-based accessing Cassandra via hector. Run on Windows box.
> 
> Traffic measurement for clients (on Windows) was done via Resource Monitor
> and packet capture via Network Monitor. The overall traffic appears to be
> roughly 700KB/sec (kilobytes) for ~1 increments).
> 
> Traffic measurement for server-to-server was done on DC:2 via packet
> capture. This capture specifically included only nodes in other
> datacenters (so no internal DC traffic was captured).
> 
> The vast majority of traffic was directed to one node DC:2-1. DC2-2
> received like 1/30 of the traffic. I think I've read somewhere that
> Cassandra directs DC-to-DC traffic to one node, so this makes sense.
> 
> What is surprising though -- is the amount of traffic. It looks to be
> roughly twice the amount of the total traffic generated by clients, i.e.
> something like 1.5MB/sec (megabytes). Note -- this only counts incoming
> traffic.
> 
> I've taken a look at some of the captured packets and it looks like
> there's much more service information in DC-to-DC traffic compared to
> client-to-server traffic -- although I am by no means certain here.
> 
> 
> Overall I have a couple of questions:
> - Is it indeed the case that server-to-server replication traffic can be
> significantly more bloated than client-to-server traffic? Or do I need to
> review my testing methodology?
> - Is there anything that can be done to reduce cross-DC replication
> traffic? Perhaps some compression scheme? Or some delay before replication
> allowing for possibly more increments to be merged together?
> 
> 
> Best regards,
> Sergey





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-counters-replication-uses-more-traffic-than-client-increments-tp7584412p7584620.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.