Recommended way of data migration

2013-09-07 Thread Renat Gilfanov
 Hello,

Let's say we have a simple CQL3 table 

CREATE TABLE example (
    id UUID PRIMARY KEY,
    timestamp TIMESTAMP,
    data ASCII
);

And I need to mutate  (for example encrypt) column values in the data column 
for all rows.

What's the recommended approach to perform such migration programatically? 

For me the general approach is:

1. Create another column family
2. extract a batch of records
3. for each extracted record, perform mutation, insert it in the new cf and 
delete from old one
4. repeat until source cf not empty

Is it correct approach and if yes, how to implement some kind of paging for the 
step 2?


Re: Recommended way of data migration

2013-09-07 Thread Edward Capriolo
I would do something like you are suggesting. I would not do the delete
until all the rows are moved. Since writes in cassandra are idempotent you
can even run the migration process multiple times without harm.


On Sat, Sep 7, 2013 at 5:31 PM, Renat Gilfanov gren...@mail.ru wrote:

 Hello,

 Let's say we have a simple CQL3 table

 CREATE TABLE example (
 id UUID PRIMARY KEY,
 timestamp TIMESTAMP,
 data ASCII
 );

 And I need to mutate  (for example encrypt) column values in the data
 column for all rows.

 What's the recommended approach to perform such migration programatically?

 For me the general approach is:

 1. Create another column family
 2. extract a batch of records
 3. for each extracted record, perform mutation, insert it in the new cf
 and delete from old one
 4. repeat until source cf not empty

 Is it correct approach and if yes, how to implement some kind of paging
 for the step 2?



[ANN] Cassaforte 1.2.0 is released

2013-09-07 Thread Oleksandr Petrov
Cassaforte [1] is a Clojure client for Apache Cassandra 1.2+. It is built
around CQL 3
and focuses on ease of use. You will likely find that using Cassandra from
Clojure has
never been so easy.

1.2.0 is a minor release that introduces one minor feature, fixes a couple
of bugs, and
makes Cassaforte compatible with Cassandra 2.0.

Release notes:
http://blog.clojurewerkz.org/blog/2013/09/07/cassaforte-1-dot-2-0-is-released/

1. http://clojurecassandra.info/ http://clojurememcached.info/

--
Alex P

https://github.com/ifesdjeen
https://twitter.com/ifesdjeen


Re: row cache

2013-09-07 Thread Edward Capriolo
I have found row cache to be more trouble then bene.

The term fools gold comes to mind.

Using key cache and leaving more free main memory seems stable and does not
have as many complications.
On Wednesday, September 4, 2013, S C as...@outlook.com wrote:
 Thank you all for your valuable comments and information.

 -SC


 Date: Tue, 3 Sep 2013 12:01:59 -0400
 From: chris.burrou...@gmail.com
 To: user@cassandra.apache.org
 CC: fsareshw...@quantcast.com
 Subject: Re: row cache

 On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
  Yes, that is correct.
 
  The SerializingCacheProvider stores row cache contents off heap. I
believe you
  need JNA enabled for this though. Someone please correct me if I am
wrong here.
 
  The ConcurrentLinkedHashCacheProvider stores row cache contents on the
java heap
  itself.
 

 Naming things is hard. Both caches are in memory and are backed by a
 ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
 the *values* are stored in off heap buffers. Both must store a half
 dozen or so objects (on heap) per entry
 (org.apache.cassandra.cache.RowCacheKey,

com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
 java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
 probably be better to call this a mixed-heap rather than off-heap
 cache. You may find the number of entires you can hold without gc
 problems to be surprising low (relative to say memcached, or physical
 memory on modern hardware).

 Invalidating a column with SerializingCacheProvider invalidates the
 entire row while with ConcurrentLinkedHashCacheProvider it does not.
 SerializingCacheProvider does not require JNA.

 Both also use memory estimation of the size (of the values only) to
 determine the total number of entries retained. Estimating the size of
 the totally on-heap ConcurrentLinkedHashCacheProvider has historically
 been dicey since we switched from sizing in entries, and it has been
 removed in 2.0.0.

 As said elsewhere in this thread the utility of the row cache varies
 from absolutely essential to source of numerous problems depending
 on the specifics of the data model and request distribution.





w00tw00t.at.ISC.SANS.DFind not found

2013-09-07 Thread Tim Dunphy
Hey all,

 I'm seeing this exception in my cassandra logs:

Exception during http request
mx4j.tools.adaptor.http.HttpException: file
mx4j/tools/adaptor/http/xsl/w00tw00t.at.ISC.SANS.DFind:) not found
at
mx4j.tools.adaptor.http.XSLTProcessor.notFoundElement(XSLTProcessor.java:314)
at
mx4j.tools.adaptor.http.HttpAdaptor.findUnknownElement(HttpAdaptor.java:800)
at
mx4j.tools.adaptor.http.HttpAdaptor$HttpClient.run(HttpAdaptor.java:976)

Do I need to be concerned about the security of this server? How can I
correct/eliminate this error message? I've just upgraded to Cassandra 2.0
,and this is the first time I've seen this error.

Thanks!
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: row cache

2013-09-07 Thread Mohit Anchlia
I agree. We've had similar experience.

Sent from my iPhone

On Sep 7, 2013, at 6:05 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 I have found row cache to be more trouble then bene.
 
 The term fools gold comes to mind.
 
 Using key cache and leaving more free main memory seems stable and does not 
 have as many complications. 
 On Wednesday, September 4, 2013, S C as...@outlook.com wrote:
  Thank you all for your valuable comments and information.
 
  -SC
 
 
  Date: Tue, 3 Sep 2013 12:01:59 -0400
  From: chris.burrou...@gmail.com
  To: user@cassandra.apache.org
  CC: fsareshw...@quantcast.com
  Subject: Re: row cache
 
  On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
   Yes, that is correct.
  
   The SerializingCacheProvider stores row cache contents off heap. I 
   believe you
   need JNA enabled for this though. Someone please correct me if I am 
   wrong here.
  
   The ConcurrentLinkedHashCacheProvider stores row cache contents on the 
   java heap
   itself.
  
 
  Naming things is hard. Both caches are in memory and are backed by a
  ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
  the *values* are stored in off heap buffers. Both must store a half
  dozen or so objects (on heap) per entry
  (org.apache.cassandra.cache.RowCacheKey,
  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
  java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
  probably be better to call this a mixed-heap rather than off-heap
  cache. You may find the number of entires you can hold without gc
  problems to be surprising low (relative to say memcached, or physical
  memory on modern hardware).
 
  Invalidating a column with SerializingCacheProvider invalidates the
  entire row while with ConcurrentLinkedHashCacheProvider it does not.
  SerializingCacheProvider does not require JNA.
 
  Both also use memory estimation of the size (of the values only) to
  determine the total number of entries retained. Estimating the size of
  the totally on-heap ConcurrentLinkedHashCacheProvider has historically
  been dicey since we switched from sizing in entries, and it has been
  removed in 2.0.0.
 
  As said elsewhere in this thread the utility of the row cache varies
  from absolutely essential to source of numerous problems depending
  on the specifics of the data model and request distribution.