Re: Thrift vs CQL3 performance

2014-07-29 Thread Kumar Ranjan
Stick with cql3 going forward. Last i checked, there is no further dev on 
thrift. I had worked with Thrift based c* api for 2 years in Python based 
pycassa and experience was not very satisfactory. I had not done comparisions 
between pycassa and cql so cant say. Cql isvery simple any way.—
Sent from Mailbox

On Mon, Jul 28, 2014 at 11:51 AM, bi kro hlqvu...@gmail.com wrote:

 Hi every one,
 I'm newcomer to Cassandra, so I would like to know about performance 
 between Thrift (Hector) vs CQL3, specially about the speed (Thrift based 
 on RPC, CQL3 based on binary protocol).
 Currently I'm using Cassandra 1.2 , which version CQL3 of 
 JavaDriver-Datastax is stable for it?
 Thanks very much

Re: 750Gb compaction task

2014-03-13 Thread Kumar Ranjan
M —
Sent from Mailbox for iPhone

On Thu, Mar 13, 2014 at 1:28 AM, Plotnik, Alexey aplot...@rhonda.ru
wrote:

 After rebalance and cleanup I have leveled CF (SSTable size = 100MB) and a 
 compaction Task that is going to process ~750GB:
 root@da1-node1:~# nodetool compactionstats
 pending tasks: 10556
   compaction typekeyspace   column family   completed 
   total  unit  progress
Compaction cafs_chunks  chunks 41015024065
 808740269082 bytes 5.07%
 I have no space for this operation, I have 300 Gb only. Is it possible to 
 resolve this situation?

Re:

2014-02-28 Thread Kumar Ranjan
Yes, filter out based on time range. Currently i do this in python . Just 
curious to see if this can be done using pycassa somehow?—
Sent from Mailbox for iPhone

On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs ty...@datastax.com wrote:

 Can you clarify exactly what you need help with?  It seems like you already
 know how to fetch the timestamps.  Are you just looking for python code to
 filter data that's not in a time range?
 By the way, there's a pycassa-specific mailing list here:
 https://groups.google.com/forum/#!forum/pycassa-discuss
 On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan winnerd...@gmail.com wrote:
 Hey folks,

 I am dealing with a legacy CFs where super_column has been used and python
 client pycassa is being used. An example is given below. My question here
 is, can I make use of  include_timestamp to select data between two
 returned timestamps e.g between 1393516744591751 and 1393516772131811. This
 is not exactly timeseries but just selected between two. Please help on
 this?


 Data is inserted like this

 TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})


 Data Fetch:

 TEST_CF.get('test_r_key', include_timestamp=True)


 OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
 1393451990902345))])),

  ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
 1393516744591751))])),

  ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
 1393516772131782))]))

  ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
 1393516772131799))]))

  ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
 1393516772131811))]))

  ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
 1393516772131854))]))

  ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
 1393516772131899))]))

 ])

 -- 
 Tyler Hobbs
 DataStax http://datastax.com/

Re:

2014-02-28 Thread Kumar Ranjan
Thanks Tyler. Yes, I scanned through pycassaShell code couple of times but
did not find anything like that.


On Fri, Feb 28, 2014 at 3:24 PM, Tyler Hobbs ty...@datastax.com wrote:

 No, pycassa won't do anything fancy with timestamps automatically, you'll
 have to keep doing yourself.


 On Fri, Feb 28, 2014 at 1:28 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 Yes, filter out based on time range. Currently i do this in python . Just
 curious to see if this can be done using pycassa somehow?
 --
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs ty...@datastax.com wrote:

 Can you clarify exactly what you need help with?  It seems like you
 already know how to fetch the timestamps.  Are you just looking for python
 code to filter data that's not in a time range?

 By the way, there's a pycassa-specific mailing list here:
 https://groups.google.com/forum/#!forum/pycassa-discuss


 On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan winnerd...@gmail.comwrote:

  Hey folks,

 I am dealing with a legacy CFs where super_column has been used and
 python client pycassa is being used. An example is given below. My question
 here is, can I make use of  include_timestamp to select data between two
 returned timestamps e.g between 1393516744591751 and 1393516772131811. This
 is not exactly timeseries but just selected between two. Please help on
 this?


 Data is inserted like this

 TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})


 Data Fetch:

 TEST_CF.get('test_r_key', include_timestamp=True)


 OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
 1393451990902345))])),

  ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
 1393516744591751))])),

  ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
 1393516772131782))]))

  ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
 1393516772131799))]))

  ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
 1393516772131811))]))

  ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
 1393516772131854))]))

  ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
 1393516772131899))]))

  ])




 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/



[no subject]

2014-02-27 Thread Kumar Ranjan
Hey folks,

I am dealing with a legacy CFs where super_column has been used and python
client pycassa is being used. An example is given below. My question here
is, can I make use of  include_timestamp to select data between two
returned timestamps e.g between 1393516744591751 and 1393516772131811. This
is not exactly timeseries but just selected between two. Please help on
this?


Data is inserted like this

TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})


Data Fetch:

TEST_CF.get('test_r_key', include_timestamp=True)


OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
1393451990902345))])),

 ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
1393516744591751))])),

 ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
1393516772131782))]))

 ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
1393516772131799))]))

 ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
1393516772131811))]))

 ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
1393516772131854))]))

 ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
1393516772131899))]))

])


pycassa get column_start and column_finish with less than or greater than

2014-02-05 Thread Kumar Ranjan
Hey Folks,

Does pycassa get column_start takes greater than equal to option? What I
know so far is, you have to exact column or super_column value for
column_start and column_finish to work. In my case, column is value of
epoch time.


OpenJDK is not recommended? Why

2014-01-28 Thread Kumar Ranjan
I am in process of setting 2 node cluster with C* version 2.0.4. When I
started each node, it failed to communicate thus, each are running separate
and not in same ring. So started looking at the log files are saw the
message below:

WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line 155) OpenJDK
is not recommended. Please upgrade to the newest Oracle Java release

Is this message informational only or can it be real issue? Is this why,
two nodes are not in a ring?

-- Kumar


Issues with seeding on EC2 for C* 2.0.4 - help needed

2014-01-28 Thread Kumar Ranjan
Hey Folks - I am burning the midnight oil fast but cant figure out what I
am doing wrong? log files has this. I have also listed both seed node and
node 2 partial configurations.


 INFO [main] 2014-01-29 05:15:11,515 CommitLog.java (line 127) Log replay
complete, 46 replayed mutations

 INFO [main] 2014-01-29 05:15:12,734 StorageService.java (line 490)
Cassandra version: 2.0.4

 INFO [main] 2014-01-29 05:15:12,743 StorageService.java (line 491) Thrift
API version: 19.39.0

 INFO [main] 2014-01-29 05:15:12,755 StorageService.java (line 492) CQL
supported versions: 2.0.0,3.1.3 (default: 3.1.3)

 INFO [main] 2014-01-29 05:15:12,821 StorageService.java (line 515) Loading
persisted ring state

 INFO [main] 2014-01-29 05:15:12,864 MessagingService.java (line 458)
Starting Messaging Service on port 7000

ERROR [main] 2014-01-29 05:15:43,890 CassandraDaemon.java (line 478)
Exception encountered during startup

java.lang.RuntimeException: Unable to gossip with any seeds


Seed node 1:

( cassandra.yml : I just have 2 node cluster and this is the seed node)


seed_provider:

# Addresses of hosts that are deemed contact points.

# Cassandra nodes use this list of hosts to find each other and learn

# the topology of the ring.  You must change this if you are running

# multiple nodes!

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  parameters:

  # seeds is actually a comma-delimited list of addresses.

  # Ex: ip1,ip2,ip3

  - seeds: 127.0.0.1

storage_port: 7000

ssl_storage_port: 7001

listen_address: 10.xxx.xxx.xxx ( Private IP of this node )

start_native_transport: true

native_transport_port: 9042

start_rpc: true

rpc_address: 0.0.0.0

rpc_port: 9160

rpc_keepalive: true

rpc_server_type: sync


Node 2:

seed_provider:

# Addresses of hosts that are deemed contact points.

# Cassandra nodes use this list of hosts to find each other and learn

# the topology of the ring.  You must change this if you are running

# multiple nodes!

- class_name: org.apache.cassandra.locator.SimpleSeedProvider

  parameters:

  # seeds is actually a comma-delimited list of addresses.

  # Ex: ip1,ip2,ip3

  - seeds: 10.xxx.xxx.xxx   --- private IP of seed node listed
above

storage_port: 7000

ssl_storage_port: 7001

listen_address: 10.xxx.xxx.xxx --- private IP of this node

start_native_transport: true

native_transport_port: 9042

start_rpc: true

rpc_address: 0.0.0.0

rpc_port: 9160

rpc_keepalive: true

rpc_server_type: sync


Re: OpenJDK is not recommended? Why

2014-01-28 Thread Kumar Ranjan
Yes got rid of openJDK and installed oracle version and warning went away.
Happy happy...Thank you folks..


On Tue, Jan 28, 2014 at 11:59 PM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 01/28/2014 09:55 PM, Kumar Ranjan wrote:

 I am in process of setting 2 node cluster with C* version 2.0.4. When I
 started each node, it failed to communicate thus, each are running
 separate and not in same ring. So started looking at the log files are
 saw the message below:


 This is probably just a configuration issue and not likely to be the fault
 of OpenJDK.  OpenJDK is ok for testing the waters and light dev work; it is
 the reference architecture for Oracle Java SE 7.


  WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line 155)
 OpenJDK is not recommended. Please upgrade to the newest Oracle Java
 release

 Is this message informational only or can it be real issue?


 Source of the above warning has some comments (attached, so they don't
 wrap so badly, I hope).

 https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
 blob;f=src/java/org/apache/cassandra/service/CassandraDaemon.java;h=
 424dbfa58ec72ea812362e2b428d0c4534626307;hb=HEAD#l106

 --
 Kind regards,
 Michael




Re: Issues with seeding on EC2 for C* 2.0.4 - help needed

2014-01-28 Thread Kumar Ranjan
Hi Michael - Yes, 7000, 7001, 9042, 9160 are all open on EC2.

Issue was seeds address and listen_address were 127.0.0.1 and private_ip.

This will help anyone

http://stackoverflow.com/questions/20690987/apache-cassandra-unable-to-gossip-with-any-seeds


On Wed, Jan 29, 2014 at 1:12 AM, Michael Shuler mich...@pbandjelly.orgwrote:

 Did you open up the ports so they can talk to each other?

 http://www.datastax.com/documentation/cassandra/2.0/
 webhelp/index.html#cassandra/install/installAMISecurityGroup.html

 --
 Michael



Centralized tool to install and manage C* from one machine

2014-01-27 Thread Kumar Ranjan
I am used to working with CCM for testing. For production, I depend on
installing cassandra manually. Is there a proven tool to install and manage
multinode cassandra cluster? If you have any experience, please let me know.


Re: Centralized tool to install and manage C* from one machine

2014-01-27 Thread Kumar Ranjan
Thank you Michael. I am trying out Priam as we speak and will post an
update of my experience with different tools. Again. Thank you. -- K


On Tue, Jan 28, 2014 at 12:29 AM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 01/27/2014 10:34 PM, Kumar Ranjan wrote:

 I am used to working with CCM for testing. For production, I depend on
 installing cassandra manually. Is there a proven tool to install and
 manage multinode cassandra cluster? If you have any experience, please
 let me know.


 A good answer will depend on where you will be deploying your cluster, how
 many nodes you plan, etc.  For bare metal installations, for instance with
 Debian or Ubuntu, you may want to automate with a preseed [0] of your OS
 installations to set up the JVM, install your desired cassandra version
 package, etc.  RedHat variants use similar OS automation with kickstart
 [1]. Or, perhaps you want to use AWS and a preinstalled AMI [2] or create
 your own golden image on AWS and save it as an AMI for booting your other
 machines.

 I've done preseeds and kickstarts extensively and have set up a few custom
 AMIs - you may want to consider that most production clusters are likely
 running Debian or Ubuntu.

 Once you have machines installed, you may want some configuration
 management, or you could use a config manager to aid in OS installation and
 setup at install time - it seems that chef [3] has gained some traction
 over the once-hot puppet [4], but salt [5] is also quite mature, if you
 like python better than ruby.

 I've used all of these and they are great.  I've also been burned by them
 all.  There's nothing quite like the complete control over configurations
 carefully checked into a VCS and parallel ssh to pull them out, along with
 a few scripts to set up things just right - just my experience  ;)

 There is also priam [6].  I keep wanting to find some time to play with
 it, so I have no insight, but it looks very interesting.

 If you'd like cassandra cluster management beyond
 installation/configuration of the machine, have a look at opscenter [7].

 I'm sure there are a lot of other projects / cookbooks that others might
 be working on - chime in!

 Michael

 [0] https://wiki.debian.org/DebianInstaller/Preseed
 [1] https://access.redhat.com/site/documentation/en-US/Red_
 Hat_Enterprise_Linux/6/html/Installation_Guide/ch-kickstart2.html
 [2] https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2
 [3] http://community.opscode.com/cookbooks/cassandra
 [4] https://forge.puppetlabs.com/tags/cassandra
 [5] http://docs.saltstack.com/ref/modules/all/salt.modules.cassandra.html
 [6] https://github.com/Netflix/Priam
 [7] http://www.datastax.com/what-we-offer/products-services/
 datastax-opscenter



alter_column_family (thrift based pycassa) drop a column

2014-01-14 Thread Kumar Ranjan
Hey folks,

I used create_column_family to create a CF but made a typo and I need to
use alter_column_family to drop that column and re-create with correct
name? Can you help with the syntax?

here is what I use for alter_column_family:

SYSTEM_MANAGER.alter_column_family('Narrative','Instagram_Tags',default_validation_class='UTF8Type',super=True,
comparator='UTF8Type',key_validation_class='UTF8Type',
column_validation_classes=validators)

where validators={'longitude': 'DoubleType', 'latitude':  'DoubleType',}


Re: Cassandra pytho pagination

2013-12-19 Thread Kumar Ranjan
Rob - I got a question following your advice. This is how, I define my
column family

validators = {

'approved':'UTF8Type',

'tid': 'UTF8Type',

'iid': 'UTF8Type',

'score':   'IntegerType',

'likes':   'IntegerType',

'retweet': 'IntegerType',

'favorite':'IntegerType',

'screen_name': 'UTF8Type',

'created_date':'UTF8Type',

'expanded_url':'UTF8Type',

'embedly_data':'BytesType',

}

SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram',
default_validation_class='UTF8Type', super=True, comparator='UTF8Type',
key_validation_class='UTF8Type', column_validation_classes=validator)

Actual data representation:

'row_key': {'1234555665_53323232': {'approved': 'false', 'tid':
123,  'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34,
screen_name:'goodname'},

'2344555665_53323232': {'approved': 'false', 'tid':
134,  'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34,
screen_name:'newname'}.

.

   }

Is there something wrong with it? Here 1234555665_53323232 and
2344555665_53323232 are super columns. Also, If I have to represent this
data with new composite comparator, How will I accomplish that?


Please let me know.


Regards.


On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 Second approach ( I used in production ):
 - fetch all super columns for a row key


 Stock response mentioning that super columns are anti-advised for use,
 especially in brand new code.

 =Rob




Re: Cassandra pytho pagination

2013-12-18 Thread Kumar Ranjan
I am using pycassa. So, here is how I solved this issue. Will discuss 2
approaches. First approach didn't work out for me. Thanks Aaron for your
attention.

First approach:
- Say if column_count = 10
- collect first 11 rows, sort first 10, send it to user (front end) as JSON
object and last=11th_column
- User then calls for page 2, with prev = 1st_column_id, column_start =
11th_column and column_count = 10
- This way, I can traverse, next page and previous page.
- Only issue with this approach is, I don't have all columns in super
column sorted. So this did not work.

Second approach ( I used in production ):
- fetch all super columns for a row key
- Sort this in python using sorted and lambda function based on column
values.
- Once sorted, I prepare buckets and each bucked size is of page
size/column count. Also filter out any rogue data if needed
- Store page by page results in Redis with keys such as
'row_key|page_1|super_column' and keep refreshing redis periodically.

I am sure, there must be a better and brighter approach but for now, 2nd
approach is working. Thoughts ??



On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton aa...@thelastpickle.comwrote:

 CQL3 and thrift do not support an offset clause, so you can only really
 support next / prev page calls to the database.

 I am trying to use xget with column_count and buffer_size parameters. Can
 someone explain me, how does it work? From doc, my understanding is that, I
 can do something like,

 What client are you using ?
 xget is not a standard cassandra function.

 Cheers

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 13/12/2013, at 4:56 am, Kumar Ranjan winnerd...@gmail.com wrote:

 Hey Folks,

 I need some ideas about support implementing of pagination on the browser,
 from the backend. So python code (backend) gets request from frontend with
 page=1,2,3,4 and so on and count_per_page=50.

 I am trying to use xget with column_count and buffer_size parameters. Can
 someone explain me, how does it work? From doc, my understanding is that, I
 can do something like,


 total_cols is total columns for that key.
 count is what user sends me.

 .*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count):

 Is my understanding correct? because its not working for page 2 and so on?
 Please enlighten me with suggestions.

 Thanks.





Issues while fetching data with pycassa get for super columns

2013-12-13 Thread Kumar Ranjan
Hi Folks - I have having issue fetch data using pycassa get() function. I
have copied the CF schema and my code is below. This query returns me just
this

Results: {u'narrativebuddieswin': ['609548930995445799_752368319',
'609549303525138481_752368319', '610162034020180814_752368319',
'610162805856002905_752368319', '610163571417146213_752368319',
'610165900312830861_752368319']}

none of the subcolumns are returned for above super column ??? Please help..


CODE: -

if start:

res_rows = col_fam.get(key, column_count=count,
column_start=start, include_timestamp=True, include_ttl=True, )

else:

res_rows = col_fam.get(key, column_count=count, include_timestamp=True,
include_ttl=True,)

return res_rows




CF Schema: 

'Twitter_Instagram':

CfDef(comment='',

  key_validation_class='org.apache.cassandra.db.marshal.UTF8Type',

  min_compaction_threshold=4,

  key_cache_save_period_in_seconds=None,

  gc_grace_seconds=864000,

  default_validation_class='org.apache.cassandra.db.marshal.UTF8Type',

  max_compaction_threshold=32,

  read_repair_chance=0.10001,

  compression_options={'sstable_compression':
'org.apache.cassandra.io.compress.SnappyCompressor'},

  bloom_filter_fp_chance=None,

  id=None,

  keyspace='Narrative',

  key_cache_size=None,

  replicate_on_write=True,

  subcomparator_type='org.apache.cassandra.db.marshal.BytesType',

  merge_shards_chance=None,

  row_cache_provider=None,

  row_cache_save_period_in_seconds=None,

  column_type='Super',

  memtable_throughput_in_mb=None,

  memtable_flush_after_mins=None,


column_metadata={‘

'expanded_url': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.UTF8Type',
name='expanded_url', index_options=None),

'favorite': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.IntegerType',
name='favorite', index_options=None),

'retweet': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.IntegerType',
name='retweet', index_options=None),

'iid': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='iid',
index_options=None),

'screen_name': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.UTF8Type',
name='screen_name', index_options=None),

'embedly_data': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.BytesType',
name='embedly_data', index_options=None),

'created_date': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.UTF8Type',
name='created_date', index_options=None),

'tid': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='tid',
index_options=None),

'score': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.IntegerType',
name='score', index_options=None),

'approved': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.UTF8Type',
name='approved', index_options=None),

'likes': ColumnDef(index_type=None, index_name=None,
validation_class='org.apache.cassandra.db.marshal.IntegerType',
name='likes', index_options=None)},


key_alias=None,

dclocal_read_repair_chance=0.0,

name='Twitter_Instagram',

compaction_strategy_options={},

row_cache_keys_to_save=None,

compaction_strategy='org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',

memtable_operations_in_millions=None,

caching='KEYS_ONLY',

comparator_type='org.apache.cassandra.db.marshal.BytesType',

row_cache_size=None),


Cassandra pytho pagination

2013-12-12 Thread Kumar Ranjan
Hey Folks,

I need some ideas about support implementing of pagination on the browser,
from the backend. So python code (backend) gets request from frontend with
page=1,2,3,4 and so on and count_per_page=50.

I am trying to use xget with column_count and buffer_size parameters. Can
someone explain me, how does it work? From doc, my understanding is that, I
can do something like,


total_cols is total columns for that key.
count is what user sends me.

.*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count):

Is my understanding correct? because its not working for page 2 and so on?
Please enlighten me with suggestions.

Thanks.


Cassandra data update for a row

2013-12-12 Thread Kumar Ranjan
Hey Folks,

I have a row like this. 'twitter_row_key' is the row key and
411186035495010304 is column. Rest is values for 411186035495010304 column.
See below.

'twitter_row_key': OrderedDict([('411186035495010304', u'{score: 0,
tid: 411186035495010304, created_at: Thu Dec 12 17:29:24 + 2013,
favorite: 0, retweet: 0, approved: true}'),])

How can I set approved to 'false' ??


When I try insert for row key 'twitter_row_key' and column
411186035495010304, it overwrites the whole data and new row becomes like
this

'twitter_row_key': OrderedDict([('411186035495010304', u'{approved:
true}'),])


Any thoughts guys?


Re:

2013-12-12 Thread Kumar Ranjan
Thanks Aaron.


On Wed, Dec 11, 2013 at 10:45 PM, Aaron Morton aa...@thelastpickle.comwrote:

  SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
 comparator_type='CompositeType', default_validation_class='UTF8Type',
 key_validation_class='UTF8Type', column_validation_classes=validators)

 CompositeType is a type composed of other types, see


 http://pycassa.github.io/pycassa/assorted/composite_types.html?highlight=compositetype

 Cheers

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 12/12/2013, at 6:15 am, Kumar Ranjan winnerd...@gmail.com wrote:

  Hey Folks,
 
  So I am creating, column family using pycassaShell. See below:
 
  validators = {
 
  'approved':  'BooleanType',
 
  'text':  'UTF8Type',
 
  'favorite_count':'IntegerType',
 
  'retweet_count': 'IntegerType',
 
  'expanded_url':  'UTF8Type',
 
  'tuid':  'LongType',
 
  'screen_name':   'UTF8Type',
 
  'profile_image': 'UTF8Type',
 
  'embedly_data':  'CompositeType',
 
  'created_at':'UTF8Type',
 
 
  }
 
  SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
 comparator_type='CompositeType', default_validation_class='UTF8Type',
 key_validation_class='UTF8Type', column_validation_classes=validators)
 
  I am getting this error:
 
  InvalidRequestException: InvalidRequestException(why='Invalid definition
 for comparator org.apache.cassandra.db.marshal.CompositeType.'
 
 
 
  My data will look like this:
 
  'row_key' : { 'tid' :
 
  {
 
  'expanded_url': u'http://instagram.com/p/hwDj2BJeBy/',
 
  'text': '#snowinginNYC Makes me so
 happy\xe2\x9d\x840brittles0 \xe2\x9b\x84 @ Grumman Studios
 http://t.co/rlOvaYSfKa',
 
  'profile_image': u'
 https://pbs.twimg.com/profile_images/3262070059/1e82f895559b904945d28cd3ab3947e5_normal.jpeg
 ',
 
  'tuid': 339322611,
 
  'approved': 'true',
 
  'favorite_count': 0,
 
  'screen_name': u'LonaVigi',
 
  'created_at': u'Wed Dec 11 01:10:05 + 2013',
 
  'embedly_data': {u'provider_url': u'http://instagram.com/',
 u'description': ulonavigi's photo on Instagram, u'title':
 u'#snwinginNYC Makes me so happy\u2744@0brittles0 \u26c4', u'url': u'
 http://distilleryimage7.ak.instagram.com/5b880dec61c711e3a50b129314edd3b_8.jpg',
 u'thumbnail_width': 640, u'height': 640, u'width': 640, u'thumbnail_url': u'
 http://distilleryimage7.ak.instagram.com/b880dec61c711e3a50b1293d14edd3b_8.jpg',
 u'author_name': u'lonavigi', u'version': u'1.0', u'provider_name':
 u'Instagram', u'type': u'poto', u'thumbnail_height': 640, u'author_url': u'
 http://instagram.com/lonavigi'},
 
  'tid': 410577192746500096,
 
  'retweet_count': 0
 
  }
 
  }
 




Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
What are the all possible values for cf_kwargs ??

SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
comparator_type=UTF8Type,  )

 - Here I want to specify, Column data types and row key type. How can
I do that ?


On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs ty...@datastax.com wrote:

 The column_validation_classes arg is just for defining individual column
 types.  Glad you got it figured out, though.


 On Thu, Aug 15, 2013 at 11:23 AM, Pinak Pani 
 nishant.has.a.quest...@gmail.com wrote:

 Thanks for quick reply. Apparantly, I was trying this to get working

 cf_kwargs = {'default_validation_class':COUNTER_COLUMN_TYPE}
 sys.create_column_family('my_ks', 'vote_count',
 column_validation_classes=cf_kwargs)  #1

 But this works:

 sys.create_column_family('my_ks', 'vote_count', **cf_kwargs)  #2

 I thought #1 should work.



 On Thu, Aug 15, 2013 at 9:15 PM, Tyler Hobbs ty...@datastax.com wrote:

 The only thing that makes a CF a counter CF is that the default
 validation class is CounterColumnType, which you can set through
 SystemManager.create_column_family().


 On Thu, Aug 15, 2013 at 10:38 AM, Pinak Pani 
 nishant.has.a.quest...@gmail.com wrote:

 I do not find a way to create a counter column family in Pycassa.
 This[1] does not help.

 Appreciate if someone can help me.

 Thanks

  1.
 http://pycassa.github.io/pycassa/api/pycassa/system_manager.html#pycassa.system_manager.SystemManager.create_column_family




 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/



[no subject]

2013-12-11 Thread Kumar Ranjan
Hey Folks,

So I am creating, column family using pycassaShell. See below:

validators = {

'approved':  'BooleanType',

'text':  'UTF8Type',

'favorite_count':'IntegerType',

'retweet_count': 'IntegerType',

'expanded_url':  'UTF8Type',

'tuid':  'LongType',

'screen_name':   'UTF8Type',

'profile_image': 'UTF8Type',

'embedly_data':  'CompositeType',

'created_at':'UTF8Type',

}

SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
comparator_type='CompositeType', default_validation_class='UTF8Type',
key_validation_class='UTF8Type', column_validation_classes=validators)


I am getting this error:

*InvalidRequestException*: InvalidRequestException(why='Invalid definition
for comparator org.apache.cassandra.db.marshal.CompositeType.'


My data will look like this:

'row_key' : { 'tid' :

{

'expanded_url': u'http://instagram.com/p/hwDj2BJeBy/',

'text': '#snowinginNYC Makes me so happy\xe2\x9d\x840brittles0
\xe2\x9b\x84 @ Grumman Studios http://t.co/rlOvaYSfKa',

'profile_image': u'
https://pbs.twimg.com/profile_images/3262070059/1e82f895559b904945d28cd3ab3947e5_normal.jpeg
',

'tuid': 339322611,

'approved': 'true',

'favorite_count': 0,

'screen_name': u'LonaVigi',

'created_at': u'Wed Dec 11 01:10:05 + 2013',

'embedly_data': {u'provider_url': u'http://instagram.com/',
u'description': ulonavigi's photo on Instagram, u'title':
u'#snwinginNYC Makes me so happy\u2744@0brittles0 \u26c4', u'url': u'
http://distilleryimage7.ak.instagram.com/5b880dec61c711e3a50b129314edd3b_8.jpg',
u'thumbnail_width': 640, u'height': 640, u'width': 640, u'thumbnail_url': u'
http://distilleryimage7.ak.instagram.com/b880dec61c711e3a50b1293d14edd3b_8.jpg',
u'author_name': u'lonavigi', u'version': u'1.0', u'provider_name':
u'Instagram', u'type': u'poto', u'thumbnail_height': 640, u'author_url': u'
http://instagram.com/lonavigi'},

'tid': 410577192746500096,

'retweet_count': 0

}

}


Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
validators = {

'approved':  'BooleanType',

'text':  'UTF8Type',

'favorite_count':'IntegerType',

'retweet_count': 'IntegerType',

'expanded_url':  'UTF8Type',

'tuid':  'LongType',

'screen_name':   'UTF8Type',

'profile_image': 'UTF8Type',

'embedly_data':  'CompositeType',

'created_at':'UTF8Type',

}

SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
comparator_type='CompositeType', default_validation_class='UTF8Type',
key_validation_class='UTF8Type', column_validation_classes=validators)


throws:

*InvalidRequestException*: InvalidRequestException(why='Invalid definition
for comparator org.apache.cassandra.db.marshal.CompositeType.

 Can you please explain why?


On Wed, Dec 11, 2013 at 12:08 PM, Tyler Hobbs ty...@datastax.com wrote:

 What options are available depends on what version of Cassandra you're
 using.

 You can specify the row key type with 'key_validation_class'.

 For column types, use 'column_validation_classes', which is a dict mapping
 column names to types.  For example:

 sys.create_column_family('mykeyspace', 'users',
 column_validation_classes={'username': UTF8Type, 'age': IntegerType})


 On Wed, Dec 11, 2013 at 10:32 AM, Kumar Ranjan winnerd...@gmail.comwrote:

 What are the all possible values for cf_kwargs ??

 SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
 comparator_type=UTF8Type,  )

  - Here I want to specify, Column data types and row key type. How
 can I do that ?


 On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs ty...@datastax.com wrote:

 The column_validation_classes arg is just for defining individual column
 types.  Glad you got it figured out, though.


 On Thu, Aug 15, 2013 at 11:23 AM, Pinak Pani 
 nishant.has.a.quest...@gmail.com wrote:

 Thanks for quick reply. Apparantly, I was trying this to get working

 cf_kwargs = {'default_validation_class':COUNTER_COLUMN_TYPE}
 sys.create_column_family('my_ks', 'vote_count',
 column_validation_classes=cf_kwargs)  #1

 But this works:

 sys.create_column_family('my_ks', 'vote_count', **cf_kwargs)  #2

 I thought #1 should work.



 On Thu, Aug 15, 2013 at 9:15 PM, Tyler Hobbs ty...@datastax.comwrote:

 The only thing that makes a CF a counter CF is that the default
 validation class is CounterColumnType, which you can set through
 SystemManager.create_column_family().


 On Thu, Aug 15, 2013 at 10:38 AM, Pinak Pani 
 nishant.has.a.quest...@gmail.com wrote:

 I do not find a way to create a counter column family in Pycassa.
 This[1] does not help.

 Appreciate if someone can help me.

 Thanks

  1.
 http://pycassa.github.io/pycassa/api/pycassa/system_manager.html#pycassa.system_manager.SystemManager.create_column_family




 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
This works, When I remove the comparator_type

validators = {

'tid':   'IntegerType',

'approved':  'BooleanType',

'text':  'UTF8Type',

'favorite_count':'IntegerType',

'retweet_count': 'IntegerType',

'expanded_url':  'UTF8Type',

'tuid':  'LongType',

'screen_name':   'UTF8Type',

'profile_image': 'UTF8Type',

'embedly_data':  'BytesType',

'created_at':'UTF8Type',

}


SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search',
default_validation_class='UTF8Type', key_validation_class='UTF8Type',
column_validation_classes=validators)




On Wed, Dec 11, 2013 at 12:23 PM, Kumar Ranjan winnerd...@gmail.com wrote:

 I am using ccm cassandra version

 *1.2.11*


 On Wed, Dec 11, 2013 at 12:19 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 validators = {

 'approved':  'BooleanType',

 'text':  'UTF8Type',

 'favorite_count':'IntegerType',

 'retweet_count': 'IntegerType',

 'expanded_url':  'UTF8Type',

 'tuid':  'LongType',

 'screen_name':   'UTF8Type',

 'profile_image': 'UTF8Type',

 'embedly_data':  'CompositeType',

 'created_at':'UTF8Type',

 }

 SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
 comparator_type='CompositeType', default_validation_class='UTF8Type',
 key_validation_class='UTF8Type', column_validation_classes=validators)


 throws:

 *InvalidRequestException*: InvalidRequestException(why='Invalid
 definition for comparator org.apache.cassandra.db.marshal.CompositeType.

  Can you please explain why?


 On Wed, Dec 11, 2013 at 12:08 PM, Tyler Hobbs ty...@datastax.com wrote:

 What options are available depends on what version of Cassandra you're
 using.

 You can specify the row key type with 'key_validation_class'.

 For column types, use 'column_validation_classes', which is a dict
 mapping column names to types.  For example:

 sys.create_column_family('mykeyspace', 'users',
 column_validation_classes={'username': UTF8Type, 'age': IntegerType})


 On Wed, Dec 11, 2013 at 10:32 AM, Kumar Ranjan winnerd...@gmail.comwrote:

 What are the all possible values for cf_kwargs ??

 SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test',
 comparator_type=UTF8Type,  )

  - Here I want to specify, Column data types and row key type. How
 can I do that ?


 On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs ty...@datastax.comwrote:

 The column_validation_classes arg is just for defining individual
 column types.  Glad you got it figured out, though.


 On Thu, Aug 15, 2013 at 11:23 AM, Pinak Pani 
 nishant.has.a.quest...@gmail.com wrote:

 Thanks for quick reply. Apparantly, I was trying this to get working

 cf_kwargs = {'default_validation_class':COUNTER_COLUMN_TYPE}
 sys.create_column_family('my_ks', 'vote_count',
 column_validation_classes=cf_kwargs)  #1

 But this works:

 sys.create_column_family('my_ks', 'vote_count', **cf_kwargs)  #2

 I thought #1 should work.



 On Thu, Aug 15, 2013 at 9:15 PM, Tyler Hobbs ty...@datastax.comwrote:

 The only thing that makes a CF a counter CF is that the default
 validation class is CounterColumnType, which you can set through
 SystemManager.create_column_family().


 On Thu, Aug 15, 2013 at 10:38 AM, Pinak Pani 
 nishant.has.a.quest...@gmail.com wrote:

 I do not find a way to create a counter column family in Pycassa.
 This[1] does not help.

 Appreciate if someone can help me.

 Thanks

  1.
 http://pycassa.github.io/pycassa/api/pycassa/system_manager.html#pycassa.system_manager.SystemManager.create_column_family




 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/






Re: 答复: How to configure linux service for Cassandra?

2013-12-02 Thread Kumar Ranjan
Hey Folks,

I have been using ccm for some time and it's pretty awesome tool to test
out admin stuff. Now, I really want to test modeling data by trying to
access ccm running cassandra using Thrift based pycassaShell client from
remote hosts (not locally). My setup is like this:

Lets say, private IP of this machine is: 10.11.12.13 (Just an example)

loLink encap:Local Loopback

  inet addr:127.0.0.1  Mask:255.0.0.0

  inet6 addr: ::1/128 Scope:Host

  UP LOOPBACK RUNNING  MTU:16436  Metric:1

  RX packets:67392708 errors:0 dropped:0 overruns:0 frame:0

  TX packets:67392708 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:0

  RX bytes:7398829042 (6.8 GiB)  TX bytes:7398829042 (6.8 GiB)


lo:1  Link encap:Local Loopback

  inet addr:127.0.0.2  Mask:255.255.255.0

  UP LOOPBACK RUNNING  MTU:16436  Metric:1


lo:2  Link encap:Local Loopback

  inet addr:127.0.0.3  Mask:255.0.0.0

  UP LOOPBACK RUNNING  MTU:16436  Metric:1


lo:3  Link encap:Local Loopback

  inet addr:127.0.0.4  Mask:255.0.0.0

  UP LOOPBACK RUNNING  MTU:16436  Metric:1


and 127.0.0.1 (node1), 127.0.0.2 (node2), 127.0.0.3 (node3), 127.0.0.4
(node4)


$ ccm status

node1: UP

node3: UP

node2: UP

node4: UP


How to connect to any of the instance from non-local hosts? When I do:

pycassaShell --host 10.11.12.13 --port 9160, it throws an exception,


thrift.transport.TTransport.TTransportException: Could not connect to
10.11.12.13:9160


Is there a way to make it work?



On Tue, Nov 12, 2013 at 4:19 AM, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 
boole.z@newegg.com wrote:

  Thanks very much. I will try.



 The goal of ccm and ccmlib is to make is easy to create, manage and
 destroy a

 small cluster on a local box. It is meant for testing of a Cassandra
 cluster.

 Best Regards,

 *Boole Guo*

 *Software Engineer, NESC-SH.MIS*

 *+86-021-51530666 %2B86-021-51530666*41442*

 *Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)*



 *发件人:* Christopher Wirt [mailto:chris.w...@struq.com]
 *发送时间:* 2013年11月12日 16:53
 *收件人:* user@cassandra.apache.org
 *主题:* RE: How to configure linux service for Cassandra?



 Starting multiple Cassandra nodes on the same machine involves setting
 loop back aliases and some configuration fiddling.



 Lucky for you Sylvain Lebresne made this handy tool in python which does
 the job for you.

 https://github.com/pcmanus/ccm



 to run as a service you need a script like this
 http://www.bajb.net/2012/01/cassandra-service-script/

 I haven’t tried this, I just run Cassandra in the foreground of a screen
 session.





 *From:* Boole.Z.Guo (mis.cnsh04.Newegg) 41442 [
 mailto:boole.z@newegg.com boole.z@newegg.com]
 *Sent:* 12 November 2013 05:17
 *To:* user@cassandra.apache.org
 *Subject:* How to configure linux service for Cassandra?



 How to configure linux service for Cassandra or start multiple Cassandra
 nodes from a single node?



 Thanks very muh!



 Best Regards,

 *Boole Guo*



Re: Choosing python client lib for Cassandra

2013-11-26 Thread Kumar Ranjan
Michael - thanks. Have you tried batching and thread pooling in python-driver? 
For now, i would avoid object mapper cqlengine, just because of my deadlines.

—
Sent from Mailbox for iPhone

On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com
wrote:

 We use the python-driver and have contributed some to its development.
 I have been careful to not push too fast on features until we need them.
 For example, we have just started using prepared statements - working well
 BTW.
 Next we will employ futures and start to exploit the async nature of new
 interface to C*.
 We are very familiar with libev in both C and python, and are happy to dig
 into the code to add features and fix bugs as needed, so the rewards of
 bypassing the old and focusing on the new seem worth the risks to us.
 ml
 On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
 currently using the thrift api to execute CQL until the native driver is
 out of beta.  I'm a little biased in recommending it, since I'm one of the
 primary authors.  If you've got cqlengine specific questions, head to the
 mailing list: https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take an
 approach like we did and throw a layer on top of thrift so you don't have
 to do a massive rewrite of your entire app once you want to go native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.comwrote:

 I have worked with Pycassa before and wrote a wrapper to use batch
 mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL
 3 based api because Thrift based api (Pycassa) will be supported for
 backward compatibility only. Apache site recommends to use Python api
 written by DataStax which is still in Beta (As per their documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and layout
 of packages,modules, classes, and functions are subject to change. There
 may also be serious bugs, so usage in a production environment is *not* 
 recommended
 at this time.

 DataStax site http://www.datastax.com/download/clientdrivers recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
 between CQL 3 based apis? Which stands out on top? Answers based on facts
 will help the community so please refrain from opinions.

 Please help ??




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade


Re: Choosing python client lib for Cassandra

2013-11-26 Thread Kumar Ranjan
Jon - Thanks. As I understand, cqlengine is an object mapper and must be using 
for cql prepare statements. What are you wrapping it with, in alternative to 
python-driver? 

—
Sent from Mailbox for iPhone

On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad j...@jonhaddad.com
wrote:

 So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently
 using the thrift api to execute CQL until the native driver is out of beta.
  I'm a little biased in recommending it, since I'm one of the primary
 authors.  If you've got cqlengine specific questions, head to the mailing
 list: https://groups.google.com/forum/#!forum/cqlengine-users
 If you want to roll your own solution, it might make sense to take an
 approach like we did and throw a layer on top of thrift so you don't have
 to do a massive rewrite of your entire app once you want to go native.
 Jon
 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.com wrote:
 I have worked with Pycassa before and wrote a wrapper to use batch
 mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL
 3 based api because Thrift based api (Pycassa) will be supported for
 backward compatibility only. Apache site recommends to use Python api
 written by DataStax which is still in Beta (As per their documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and layout of
 packages,modules, classes, and functions are subject to change. There may
 also be serious bugs, so usage in a production environment is *not* 
 recommended
 at this time.

 DataStax site http://www.datastax.com/download/clientdrivers recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
 between CQL 3 based apis? Which stands out on top? Answers based on facts
 will help the community so please refrain from opinions.

 Please help ??

 -- 
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Kumar Ranjan
Hi Jon - you are right. Its that I understand other ORM like python sqlalchemy 
or perl DBIX by heart. So i can cql faster than use cqlengine. I will give a 
shot at python-driver based on Michael's recommendation.

—
Sent from Mailbox for iPhone

On Tue, Nov 26, 2013 at 2:21 PM, Jonathan Haddad j...@jonhaddad.com
wrote:

 We're currently using the cql package, which is really a wrapper around
 thrift.
 To your concern about deadlines, I'm not sure how writing raw CQL is going
 to be any faster than using a mapper library for anything other than the
 most trivial of project.
 On Tue, Nov 26, 2013 at 11:09 AM, Kumar Ranjan winnerd...@gmail.com wrote:
 Jon - Thanks. As I understand, cqlengine is an object mapper and must be
 using for cql prepare statements. What are you wrapping it with, in
 alternative to python-driver?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad j...@jonhaddad.comwrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
 currently using the thrift api to execute CQL until the native driver is
 out of beta.  I'm a little biased in recommending it, since I'm one of the
 primary authors.  If you've got cqlengine specific questions, head to the
 mailing list: https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take an
 approach like we did and throw a layer on top of thrift so you don't have
 to do a massive rewrite of your entire app once you want to go native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.comwrote:

  I have worked with Pycassa before and wrote a wrapper to use batch
 mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to use
 CQL 3 based api because Thrift based api (Pycassa) will be supported for
 backward compatibility only. Apache site recommends to use Python api
 written by DataStax which is still in Beta (As per their documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and layout
 of packages,modules, classes, and functions are subject to change. There
 may also be serious bugs, so usage in a production environment is *not* 
 recommended
 at this time.

 DataStax site http://www.datastax.com/download/clientdrivers recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
 between CQL 3 based apis? Which stands out on top? Answers based on facts
 will help the community so please refrain from opinions.

 Please help ??




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



 -- 
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Kumar Ranjan
Jon - Any comment on batching?

—
Sent from Mailbox for iPhone

On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael michael.la...@nytimes.com
wrote:

 That's not a problem we have faced yet.
 On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.com wrote:
 How do you insert huge amount of data?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 I think thread pooling is always in operation - and we haven't seen any
 problems in that regard going to the 6 local nodes each client connects to.
 We haven't tried batching yet.


 On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 Michael - thanks. Have you tried batching and thread pooling in
 python-driver? For now, i would avoid object mapper cqlengine, just because
 of my deadlines.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 We use the python-driver and have contributed some to its development.

 I have been careful to not push too fast on features until we need
 them. For example, we have just started using prepared statements - 
 working
 well BTW.

 Next we will employ futures and start to exploit the async nature of
 new interface to C*.

 We are very familiar with libev in both C and python, and are happy to
 dig into the code to add features and fix bugs as needed, so the rewards 
 of
 bypassing the old and focusing on the new seem worth the risks to us.

 ml


 On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad 
 j...@jonhaddad.comwrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
 currently using the thrift api to execute CQL until the native driver is
 out of beta.  I'm a little biased in recommending it, since I'm one of 
 the
 primary authors.  If you've got cqlengine specific questions, head to the
 mailing list: https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take an
 approach like we did and throw a layer on top of thrift so you don't have
 to do a massive rewrite of your entire app once you want to go native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan 
 winnerd...@gmail.comwrote:

  I have worked with Pycassa before and wrote a wrapper to use batch
 mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to use
 CQL 3 based api because Thrift based api (Pycassa) will be supported for
 backward compatibility only. Apache site recommends to use Python api
 written by DataStax which is still in Beta (As per their documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and
 layout of packages,modules, classes, and functions are subject to 
 change.
 There may also be serious bugs, so usage in a production environment is
 *not* recommended at this time.

 DataStax site http://www.datastax.com/download/clientdrivers recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one compared
 between CQL 3 based apis? Which stands out on top? Answers based on 
 facts
 will help the community so please refrain from opinions.

 Please help ??




  --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade







Re: Choosing python client lib for Cassandra

2013-11-26 Thread Kumar Ranjan
Hi Jonathan - Does cqlengine have support for python 2.6 ?


On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 cqlengine supports batch queries, see the docs here:
 http://cqlengine.readthedocs.org/en/latest/topics/queryset.html#batch-queries


 On Tue, Nov 26, 2013 at 11:53 AM, Kumar Ranjan winnerd...@gmail.comwrote:

 Jon - Any comment on batching?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 That's not a problem we have faced yet.


 On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 How do you insert huge amount of data?
  —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


  On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I think thread pooling is always in operation - and we haven't seen
 any problems in that regard going to the 6 local nodes each client 
 connects
 to. We haven't tried batching yet.


 On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 Michael - thanks. Have you tried batching and thread pooling in
 python-driver? For now, i would avoid object mapper cqlengine, just 
 because
 of my deadlines.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 We use the python-driver and have contributed some to its
 development.

 I have been careful to not push too fast on features until we need
 them. For example, we have just started using prepared statements - 
 working
 well BTW.

 Next we will employ futures and start to exploit the async nature of
 new interface to C*.

 We are very familiar with libev in both C and python, and are happy
 to dig into the code to add features and fix bugs as needed, so the 
 rewards
 of bypassing the old and focusing on the new seem worth the risks to us.

 ml


 On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad 
 j...@jonhaddad.comwrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
 currently using the thrift api to execute CQL until the native driver 
 is
 out of beta.  I'm a little biased in recommending it, since I'm one of 
 the
 primary authors.  If you've got cqlengine specific questions, head to 
 the
 mailing list:
 https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to take
 an approach like we did and throw a layer on top of thrift so you don't
 have to do a massive rewrite of your entire app once you want to go 
 native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.com
  wrote:

  I have worked with Pycassa before and wrote a wrapper to use
 batch mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now to
 use CQL 3 based api because Thrift based api (Pycassa) will be 
 supported
 for backward compatibility only. Apache site recommends to use Python 
 api
 written by DataStax which is still in Beta (As per their 
 documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and
 layout of packages,modules, classes, and functions are subject to 
 change.
 There may also be serious bugs, so usage in a production environment 
 is
 *not* recommended at this time.

 DataStax site http://www.datastax.com/download/clientdrivers 
 recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one 
 compared
 between CQL 3 based apis? Which stands out on top? Answers based on 
 facts
 will help the community so please refrain from opinions.

 Please help ??




  --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade










 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: Choosing python client lib for Cassandra

2013-11-26 Thread Kumar Ranjan
Thanks Jonathan for the help.


On Tue, Nov 26, 2013 at 6:14 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 No, 2.7 only.


 On Tue, Nov 26, 2013 at 3:04 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 Hi Jonathan - Does cqlengine have support for python 2.6 ?


 On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote:

 cqlengine supports batch queries, see the docs here:
 http://cqlengine.readthedocs.org/en/latest/topics/queryset.html#batch-queries


 On Tue, Nov 26, 2013 at 11:53 AM, Kumar Ranjan winnerd...@gmail.comwrote:

 Jon - Any comment on batching?
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 That's not a problem we have faced yet.


 On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.comwrote:

 How do you insert huge amount of data?
  —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


  On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 I think thread pooling is always in operation - and we haven't seen
 any problems in that regard going to the 6 local nodes each client 
 connects
 to. We haven't tried batching yet.


 On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan 
 winnerd...@gmail.comwrote:

 Michael - thanks. Have you tried batching and thread pooling in
 python-driver? For now, i would avoid object mapper cqlengine, just 
 because
 of my deadlines.
 —
 Sent from Mailbox https://www.dropbox.com/mailbox for iPhone


 On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael 
 michael.la...@nytimes.com wrote:

 We use the python-driver and have contributed some to its
 development.

 I have been careful to not push too fast on features until we need
 them. For example, we have just started using prepared statements - 
 working
 well BTW.

 Next we will employ futures and start to exploit the async nature
 of new interface to C*.

 We are very familiar with libev in both C and python, and are
 happy to dig into the code to add features and fix bugs as needed, so 
 the
 rewards of bypassing the old and focusing on the new seem worth the 
 risks
 to us.

 ml


 On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad 
 j...@jonhaddad.com wrote:

  So, for cqlengine (https://github.com/cqlengine/cqlengine),
 we're currently using the thrift api to execute CQL until the native 
 driver
 is out of beta.  I'm a little biased in recommending it, since I'm 
 one of
 the primary authors.  If you've got cqlengine specific questions, 
 head to
 the mailing list:
 https://groups.google.com/forum/#!forum/cqlengine-users

 If you want to roll your own solution, it might make sense to
 take an approach like we did and throw a layer on top of thrift so 
 you
 don't have to do a massive rewrite of your entire app once you want 
 to go
 native.

 Jon


 On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan 
 winnerd...@gmail.com wrote:

  I have worked with Pycassa before and wrote a wrapper to use
 batch mutation  connection pooling etc. But
 http://wiki.apache.org/cassandra/ClientOptions recommends now
 to use CQL 3 based api because Thrift based api (Pycassa) will be 
 supported
 for backward compatibility only. Apache site recommends to use 
 Python api
 written by DataStax which is still in Beta (As per their 
 documentation).
 See warnings from their python-driver/README.rst file

 *Warning*

 This driver is currently under heavy development, so the API and
 layout of packages,modules, classes, and functions are subject to 
 change.
 There may also be serious bugs, so usage in a production 
 environment is
 *not* recommended at this time.

 DataStax site http://www.datastax.com/download/clientdrivers 
 recommends
 using DB-API 2.0 plus legacy api's. Is there more? Has any one 
 compared
 between CQL 3 based apis? Which stands out on top? Answers based on 
 facts
 will help the community so please refrain from opinions.

 Please help ??




  --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade










 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: Exporting all data within a keyspace

2013-04-30 Thread Kumar Ranjan
Try sstable2json and json2sstable. But it works on column family so you can
fetch all column family and iterate over list of CF and use sstable2json
tool to extract data. Remember this will only fetch on disk data do
anything in memtable/cache which is to be flushed will be missed. So run
compaction and then run the written script.

On Tuesday, April 30, 2013, Chidambaran Subramanian wrote:

 Is there any easy way of exporting all data for a keyspace (and
 conversely) importing it.

 Regards
 Chiddu