Re: Thrift vs CQL3 performance
Stick with cql3 going forward. Last i checked, there is no further dev on thrift. I had worked with Thrift based c* api for 2 years in Python based pycassa and experience was not very satisfactory. I had not done comparisions between pycassa and cql so cant say. Cql isvery simple any way.— Sent from Mailbox On Mon, Jul 28, 2014 at 11:51 AM, bi kro hlqvu...@gmail.com wrote: Hi every one, I'm newcomer to Cassandra, so I would like to know about performance between Thrift (Hector) vs CQL3, specially about the speed (Thrift based on RPC, CQL3 based on binary protocol). Currently I'm using Cassandra 1.2 , which version CQL3 of JavaDriver-Datastax is stable for it? Thanks very much
Re: 750Gb compaction task
M — Sent from Mailbox for iPhone On Thu, Mar 13, 2014 at 1:28 AM, Plotnik, Alexey aplot...@rhonda.ru wrote: After rebalance and cleanup I have leveled CF (SSTable size = 100MB) and a compaction Task that is going to process ~750GB: root@da1-node1:~# nodetool compactionstats pending tasks: 10556 compaction typekeyspace column family completed total unit progress Compaction cafs_chunks chunks 41015024065 808740269082 bytes 5.07% I have no space for this operation, I have 300 Gb only. Is it possible to resolve this situation?
Re:
Yes, filter out based on time range. Currently i do this in python . Just curious to see if this can be done using pycassa somehow?— Sent from Mailbox for iPhone On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs ty...@datastax.com wrote: Can you clarify exactly what you need help with? It seems like you already know how to fetch the timestamps. Are you just looking for python code to filter data that's not in a time range? By the way, there's a pycassa-specific mailing list here: https://groups.google.com/forum/#!forum/pycassa-discuss On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan winnerd...@gmail.com wrote: Hey folks, I am dealing with a legacy CFs where super_column has been used and python client pycassa is being used. An example is given below. My question here is, can I make use of include_timestamp to select data between two returned timestamps e.g between 1393516744591751 and 1393516772131811. This is not exactly timeseries but just selected between two. Please help on this? Data is inserted like this TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}}) Data Fetch: TEST_CF.get('test_r_key', include_timestamp=True) OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1', 1393451990902345))])), ('1235', OrderedDict([('key_name_2', (u'taf_test_2', 1393516744591751))])), ('1236', OrderedDict([('key_name_3', (u'taf_test_3', 1393516772131782))])) ('1237', OrderedDict([('key_name_4', (u'taf_test_4', 1393516772131799))])) ('1238', OrderedDict([('key_name_5', (u'taf_test_5', 1393516772131811))])) ('1239', OrderedDict([('key_name_6', (u'taf_test_6', 1393516772131854))])) ('1240', OrderedDict([('key_name_7', (u'taf_test_7', 1393516772131899))])) ]) -- Tyler Hobbs DataStax http://datastax.com/
Re:
Thanks Tyler. Yes, I scanned through pycassaShell code couple of times but did not find anything like that. On Fri, Feb 28, 2014 at 3:24 PM, Tyler Hobbs ty...@datastax.com wrote: No, pycassa won't do anything fancy with timestamps automatically, you'll have to keep doing yourself. On Fri, Feb 28, 2014 at 1:28 PM, Kumar Ranjan winnerd...@gmail.comwrote: Yes, filter out based on time range. Currently i do this in python . Just curious to see if this can be done using pycassa somehow? -- Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs ty...@datastax.com wrote: Can you clarify exactly what you need help with? It seems like you already know how to fetch the timestamps. Are you just looking for python code to filter data that's not in a time range? By the way, there's a pycassa-specific mailing list here: https://groups.google.com/forum/#!forum/pycassa-discuss On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan winnerd...@gmail.comwrote: Hey folks, I am dealing with a legacy CFs where super_column has been used and python client pycassa is being used. An example is given below. My question here is, can I make use of include_timestamp to select data between two returned timestamps e.g between 1393516744591751 and 1393516772131811. This is not exactly timeseries but just selected between two. Please help on this? Data is inserted like this TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}}) Data Fetch: TEST_CF.get('test_r_key', include_timestamp=True) OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1', 1393451990902345))])), ('1235', OrderedDict([('key_name_2', (u'taf_test_2', 1393516744591751))])), ('1236', OrderedDict([('key_name_3', (u'taf_test_3', 1393516772131782))])) ('1237', OrderedDict([('key_name_4', (u'taf_test_4', 1393516772131799))])) ('1238', OrderedDict([('key_name_5', (u'taf_test_5', 1393516772131811))])) ('1239', OrderedDict([('key_name_6', (u'taf_test_6', 1393516772131854))])) ('1240', OrderedDict([('key_name_7', (u'taf_test_7', 1393516772131899))])) ]) -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
[no subject]
Hey folks, I am dealing with a legacy CFs where super_column has been used and python client pycassa is being used. An example is given below. My question here is, can I make use of include_timestamp to select data between two returned timestamps e.g between 1393516744591751 and 1393516772131811. This is not exactly timeseries but just selected between two. Please help on this? Data is inserted like this TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}}) Data Fetch: TEST_CF.get('test_r_key', include_timestamp=True) OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1', 1393451990902345))])), ('1235', OrderedDict([('key_name_2', (u'taf_test_2', 1393516744591751))])), ('1236', OrderedDict([('key_name_3', (u'taf_test_3', 1393516772131782))])) ('1237', OrderedDict([('key_name_4', (u'taf_test_4', 1393516772131799))])) ('1238', OrderedDict([('key_name_5', (u'taf_test_5', 1393516772131811))])) ('1239', OrderedDict([('key_name_6', (u'taf_test_6', 1393516772131854))])) ('1240', OrderedDict([('key_name_7', (u'taf_test_7', 1393516772131899))])) ])
pycassa get column_start and column_finish with less than or greater than
Hey Folks, Does pycassa get column_start takes greater than equal to option? What I know so far is, you have to exact column or super_column value for column_start and column_finish to work. In my case, column is value of epoch time.
OpenJDK is not recommended? Why
I am in process of setting 2 node cluster with C* version 2.0.4. When I started each node, it failed to communicate thus, each are running separate and not in same ring. So started looking at the log files are saw the message below: WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line 155) OpenJDK is not recommended. Please upgrade to the newest Oracle Java release Is this message informational only or can it be real issue? Is this why, two nodes are not in a ring? -- Kumar
Issues with seeding on EC2 for C* 2.0.4 - help needed
Hey Folks - I am burning the midnight oil fast but cant figure out what I am doing wrong? log files has this. I have also listed both seed node and node 2 partial configurations. INFO [main] 2014-01-29 05:15:11,515 CommitLog.java (line 127) Log replay complete, 46 replayed mutations INFO [main] 2014-01-29 05:15:12,734 StorageService.java (line 490) Cassandra version: 2.0.4 INFO [main] 2014-01-29 05:15:12,743 StorageService.java (line 491) Thrift API version: 19.39.0 INFO [main] 2014-01-29 05:15:12,755 StorageService.java (line 492) CQL supported versions: 2.0.0,3.1.3 (default: 3.1.3) INFO [main] 2014-01-29 05:15:12,821 StorageService.java (line 515) Loading persisted ring state INFO [main] 2014-01-29 05:15:12,864 MessagingService.java (line 458) Starting Messaging Service on port 7000 ERROR [main] 2014-01-29 05:15:43,890 CassandraDaemon.java (line 478) Exception encountered during startup java.lang.RuntimeException: Unable to gossip with any seeds Seed node 1: ( cassandra.yml : I just have 2 node cluster and this is the seed node) seed_provider: # Addresses of hosts that are deemed contact points. # Cassandra nodes use this list of hosts to find each other and learn # the topology of the ring. You must change this if you are running # multiple nodes! - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: # seeds is actually a comma-delimited list of addresses. # Ex: ip1,ip2,ip3 - seeds: 127.0.0.1 storage_port: 7000 ssl_storage_port: 7001 listen_address: 10.xxx.xxx.xxx ( Private IP of this node ) start_native_transport: true native_transport_port: 9042 start_rpc: true rpc_address: 0.0.0.0 rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync Node 2: seed_provider: # Addresses of hosts that are deemed contact points. # Cassandra nodes use this list of hosts to find each other and learn # the topology of the ring. You must change this if you are running # multiple nodes! - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: # seeds is actually a comma-delimited list of addresses. # Ex: ip1,ip2,ip3 - seeds: 10.xxx.xxx.xxx --- private IP of seed node listed above storage_port: 7000 ssl_storage_port: 7001 listen_address: 10.xxx.xxx.xxx --- private IP of this node start_native_transport: true native_transport_port: 9042 start_rpc: true rpc_address: 0.0.0.0 rpc_port: 9160 rpc_keepalive: true rpc_server_type: sync
Re: OpenJDK is not recommended? Why
Yes got rid of openJDK and installed oracle version and warning went away. Happy happy...Thank you folks.. On Tue, Jan 28, 2014 at 11:59 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 01/28/2014 09:55 PM, Kumar Ranjan wrote: I am in process of setting 2 node cluster with C* version 2.0.4. When I started each node, it failed to communicate thus, each are running separate and not in same ring. So started looking at the log files are saw the message below: This is probably just a configuration issue and not likely to be the fault of OpenJDK. OpenJDK is ok for testing the waters and light dev work; it is the reference architecture for Oracle Java SE 7. WARN [main] 2014-01-28 06:02:17,861 CassandraDaemon.java (line 155) OpenJDK is not recommended. Please upgrade to the newest Oracle Java release Is this message informational only or can it be real issue? Source of the above warning has some comments (attached, so they don't wrap so badly, I hope). https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a= blob;f=src/java/org/apache/cassandra/service/CassandraDaemon.java;h= 424dbfa58ec72ea812362e2b428d0c4534626307;hb=HEAD#l106 -- Kind regards, Michael
Re: Issues with seeding on EC2 for C* 2.0.4 - help needed
Hi Michael - Yes, 7000, 7001, 9042, 9160 are all open on EC2. Issue was seeds address and listen_address were 127.0.0.1 and private_ip. This will help anyone http://stackoverflow.com/questions/20690987/apache-cassandra-unable-to-gossip-with-any-seeds On Wed, Jan 29, 2014 at 1:12 AM, Michael Shuler mich...@pbandjelly.orgwrote: Did you open up the ports so they can talk to each other? http://www.datastax.com/documentation/cassandra/2.0/ webhelp/index.html#cassandra/install/installAMISecurityGroup.html -- Michael
Centralized tool to install and manage C* from one machine
I am used to working with CCM for testing. For production, I depend on installing cassandra manually. Is there a proven tool to install and manage multinode cassandra cluster? If you have any experience, please let me know.
Re: Centralized tool to install and manage C* from one machine
Thank you Michael. I am trying out Priam as we speak and will post an update of my experience with different tools. Again. Thank you. -- K On Tue, Jan 28, 2014 at 12:29 AM, Michael Shuler mich...@pbandjelly.orgwrote: On 01/27/2014 10:34 PM, Kumar Ranjan wrote: I am used to working with CCM for testing. For production, I depend on installing cassandra manually. Is there a proven tool to install and manage multinode cassandra cluster? If you have any experience, please let me know. A good answer will depend on where you will be deploying your cluster, how many nodes you plan, etc. For bare metal installations, for instance with Debian or Ubuntu, you may want to automate with a preseed [0] of your OS installations to set up the JVM, install your desired cassandra version package, etc. RedHat variants use similar OS automation with kickstart [1]. Or, perhaps you want to use AWS and a preinstalled AMI [2] or create your own golden image on AWS and save it as an AMI for booting your other machines. I've done preseeds and kickstarts extensively and have set up a few custom AMIs - you may want to consider that most production clusters are likely running Debian or Ubuntu. Once you have machines installed, you may want some configuration management, or you could use a config manager to aid in OS installation and setup at install time - it seems that chef [3] has gained some traction over the once-hot puppet [4], but salt [5] is also quite mature, if you like python better than ruby. I've used all of these and they are great. I've also been burned by them all. There's nothing quite like the complete control over configurations carefully checked into a VCS and parallel ssh to pull them out, along with a few scripts to set up things just right - just my experience ;) There is also priam [6]. I keep wanting to find some time to play with it, so I have no insight, but it looks very interesting. If you'd like cassandra cluster management beyond installation/configuration of the machine, have a look at opscenter [7]. I'm sure there are a lot of other projects / cookbooks that others might be working on - chime in! Michael [0] https://wiki.debian.org/DebianInstaller/Preseed [1] https://access.redhat.com/site/documentation/en-US/Red_ Hat_Enterprise_Linux/6/html/Installation_Guide/ch-kickstart2.html [2] https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 [3] http://community.opscode.com/cookbooks/cassandra [4] https://forge.puppetlabs.com/tags/cassandra [5] http://docs.saltstack.com/ref/modules/all/salt.modules.cassandra.html [6] https://github.com/Netflix/Priam [7] http://www.datastax.com/what-we-offer/products-services/ datastax-opscenter
alter_column_family (thrift based pycassa) drop a column
Hey folks, I used create_column_family to create a CF but made a typo and I need to use alter_column_family to drop that column and re-create with correct name? Can you help with the syntax? here is what I use for alter_column_family: SYSTEM_MANAGER.alter_column_family('Narrative','Instagram_Tags',default_validation_class='UTF8Type',super=True, comparator='UTF8Type',key_validation_class='UTF8Type', column_validation_classes=validators) where validators={'longitude': 'DoubleType', 'latitude': 'DoubleType',}
Re: Cassandra pytho pagination
Rob - I got a question following your advice. This is how, I define my column family validators = { 'approved':'UTF8Type', 'tid': 'UTF8Type', 'iid': 'UTF8Type', 'score': 'IntegerType', 'likes': 'IntegerType', 'retweet': 'IntegerType', 'favorite':'IntegerType', 'screen_name': 'UTF8Type', 'created_date':'UTF8Type', 'expanded_url':'UTF8Type', 'embedly_data':'BytesType', } SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram', default_validation_class='UTF8Type', super=True, comparator='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validator) Actual data representation: 'row_key': {'1234555665_53323232': {'approved': 'false', 'tid': 123, 'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34, screen_name:'goodname'}, '2344555665_53323232': {'approved': 'false', 'tid': 134, 'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34, screen_name:'newname'}. . } Is there something wrong with it? Here 1234555665_53323232 and 2344555665_53323232 are super columns. Also, If I have to represent this data with new composite comparator, How will I accomplish that? Please let me know. Regards. On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan winnerd...@gmail.comwrote: Second approach ( I used in production ): - fetch all super columns for a row key Stock response mentioning that super columns are anti-advised for use, especially in brand new code. =Rob
Re: Cassandra pytho pagination
I am using pycassa. So, here is how I solved this issue. Will discuss 2 approaches. First approach didn't work out for me. Thanks Aaron for your attention. First approach: - Say if column_count = 10 - collect first 11 rows, sort first 10, send it to user (front end) as JSON object and last=11th_column - User then calls for page 2, with prev = 1st_column_id, column_start = 11th_column and column_count = 10 - This way, I can traverse, next page and previous page. - Only issue with this approach is, I don't have all columns in super column sorted. So this did not work. Second approach ( I used in production ): - fetch all super columns for a row key - Sort this in python using sorted and lambda function based on column values. - Once sorted, I prepare buckets and each bucked size is of page size/column count. Also filter out any rogue data if needed - Store page by page results in Redis with keys such as 'row_key|page_1|super_column' and keep refreshing redis periodically. I am sure, there must be a better and brighter approach but for now, 2nd approach is working. Thoughts ?? On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton aa...@thelastpickle.comwrote: CQL3 and thrift do not support an offset clause, so you can only really support next / prev page calls to the database. I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like, What client are you using ? xget is not a standard cassandra function. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 13/12/2013, at 4:56 am, Kumar Ranjan winnerd...@gmail.com wrote: Hey Folks, I need some ideas about support implementing of pagination on the browser, from the backend. So python code (backend) gets request from frontend with page=1,2,3,4 and so on and count_per_page=50. I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like, total_cols is total columns for that key. count is what user sends me. .*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count): Is my understanding correct? because its not working for page 2 and so on? Please enlighten me with suggestions. Thanks.
Issues while fetching data with pycassa get for super columns
Hi Folks - I have having issue fetch data using pycassa get() function. I have copied the CF schema and my code is below. This query returns me just this Results: {u'narrativebuddieswin': ['609548930995445799_752368319', '609549303525138481_752368319', '610162034020180814_752368319', '610162805856002905_752368319', '610163571417146213_752368319', '610165900312830861_752368319']} none of the subcolumns are returned for above super column ??? Please help.. CODE: - if start: res_rows = col_fam.get(key, column_count=count, column_start=start, include_timestamp=True, include_ttl=True, ) else: res_rows = col_fam.get(key, column_count=count, include_timestamp=True, include_ttl=True,) return res_rows CF Schema: 'Twitter_Instagram': CfDef(comment='', key_validation_class='org.apache.cassandra.db.marshal.UTF8Type', min_compaction_threshold=4, key_cache_save_period_in_seconds=None, gc_grace_seconds=864000, default_validation_class='org.apache.cassandra.db.marshal.UTF8Type', max_compaction_threshold=32, read_repair_chance=0.10001, compression_options={'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}, bloom_filter_fp_chance=None, id=None, keyspace='Narrative', key_cache_size=None, replicate_on_write=True, subcomparator_type='org.apache.cassandra.db.marshal.BytesType', merge_shards_chance=None, row_cache_provider=None, row_cache_save_period_in_seconds=None, column_type='Super', memtable_throughput_in_mb=None, memtable_flush_after_mins=None, column_metadata={‘ 'expanded_url': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='expanded_url', index_options=None), 'favorite': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.IntegerType', name='favorite', index_options=None), 'retweet': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.IntegerType', name='retweet', index_options=None), 'iid': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='iid', index_options=None), 'screen_name': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='screen_name', index_options=None), 'embedly_data': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.BytesType', name='embedly_data', index_options=None), 'created_date': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='created_date', index_options=None), 'tid': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='tid', index_options=None), 'score': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.IntegerType', name='score', index_options=None), 'approved': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.UTF8Type', name='approved', index_options=None), 'likes': ColumnDef(index_type=None, index_name=None, validation_class='org.apache.cassandra.db.marshal.IntegerType', name='likes', index_options=None)}, key_alias=None, dclocal_read_repair_chance=0.0, name='Twitter_Instagram', compaction_strategy_options={}, row_cache_keys_to_save=None, compaction_strategy='org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', memtable_operations_in_millions=None, caching='KEYS_ONLY', comparator_type='org.apache.cassandra.db.marshal.BytesType', row_cache_size=None),
Cassandra pytho pagination
Hey Folks, I need some ideas about support implementing of pagination on the browser, from the backend. So python code (backend) gets request from frontend with page=1,2,3,4 and so on and count_per_page=50. I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like, total_cols is total columns for that key. count is what user sends me. .*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count): Is my understanding correct? because its not working for page 2 and so on? Please enlighten me with suggestions. Thanks.
Cassandra data update for a row
Hey Folks, I have a row like this. 'twitter_row_key' is the row key and 411186035495010304 is column. Rest is values for 411186035495010304 column. See below. 'twitter_row_key': OrderedDict([('411186035495010304', u'{score: 0, tid: 411186035495010304, created_at: Thu Dec 12 17:29:24 + 2013, favorite: 0, retweet: 0, approved: true}'),]) How can I set approved to 'false' ?? When I try insert for row key 'twitter_row_key' and column 411186035495010304, it overwrites the whole data and new row becomes like this 'twitter_row_key': OrderedDict([('411186035495010304', u'{approved: true}'),]) Any thoughts guys?
Re:
Thanks Aaron. On Wed, Dec 11, 2013 at 10:45 PM, Aaron Morton aa...@thelastpickle.comwrote: SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type='CompositeType', default_validation_class='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validators) CompositeType is a type composed of other types, see http://pycassa.github.io/pycassa/assorted/composite_types.html?highlight=compositetype Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 12/12/2013, at 6:15 am, Kumar Ranjan winnerd...@gmail.com wrote: Hey Folks, So I am creating, column family using pycassaShell. See below: validators = { 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongType', 'screen_name': 'UTF8Type', 'profile_image': 'UTF8Type', 'embedly_data': 'CompositeType', 'created_at':'UTF8Type', } SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type='CompositeType', default_validation_class='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validators) I am getting this error: InvalidRequestException: InvalidRequestException(why='Invalid definition for comparator org.apache.cassandra.db.marshal.CompositeType.' My data will look like this: 'row_key' : { 'tid' : { 'expanded_url': u'http://instagram.com/p/hwDj2BJeBy/', 'text': '#snowinginNYC Makes me so happy\xe2\x9d\x840brittles0 \xe2\x9b\x84 @ Grumman Studios http://t.co/rlOvaYSfKa', 'profile_image': u' https://pbs.twimg.com/profile_images/3262070059/1e82f895559b904945d28cd3ab3947e5_normal.jpeg ', 'tuid': 339322611, 'approved': 'true', 'favorite_count': 0, 'screen_name': u'LonaVigi', 'created_at': u'Wed Dec 11 01:10:05 + 2013', 'embedly_data': {u'provider_url': u'http://instagram.com/', u'description': ulonavigi's photo on Instagram, u'title': u'#snwinginNYC Makes me so happy\u2744@0brittles0 \u26c4', u'url': u' http://distilleryimage7.ak.instagram.com/5b880dec61c711e3a50b129314edd3b_8.jpg', u'thumbnail_width': 640, u'height': 640, u'width': 640, u'thumbnail_url': u' http://distilleryimage7.ak.instagram.com/b880dec61c711e3a50b1293d14edd3b_8.jpg', u'author_name': u'lonavigi', u'version': u'1.0', u'provider_name': u'Instagram', u'type': u'poto', u'thumbnail_height': 640, u'author_url': u' http://instagram.com/lonavigi'}, 'tid': 410577192746500096, 'retweet_count': 0 } }
Re: How to create counter column family via Pycassa?
What are the all possible values for cf_kwargs ?? SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type=UTF8Type, ) - Here I want to specify, Column data types and row key type. How can I do that ? On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs ty...@datastax.com wrote: The column_validation_classes arg is just for defining individual column types. Glad you got it figured out, though. On Thu, Aug 15, 2013 at 11:23 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: Thanks for quick reply. Apparantly, I was trying this to get working cf_kwargs = {'default_validation_class':COUNTER_COLUMN_TYPE} sys.create_column_family('my_ks', 'vote_count', column_validation_classes=cf_kwargs) #1 But this works: sys.create_column_family('my_ks', 'vote_count', **cf_kwargs) #2 I thought #1 should work. On Thu, Aug 15, 2013 at 9:15 PM, Tyler Hobbs ty...@datastax.com wrote: The only thing that makes a CF a counter CF is that the default validation class is CounterColumnType, which you can set through SystemManager.create_column_family(). On Thu, Aug 15, 2013 at 10:38 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: I do not find a way to create a counter column family in Pycassa. This[1] does not help. Appreciate if someone can help me. Thanks 1. http://pycassa.github.io/pycassa/api/pycassa/system_manager.html#pycassa.system_manager.SystemManager.create_column_family -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
[no subject]
Hey Folks, So I am creating, column family using pycassaShell. See below: validators = { 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongType', 'screen_name': 'UTF8Type', 'profile_image': 'UTF8Type', 'embedly_data': 'CompositeType', 'created_at':'UTF8Type', } SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type='CompositeType', default_validation_class='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validators) I am getting this error: *InvalidRequestException*: InvalidRequestException(why='Invalid definition for comparator org.apache.cassandra.db.marshal.CompositeType.' My data will look like this: 'row_key' : { 'tid' : { 'expanded_url': u'http://instagram.com/p/hwDj2BJeBy/', 'text': '#snowinginNYC Makes me so happy\xe2\x9d\x840brittles0 \xe2\x9b\x84 @ Grumman Studios http://t.co/rlOvaYSfKa', 'profile_image': u' https://pbs.twimg.com/profile_images/3262070059/1e82f895559b904945d28cd3ab3947e5_normal.jpeg ', 'tuid': 339322611, 'approved': 'true', 'favorite_count': 0, 'screen_name': u'LonaVigi', 'created_at': u'Wed Dec 11 01:10:05 + 2013', 'embedly_data': {u'provider_url': u'http://instagram.com/', u'description': ulonavigi's photo on Instagram, u'title': u'#snwinginNYC Makes me so happy\u2744@0brittles0 \u26c4', u'url': u' http://distilleryimage7.ak.instagram.com/5b880dec61c711e3a50b129314edd3b_8.jpg', u'thumbnail_width': 640, u'height': 640, u'width': 640, u'thumbnail_url': u' http://distilleryimage7.ak.instagram.com/b880dec61c711e3a50b1293d14edd3b_8.jpg', u'author_name': u'lonavigi', u'version': u'1.0', u'provider_name': u'Instagram', u'type': u'poto', u'thumbnail_height': 640, u'author_url': u' http://instagram.com/lonavigi'}, 'tid': 410577192746500096, 'retweet_count': 0 } }
Re: How to create counter column family via Pycassa?
validators = { 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongType', 'screen_name': 'UTF8Type', 'profile_image': 'UTF8Type', 'embedly_data': 'CompositeType', 'created_at':'UTF8Type', } SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type='CompositeType', default_validation_class='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validators) throws: *InvalidRequestException*: InvalidRequestException(why='Invalid definition for comparator org.apache.cassandra.db.marshal.CompositeType. Can you please explain why? On Wed, Dec 11, 2013 at 12:08 PM, Tyler Hobbs ty...@datastax.com wrote: What options are available depends on what version of Cassandra you're using. You can specify the row key type with 'key_validation_class'. For column types, use 'column_validation_classes', which is a dict mapping column names to types. For example: sys.create_column_family('mykeyspace', 'users', column_validation_classes={'username': UTF8Type, 'age': IntegerType}) On Wed, Dec 11, 2013 at 10:32 AM, Kumar Ranjan winnerd...@gmail.comwrote: What are the all possible values for cf_kwargs ?? SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type=UTF8Type, ) - Here I want to specify, Column data types and row key type. How can I do that ? On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs ty...@datastax.com wrote: The column_validation_classes arg is just for defining individual column types. Glad you got it figured out, though. On Thu, Aug 15, 2013 at 11:23 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: Thanks for quick reply. Apparantly, I was trying this to get working cf_kwargs = {'default_validation_class':COUNTER_COLUMN_TYPE} sys.create_column_family('my_ks', 'vote_count', column_validation_classes=cf_kwargs) #1 But this works: sys.create_column_family('my_ks', 'vote_count', **cf_kwargs) #2 I thought #1 should work. On Thu, Aug 15, 2013 at 9:15 PM, Tyler Hobbs ty...@datastax.comwrote: The only thing that makes a CF a counter CF is that the default validation class is CounterColumnType, which you can set through SystemManager.create_column_family(). On Thu, Aug 15, 2013 at 10:38 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: I do not find a way to create a counter column family in Pycassa. This[1] does not help. Appreciate if someone can help me. Thanks 1. http://pycassa.github.io/pycassa/api/pycassa/system_manager.html#pycassa.system_manager.SystemManager.create_column_family -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: How to create counter column family via Pycassa?
This works, When I remove the comparator_type validators = { 'tid': 'IntegerType', 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongType', 'screen_name': 'UTF8Type', 'profile_image': 'UTF8Type', 'embedly_data': 'BytesType', 'created_at':'UTF8Type', } SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search', default_validation_class='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validators) On Wed, Dec 11, 2013 at 12:23 PM, Kumar Ranjan winnerd...@gmail.com wrote: I am using ccm cassandra version *1.2.11* On Wed, Dec 11, 2013 at 12:19 PM, Kumar Ranjan winnerd...@gmail.comwrote: validators = { 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongType', 'screen_name': 'UTF8Type', 'profile_image': 'UTF8Type', 'embedly_data': 'CompositeType', 'created_at':'UTF8Type', } SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type='CompositeType', default_validation_class='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validators) throws: *InvalidRequestException*: InvalidRequestException(why='Invalid definition for comparator org.apache.cassandra.db.marshal.CompositeType. Can you please explain why? On Wed, Dec 11, 2013 at 12:08 PM, Tyler Hobbs ty...@datastax.com wrote: What options are available depends on what version of Cassandra you're using. You can specify the row key type with 'key_validation_class'. For column types, use 'column_validation_classes', which is a dict mapping column names to types. For example: sys.create_column_family('mykeyspace', 'users', column_validation_classes={'username': UTF8Type, 'age': IntegerType}) On Wed, Dec 11, 2013 at 10:32 AM, Kumar Ranjan winnerd...@gmail.comwrote: What are the all possible values for cf_kwargs ?? SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type=UTF8Type, ) - Here I want to specify, Column data types and row key type. How can I do that ? On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs ty...@datastax.comwrote: The column_validation_classes arg is just for defining individual column types. Glad you got it figured out, though. On Thu, Aug 15, 2013 at 11:23 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: Thanks for quick reply. Apparantly, I was trying this to get working cf_kwargs = {'default_validation_class':COUNTER_COLUMN_TYPE} sys.create_column_family('my_ks', 'vote_count', column_validation_classes=cf_kwargs) #1 But this works: sys.create_column_family('my_ks', 'vote_count', **cf_kwargs) #2 I thought #1 should work. On Thu, Aug 15, 2013 at 9:15 PM, Tyler Hobbs ty...@datastax.comwrote: The only thing that makes a CF a counter CF is that the default validation class is CounterColumnType, which you can set through SystemManager.create_column_family(). On Thu, Aug 15, 2013 at 10:38 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: I do not find a way to create a counter column family in Pycassa. This[1] does not help. Appreciate if someone can help me. Thanks 1. http://pycassa.github.io/pycassa/api/pycassa/system_manager.html#pycassa.system_manager.SystemManager.create_column_family -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/ -- Tyler Hobbs DataStax http://datastax.com/
Re: 答复: How to configure linux service for Cassandra?
Hey Folks, I have been using ccm for some time and it's pretty awesome tool to test out admin stuff. Now, I really want to test modeling data by trying to access ccm running cassandra using Thrift based pycassaShell client from remote hosts (not locally). My setup is like this: Lets say, private IP of this machine is: 10.11.12.13 (Just an example) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:67392708 errors:0 dropped:0 overruns:0 frame:0 TX packets:67392708 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7398829042 (6.8 GiB) TX bytes:7398829042 (6.8 GiB) lo:1 Link encap:Local Loopback inet addr:127.0.0.2 Mask:255.255.255.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 lo:2 Link encap:Local Loopback inet addr:127.0.0.3 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 lo:3 Link encap:Local Loopback inet addr:127.0.0.4 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 and 127.0.0.1 (node1), 127.0.0.2 (node2), 127.0.0.3 (node3), 127.0.0.4 (node4) $ ccm status node1: UP node3: UP node2: UP node4: UP How to connect to any of the instance from non-local hosts? When I do: pycassaShell --host 10.11.12.13 --port 9160, it throws an exception, thrift.transport.TTransport.TTransportException: Could not connect to 10.11.12.13:9160 Is there a way to make it work? On Tue, Nov 12, 2013 at 4:19 AM, Boole.Z.Guo (mis.cnsh04.Newegg) 41442 boole.z@newegg.com wrote: Thanks very much. I will try. The goal of ccm and ccmlib is to make is easy to create, manage and destroy a small cluster on a local box. It is meant for testing of a Cassandra cluster. Best Regards, *Boole Guo* *Software Engineer, NESC-SH.MIS* *+86-021-51530666 %2B86-021-51530666*41442* *Floor 19, KaiKai Plaza, 888, Wanhangdu Rd, Shanghai (200042)* *发件人:* Christopher Wirt [mailto:chris.w...@struq.com] *发送时间:* 2013年11月12日 16:53 *收件人:* user@cassandra.apache.org *主题:* RE: How to configure linux service for Cassandra? Starting multiple Cassandra nodes on the same machine involves setting loop back aliases and some configuration fiddling. Lucky for you Sylvain Lebresne made this handy tool in python which does the job for you. https://github.com/pcmanus/ccm to run as a service you need a script like this http://www.bajb.net/2012/01/cassandra-service-script/ I haven’t tried this, I just run Cassandra in the foreground of a screen session. *From:* Boole.Z.Guo (mis.cnsh04.Newegg) 41442 [ mailto:boole.z@newegg.com boole.z@newegg.com] *Sent:* 12 November 2013 05:17 *To:* user@cassandra.apache.org *Subject:* How to configure linux service for Cassandra? How to configure linux service for Cassandra or start multiple Cassandra nodes from a single node? Thanks very muh! Best Regards, *Boole Guo*
Re: Choosing python client lib for Cassandra
Michael - thanks. Have you tried batching and thread pooling in python-driver? For now, i would avoid object mapper cqlengine, just because of my deadlines. — Sent from Mailbox for iPhone On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com wrote: We use the python-driver and have contributed some to its development. I have been careful to not push too fast on features until we need them. For example, we have just started using prepared statements - working well BTW. Next we will employ futures and start to exploit the async nature of new interface to C*. We are very familiar with libev in both C and python, and are happy to dig into the code to add features and fix bugs as needed, so the rewards of bypassing the old and focusing on the new seem worth the risks to us. ml On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.com wrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing list: https://groups.google.com/forum/#!forum/cqlengine-users If you want to roll your own solution, it might make sense to take an approach like we did and throw a layer on top of thrift so you don't have to do a massive rewrite of your entire app once you want to go native. Jon On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.comwrote: I have worked with Pycassa before and wrote a wrapper to use batch mutation connection pooling etc. But http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL 3 based api because Thrift based api (Pycassa) will be supported for backward compatibility only. Apache site recommends to use Python api written by DataStax which is still in Beta (As per their documentation). See warnings from their python-driver/README.rst file *Warning* This driver is currently under heavy development, so the API and layout of packages,modules, classes, and functions are subject to change. There may also be serious bugs, so usage in a production environment is *not* recommended at this time. DataStax site http://www.datastax.com/download/clientdrivers recommends using DB-API 2.0 plus legacy api's. Is there more? Has any one compared between CQL 3 based apis? Which stands out on top? Answers based on facts will help the community so please refrain from opinions. Please help ?? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Choosing python client lib for Cassandra
Jon - Thanks. As I understand, cqlengine is an object mapper and must be using for cql prepare statements. What are you wrapping it with, in alternative to python-driver? — Sent from Mailbox for iPhone On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad j...@jonhaddad.com wrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing list: https://groups.google.com/forum/#!forum/cqlengine-users If you want to roll your own solution, it might make sense to take an approach like we did and throw a layer on top of thrift so you don't have to do a massive rewrite of your entire app once you want to go native. Jon On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.com wrote: I have worked with Pycassa before and wrote a wrapper to use batch mutation connection pooling etc. But http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL 3 based api because Thrift based api (Pycassa) will be supported for backward compatibility only. Apache site recommends to use Python api written by DataStax which is still in Beta (As per their documentation). See warnings from their python-driver/README.rst file *Warning* This driver is currently under heavy development, so the API and layout of packages,modules, classes, and functions are subject to change. There may also be serious bugs, so usage in a production environment is *not* recommended at this time. DataStax site http://www.datastax.com/download/clientdrivers recommends using DB-API 2.0 plus legacy api's. Is there more? Has any one compared between CQL 3 based apis? Which stands out on top? Answers based on facts will help the community so please refrain from opinions. Please help ?? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Choosing python client lib for Cassandra
Hi Jon - you are right. Its that I understand other ORM like python sqlalchemy or perl DBIX by heart. So i can cql faster than use cqlengine. I will give a shot at python-driver based on Michael's recommendation. — Sent from Mailbox for iPhone On Tue, Nov 26, 2013 at 2:21 PM, Jonathan Haddad j...@jonhaddad.com wrote: We're currently using the cql package, which is really a wrapper around thrift. To your concern about deadlines, I'm not sure how writing raw CQL is going to be any faster than using a mapper library for anything other than the most trivial of project. On Tue, Nov 26, 2013 at 11:09 AM, Kumar Ranjan winnerd...@gmail.com wrote: Jon - Thanks. As I understand, cqlengine is an object mapper and must be using for cql prepare statements. What are you wrapping it with, in alternative to python-driver? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad j...@jonhaddad.comwrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing list: https://groups.google.com/forum/#!forum/cqlengine-users If you want to roll your own solution, it might make sense to take an approach like we did and throw a layer on top of thrift so you don't have to do a massive rewrite of your entire app once you want to go native. Jon On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.comwrote: I have worked with Pycassa before and wrote a wrapper to use batch mutation connection pooling etc. But http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL 3 based api because Thrift based api (Pycassa) will be supported for backward compatibility only. Apache site recommends to use Python api written by DataStax which is still in Beta (As per their documentation). See warnings from their python-driver/README.rst file *Warning* This driver is currently under heavy development, so the API and layout of packages,modules, classes, and functions are subject to change. There may also be serious bugs, so usage in a production environment is *not* recommended at this time. DataStax site http://www.datastax.com/download/clientdrivers recommends using DB-API 2.0 plus legacy api's. Is there more? Has any one compared between CQL 3 based apis? Which stands out on top? Answers based on facts will help the community so please refrain from opinions. Please help ?? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Choosing python client lib for Cassandra
Jon - Any comment on batching? — Sent from Mailbox for iPhone On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael michael.la...@nytimes.com wrote: That's not a problem we have faced yet. On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.com wrote: How do you insert huge amount of data? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael michael.la...@nytimes.com wrote: I think thread pooling is always in operation - and we haven't seen any problems in that regard going to the 6 local nodes each client connects to. We haven't tried batching yet. On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan winnerd...@gmail.comwrote: Michael - thanks. Have you tried batching and thread pooling in python-driver? For now, i would avoid object mapper cqlengine, just because of my deadlines. — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com wrote: We use the python-driver and have contributed some to its development. I have been careful to not push too fast on features until we need them. For example, we have just started using prepared statements - working well BTW. Next we will employ futures and start to exploit the async nature of new interface to C*. We are very familiar with libev in both C and python, and are happy to dig into the code to add features and fix bugs as needed, so the rewards of bypassing the old and focusing on the new seem worth the risks to us. ml On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.comwrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing list: https://groups.google.com/forum/#!forum/cqlengine-users If you want to roll your own solution, it might make sense to take an approach like we did and throw a layer on top of thrift so you don't have to do a massive rewrite of your entire app once you want to go native. Jon On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.comwrote: I have worked with Pycassa before and wrote a wrapper to use batch mutation connection pooling etc. But http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL 3 based api because Thrift based api (Pycassa) will be supported for backward compatibility only. Apache site recommends to use Python api written by DataStax which is still in Beta (As per their documentation). See warnings from their python-driver/README.rst file *Warning* This driver is currently under heavy development, so the API and layout of packages,modules, classes, and functions are subject to change. There may also be serious bugs, so usage in a production environment is *not* recommended at this time. DataStax site http://www.datastax.com/download/clientdrivers recommends using DB-API 2.0 plus legacy api's. Is there more? Has any one compared between CQL 3 based apis? Which stands out on top? Answers based on facts will help the community so please refrain from opinions. Please help ?? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Choosing python client lib for Cassandra
Hi Jonathan - Does cqlengine have support for python 2.6 ? On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad j...@jonhaddad.com wrote: cqlengine supports batch queries, see the docs here: http://cqlengine.readthedocs.org/en/latest/topics/queryset.html#batch-queries On Tue, Nov 26, 2013 at 11:53 AM, Kumar Ranjan winnerd...@gmail.comwrote: Jon - Any comment on batching? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael michael.la...@nytimes.com wrote: That's not a problem we have faced yet. On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.comwrote: How do you insert huge amount of data? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael michael.la...@nytimes.com wrote: I think thread pooling is always in operation - and we haven't seen any problems in that regard going to the 6 local nodes each client connects to. We haven't tried batching yet. On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan winnerd...@gmail.comwrote: Michael - thanks. Have you tried batching and thread pooling in python-driver? For now, i would avoid object mapper cqlengine, just because of my deadlines. — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com wrote: We use the python-driver and have contributed some to its development. I have been careful to not push too fast on features until we need them. For example, we have just started using prepared statements - working well BTW. Next we will employ futures and start to exploit the async nature of new interface to C*. We are very familiar with libev in both C and python, and are happy to dig into the code to add features and fix bugs as needed, so the rewards of bypassing the old and focusing on the new seem worth the risks to us. ml On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.comwrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing list: https://groups.google.com/forum/#!forum/cqlengine-users If you want to roll your own solution, it might make sense to take an approach like we did and throw a layer on top of thrift so you don't have to do a massive rewrite of your entire app once you want to go native. Jon On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.com wrote: I have worked with Pycassa before and wrote a wrapper to use batch mutation connection pooling etc. But http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL 3 based api because Thrift based api (Pycassa) will be supported for backward compatibility only. Apache site recommends to use Python api written by DataStax which is still in Beta (As per their documentation). See warnings from their python-driver/README.rst file *Warning* This driver is currently under heavy development, so the API and layout of packages,modules, classes, and functions are subject to change. There may also be serious bugs, so usage in a production environment is *not* recommended at this time. DataStax site http://www.datastax.com/download/clientdrivers recommends using DB-API 2.0 plus legacy api's. Is there more? Has any one compared between CQL 3 based apis? Which stands out on top? Answers based on facts will help the community so please refrain from opinions. Please help ?? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Choosing python client lib for Cassandra
Thanks Jonathan for the help. On Tue, Nov 26, 2013 at 6:14 PM, Jonathan Haddad j...@jonhaddad.com wrote: No, 2.7 only. On Tue, Nov 26, 2013 at 3:04 PM, Kumar Ranjan winnerd...@gmail.comwrote: Hi Jonathan - Does cqlengine have support for python 2.6 ? On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad j...@jonhaddad.comwrote: cqlengine supports batch queries, see the docs here: http://cqlengine.readthedocs.org/en/latest/topics/queryset.html#batch-queries On Tue, Nov 26, 2013 at 11:53 AM, Kumar Ranjan winnerd...@gmail.comwrote: Jon - Any comment on batching? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 2:52 PM, Laing, Michael michael.la...@nytimes.com wrote: That's not a problem we have faced yet. On Tue, Nov 26, 2013 at 2:46 PM, Kumar Ranjan winnerd...@gmail.comwrote: How do you insert huge amount of data? — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 2:31 PM, Laing, Michael michael.la...@nytimes.com wrote: I think thread pooling is always in operation - and we haven't seen any problems in that regard going to the 6 local nodes each client connects to. We haven't tried batching yet. On Tue, Nov 26, 2013 at 2:05 PM, Kumar Ranjan winnerd...@gmail.comwrote: Michael - thanks. Have you tried batching and thread pooling in python-driver? For now, i would avoid object mapper cqlengine, just because of my deadlines. — Sent from Mailbox https://www.dropbox.com/mailbox for iPhone On Tue, Nov 26, 2013 at 1:52 PM, Laing, Michael michael.la...@nytimes.com wrote: We use the python-driver and have contributed some to its development. I have been careful to not push too fast on features until we need them. For example, we have just started using prepared statements - working well BTW. Next we will employ futures and start to exploit the async nature of new interface to C*. We are very familiar with libev in both C and python, and are happy to dig into the code to add features and fix bugs as needed, so the rewards of bypassing the old and focusing on the new seem worth the risks to us. ml On Tue, Nov 26, 2013 at 1:16 PM, Jonathan Haddad j...@jonhaddad.com wrote: So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing list: https://groups.google.com/forum/#!forum/cqlengine-users If you want to roll your own solution, it might make sense to take an approach like we did and throw a layer on top of thrift so you don't have to do a massive rewrite of your entire app once you want to go native. Jon On Tue, Nov 26, 2013 at 9:46 AM, Kumar Ranjan winnerd...@gmail.com wrote: I have worked with Pycassa before and wrote a wrapper to use batch mutation connection pooling etc. But http://wiki.apache.org/cassandra/ClientOptions recommends now to use CQL 3 based api because Thrift based api (Pycassa) will be supported for backward compatibility only. Apache site recommends to use Python api written by DataStax which is still in Beta (As per their documentation). See warnings from their python-driver/README.rst file *Warning* This driver is currently under heavy development, so the API and layout of packages,modules, classes, and functions are subject to change. There may also be serious bugs, so usage in a production environment is *not* recommended at this time. DataStax site http://www.datastax.com/download/clientdrivers recommends using DB-API 2.0 plus legacy api's. Is there more? Has any one compared between CQL 3 based apis? Which stands out on top? Answers based on facts will help the community so please refrain from opinions. Please help ?? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Exporting all data within a keyspace
Try sstable2json and json2sstable. But it works on column family so you can fetch all column family and iterate over list of CF and use sstable2json tool to extract data. Remember this will only fetch on disk data do anything in memtable/cache which is to be flushed will be missed. So run compaction and then run the written script. On Tuesday, April 30, 2013, Chidambaran Subramanian wrote: Is there any easy way of exporting all data for a keyspace (and conversely) importing it. Regards Chiddu