Re: CRUD test
Have you checked the timestamp you're using for the subsequent inserts is higher than that used in the delete? On Thu, Jul 22, 2010 at 2:29 AM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote: Hi there, I'm trying to implement a simple CRUD service based on Cassandra. I use Hector client. While writing tests, I found out that if I create a few columns using API, then delete them from cassandra-cli and and re-create them using the same code (same key, etc), I can never get these new columns back using cassandra-cli. I tried to set different consistency levels but it did not change anything. I am never able to insert to these columns again from my code, although cassandra-cli can insert them. I thought it might have something to do with eventual consistency but even after waiting hours, nothing changes. I have only one node (one cassandra sever) running on 64-bit Ubunutu, if it matters. I added my keyspace and couple of column families pretty much following defaults in storage-conf.xml. Thanks, Oleg -- Maybe she awoke to see the roommate's boyfriend swinging from the chandelier wearing a boar's head. Something which you, I, and everyone else would call Tuesday, of course.
Re: goods search with cassandra
Thanks for your suggestion. Does it work if insertion through thrift client, and reading through cassandra directly like ClientOnlyExample? 2010/7/21 Santal Li santal...@gmail.com I think build a ColumnValueFilter isn't a good idea, you really needs was a self defined index, otherwise filter will cause too many scan and disk IO. we have meet almost same problem as yours in our own webapp: store data in one fields, then get data by search on another fields. Our solution is create a new KeySpace for index, them maintains the index by query conditions at application. Suggest you read this document, for get basic idea http://code.google.com/intl/zh-CN/appengine/articles/index_building.html . if you using this solution, maybe you need consider bellow issue: 1. multi client concurrent access 2. index and object data maybe inconsistence during error. Some kind of lock service maybe help, like ZooKeeper. Regards -Santal 2010/7/19 Chen Xinli chen.d...@gmail.com Hi, I want to implement goods search with cassandra; and I have some confusings. Can someone help me out? The case is that: There are about 1 million shops, every shop with about 10,000 goods, every goods with property like title, price etc.. The search is like give me 10 goods in a specific shop and the price of the goods should be less than 10$ For the data model, I use shop name as the key; goods id as the column name and title, price are special encoded as column value . There are too many goods in one shop, filtering the data in thrift client is impossible for network transferring reason. I want to implement a special ColumnValueFilter extends QueryFilter to get the result in local. Is this the best way? Insertion of goods is about 100/second for the whole cluster, so a thrift client for insertion is ok. For reads, latency and qps are important and I must provide a http service for user searching. Embedding a thrift client in such a service will involve another network transferring, so I want to build the service on top of cassandra directly. I reviewed the code of ClientOnlyExample.java. What makes me confusing is that: insertion through thrift client and reading through using cassandra directly, is data consistency promised and how? Any help is appreciated. Thanks! -- Best Regards, Chen Xinli -- Best Regards, Chen Xinli
Re: Cassandra committing massive virtual memory
We are seeing cassandra using very high virtual memory. One server in the cluster shows 90GB and the other shows about 70GB of committed virtual memory. The real memory used is less than 1GB. The Xmx is 4GB. The physical memory on the machine is 32GB and the swap space on the machine is around 20GB. I understand that the virtual memory is not really used by the process and is just reserved. My question is why does cassandra need to reserve such huge memory. What is the logic that is used internally. How is it related to usage pattern or data storage pattern. Should we be concerned? Virtual size in and of itself is not a concern because it is mmap():ed address space. AFAIK you may see sizes up to the entire database size. What people have had issues with is swapping resulting from the operating system considering mmap():ed area contributing to memory pressure and swapping out the Java heap. But unless you have issues like that, I don't believe the virtual address space size in and of itself should be a concern. -- / Peter Schuller
Re: Bootstrap question
On Wed, Jul 21, 2010 at 14:14, Anthony Molinaro antho...@alumni.caltech.edu wrote: Sure, looks like that's in 0.6.4, so I'll probably just rebuild my server based on the 0.6 branch, unless you want me to test just the patch for 1221? Most likely won't get a chance to try until tomorrow, so let me know. Either way works for me.
Re: Cassandra committing massive virtual memory
The DiskAccessMode is set to auto. I am going to try with standard and see, but i am concerned how much that would affect performance negatively. thx Amit On Wed, Jul 21, 2010 at 3:59 PM, Aaron Morton aa...@thelastpickle.comwrote: Are you using mmap or auto DiskAccessMode ? There is a known issue with memory mapped file access taking up a lot of memory. See CASSANDRA-1214 https://issues.apache.org/jira/browse/CASSANDRA-1214 there is also some discussion in the mail list here. Try setting the DiskAccessMode to standard Aaron On 22 Jul, 2010,at 10:37 AM, Amit Sinha amitabhs2...@gmail.com wrote: We are seeing cassandra using very high virtual memory. One server in the cluster shows 90GB and the other shows about 70GB of committed virtual memory. The real memory used is less than 1GB. The Xmx is 4GB. The physical memory on the machine is 32GB and the swap space on the machine is around 20GB. I understand that the virtual memory is not really used by the process and is just reserved. My question is why does cassandra need to reserve such huge memory. What is the logic that is used internally. How is it related to usage pattern or data storage pattern. Should we be concerned? thanks Amit -- thanks Amitabh
Lucene CassandraDirectory Implementation
Hi All, I was browsing through the Lucene JIRA and came across the issue named A Column-Oriented Cassandra-Based Lucene Directory at https://issues.apache.org/jira/browse/LUCENE-2456 Has anyone had a chance to test it? If so, do you think it's an efficient implementation as a replacement for the FSDirectory? Best Regards, Utku
Re: Is it possible to read from one row, several SuperColumns with all their Columns in one call?
Hi Aaron, the problem I have is that those UUIDs are random numbers. 2,3,4 are not sequential unfortunately. I don't think there is an API like mutiget_slice for key but for Super Column names. Is there any other way to specify a list of super column names to read where those names are not sequential? thanks On Wed, Jul 21, 2010 at 5:58 PM, Aaron Morton aa...@thelastpickle.com wrote: Take a look at the get_slice function http://wiki.apache.org/cassandra/API You could send one with a ColumnParent that only specifies the ColumnFamily and a SlicePredicate with a SliceRange where the start and finish values are empty strings. Set the count to an appropriate level to get them all (e.g. 1000) or make multiple calls. Aaron On 22 Jul, 2010,at 12:05 PM, Patricio Echagüe patric...@gmail.com wrote: Hi all, apologize before hand if this question sounds a bit dumb but I don't see what API I should use (if there is any) this is my model: row key A: SC_UUID_1: Col1, Col2, Col3 SC_UUID_2: Col1, Col2, Col3 SC_UUID_3: Col1, Col2, Col3 SC_UUID_4: Col1, Col2, Col3 SC_UUID_5: Col1, Col2, Col3 SC_UUID(i) are random UUIDs. Is there any way to read in one call from row A, the SC 2,3,4 with all their columns? The amount of columns in every SuperColumn is not big at all. normally less than 20. Thanks -- Patricio- -- Patricio.-
Requesting data model suggestions
Hello, Although, I've done a bit of reading about Cassandra's data model and I've set up a Cassandra pair, I'm still unsure as to what might be best for my purposes. Briefly, I've got a set of strings A, B, and C. If needed, A could be represented as an integer. Each A is associated with exactly one B or C (but not both). A also has a number of parameters associated with it which change over time. These changes, however, are reported with the B or C identifier. Currently, in mysql, I have three tables, A', B', and C' each using A, B, and C as keys. When an update arrives, the code searches for the key (B or C) in the appropriate table, determines the associated A (using a foreign key), and then updates the values in the table row in A' with key=A. Anyone have ideas about how to model this in Cassandra? Thanks! -- -Scott
Re: Script 'hangs' when i stop 1 cassandra node (of 4 nodes)
I think you need to narrow down your problem before we can help. :) On Tue, Jul 20, 2010 at 7:03 AM, Pieter Maes maesc...@gmail.com wrote: Hi, I'm currently using Cassandra 0.6.3 with php thrift (svn r959516) in the phpcassa wrapper (last git + a fix of mine that fixes strange timeouts..). (yeah i use php, don't shoot me for it) (i also mailed that mailing list, but no answer yet from there) When i was running my migration script 1 php script that fetches data from mySQL to add it to Cassandra. (first checks if data exists before each insert) When i stop 1 of the 4 nodes (replication factor 3) the script just hangs.. i don't get any exceptions or timeouts.. anyone any idea? or how i can debug this? (bug very easy to replicate) Best regards Pieter Maes -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Cassandra Chef recipe and EC2 snitch
Hi all, I'm setting up a new cluster on EC2 for the first time and looking at the wiki cloud setup page (http://wiki.apache.org/cassandra/CloudConfig). There's a chef recipe linked there that mentions an ec2snitch. The link doesn't seem to go where it says it does. Does anyone know where those resources have gone or are they no longer available? Thanks -Allan
Re: Cassandra Chef recipe and EC2 snitch
You don't need the ec2snitch necessarily. AFAIK, It's meant to be a better way of detecting where your ec2 instances are. But, unless you're popping instances all the time, I don't think it's worth it. Check out the step-by-step guide on that same page. Pure EC2 api calls to setup your cluster. You can also use rackaware-ness in EC2. Just add in the PropertyFile endpoint and put your rack file in /etc/cassandra/rack.properties. Dave Viner On Thu, Jul 22, 2010 at 10:08 AM, Allan Carroll alla...@gmail.com wrote: Hi all, I'm setting up a new cluster on EC2 for the first time and looking at the wiki cloud setup page (http://wiki.apache.org/cassandra/CloudConfig). There's a chef recipe linked there that mentions an ec2snitch. The link doesn't seem to go where it says it does. Does anyone know where those resources have gone or are they no longer available? Thanks -Allan
Re: Is it possible to read from one row, several SuperColumns with all their Columns in one call?
Hey thanks Aaron. It finally worked. The API reference looked a bit confusing to me. I used (as you suggested): ColumnParent parent = new ColumnParent( ColumnFamily name); SlicePredicate sp = new SlicePredicate(); sp.setColumn_names( list of super column names); after calling get_slice() as result, i got the super columns I passed in the list with ALLl their columns in one call. so now, just IF I WANTED TO, would it be possible to get just a subset of columns contained in those supercolumns instead of all of them assuming that the amount of columns are lot and for performance issues i want to avoid moving a lot of data over the network? 2010/7/22 Patricio Echagüe patric...@gmail.com: Hi Aaron, the problem I have is that those UUIDs are random numbers. 2,3,4 are not sequential unfortunately. I don't think there is an API like mutiget_slice for key but for Super Column names. Is there any other way to specify a list of super column names to read where those names are not sequential? thanks On Wed, Jul 21, 2010 at 5:58 PM, Aaron Morton aa...@thelastpickle.com wrote: Take a look at the get_slice function http://wiki.apache.org/cassandra/API You could send one with a ColumnParent that only specifies the ColumnFamily and a SlicePredicate with a SliceRange where the start and finish values are empty strings. Set the count to an appropriate level to get them all (e.g. 1000) or make multiple calls. Aaron On 22 Jul, 2010,at 12:05 PM, Patricio Echagüe patric...@gmail.com wrote: Hi all, apologize before hand if this question sounds a bit dumb but I don't see what API I should use (if there is any) this is my model: row key A: SC_UUID_1: Col1, Col2, Col3 SC_UUID_2: Col1, Col2, Col3 SC_UUID_3: Col1, Col2, Col3 SC_UUID_4: Col1, Col2, Col3 SC_UUID_5: Col1, Col2, Col3 SC_UUID(i) are random UUIDs. Is there any way to read in one call from row A, the SC 2,3,4 with all their columns? The amount of columns in every SuperColumn is not big at all. normally less than 20. Thanks -- Patricio- -- Patricio.- -- Patricio.-
Re: CRUD test
Yes, and that was the issue. But now after I delete a row from cassandra-cli, I cannot insert anything back with my code. Insert code works does not throw any exceptions but when I read just inserted columns I get NotFoundException at the last line: client = borrowClient(); Keyspace keyspace = client.getKeyspace(KEYSPACE, CONSISTENCY_LEVEL); ColumnPath cp = new ColumnPath(application); cp.setSuper_column(uuid.getBytes()); SuperColumn sc = keyspace.getSuperColumn(category, cp); It makes me think that once I remove supercolumn it cannot be created again. On Jul 22, 2010 1:13 AM, Colin Vipurs zodiac...@gmail.com wrote: Have you checked the timestamp you're using for the subsequent inserts is higher than that used in the delete? On Thu, Jul 22, 2010 at 2:29 AM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote: Hi there, I'm try... -- Maybe she awoke to see the roommate's boyfriend swinging from the chandelier wearing a boar's head. Something which you, I, and everyone else would call Tuesday, of course.
RE: CRUD test
I am able to reproduce his problem. If you take the default storage-conf.xml file and utilize the Super2 ColumnFamily with the code below. You will see that the data is not getting created once you run the delete. It seems to not allow you to create data via Thrift. HOWEVER, data can be created via the command line tool. import java.io.UnsupportedEncodingException; import java.util.List; import org.apache.cassandra.thrift.Cassandra; import org.apache.cassandra.thrift.Column; import org.apache.cassandra.thrift.ColumnOrSuperColumn; import org.apache.cassandra.thrift.ColumnParent; import org.apache.cassandra.thrift.ColumnPath; import org.apache.cassandra.thrift.ConsistencyLevel; import org.apache.cassandra.thrift.InvalidRequestException; import org.apache.cassandra.thrift.NotFoundException; import org.apache.cassandra.thrift.SlicePredicate; import org.apache.cassandra.thrift.SliceRange; import org.apache.cassandra.thrift.SuperColumn; import org.apache.cassandra.thrift.TimedOutException; import org.apache.cassandra.thrift.UnavailableException; import org.apache.thrift.TException; import org.apache.thrift.protocol.TBinaryProtocol; import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.transport.TSocket; import org.apache.thrift.transport.TTransport; public class CrudTest { private static final String KEYSPACE = Keyspace1; public static void main(String[] args) { CrudTest client = new CrudTest(); try { client.run(); } catch (Exception e) { e.printStackTrace(); } } public void run() throws TException, InvalidRequestException, UnavailableException, UnsupportedEncodingException, NotFoundException, TimedOutException { TTransport tr = new TSocket(localhost, 9160); TProtocol proto = new TBinaryProtocol(tr); Cassandra.Client client = new Cassandra.Client(proto); tr.open(); System.out.println( CREATING DATA *); createData(client); getData(client); System.out.println(); System.out.println( DELETING DATA *); deleteData(client); getData(client); System.out.println(); System.out.println( CREATING DATA *); createData(client); getData(client); tr.close(); } private void createData(Cassandra.Client client) throws InvalidRequestException, UnavailableException, TimedOutException, TException { ColumnPath cp1 = new ColumnPath(Super2); cp1.setSuper_column(hotel.getBytes()); cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE, name, cp1, Best Western of SF.getBytes(), System.currentTimeMillis(), ConsistencyLevel.ALL); ColumnPath cp2 = new ColumnPath(Super2); cp2.setSuper_column(hotel.getBytes()); cp2.setColumn(Econolodge.getBytes()); client.insert(KEYSPACE, name, cp2, Econolodge of SF.getBytes(), System.currentTimeMillis(), ConsistencyLevel.ALL); } private void deleteData(Cassandra.Client client) throws InvalidRequestException, UnavailableException, TimedOutException, TException { client.remove(KEYSPACE, hotel, new ColumnPath(Super2), System.currentTimeMillis(), ConsistencyLevel.ONE); } private void getData(Cassandra.Client client) throws InvalidRequestException, UnavailableException, TimedOutException, TException { SliceRange sliceRange = new SliceRange(); sliceRange.setStart(new byte[] {}); sliceRange.setFinish(new byte[] {}); SlicePredicate slicePredicate = new SlicePredicate(); slicePredicate.setSlice_range(sliceRange); getData(client, slicePredicate); } private void getData(Cassandra.Client client, SlicePredicate slicePredicate) throws InvalidRequestException, UnavailableException, TimedOutException, TException {
Re: Re: Re: What is consuming the heap?
The version we are using is 0.6.1 2010-07-23 发件人: 王一锋 发送时间: 2010-07-23 09:38:15 收件人: user 抄送: 主题: Re: Re: Re: What is consuming the heap? Yes, we are doing a lot of inserts. But how can CASSANDRA-1042 cause an OutOfMemory? And we are using multigetSlice(). We are not doing any get_range_slice() at all. 2010-07-23 发件人: Jonathan Ellis 发送时间: 2010-07-21 21:17:21 收件人: user 抄送: 主题: Re: Re: What is consuming the heap? On Tue, Jul 20, 2010 at 11:33 PM, Peter Schuller peter.schul...@infidyne.com wrote: INFO [GC inspection] 2010-07-21 01:01:49,661 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 11748 ms, 413673472 reclaimed leaving 9779542600 used; max is 10873667584 ERROR [Thread-35] 2010-07-21 01:02:10,941 CassandraDaemon.java (line 78) Fatal exception in thread Thread[Thread-35,5,main] java.lang.OutOfMemoryError: Java heap space INFO [GC inspection] 2010-07-21 01:02:10,958 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 10043 ms, 259576 reclaimed leaving 10172790816 used; max is 10873667584 So that confirms a legitimate out-of-memory condition in the sense that CMS is reclaiming extremely little and the live set after a concurrent mark/sweep is indeed around the 10 gig. Are you doing a lot of inserts? You might be hitting https://issues.apache.org/jira/browse/CASSANDRA-1042 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com