Re: CRUD test

2010-07-22 Thread Colin Vipurs
Have you checked the timestamp you're using for the subsequent inserts
is higher than that used in the delete?

On Thu, Jul 22, 2010 at 2:29 AM, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
 Hi there,
 I'm trying to implement a simple CRUD service based on Cassandra. I use
 Hector client.
 While writing tests, I found out that if I create a few columns using API,
 then delete them from cassandra-cli and and re-create them using the same
 code (same key, etc), I can never get  these new columns back using
 cassandra-cli. I tried to set different consistency levels but it did not
 change anything. I am never able to insert to these columns again from my
 code, although cassandra-cli can insert them.
 I   thought it might have something to do with eventual consistency but even
 after waiting hours, nothing changes.
 I have only one node (one cassandra sever) running on 64-bit Ubunutu, if it
 matters. I added my keyspace and couple of column families pretty much
 following defaults in storage-conf.xml.
 Thanks,
   Oleg




-- 
Maybe she awoke to see the roommate's boyfriend swinging from the
chandelier wearing a boar's head.

Something which you, I, and everyone else would call Tuesday, of course.


Re: goods search with cassandra

2010-07-22 Thread Chen Xinli
Thanks for your suggestion.

Does it work if insertion through thrift client, and reading through
cassandra directly like ClientOnlyExample?

2010/7/21 Santal Li santal...@gmail.com

 I think build a ColumnValueFilter isn't a good idea, you really needs was a
 self defined index, otherwise filter will cause too many scan and disk IO.

 we have meet almost same problem as yours in our own webapp: store data in
 one fields, then get data by search on another fields. Our solution is
 create a new KeySpace for index, them maintains the index by query
 conditions at application. Suggest you read this document, for get
 basic idea
 http://code.google.com/intl/zh-CN/appengine/articles/index_building.html .

 if you using this solution, maybe you need consider bellow issue:
 1. multi client concurrent access
 2. index and object data maybe inconsistence during error.

 Some kind of lock service maybe help, like ZooKeeper.

 Regards
 -Santal



 2010/7/19 Chen Xinli chen.d...@gmail.com

 Hi,

 I want to implement goods search with cassandra; and I have some
 confusings. Can someone help me out?

 The case is that:
 There are about 1 million shops, every shop with about 10,000 goods, every
 goods with property like title, price etc..
 The search is like give me 10 goods in a specific shop and the price of
 the goods should be less than  10$

 For the data model, I use shop name as the key; goods id as the column
 name and title, price are special encoded as column value .
 There are too many goods in one shop, filtering the data in thrift client
 is impossible for network transferring reason.
 I want to implement a special ColumnValueFilter extends QueryFilter to get
 the result in local.
 Is this the best way?


 Insertion of goods is about 100/second for the whole cluster, so a thrift
 client for insertion is ok.
 For reads, latency and qps are important and I must provide a http service
 for user searching.
 Embedding a thrift client in such a service will involve another network
 transferring, so I want to build the service on top of cassandra directly.
 I reviewed the code of ClientOnlyExample.java.
 What makes me confusing is that: insertion through thrift client and
 reading through using cassandra directly, is data consistency promised and
 how?

 Any help is appreciated. Thanks!

 --
 Best Regards,
 Chen Xinli





-- 
Best Regards,
Chen Xinli


Re: Cassandra committing massive virtual memory

2010-07-22 Thread Peter Schuller
 We are seeing cassandra using very high virtual memory. One server in the
 cluster shows 90GB and the other shows about 70GB of committed virtual
 memory.
 The real memory used is less than 1GB. The Xmx is 4GB. The physical memory
 on the machine is 32GB and the swap space on the machine is around 20GB.

 I understand that the virtual memory is not really used by the process and
 is just reserved. My question is why does cassandra need to reserve such
 huge memory. What is the logic that is used internally. How is it related to
 usage pattern or data storage pattern. Should we be concerned?

Virtual size in and of itself is not a concern because it is mmap():ed
address space. AFAIK you may see sizes up to the entire database size.

What people have had issues with is swapping resulting from the
operating system considering mmap():ed area contributing to memory
pressure and swapping out the Java heap. But unless you have issues
like that, I don't believe the virtual address space size in and of
itself should be a concern.

-- 
/ Peter Schuller


Re: Bootstrap question

2010-07-22 Thread Gary Dusbabek
On Wed, Jul 21, 2010 at 14:14, Anthony Molinaro
antho...@alumni.caltech.edu wrote:
 Sure, looks like that's in 0.6.4, so I'll probably just rebuild my server
 based on the 0.6 branch, unless you want me to test just the patch for
 1221?  Most likely won't get a chance to try until tomorrow, so let me
 know.


Either way works for me.


Re: Cassandra committing massive virtual memory

2010-07-22 Thread Amit Sinha
The DiskAccessMode is set to auto.
I am going to try with standard and see, but i am concerned how much that
would affect performance negatively.

thx
Amit

On Wed, Jul 21, 2010 at 3:59 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Are you using mmap or auto DiskAccessMode ?

 There is a known issue with memory mapped file access taking up a lot of
 memory. See CASSANDRA-1214
 https://issues.apache.org/jira/browse/CASSANDRA-1214 there is also some
 discussion in the mail list here.

 Try setting the DiskAccessMode to standard

 Aaron


 On 22 Jul, 2010,at 10:37 AM, Amit Sinha amitabhs2...@gmail.com wrote:

 We are seeing cassandra using very high virtual memory. One server in the
 cluster shows 90GB and the other shows about 70GB of committed virtual
 memory.
 The real memory used is less than 1GB. The Xmx is 4GB. The physical memory
 on the machine is 32GB and the swap space on the machine is around 20GB.

 I understand that the virtual memory is not really used by the process and
 is just reserved. My question is why does cassandra need to reserve such
 huge memory. What is the logic that is used internally. How is it related to
 usage pattern or data storage pattern. Should we be concerned?

 thanks
 Amit






-- 
thanks
Amitabh


Lucene CassandraDirectory Implementation

2010-07-22 Thread Utku Can Topçu
Hi All,

I was browsing through the Lucene JIRA and came across the issue named A
Column-Oriented Cassandra-Based Lucene Directory at
https://issues.apache.org/jira/browse/LUCENE-2456

Has anyone had a chance to test it? If so, do you think it's an efficient
implementation as a replacement for the FSDirectory?

Best Regards,

Utku


Re: Is it possible to read from one row, several SuperColumns with all their Columns in one call?

2010-07-22 Thread Patricio Echagüe
Hi Aaron, the problem I have is that those UUIDs are random numbers.
2,3,4 are not sequential unfortunately. I don't think there is an API
like mutiget_slice for key but for Super Column names.

Is there any other way to specify a list of super column names to read
where those names are not sequential?

thanks


On Wed, Jul 21, 2010 at 5:58 PM, Aaron Morton aa...@thelastpickle.com wrote:
 Take a look at the  get_slice  function http://wiki.apache.org/cassandra/API

 You could send one with a ColumnParent that only specifies the ColumnFamily
 and a SlicePredicate with a SliceRange where the start and finish values are
 empty strings. Set the count to an appropriate level to get them all (e.g.
 1000) or make multiple calls.


 Aaron


 On 22 Jul, 2010,at 12:05 PM, Patricio Echagüe patric...@gmail.com wrote:

 Hi all, apologize before hand if this question sounds a bit dumb but I
 don't see what API I should use (if there is any)

 this is my model:

 row key A:
 SC_UUID_1:
 Col1, Col2, Col3
 SC_UUID_2:
 Col1, Col2, Col3
 SC_UUID_3:
 Col1, Col2, Col3
 SC_UUID_4:
 Col1, Col2, Col3
 SC_UUID_5:
 Col1, Col2, Col3

 SC_UUID(i) are random UUIDs.

 Is there any way to read in one call from row A, the SC 2,3,4 with all
 their columns? The amount of columns in every SuperColumn is not big
 at all. normally less than 20.

 Thanks
 --
 Patricio-




-- 
Patricio.-


Requesting data model suggestions

2010-07-22 Thread Scott Mann
Hello,

Although,  I've done a bit of reading about Cassandra's data model and
I've set up a Cassandra pair, I'm still unsure as to what might be
best for my purposes.

Briefly, I've got a set of strings A, B, and C. If needed, A could be
represented as an integer. Each A is associated with exactly one B or
C (but not both). A also has a number of parameters associated with it
which change over time. These changes, however, are reported with the
B or C identifier.

Currently, in mysql, I have three tables, A', B', and C' each using A,
B, and C as keys. When an update arrives, the code searches for the
key (B or C) in the appropriate table, determines the associated A
(using a foreign key), and then updates the values in the table row in
A' with key=A.

Anyone have ideas about how to model this in Cassandra?

Thanks!

-- 
-Scott


Re: Script 'hangs' when i stop 1 cassandra node (of 4 nodes)

2010-07-22 Thread Jonathan Ellis
I think you need to narrow down your problem before we can help. :)

On Tue, Jul 20, 2010 at 7:03 AM, Pieter Maes maesc...@gmail.com wrote:
  Hi,

 I'm currently using Cassandra 0.6.3 with php thrift (svn r959516) in the
 phpcassa wrapper (last git + a fix of mine that fixes strange
 timeouts..). (yeah i use php, don't shoot me for it)
 (i also mailed that mailing list, but no answer yet from there)

 When i was running my migration script 1 php script that fetches data
 from mySQL to add it to Cassandra. (first checks if data exists before
 each insert)

 When i stop 1 of the 4 nodes (replication factor 3)
 the script just hangs..
 i don't get any exceptions or timeouts..

 anyone any idea?
 or how i can debug this? (bug very easy to replicate)

 Best regards
 Pieter Maes




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Cassandra Chef recipe and EC2 snitch

2010-07-22 Thread Allan Carroll
Hi all, 

I'm setting up a new cluster on EC2 for the first time and looking at the wiki 
cloud setup page (http://wiki.apache.org/cassandra/CloudConfig). There's a chef 
recipe linked there that mentions an ec2snitch. The link doesn't seem to go 
where it says it does. Does anyone know where those resources have gone or are 
they no longer available?

Thanks
-Allan

Re: Cassandra Chef recipe and EC2 snitch

2010-07-22 Thread Dave Viner
You don't need the ec2snitch necessarily.  AFAIK, It's meant to be a
better way of detecting where your ec2 instances are.  But, unless you're
popping instances all the time, I don't think it's worth it.

Check out the step-by-step guide on that same page.  Pure EC2 api calls to
setup your cluster.

You can also use rackaware-ness in EC2.  Just add in the PropertyFile
endpoint and put your rack file in /etc/cassandra/rack.properties.

Dave Viner

On Thu, Jul 22, 2010 at 10:08 AM, Allan Carroll alla...@gmail.com wrote:

 Hi all,

 I'm setting up a new cluster on EC2 for the first time and looking at the
 wiki cloud setup page (http://wiki.apache.org/cassandra/CloudConfig).
 There's a chef recipe linked there that mentions an ec2snitch. The link
 doesn't seem to go where it says it does. Does anyone know where those
 resources have gone or are they no longer available?

 Thanks
 -Allan


Re: Is it possible to read from one row, several SuperColumns with all their Columns in one call?

2010-07-22 Thread Patricio Echagüe
Hey thanks Aaron. It finally worked. The API reference looked a bit
confusing to me.

I used (as you suggested):
ColumnParent parent = new ColumnParent(
ColumnFamily name);
SlicePredicate sp = new SlicePredicate();
sp.setColumn_names( list of super column names);

after calling get_slice() as result, i got the super columns I
passed in the list with ALLl their columns in one call.

so now, just IF I WANTED TO, would it be possible to get just a subset
of columns contained in those supercolumns instead of all of them
assuming that the amount of columns are lot and for performance issues
i want to avoid moving a lot of data over the network?

2010/7/22 Patricio Echagüe patric...@gmail.com:
 Hi Aaron, the problem I have is that those UUIDs are random numbers.
 2,3,4 are not sequential unfortunately. I don't think there is an API
 like mutiget_slice for key but for Super Column names.

 Is there any other way to specify a list of super column names to read
 where those names are not sequential?

 thanks


 On Wed, Jul 21, 2010 at 5:58 PM, Aaron Morton aa...@thelastpickle.com wrote:
 Take a look at the  get_slice  function http://wiki.apache.org/cassandra/API

 You could send one with a ColumnParent that only specifies the ColumnFamily
 and a SlicePredicate with a SliceRange where the start and finish values are
 empty strings. Set the count to an appropriate level to get them all (e.g.
 1000) or make multiple calls.


 Aaron


 On 22 Jul, 2010,at 12:05 PM, Patricio Echagüe patric...@gmail.com wrote:

 Hi all, apologize before hand if this question sounds a bit dumb but I
 don't see what API I should use (if there is any)

 this is my model:

 row key A:
 SC_UUID_1:
 Col1, Col2, Col3
 SC_UUID_2:
 Col1, Col2, Col3
 SC_UUID_3:
 Col1, Col2, Col3
 SC_UUID_4:
 Col1, Col2, Col3
 SC_UUID_5:
 Col1, Col2, Col3

 SC_UUID(i) are random UUIDs.

 Is there any way to read in one call from row A, the SC 2,3,4 with all
 their columns? The amount of columns in every SuperColumn is not big
 at all. normally less than 20.

 Thanks
 --
 Patricio-




 --
 Patricio.-




-- 
Patricio.-


Re: CRUD test

2010-07-22 Thread Oleg Tsvinev
Yes, and that was the issue. But now after I delete a row from
cassandra-cli, I cannot insert anything back with my code. Insert code works
does not throw any exceptions but when I read just inserted columns I get
NotFoundException at the last line:

client = borrowClient();
Keyspace keyspace = client.getKeyspace(KEYSPACE,
CONSISTENCY_LEVEL);
ColumnPath cp = new ColumnPath(application);
cp.setSuper_column(uuid.getBytes());
SuperColumn sc = keyspace.getSuperColumn(category, cp);

It makes me think that once I remove supercolumn it cannot be created again.


On Jul 22, 2010 1:13 AM, Colin Vipurs zodiac...@gmail.com wrote:

Have you checked the timestamp you're using for the subsequent inserts
is higher than that used in the delete?


On Thu, Jul 22, 2010 at 2:29 AM, Oleg Tsvinev oleg.tsvi...@gmail.com
wrote:
 Hi there,
 I'm try...
--
Maybe she awoke to see the roommate's boyfriend swinging from the
chandelier wearing a boar's head.

Something which you, I, and everyone else would call Tuesday, of course.


RE: CRUD test

2010-07-22 Thread Peter Minearo
I am able to reproduce his problem. If you take the default storage-conf.xml 
file and utilize the Super2 ColumnFamily with the code below.  You will see 
that the data is not getting created once you run the delete.  It seems to not 
allow you to create data via Thrift.  HOWEVER, data can be created via the 
command line tool.

import java.io.UnsupportedEncodingException;
import java.util.List;

import org.apache.cassandra.thrift.Cassandra;
import org.apache.cassandra.thrift.Column;
import org.apache.cassandra.thrift.ColumnOrSuperColumn;
import org.apache.cassandra.thrift.ColumnParent;
import org.apache.cassandra.thrift.ColumnPath;
import org.apache.cassandra.thrift.ConsistencyLevel;
import org.apache.cassandra.thrift.InvalidRequestException;
import org.apache.cassandra.thrift.NotFoundException;
import org.apache.cassandra.thrift.SlicePredicate;
import org.apache.cassandra.thrift.SliceRange;
import org.apache.cassandra.thrift.SuperColumn;
import org.apache.cassandra.thrift.TimedOutException;
import org.apache.cassandra.thrift.UnavailableException;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.transport.TTransport;


public class CrudTest {


private static final String KEYSPACE = Keyspace1;


public static void main(String[] args) {
CrudTest client = new CrudTest();

try {
client.run();
} catch (Exception e) {
e.printStackTrace();
} 

}


public void run() throws TException, InvalidRequestException, 
UnavailableException, UnsupportedEncodingException, NotFoundException, 
TimedOutException {
TTransport tr = new TSocket(localhost, 9160);
TProtocol proto = new TBinaryProtocol(tr);
Cassandra.Client client = new Cassandra.Client(proto);
tr.open();

System.out.println( CREATING DATA *);
createData(client);
getData(client);
System.out.println();
System.out.println( DELETING DATA *);
deleteData(client);
getData(client);
System.out.println();
System.out.println( CREATING DATA *);
createData(client);
getData(client);

tr.close();
  }


private void createData(Cassandra.Client client) throws 
InvalidRequestException, UnavailableException, TimedOutException, TException {
ColumnPath cp1 = new ColumnPath(Super2);
cp1.setSuper_column(hotel.getBytes());
cp1.setColumn(Best Western.getBytes());


client.insert(KEYSPACE,
  name,
  cp1,
  Best Western of SF.getBytes(),
  System.currentTimeMillis(),
  ConsistencyLevel.ALL);

ColumnPath cp2 = new ColumnPath(Super2);
cp2.setSuper_column(hotel.getBytes());
cp2.setColumn(Econolodge.getBytes());

client.insert(KEYSPACE,
  name,
  cp2,
  Econolodge of SF.getBytes(),
  System.currentTimeMillis(),
  ConsistencyLevel.ALL);

}


private void deleteData(Cassandra.Client client) throws 
InvalidRequestException, UnavailableException, TimedOutException, TException {

client.remove(KEYSPACE,
  hotel,
  new ColumnPath(Super2),
  System.currentTimeMillis(),
  ConsistencyLevel.ONE);

}


private void getData(Cassandra.Client client) throws 
InvalidRequestException, UnavailableException, TimedOutException, TException {
SliceRange sliceRange = new SliceRange();
sliceRange.setStart(new byte[] {});
sliceRange.setFinish(new byte[] {});

SlicePredicate slicePredicate = new SlicePredicate();
slicePredicate.setSlice_range(sliceRange);

getData(client, slicePredicate);
}


private void getData(Cassandra.Client client, SlicePredicate 
slicePredicate) throws InvalidRequestException, UnavailableException, 
TimedOutException, TException {

Re: Re: Re: What is consuming the heap?

2010-07-22 Thread 王一锋
The version we are using is 0.6.1

2010-07-23 







发件人: 王一锋 
发送时间: 2010-07-23  09:38:15 
收件人: user 
抄送: 
主题: Re: Re: Re: What is consuming the heap? 
 
Yes, we are doing a lot of inserts.

But how can CASSANDRA-1042 cause an OutOfMemory?
And we are using multigetSlice(). We are not doing any get_range_slice() at all.

2010-07-23 







发件人: Jonathan Ellis 
发送时间: 2010-07-21  21:17:21 
收件人: user 
抄送: 
主题: Re: Re: What is consuming the heap? 
On Tue, Jul 20, 2010 at 11:33 PM, Peter Schuller
peter.schul...@infidyne.com wrote:
  INFO [GC inspection] 2010-07-21 01:01:49,661 GCInspector.java (line 110) GC 
 for ConcurrentMarkSweep: 11748 ms, 413673472 reclaimed leaving 9779542600 
 used; max is 10873667584
 ERROR [Thread-35] 2010-07-21 01:02:10,941 CassandraDaemon.java (line 78) 
 Fatal exception in thread Thread[Thread-35,5,main]
 java.lang.OutOfMemoryError: Java heap space
  INFO [GC inspection] 2010-07-21 01:02:10,958 GCInspector.java (line 110) GC 
 for ConcurrentMarkSweep: 10043 ms, 259576 reclaimed leaving 10172790816 
 used; max is 10873667584

 So that confirms a legitimate out-of-memory condition in the sense
 that CMS is reclaiming extremely little and the live set after a
 concurrent mark/sweep is indeed around the 10 gig.
Are you doing a lot of inserts?  You might be hitting
https://issues.apache.org/jira/browse/CASSANDRA-1042
-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com