Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Ananth Gundabattula
Thanks a lot Aaron and Edward. The mail thread clarifies some things for me. For letting others know on this thread, running an upgradesstables did decrease our bloom filter false positive ratios a lot. ( upgradesstables was run not to upgrade from a casasndra version to a higher cassandra versio

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Edward Capriolo
On Tue, Nov 20, 2012 at 5:23 PM, aaron morton wrote: > My understanding of the compaction process was that since data files keep > continuously merging we should not have data files with very old last > modified timestamps > > It is perfectly OK to have very old SSTables. > > But performing an upg

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread aaron morton
> My understanding of the compaction process was that since data files keep > continuously merging we should not have data files with very old last > modified timestamps It is perfectly OK to have very old SSTables. > But performing an upgradesstables did decrease the number of data files and

Re: Query regarding SSTable timestamps and counts

2012-11-19 Thread Rob Coli
On Sun, Nov 18, 2012 at 7:57 PM, Ananth Gundabattula wrote: > As per the above url, " After running a major compaction, automatic minor > compactions are no longer triggered, frequently requiring you to manually > run major compactions on a routine basis." ( Just before the heading Tuning > Column

Re: Collections, query for "contains"?

2012-11-19 Thread Sylvain Lebresne
ateral view syntax that is similar to > transposed. > > > On Monday, November 19, 2012, Timmy Turner wrote: > > Is there no option to query for the contents of a collection? > > Something like > > select * from cf where c_list contains('some_value') > >

Re: Collections, query for "contains"?

2012-11-19 Thread Edward Capriolo
This was my first question after I git the inserts working. Hive has udfs like array contains. It also has lateral view syntax that is similar to transposed. On Monday, November 19, 2012, Timmy Turner wrote: > Is there no option to query for the contents of a collection? > Somethin

Re: Query regarding SSTable timestamps and counts

2012-11-18 Thread Ananth Gundabattula
Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction > It does not disable compaction. As per the above url, " After running a major compaction, automatic min

Re: Query regarding SSTable timestamps and counts

2012-11-18 Thread aaron morton
> As per datastax documentation, a manual compaction forces the admin to start > compaction manually and disables the automated compaction (atleast for major > compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there

Re: Strange delay in query

2012-11-13 Thread aaron morton
Minor compactions will still be triggered whenever a size tier gets 4+ sstables (for the default compaction strategy). So it does not affect new data. It just takes longer for the biggest size tier to get to 4 files. So it takes longer to compact the big output from the major compaction. Assu

Re: Strange delay in query

2012-11-13 Thread J. D. Jordan
Correct On Nov 13, 2012, at 5:21 AM, André Cruz wrote: > On Nov 13, 2012, at 8:54 AM, aaron morton wrote: > >>> I don't think that statement is accurate. >> Which part ? > > Probably this part: > "After running a major compaction, automatic minor compactions are no longer > triggered, freque

Re: Strange delay in query

2012-11-13 Thread André Cruz
On Nov 13, 2012, at 8:54 AM, aaron morton wrote: >> I don't think that statement is accurate. > Which part ? Probably this part: "After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis

Re: Strange delay in query

2012-11-13 Thread aaron morton
> I don't think that statement is accurate. Which part ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 13/11/2012, at 6:31 AM, Binh Nguyen wrote: > I don't think that statement is accurate. The minor compaction is still > triggered for

Re: Strange delay in query

2012-11-12 Thread Binh Nguyen
I don't think that statement is accurate. The minor compaction is still triggered for small sstables but for the big sstables it may or may not. By default Cassandra will wait until it finds 4 sstables of the same size to trigger the compaction so if the sstables are big then it may take a while to

Re: Strange delay in query

2012-11-11 Thread aaron morton
If you have a long lived row with a lot of tombstones or overwrites, it's often more efficient to select a known list of columns. There are short circuits in the read path that can avoid older tombstones filled fragments of the row being read. (Obviously this is hard to do if you don't know the

Re: Strange delay in query

2012-11-11 Thread André Cruz
On Nov 11, 2012, at 12:01 AM, Binh Nguyen wrote: > FYI: Repair does not remove tombstones. To remove tombstones you need to run > compaction. > If you have a lot of data then make sure you run compaction on all nodes > before running repair. We had a big trouble with our system regarding > tom

Re: Strange delay in query

2012-11-10 Thread Binh Nguyen
FYI: Repair does not remove tombstones. To remove tombstones you need to run compaction. If you have a lot of data then make sure you run compaction on all nodes before running repair. We had a big trouble with our system regarding tombstone and it took us long time to figure out the reason. It tur

Re: Strange delay in query

2012-11-09 Thread André Cruz
That must be it. I dumped the sstables to json and there are lots of records, including ones that are returned to my application, that have the deletedAt attribute. I think this is because the regular repair job was not running for some time, surely more than the grace period, and lots of tombst

Re: Strange delay in query

2012-11-08 Thread Josep Blanquer
Can it be that you have tons and tons of tombstoned columns in the middle of these two? I've seen plenty of performance issues with wide rows littered with column tombstones (you could check with dumping the sstables...) Just a thought... Josep M. On Thu, Nov 8, 2012 at 12:23 PM, André Cruz wro

Re: Strange delay in query

2012-11-08 Thread André Cruz
These are the two columns in question: => (super_column=13957152-234b-11e2-92bc-e0db550199f4, (column=attributes, value=, timestamp=1351681613263657) (column=blocks, value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q, timestamp=1351681613263657

Re: Strange delay in query

2012-11-08 Thread Andrey Ilinykh
What is the size of columns? Probably those two are huge. On Thu, Nov 8, 2012 at 4:01 AM, André Cruz wrote: > On Nov 7, 2012, at 12:15 PM, André Cruz wrote: > > > This error also happens on my application that uses pycassa, so I don't > think this is the same bug. > > I have narrowed it down t

Re: Strange delay in query

2012-11-08 Thread André Cruz
On Nov 7, 2012, at 12:15 PM, André Cruz wrote: > This error also happens on my application that uses pycassa, so I don't think > this is the same bug. I have narrowed it down to a slice between two consecutive columns. Observe this behaviour using pycassa: >>> DISCO_CASS.col_fam_nsrev.get(uui

Re: Strange delay in query

2012-11-07 Thread André Cruz
his issue on all 3 nodes. Also, I have a replication factor of 3. > 2. What's the result when query without limit? This row has 600k columns. I issued a count, and after some 10s: [disco@Disco] count NamespaceRevision[3cd88d97-ffde-44ca-8ae9-5336caaebc4e]; 609054 columns > 3. What'

Re: Strange delay in query

2012-11-06 Thread Chuan-Heng Hsiao
ult when query without limit? 3. What's the result after doing nodetool repair -pr on that particular column family and that node? btw, there seems to be some minor bug in the 1.1.5 cassandra-cli (but not in 1.1.6). I got error msg after creating an empty keyspace and updating the replicatio

Re: Strange delay in query

2012-11-06 Thread André Cruz
think that fetching the first 34 columns would be fast, and just a little bit slower than 33 columns, but this is a big difference. Thank you and best regards, André Cruz On Nov 6, 2012, at 2:43 PM, André Cruz wrote: > Hello. > > I have a SCF that is acting strange. See these 2 query

Strange delay in query

2012-11-06 Thread André Cruz
Hello. I have a SCF that is acting strange. See these 2 query times: get NamespaceRevision[3cd88d97-ffde-44ca-8ae9-5336caaebc4e] limit 33; ... Returned 33 results. Elapsed time: 41 msec(s). get NamespaceRevision[3cd88d97-ffde-44ca-8ae9-5336caaebc4e] limit 34; ... Returned 34 results. Elapsed

Re: How does Cassandra optimize this query?

2012-11-05 Thread Sylvain Lebresne
> "misleading". > > > > Yes. Bingo. > > > > It is misleading because it is not useful in any other context besides > > someone playing around with a ten row table in cqlsh. CQL stops me > > from executing some queries that are not efficient, yet it allows t

Re: How does Cassandra optimize this query?

2012-11-05 Thread Edward Capriolo
t;misleading". > > Yes. Bingo. > > It is misleading because it is not useful in any other context besides > someone playing around with a ten row table in cqlsh. CQL stops me > from executing some queries that are not efficient, yet it allows this > one. If I am new to Cas

Re: How does Cassandra optimize this query?

2012-11-05 Thread Edward Capriolo
d with a ten row table in cqlsh. CQL stops me from executing some queries that are not efficient, yet it allows this one. If I am new to Cassandra and developing, this query works and produces a result then once my database gets real data produces a different result (likely an empty one). When I fir

Re: How does Cassandra optimize this query?

2012-11-05 Thread Sylvain Lebresne
On Mon, Nov 5, 2012 at 6:55 PM, Edward Capriolo wrote: > I see. It is fairly misleading because it is a query that does not > work at scale. This syntax is only helpful if you have less then a few > thousand rows in Cassandra. Just for the sake of argument, how is that misleading? If

Re: How does Cassandra optimize this query?

2012-11-05 Thread Edward Capriolo
I see. It is fairly misleading because it is a query that does not work at scale. This syntax is only helpful if you have less then a few thousand rows in Cassandra. On Mon, Nov 5, 2012 at 12:24 PM, Sylvain Lebresne wrote: > On Mon, Nov 5, 2012 at 4:12 PM, Edward Capriolo > wrote: >>

Re: How does Cassandra optimize this query?

2012-11-05 Thread Sylvain Lebresne
On Mon, Nov 5, 2012 at 4:12 PM, Edward Capriolo wrote: > Is this query the equivalent of a full table scan? Without a starting > point get_range_slice is just starting at token 0? > It is, but that's what you asked for after all. If you want to start at a given token you can do:

Re: How does Cassandra optimize this query?

2012-11-05 Thread Edward Capriolo
Interesting, Is this query the equivalent of a full table scan? Without a starting point get_range_slice is just starting at token 0? Edward On Mon, Nov 5, 2012 at 2:18 AM, Sylvain Lebresne wrote: > On Sun, Nov 4, 2012 at 7:49 PM, Edward Capriolo > wrote: >> >> CQL3 A

Re: How does Cassandra optimize this query?

2012-11-04 Thread Sylvain Lebresne
On Sun, Nov 4, 2012 at 7:49 PM, Edward Capriolo wrote: > CQL3 Allows me to search the second component of a primary key. Which > really just seems to be component 1 of a composite column. > > So what thrift operation does this correspond to? This looks like a > column slice without specifying a ke

How does Cassandra optimize this query?

2012-11-04 Thread Edward Capriolo
If we create a column family: CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) ); The CLI views this column like so: create column family videos with column_type = 'S

Re: Query slowly using enterprise Cassandra

2012-10-31 Thread aaron morton
You will have better luck with Data Stax Enterprise questions on their support forums: http://www.datastax.com/support-forums/ Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/10/2012, at 4:17 PM, dong.yajun wrote: > hi list, > > Jus

Re: DELETE query failing in CQL 3.0

2012-10-22 Thread Tyler Hobbs
wrote: > I figured out the problem. The DELETE query only works if the column > used in the WHERE clause is also the first column used to define the > PRIMARY KEY. > > ** ** > > -Thomas > > ** ** > > *From:* wang liang [mailto:wla...@gmail.com] > *Sent:* Mond

RE: DELETE query failing in CQL 3.0

2012-10-22 Thread Ryabin, Thomas
I figured out the problem. The DELETE query only works if the column used in the WHERE clause is also the first column used to define the PRIMARY KEY. -Thomas From: wang liang [mailto:wla...@gmail.com] Sent: Monday, October 22, 2012 1:31 AM To: user@cassandra.apache.org Subject: Re: DELETE

Re: DELETE query failing in CQL 3.0

2012-10-21 Thread wang liang
astpickle.com > > On 20/10/2012, at 5:53 AM, "Ryabin, Thomas" > wrote: > > I have a column family called “books”, and am trying to delete all rows > where the “title” column is equal to “hatchet”. This is the query I am > using: > DELETE FROM books WHERE title =

Re: DELETE query failing in CQL 3.0

2012-10-21 Thread aaron morton
title” column is equal to “hatchet”. This is the query I am using: > DELETE FROM books WHERE title = ‘hatchet’; > > This query is failing with this error: > Bad Request: PRIMARY KEY part title found in SET part > > I am using Cassandra 1.1 and CQL 3.0. What could be the problem? > > -Thomas

DELETE query failing in CQL 3.0

2012-10-19 Thread Ryabin, Thomas
I have a column family called "books", and am trying to delete all rows where the "title" column is equal to "hatchet". This is the query I am using: DELETE FROM books WHERE title = 'hatchet'; This query is failing with this error: Bad Request: PRI

Re: Query over secondary indexes

2012-10-09 Thread Hiller, Dean
or B-tree (whichever you way you prefer to think about it). >Obviously, you don't want a partitions with billions of rows as the >B-tree starts to get a bit large. In both, you can have as many >partitions as you likeŠbillions, trillions. > >PlayOrm is just doing a range scan

Re: Query over secondary indexes

2012-10-09 Thread Hiller, Dean
as you like…billions, trillions. PlayOrm is just doing a range scan on your behalf. If you do a complex query like left join trade.account where account.isActive=true and trade.numShares>50, it is doing a range scan on a few indices but it does so in batches and eventually will do lookahead a

Re: Query over secondary indexes

2012-10-09 Thread Vivek Mishra
wide row indexing/compount primary key approach. -Vivek On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean wrote: > Another option may be PlayOrm for you and it's scalable-SQL. We queried > one million rows for 100 results in just 60ms. (and it does joins). Query > CL =QUORUM. >

Re: Query over secondary indexes

2012-10-08 Thread Vivek Mishra
@impetus.co.in> wrote: >> >>> Try making *user_name* a primary key in combination with some other >>> unique column and see if results are improving. >>> >>> -Rishabh >>> >>> *From:* Vivek Mishra [mailto:mishra.v...@gmail.com] >>>

Re: Query over secondary indexes

2012-10-08 Thread Vivek Mishra
ng *user_name* a primary key in combination with some other >> unique column and see if results are improving. >> >> -Rishabh >> >> *From:* Vivek Mishra [mailto:mishra.v...@gmail.com] >> *Sent:* Friday, October 05, 2012 2:35 PM >> *To:* user@cassandra.apache.org

Re: Query over secondary indexes

2012-10-08 Thread aaron morton
2:45 PM, Rishabh Agrawal > wrote: > Try making user_name a primary key in combination with some other unique > column and see if results are improving. > > -Rishabh > > From: Vivek Mishra [mailto:mishra.v...@gmail.com] > Sent: Friday, October 05, 2012 2:35 PM >

Re: Query over secondary indexes

2012-10-05 Thread Vivek Mishra
> > -Rishabh > > *From:* Vivek Mishra [mailto:mishra.v...@gmail.com] > *Sent:* Friday, October 05, 2012 2:35 PM > *To:* user@cassandra.apache.org > *Subject:* Query over secondary indexes > > > > I have a column family "User" which is having a indexed colum

RE: Query over secondary indexes

2012-10-05 Thread Rishabh Agrawal
Try making user_name a primary key in combination with some other unique column and see if results are improving. -Rishabh From: Vivek Mishra [mailto:mishra.v...@gmail.com] Sent: Friday, October 05, 2012 2:35 PM To: user@cassandra.apache.org Subject: Query over secondary indexes I have a column

Re: Simple data model for 1 simple range query?

2012-10-04 Thread T Akhayo
Hi Dean, Thank you for your reply, i appreciate the help. I managed to get my data model in cassandra and already inserted data and ran the query, but don't yet have enough data to do correct benchmarking. I'm now trying to load a huge amount of data using SSTableSimpleUnsortedWriter c

Re: Simple data model for 1 simple range query?

2012-10-03 Thread Hiller, Dean
PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Simple data model for 1 simple range query? Good evening, I have a quite simple data model. Pseudo CQL code: create table bars( timeframe int, date Date, info1 double, i

Simple data model for 1 simple range query?

2012-10-03 Thread T Akhayo
Good evening, I have a quite simple data model. Pseudo CQL code: create table bars( timeframe int, date Date, info1 double, info2 double, .. primary key( timeframe, date ) ) My most important query is (which might be the only one actually): select * from bars where timeframe=X and date>Y

Re: Query advice to prevent node overload

2012-09-19 Thread aaron morton
> Wouldn't that return files from directories '/tmp1', '/tmp2', for example? I believe so. > I thought the goal was to return files and subdirectories recursively inside > '/tmp'. I'm not sure what the purpose of the query was. The query query w

Re: Query advice to prevent node overload

2012-09-18 Thread André Cruz
On Sep 18, 2012, at 3:06 AM, aaron morton wrote: >> select filename from inode where filename > ‘/tmp’ and filename < ‘/tmq’ and >> sentinel = ‘x’; Wouldn't that return files from directories '/tmp1', '/tmp2', for example? I thought the goal was to return files and subdirectories recursively i

Re: Query advice to prevent node overload

2012-09-17 Thread aaron morton
> Could you explain the usage of the "sentinel"? Queries that use a secondary index must include an equality clause. That's the sentinel is there for… > select filename from inode where filename > ‘/tmp’ and filename < ‘/tmq’ and > sentinel = ‘x’; Cheers - Aaron Morton Freelanc

Re: Query advice to prevent node overload

2012-09-17 Thread André Cruz
On Sep 17, 2012, at 3:04 AM, aaron morton wrote: >> I have a schema that represents a filesystem and one example of a Super CF >> is: > This may help with some ideas > http://www.datastax.com/dev/blog/cassandra-file-system-design Could you explain the usage of the "sentinel"? Which nodes have i

Re: Query advice to prevent node overload

2012-09-17 Thread André Cruz
s (columns) the dataset is 1000x1. > This is the way the query works internally. Multiget is simply a collections > of independent gets. > > >> The multiget() is more efficient, but I'm having trouble trying to limit the >> size of the data returned in order to n

Re: Composite Column Query Modeling

2012-09-16 Thread aaron morton
> I may be missing something, but it looks like you pass multiple keys but > only a singular SlicePredicate My bad. I was probably thinking "multiple gets" but wrote multigets. If Collections don't help maybe you need to support both query types using separate CF's

Re: Query advice to prevent node overload

2012-09-16 Thread aaron morton
xed. Meaning all the sub columns have to be read into memory. > So if I set column_count = 1, as I have now, but fetch 1000 dirs (rows) > and each one happens to have 1 files (columns) the dataset is 1000x1. This is the way the query works internally. Multiget is simply a colle

Re: Composite Column Query Modeling

2012-09-14 Thread Hiller, Dean
There is another trick here. On the playOrm open source project, we need to do a sparse query for a join and so we send out 100 async requests and cache up the java "Future" objects and return the first needed result back without waiting for the others. With the S-SQLin playOrm, w

Re: Composite Column Query Modeling

2012-09-14 Thread Adam Holmberg
--- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 14/09/2012, at 8:31 AM, Adam Holmberg > wrote: > > I'm modeling a new application and considering the use of SuperColumn vs. > Composite Column paradigms. I understan

Query advice to prevent node overload

2012-09-14 Thread André Cruz
sandra query per dir, or a multiget for all needed dirs. The multiget() is more efficient, but I'm having trouble trying to limit the size of the data returned in order to not crash the cassandra node. I'm using the pycassa client lib, and until now I have been doing per-directory get(

Re: Composite Column Query Modeling

2012-09-14 Thread aaron morton
aradigms. I understand that SuperColumns are discouraged in > new development, but I'm pondering a query where it seems like SuperColumns > might be better suited. > > Consider a CF with SuperColumn layout as follows > > t = { > k1: { > s1: { c1:v1, c2:v2 }, >

Composite Column Query Modeling

2012-09-13 Thread Adam Holmberg
I'm modeling a new application and considering the use of SuperColumn vs. Composite Column paradigms. I understand that SuperColumns are discouraged in new development, but I'm pondering a query where it seems like SuperColumns might be better suited. Consider a CF with SuperColumn

Re: wild card on query

2012-08-17 Thread Swathi Vikas
Thank you very much Aaron. Information you provided is very helpful.   Have a great Weekend!!! swat.vikas   From: aaron morton To: user@cassandra.apache.org Sent: Thursday, August 16, 2012 6:29 PM Subject: Re: wild card on query > I want to retrieve all

Re: wild card on query

2012-08-16 Thread aaron morton
> I want to retrieve all the photos from all the users of certain project. My > sql like query will be "select projectid * photos from Users". How can i run > this kind of row key predicate while executing query on cassandra? You cannot / should not do that using the data m

wild card on query

2012-08-16 Thread Swathi Vikas
Hi, I am trying to run query on cassandra cluster with predicate on row key. I have column family called "Users" and rows with row key like "projectid_userid_photos". Each user within a project can have rows like projectid_userid_blog, projectid_userid_status and so on. 

Re: Composite Column Slice query, wildcard first component?

2012-08-15 Thread aaron morton
> Is there a way to create a slice query that returns all columns where the > _second_ component is A? No. You can only get a contiguous slice of columns. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/08/2012, at 7:21 AM, Mik

Composite Column Slice query, wildcard first component?

2012-08-15 Thread Mike Hugo
Hello, Given a row like this "key1" => (A:A:C), (A:A:B), (B:A:C), (B:C:D) Is there a way to create a slice query that returns all columns where the _second_ component is A? That is, I would like to get back the following columns by asking for columns where component[0] = * and com

Query for last (composite) columns

2012-08-11 Thread Ersin Er
composite columns in the following page: https://github.com/Netflix/astyanax/wiki/Examples What I would like to query for is that last session events of each user. (So it's like a group-by query.) Can I get this information in a single query and would it be an efficient way to do it (regarding th

Re: Schema question : Query to support "Find which all of these 500 email ids have been registered"

2012-07-27 Thread Aklin_81
What about if I spread these columns across 20 rows ? Then I have to query each of these 20 rows for 500 columns. but still this seems a better solution than one row for all cols or separate row for each email id approaches !? On Fri, Jul 27, 2012 at 11:36 AM, Aklin_81 wrote: > Sorry for

Re: Schema question : Query to support "Find which all of these 500 email ids have been registered"

2012-07-26 Thread Aklin_81
Sorry for the confusion created. I need to store emails registered just for a single application. So although my data model would fit into just a single row. But is storing a hundred million columns(col name size= 8 byte; col value size=4 byte ) in a single row a good idea ? I am very much tempted

Re: Schema question : Query to support "Find which all of these 500 email ids have been registered"

2012-07-26 Thread Roshni Rajagopal
In general I believe wide rows (many cols ) are preferable to skinny rows (many rows) so that you can get all the information in 1 go, One can store 2 billion cols in a row. However, on what basis would you store the 500 email ids in 1 row? What can be the row key? For e.g. If the query you want

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 11:16 AM, Ertio Lew wrote: > I want to read columns for a randomly selected list of userIds(completely > random). I fetch the data using userIds(which would be used as column names > in case of single row or as rowkeys incase of 1 row for each user) for a > selected list o

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
I want to read columns for a randomly selected list of userIds(completely random). I fetch the data using userIds(which would be used as column names in case of single row or as rowkeys incase of 1 row for each user) for a selected list of users. Assume that the application knows the list of userId

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
ns such that can efficiently read columns for atleast 300-500 users > in a single read query. Is the query timebased or userid based? How do you determine which users to read first? Do you read all of them or few of them? What's the query criteria? It would be helpful to understand exac

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
in a single read query.

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 10:53 AM, Ertio Lew wrote: > Actually these columns are 1 for each entity in my application & I need to > query at any time columns for a list of 300-500 entities in one go. Can you describe your situation with small example?

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Ertio Lew
Actually these columns are 1 for each entity in my application & I need to query at any time columns for a list of 300-500 entities in one go.

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Mohit Anchlia
On Mon, Jul 23, 2012 at 10:07 AM, Ertio Lew wrote: > My major concern is that is it too bad retrieving 300-500 rows (each for a > single column) in a single read query that I should store all these(around > a hundred million) columns in a single row? You could create multiple rows and

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Eldad Yamin
n Mon, Jul 23, 2012 at 3:40 PM, rohit bhatia wrote: > You should probably try to break the one row scheme to > 2*Number_of_nodes rows scheme.. This should ensure proper distribution > of rows and still allow u to query from a few fixed number of rows. > How u do it depends on how are

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread rohit bhatia
You should probably try to break the one row scheme to 2*Number_of_nodes rows scheme.. This should ensure proper distribution of rows and still allow u to query from a few fixed number of rows. How u do it depends on how are u gonna choose ur 200-500 columns during reading (try having them in the

Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-22 Thread Ertio Lew
I want to store hundred of millions of columns(containing id1 to id2 mappings) in the DB & at any single time, retrieve a set of about 200-500 columns based on the column names(id1) if they are in single row or using rowkeys if each column is stored in a unique row. If I put them in a single row:

Re: Why is our range query failing in Cassandra 0.8.10 Client

2012-07-12 Thread Sylvain Lebresne
When executing a query like: get events WHERE Firm=434550 AND ds_timestamp>=1341955958200 AND ds_timestamp<=1341955958200; what the 2ndary index implementation will do is: 1) it queries the index for Firm for the row with key 434550 (because that's the only one restricted by an equal

Why is our range query failing in Cassandra 0.8.10 Client

2012-07-11 Thread JohnB
651018) => (column=ds_timestamp, value=1341955958200, timestamp=1341955980651020) If I run the following query: get events WHERE Firm=434550 AND ds_timestamp>=1341955958200 AND ds_timestamp<=1341955958200; (which in theory would should return the same 1 row result) It runs for aroun

Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread Tyler Hobbs
On Tue, Jul 10, 2012 at 2:20 PM, Sunit Randhawa wrote: > I have tested this extensively and EOC has huge issue in terms of > usability of CompositeTypes in Cassandra. > > As an example: If you have 2 Composite Columns such as A:B:C and A:D:C. > > And if you do search on A:B as start and end Compos

Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread Sunit Randhawa
I have tested this extensively and EOC has huge issue in terms of usability of CompositeTypes in Cassandra. As an example: If you have 2 Composite Columns such as A:B:C and A:D:C. And if you do search on A:B as start and end Composite Components, it will return D as well. Because it returns all t

Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread Tyler Hobbs
I think in this case that's just Hector's way of setting the EOC byte for a component. My guess is that the composite isn't being structured correctly through Hector, as well. On Tue, Jul 10, 2012 at 4:40 AM, aaron morton wrote: > > The first thing that stands out is that (in cassandra) comparis

Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread aaron morton
Ah, it's a Hector query question. You may have bette luck on the Hector email list. Or if you can turn on debug logging on the server and grab the query that would be handy. The first thing that stands out is that (in cassandra) comparison operations are not used in a slice range. C

Re: Composite Slice Query returning non-sliced data

2012-07-09 Thread Sunit Randhawa
orton > > wrote: > > > #2 has the Composite Column and #1 does not. > > > > They are both strings. > > > > All column names *must* be of the same type. What was your CF definition ? > > > > Cheers > > > > - >

Re: Composite Slice Query returning non-sliced data

2012-07-08 Thread aaron morton
t; >> All column names *must* be of the same type. What was your CF definition ? >> >> >> Cheers >> >> >> - >> >> Aaron Morton >> >> Freelance Developer >> >> @aaronmorton >> >> ht

Re: Composite Slice Query returning non-sliced data

2012-07-06 Thread Sunit Randhawa
rton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > On 6/07/2012, at 7:26 AM, Sunit Randhawa wrote: > > > Hello, > > > I have 2 Columns for a 'RowKey' as below: > > > #1 : set CF['RowKey'][

Re: Composite Slice Query returning non-sliced data

2012-07-06 Thread aaron morton
>> Hello, >> >> I have 2 Columns for a 'RowKey' as below: >> >> #1 : set CF['RowKey']['1000']='A=1,B=2'; >> #2: set CF['RowKey']['1000:C1']='A=2,B=3''; >>

Re: Composite Slice Query returning non-sliced data

2012-07-05 Thread Sunit Randhawa
#1 : set CF['RowKey']['1000']='A=1,B=2'; > #2: set CF['RowKey']['1000:C1']='A=2,B=3''; > > #2 has the Composite Column and #1 does not. > > Now when I execute the Composite Slice query by 1000 and C1, I do get > bot

Re: Composite Slice Query returning non-sliced data

2012-07-05 Thread aaron morton
ote: > Hello, > > I have 2 Columns for a 'RowKey' as below: > > #1 : set CF['RowKey']['1000']='A=1,B=2'; > #2: set CF['RowKey']['1000:C1']='A=2,B=3''; > > #2 has the Composite Column and #1 does no

Composite Slice Query returning non-sliced data

2012-07-05 Thread Sunit Randhawa
Hello, I have 2 Columns for a 'RowKey' as below: #1 : set CF['RowKey']['1000']='A=1,B=2'; #2: set CF['RowKey']['1000:C1']='A=2,B=3''; #2 has the Composite Column and #1 does not. Now when I execute the Composite Slice

Re: Cassandra 1.0.6 data flush query

2012-06-24 Thread aaron morton
> > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-1-0-6-data-flush-query-tp7580733.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.

Cassandra 1.0.6 data flush query

2012-06-21 Thread Roshan
_mb: 200 in_memory_compaction_limit_in_mb: 16 (from 64MB) Key cache = 1 Row cache = 0 Could someone please help me on this. Thanks /Roshan -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-1-0-6-data-flush-query-tp7580733.html S

Re: When will CQL BATCH support binding variable (Query substitution use named parameters)?

2012-06-21 Thread Data Craftsman
Hi Sylvain, Thanks for the quick response. Yes. I don't know the difference from bind variable to "query substitution". I'm a little confused, I just try to use your language. :) In Oracle world, we called Binding Variable. Can you give me a concise example for bound va

Re: When will CQL BATCH support binding variable (Query substitution use named parameters)?

2012-06-20 Thread Sylvain Lebresne
ture? > > e.g. > http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ > > Query substitution > Use named parameters and a dictionary of names and values. > >>> cursor.execute("SELECT column FROM CF WHERE name=:name", dict(name="Foo")) That may be a prob

When will CQL BATCH support binding variable (Query substitution use named parameters)?

2012-06-20 Thread Data Craftsman
Hello, CQL BATCH is good for INSERT/UPDATE performance. But it cannot do binding variable, exposed to SQL injection. Is there a plan to make CQL BATCH to support binding variable in near future? e.g. http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ Query substitution Use named

<    5   6   7   8   9   10   11   12   >