Re: Change of behaviour in multiget_slice query for unknown keys between 0.7 and 1.1?

2012-06-19 Thread aaron morton
Nothing has changed in the server, try the Hector user group. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/06/2012, at 12:02 PM, Edward Sargisson wrote:

> Hi all,
> Was there a change of behaviour in multiget_slice query in Cassandra or 
> Hector between 0.7 and 1.1 when dealing with a key that doesn't exist?
> 
> We've just upgraded and our in memory unit test is failing (although just on 
> my machine). The test code is looking for a key that doesn't exist and 
> expects to get null. Instead it gets a ColumnSlice with a single column 
> called val. If there were something there then we'd expect columns with names 
> like bytes, int or string. Other rows in the column family have those columns 
> as well as val.
> 
> Is there a reason for this behaviour?
> I'd like to see if there was an explanation before I change the unit test for 
> it.
> 
> Many thanks in advance,
> Edward
> 
> -- 
> Edward Sargisson
> senior java developer
> Global Relay
> 
> edward.sargis...@globalrelay.net
> 
> 
> 866.484.6630 
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  
> (+65.3158.1301)
> 
> Global Relay Archive supports email, instant messaging, BlackBerry, 
> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook 
> and more.   
> 
> Ask about Global Relay Message — The Future of Collaboration in the Financial 
> Services World
> 
> All email sent to or from this address will be retained by Global Relay’s 
> email archiving system. This message is intended only for the use of the 
> individual or entity to which it is addressed, and may contain information 
> that is privileged, confidential, and exempt from disclosure under applicable 
> law.  Global Relay will not be liable for any compliance or technical 
> information provided herein.  All trademarks are the property of their 
> respective owners.



Change of behaviour in multiget_slice query for unknown keys between 0.7 and 1.1?

2012-06-18 Thread Edward Sargisson

Hi all,
Was there a change of behaviour in multiget_slice query in Cassandra or 
Hector between 0.7 and 1.1 when dealing with a key that doesn't exist?


We've just upgraded and our in memory unit test is failing (although 
just on my machine). The test code is looking for a key that doesn't 
exist and expects to get null. Instead it gets a ColumnSlice with a 
single column called val. If there were something there then we'd expect 
columns with names like bytes, int or string. Other rows in the column 
family have those columns as well as val.


Is there a reason for this behaviour?
I'd like to see if there was an explanation before I change the unit 
test for it.


Many thanks in advance,
Edward

--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net <mailto:edward.sargis...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London  (+44.0800.032.9829) | 
Singapore  (+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein.  All trademarks 
are the property of their respective owners.




RE: Cassandra upgrade to 1.1.1 resulted in slow query issue

2012-06-14 Thread Ganza, Ivan
Greetings,

Thank you - issue is created here:  
https://issues.apache.org/jira/browse/CASSANDRA-4340

-Ivan/

---
[cid:image001.jpg@01CD4A16.2AA22DE0]
Ivan Ganza | Senior Developer | Information Technology
c: 647.701.6084 | e:  iga...@globeandmail.com

From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Thursday, June 14, 2012 8:20 AM
To: user@cassandra.apache.org
Cc: cassandra-u...@incubator.apache.org; Schlueter, Kevin
Subject: Re: Cassandra upgrade to 1.1.1 resulted in slow query issue

That does looks fishy.
Would you mind opening a ticket on jira 
(https://issues.apache.org/jira/browse/CASSANDRA) directly for that. It's 
easier for us to track it there.

Thanks,
Sylvain

On Wed, Jun 13, 2012 at 8:05 PM, Ganza, Ivan 
mailto:iga...@globeandmail.com>> wrote:
Greetings,

We have recently introduced Cassandra at the Globe and Mail here in Toronto, 
Canada.  We are processing and storing the North American stock-market feed.  
We have found it to work very quickly and things have been looking very good.

Recently we upgraded to version 1.1.1 and then we have noticed some issues 
occurring.

I will try to describe it for you here.  Basically one operation that we very 
often perform and is very critical is the ability to 'get the latest quote'.  
This would return to you the latest Quote adjusted against exchange delay 
rules.  With Cassandra version 1.0.3 we could get a Quote in around 2ms.  After 
update we are looking at time of at least 2-3 seconds.

The way we query the quote is using a REVERSED SuperSliceQuery  with start=now, 
end=00:00:00.000 (beginning of day) LIMITED to 1.

Our investigation leads us to suspect that, since upgrade, Cassandra seems to 
be reading the sstable from disk even when we request a small range of day only 
5 seconds back.  If you look at the output below you can see that the query 
does NOT get slower as the lookback increases from 5  sec, 60 sec, 15 min, 60 
min, and 24 hours.

We also noticed that the query was very fast for the first five minutes of 
trading, apparently until the first sstable was flushed to disk.  After that we 
go into query times of 1-2 seconds or so.

Query time[lookback=5]:[1711ms]
Query time[lookback=60]:[1592ms]
Query time[lookback=900]:[1520ms]
Query time[lookback=3600]:[1294ms]
Query time[lookback=86400]:[1391ms]

We would really appreciate input or help on this.

Cassandra version: 1.1.1
Hector version: 1.0-1

---
public void testCassandraIssue() {
try {
  int[] seconds = new int[]{ 5, 60, 60 * 15, 60 * 60, 60 * 60 * 
24};
  for(int sec : seconds) {
DateTime start = new DateTime();
SuperSliceQuery 
superSliceQuery = HFactory.createSuperSliceQuery(keyspaceOperator, 
StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), 
StringSerializer.get());
superSliceQuery.setKey("101390" + "." + 
testFormatter.print(start));
superSliceQuery.setColumnFamily("Quotes");
superSliceQuery.setRange(superKeyFormatter.print(start),

superKeyFormatter.print(start.minusSeconds(sec)),
true,
1);

long theStart = System.currentTimeMillis();
QueryResult> result 
= superSliceQuery.execute();
long end = System.currentTimeMillis();
System.out.println("Query time[lookback=" + sec + "]:[" 
+ (end - theStart) + "ms]");
  }
} catch(Exception e) {
  e.printStackTrace();
  fail(e.getMessage());
}
  }

---
create column family Quotes
with column_type = Super
and  comparator = BytesType
and subcomparator = BytesType
and keys_cached = 7000
and rows_cached = 0
and row_cache_save_period = 0
and key_cache_save_period = 3600
and memtable_throughput = 255
and memtable_operations = 0.29
AND compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};



-Ivan/

---
[cid:image001.jpg@01CD4A16.2AA22DE0]
Ivan Ganza | Senior Developer | Information Technology
c: 647.701.6084 | e:  
iga...@globeandmail.com<mailto:iga...@globeandmail.com>


<>

Re: Cassandra upgrade to 1.1.1 resulted in slow query issue

2012-06-14 Thread Sylvain Lebresne
That does looks fishy.
Would you mind opening a ticket on jira (
https://issues.apache.org/jira/browse/CASSANDRA) directly for that. It's
easier for us to track it there.

Thanks,
Sylvain

On Wed, Jun 13, 2012 at 8:05 PM, Ganza, Ivan wrote:

> Greetings,
>
> ** **
>
> We have recently introduced Cassandra at the Globe and Mail here in
> Toronto, Canada.  We are processing and storing the North American
> stock-market feed.  We have found it to work very quickly and things have
> been looking very good.
>
> ** **
>
> Recently we upgraded to version 1.1.1 and then we have noticed some issues
> occurring.
>
> ** **
>
> I will try to describe it for you here.  Basically one operation that we
> very often perform and is very critical is the ability to ‘get the latest
> quote’.  This would return to you the latest Quote adjusted against
> exchange delay rules.  With Cassandra version 1.0.3 we could get a Quote in
> around 2ms.  After update we are looking at time of at least 2-3 seconds.*
> ***
>
> ** **
>
> The way we query the quote is using a REVERSED SuperSliceQuery  with
> start=now, end=00:00:00.000 (beginning of day) LIMITED to 1.
>
> ** **
>
> Our investigation leads us to suspect that, since upgrade, Cassandra seems
> to be reading the sstable from disk even when we request a small range of
> day only 5 seconds back.  If you look at the output below you can see that
> the query does NOT get slower as the lookback increases from 5  sec, 60
> sec, 15 min, 60 min, and 24 hours.
>
> ** **
>
> We also noticed that the query was very fast for the first five minutes of
> trading, apparently until the first sstable was flushed to disk.  After
> that we go into query times of 1-2 seconds or so.
>
> ** **
>
> Query time[lookback=5]:[1711ms]
>
> Query time[lookback=60]:[1592ms]
>
> Query time[lookback=900]:[1520ms]
>
> Query time[lookback=3600]:[1294ms]
>
> Query time[lookback=86400]:[1391ms]
>
>
> We would really appreciate input or help on this.
>
> ** **
>
> Cassandra version: 1.1.1
>
> Hector version: 1.0-1
>
> ** **
>
> ---
>
> *public* *void* testCassandraIssue() {
>
> *try* {
>
>   *int*[] seconds = *new* *int*[]{ 5, 60, 60 * 15, 60 *
> 60, 60 * 60 * 24};
>
>   *for*(*int* sec : seconds) {
>
> DateTime start = *new* DateTime();
>
> SuperSliceQuery
> superSliceQuery = HFactory.*createSuperSliceQuery*(keyspaceOperator,
> StringSerializer.*get*(), StringSerializer.*get*(), StringSerializer.*get*(),
> StringSerializer.*get*());
>
> superSliceQuery.setKey("101390" + "." + *
> testFormatter*.print(start));
>
> superSliceQuery.setColumnFamily("Quotes");
>
> superSliceQuery.setRange(*superKeyFormatter*
> .print(start),
>
> *superKeyFormatter*
> .print(start.minusSeconds(sec)),
>
> *true*,
>
> 1);
>
> ** **
>
> *long* theStart = System.*currentTimeMillis*();***
> *
>
> QueryResult>
> result = superSliceQuery.execute();
>
> *long* end = System.*currentTimeMillis*();
>
> System.*out*.println("Query time[lookback=" + sec
> + "]:[" + (end - theStart) + "ms]");
>
>   }
>
> } *catch*(Exception e) {
>
>   e.printStackTrace();
>
>   *fail*(e.getMessage());
>
> }
>
>   }
>
> ** **
>
> ---
>
> create column family Quotes
>
> with column_type = Super
>
> and  comparator = BytesType
>
> and subcomparator = BytesType
>
> and keys_cached = 7000
>
> and rows_cached = 0
>
> and row_cache_save_period = 0
>
> and key_cache_save_period = 3600
>
> and memtable_throughput = 255
>
> and memtable_operations = 0.29
>
> AND compression_options={sstable_compression:SnappyCompressor,
> chunk_length_kb:64};
>
> ** **
>
> ** **
>
> ** **
>
> -Ivan/
>
> ** **
>
> ---
>
> [image: Description: Description: Description: Description:
> cid:3376987576_26606724]
>
> *Ivan Ganza* | Senior Developer | Information Technology
>
> c: 647.701.6084 | e:  iga...@globeandmail.com
>
> ** **
>
<>

Cassandra upgrade to 1.1.1 resulted in slow query issue

2012-06-13 Thread Ganza, Ivan
Greetings,

We have recently introduced Cassandra at the Globe and Mail here in Toronto, 
Canada.  We are processing and storing the North American stock-market feed.  
We have found it to work very quickly and things have been looking very good.

Recently we upgraded to version 1.1.1 and then we have noticed some issues 
occurring.

I will try to describe it for you here.  Basically one operation that we very 
often perform and is very critical is the ability to 'get the latest quote'.  
This would return to you the latest Quote adjusted against exchange delay 
rules.  With Cassandra version 1.0.3 we could get a Quote in around 2ms.  After 
update we are looking at time of at least 2-3 seconds.

The way we query the quote is using a REVERSED SuperSliceQuery  with start=now, 
end=00:00:00.000 (beginning of day) LIMITED to 1.

Our investigation leads us to suspect that, since upgrade, Cassandra seems to 
be reading the sstable from disk even when we request a small range of day only 
5 seconds back.  If you look at the output below you can see that the query 
does NOT get slower as the lookback increases from 5  sec, 60 sec, 15 min, 60 
min, and 24 hours.

We also noticed that the query was very fast for the first five minutes of 
trading, apparently until the first sstable was flushed to disk.  After that we 
go into query times of 1-2 seconds or so.

Query time[lookback=5]:[1711ms]
Query time[lookback=60]:[1592ms]
Query time[lookback=900]:[1520ms]
Query time[lookback=3600]:[1294ms]
Query time[lookback=86400]:[1391ms]

We would really appreciate input or help on this.

Cassandra version: 1.1.1
Hector version: 1.0-1

---
public void testCassandraIssue() {
try {
  int[] seconds = new int[]{ 5, 60, 60 * 15, 60 * 60, 60 * 60 * 
24};
  for(int sec : seconds) {
DateTime start = new DateTime();
SuperSliceQuery 
superSliceQuery = HFactory.createSuperSliceQuery(keyspaceOperator, 
StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), 
StringSerializer.get());
superSliceQuery.setKey("101390" + "." + 
testFormatter.print(start));
superSliceQuery.setColumnFamily("Quotes");
superSliceQuery.setRange(superKeyFormatter.print(start),

superKeyFormatter.print(start.minusSeconds(sec)),
true,
1);

long theStart = System.currentTimeMillis();
QueryResult> result 
= superSliceQuery.execute();
long end = System.currentTimeMillis();
    System.out.println("Query time[lookback=" + sec + "]:[" 
+ (end - theStart) + "ms]");
  }
} catch(Exception e) {
  e.printStackTrace();
  fail(e.getMessage());
}
  }

---
create column family Quotes
with column_type = Super
and  comparator = BytesType
and subcomparator = BytesType
and keys_cached = 7000
and rows_cached = 0
and row_cache_save_period = 0
and key_cache_save_period = 3600
and memtable_throughput = 255
and memtable_operations = 0.29
AND compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};



-Ivan/

---
[cid:image001.jpg@01CD496D.8EAB8240]
Ivan Ganza | Senior Developer | Information Technology
c: 647.701.6084 | e:  iga...@globeandmail.com

<>

Re: Query

2012-06-06 Thread shelan Perera
Hi,

You can find detailed info here [1]

[1] https://github.com/hector-client/hector/wiki/User-Guide

regards

On Wed, Jun 6, 2012 at 3:38 PM, MOHD ARSHAD SALEEM <
marshadsal...@tataelxsi.co.in> wrote:

>  Hi,
> After creating the keyspace successfully now i want to know how to read
> write data using API,s
>
> Regards
> Arshad
>  --
> *From:* Filippo Diotalevi [fili...@ntoklo.com]
> *Sent:* Wednesday, June 06, 2012 2:27 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Query
>
>   Hi,
> the Javadoc (or source code) of
> the me.prettyprint.hector.api.factory.HFactory class contains all the
> examples to create keyspaces and column families.
>
>  To create a keyspace:
>
>  String testKeyspace = "testKeyspace";
> KeyspaceDefinition newKeyspace
> = HFactory.createKeyspaceDefinition(testKeyspace);
> cluster.addKeyspace(newKeyspace);
>
>
>  To create a column family and a keyspace:
>
>  String keyspace = "testKeyspace";
> String column1 = "testcolumn";
> ColumnFamilyDefinition columnFamily1
> = HFactory.createColumnFamilyDefinition(keyspace, column1);
> List columns =
> new ArrayList();
> columns.add(columnFamily1);
>
>  KeyspaceDefinition testKeyspace =
> HFactory.createKeyspaceDefinition(keyspace, 
> org.apache.cassandra.locator.SimpleStrategy.class.getName(),
> 1, columns);
> cluster.addKeyspace(testKeyspace);
>
>  --
> Filippo Diotalevi
>
>
>  On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote:
>
>   Hi All,
>
> I am using Hector client for cassandra . I wanted to know how to create
> keyspace and column family using API's to read and write data.
> or  i have to create keyspace and column family manually using command
> line interface.
>
> Regards
> Arshad
>
>
>


-- 
Shelan Perera

Home: http://www.shelan.org
Blog   : http://www.shelanlk.com
Twitter: shelan
skype  :shelan.perera
gtalk   :shelanrc

 I am the master of my fate:
 I am the captain of my soul.
 *invictus*


RE: Query

2012-06-06 Thread MOHD ARSHAD SALEEM
Hi,
After creating the keyspace successfully now i want to know how to read write 
data using API,s

Regards
Arshad

From: Filippo Diotalevi [fili...@ntoklo.com]
Sent: Wednesday, June 06, 2012 2:27 PM
To: user@cassandra.apache.org
Subject: Re: Query

Hi,
the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory 
class contains all the examples to create keyspaces and column families.

To create a keyspace:

String testKeyspace = "testKeyspace";
KeyspaceDefinition newKeyspace = 
HFactory.createKeyspaceDefinition(testKeyspace);
cluster.addKeyspace(newKeyspace);


To create a column family and a keyspace:

String keyspace = "testKeyspace";
String column1 = "testcolumn";
ColumnFamilyDefinition columnFamily1 = 
HFactory.createColumnFamilyDefinition(keyspace, column1);
List columns = new ArrayList();
columns.add(columnFamily1);

KeyspaceDefinition testKeyspace =
HFactory.createKeyspaceDefinition(keyspace, 
org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns);
cluster.addKeyspace(testKeyspace);

--
Filippo Diotalevi



On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote:

Hi All,

I am using Hector client for cassandra . I wanted to know how to create 
keyspace and column family using API's to read and write data.
or  i have to create keyspace and column family manually using command line 
interface.

Regards
Arshad



Re: Query

2012-06-06 Thread Filippo Diotalevi
Hi,  
the Javadoc (or source code) of the me.prettyprint.hector.api.factory.HFactory 
class contains all the examples to create keyspaces and column families.

To create a keyspace:

String testKeyspace = "testKeyspace"; 
KeyspaceDefinition newKeyspace = 
HFactory.createKeyspaceDefinition(testKeyspace);
cluster.addKeyspace(newKeyspace);



To create a column family and a keyspace:

String keyspace = "testKeyspace"; 
String column1 = "testcolumn";
ColumnFamilyDefinition columnFamily1 = 
HFactory.createColumnFamilyDefinition(keyspace, column1);
List columns = new ArrayList(); 
columns.add(columnFamily1);

KeyspaceDefinition testKeyspace =
HFactory.createKeyspaceDefinition(keyspace, 
org.apache.cassandra.locator.SimpleStrategy.class.getName(), 1, columns);
cluster.addKeyspace(testKeyspace);


-- 
Filippo Diotalevi



On Wednesday, 6 June 2012 at 07:05, MOHD ARSHAD SALEEM wrote:

> Hi All,
> 
> I am using Hector client for cassandra . I wanted to know how to create 
> keyspace and column family using API's to read and write data.
> or  i have to create keyspace and column family manually using command line 
> interface.
> 
> Regards
> Arshad



Query

2012-06-05 Thread MOHD ARSHAD SALEEM
Hi All,

I am using Hector client for cassandra . I wanted to know how to create 
keyspace and column family using API's to read and write data.
or  i have to create keyspace and column family manually using command line 
interface.

Regards
Arshad


Re: Query

2012-06-04 Thread Franc Carter
On Mon, Jun 4, 2012 at 7:36 PM, MOHD ARSHAD SALEEM <
marshadsal...@tataelxsi.co.in> wrote:

>  Hi all,
>
> I wanted to know how to read and write data using cassandra API's . is
> there any link related to sample program .
>

I did a Proof of Concept using a python client -PyCassa (
https://github.com/pycassa/pycassa) which works well

cheers


> Regards
> Arshad
>



-- 

*Franc Carter* | Systems architect | Sirca Ltd
 

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: Query

2012-06-04 Thread Amresh Singh
Here is a link that will help you out if you use Kundera as high level
client for Cassandra:

https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes<https://mail3.impetus.co.in/owa/redir.aspx?C=EkEr9x7W6ku6EW9m23CsVhgx2Di4Fc8Ixe8fyRCMSrCL8TOfNSadRVR_uY98wDCUO3S71gXRO0g.&URL=https%3a%2f%2fgithub.com%2fimpetus-opensource%2fKundera%2fwiki%2fGetting-Started-in-5-minutes>

Regards,
Amresh

On Mon, Jun 4, 2012 at 3:09 PM, Rishabh Agrawal <
rishabh.agra...@impetus.co.in> wrote:

>  If you are using Java try out Kundera or Hector, both are good and have
> good documentation available.
>
>
>
> *From:* MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in]
> *Sent:* Monday, June 04, 2012 2:37 AM
> *To:* user@cassandra.apache.org
> *Subject:* Query
>
>
>
> Hi all,
>
> I wanted to know how to read and write data using cassandra API's . is
> there any link related to sample program .
>
> Regards
> Arshad
>
> --
>
> Register for Impetus webinar ‘User Experience Design for iPad
> Applications’ June 8(10:00am PT). http://lf1.me/f9/
>
> Impetus’ Head of Labs to present on ‘Integrating Big Data technologies in
> your IT portfolio’ at Cloud Expo, NY (June 11-14). Contact us for a
> complimentary pass.Impetus also sponsoring the Yahoo Summit 2012.
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>


RE: Query

2012-06-04 Thread Rishabh Agrawal
If you are using Java try out Kundera or Hector, both are good and have good 
documentation available.

From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in]
Sent: Monday, June 04, 2012 2:37 AM
To: user@cassandra.apache.org
Subject: Query

Hi all,

I wanted to know how to read and write data using cassandra API's . is there 
any link related to sample program .

Regards
Arshad



Register for Impetus webinar 'User Experience Design for iPad Applications' 
June 8(10:00am PT). http://lf1.me/f9/

Impetus' Head of Labs to present on 'Integrating Big Data technologies in your 
IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary 
pass.Impetus also sponsoring the Yahoo Summit 2012.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Query

2012-06-04 Thread MOHD ARSHAD SALEEM
Hi all,

I wanted to know how to read and write data using cassandra API's . is there 
any link related to sample program .

Regards
Arshad


Re: Query on how to count the total number of rowkeys and columns in them

2012-05-24 Thread Віталій Тимчишин
You should read multiple "batches" specifying last key received from
previous batch as first key for next one.
For large databases I'd recommend you to use statistical approach (if it's
feasible). With random parittioner it works well.
Don't read the whole db. Knowing whole keyspace you can read part, get
number of records per key (<1), then multiply by keyspace size and get your
total.
You can even implement an algorithm that will work until required precision
is obtained (simply after each batch compare you previous and current
estimate).
For me it's enough to read ~1% of DB to get good result.

Best regards, Vitalii Tymchyshyn

2012/5/24 Prakrati Agrawal 

>  Hi
>
> ** **
>
> I am trying to learn Cassandra and I have one doubt. I am using the Thrift
> API, to count the number of row keys I am using KeyRange to specify the row
> keys. To count all of them, I specify the start and end as “new byte[0]”.
> But the count is set to 100 by default. How do I use this method to count
> the keys if I don’t know the actual number of keys in my Cassandra
> database? Please help me
>
>  **
>
-- 
Best regards,
 Vitalii Tymchyshyn


Re: Query on how to count the total number of rowkeys and columns in them

2012-05-23 Thread samal
default count is 100, set this to some max value, but this won't guarantee
actual count.

Something like paging can help in counting. Get the last key as start in
second query, end as null, count as some value. But this will port data to
client where as we only need count.

Other solution may be (if count is very necessary) having separate counter
CF, incr whenever key is inserted in other CF.

I will not use Thrift API, clients library is very mature [1] & CQL is also
very good.

[1]
http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get_range

/Samal

On Thu, May 24, 2012 at 11:52 AM, Prakrati Agrawal <
prakrati.agra...@mu-sigma.com> wrote:

>  Hi
>
> ** **
>
> I am trying to learn Cassandra and I have one doubt. I am using the Thrift
> API, to count the number of row keys I am using KeyRange to specify the row
> keys. To count all of them, I specify the start and end as “new byte[0]”.
> But the count is set to 100 by default. How do I use this method to count
> the keys if I don’t know the actual number of keys in my Cassandra
> database? Please help me
>
> ** **
>
> Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 |
> www.mu-sigma.com 
>
> ** **
>
> --
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the person(s)
> or entities to which it is addressed. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited and may be illegal. If you received this in error, please
> contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet
> accessibility, the Company cannot accept liability for any virus introduced
> by this e-mail or any attachment and you are advised to use up-to-date
> virus checking software.
>


Query on how to count the total number of rowkeys and columns in them

2012-05-23 Thread Prakrati Agrawal
Hi

I am trying to learn Cassandra and I have one doubt. I am using the Thrift API, 
to count the number of row keys I am using KeyRange to specify the row keys. To 
count all of them, I specify the start and end as "new byte[0]". But the count 
is set to 100 by default. How do I use this method to count the keys if I don't 
know the actual number of keys in my Cassandra database? Please help me

Prakrati Agrawal | Developer - Big Data(I&D)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Does Cassandra support parallel query processing?

2012-05-22 Thread aaron morton
In general read queries run on multiple nodes. But each node computes the 
complete result to the query. 

There is no support for aggregate queries. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/05/2012, at 6:49 PM, Majid Azimi wrote:

> hi guys,
> 
> I'm going to build a warehouse with Cassandra. There are a lot of range and 
> aggregate queries. 
> Does Cassandra support parallel query processing?(both on single box and 
> cluster)



Does Cassandra support parallel query processing?

2012-05-19 Thread Majid Azimi
hi guys,

I'm going to build a warehouse with Cassandra. There are a lot of range and 
aggregate queries. 

Does Cassandra support parallel query processing?(both on single box and 
cluster)


Re: primary keys query

2012-05-16 Thread Cyril Auburtin
tx was looking at http://code.google.com/p/javageomodel/ too

2012/5/14 aaron morton 

> So it seems it's not a good idea, to use Cassandra like that?
>
> Right. It's basically a table scan.
>
> Here is some background on the approach simple geo took to using
> Cassandra...
> http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
>
> Also PostGis for Postgress seems popular http://postgis.refractions.net/
>
> Hope that helps.
>
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/05/2012, at 4:23 AM, cyril auburtin wrote:
>
> I was thinking of a CF with many many rows with id, type, latitude and
> longitude (indexed), and do geolocation queries: type=all and lat < 43 and
> lat >42.9 and lon < 7.3 and lon > 7.2
>
> where all rows have type=all
> (at least try how Cassandra deals with that)
> So it seems it's not a good idea, to use Cassandra like that?
>
> There's also the possibly to do in parallel, other CF, with latitude in
> rows, that will be sorted, so an indexed query can give us the right
> latidue range, and then just query with logitude < and >
>
> What do you think of that
>
> thanks
>
> 2012/5/11 Dave Brosius 
>
>> Inequalities on secondary indices are always done in memory, so without
>> at least one EQ on another secondary index you will be loading every row in
>> the database, which with a massive database isn't a good idea. So by
>> requiring at least one EQ on an index, you hopefully limit the set of rows
>> that need to be read into memory to a manageable size. Although obviously
>> you can still get into trouble with that as well.
>>
>>
>>
>>
>> On 05/11/2012 09:39 AM, cyril auburtin wrote:
>>
>>> Sorry for askign that
>>> but Why is it necessary to always have at least one EQ comparison
>>>
>>> [default@Keyspace1] get test where birth_year>1985;
>>>No indexed columns present in index clause with operator EQ
>>>
>>> It oblige to have one dummy indexed column, to do this query
>>>
>>> [default@Keyspace1] get test where tag=sea and birth_year>1985;
>>> ---
>>> RowKey: sam
>>> => (column=birth_year, value=1988, timestamp=1336742346059000)
>>>
>>>
>>>
>>
>
>


Re: primary keys query

2012-05-14 Thread aaron morton
> So it seems it's not a good idea, to use Cassandra like that?
Right. It's basically a table scan. 

Here is some background on the approach simple geo took to using Cassandra...
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

Also PostGis for Postgress seems popular http://postgis.refractions.net/

Hope that helps. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/05/2012, at 4:23 AM, cyril auburtin wrote:

> I was thinking of a CF with many many rows with id, type, latitude and 
> longitude (indexed), and do geolocation queries: type=all and lat < 43 and 
> lat >42.9 and lon < 7.3 and lon > 7.2
> 
> where all rows have type=all
> (at least try how Cassandra deals with that)
> So it seems it's not a good idea, to use Cassandra like that?
> 
> There's also the possibly to do in parallel, other CF, with latitude in rows, 
> that will be sorted, so an indexed query can give us the right latidue range, 
> and then just query with logitude < and >
> 
> What do you think of that
> 
> thanks
> 
> 2012/5/11 Dave Brosius 
> Inequalities on secondary indices are always done in memory, so without at 
> least one EQ on another secondary index you will be loading every row in the 
> database, which with a massive database isn't a good idea. So by requiring at 
> least one EQ on an index, you hopefully limit the set of rows that need to be 
> read into memory to a manageable size. Although obviously you can still get 
> into trouble with that as well.
> 
> 
> 
> 
> On 05/11/2012 09:39 AM, cyril auburtin wrote:
> Sorry for askign that
> but Why is it necessary to always have at least one EQ comparison
> 
> [default@Keyspace1] get test where birth_year>1985;
>No indexed columns present in index clause with operator EQ
> 
> It oblige to have one dummy indexed column, to do this query
> 
> [default@Keyspace1] get test where tag=sea and birth_year>1985;
> ---
> RowKey: sam
> => (column=birth_year, value=1988, timestamp=1336742346059000)
> 
> 
> 
> 



Re: primary keys query

2012-05-11 Thread cyril auburtin
I was thinking of a CF with many many rows with id, type, latitude and
longitude (indexed), and do geolocation queries: type=all and lat < 43 and
lat >42.9 and lon < 7.3 and lon > 7.2

where all rows have type=all
(at least try how Cassandra deals with that)
So it seems it's not a good idea, to use Cassandra like that?

There's also the possibly to do in parallel, other CF, with latitude in
rows, that will be sorted, so an indexed query can give us the right
latidue range, and then just query with logitude < and >

What do you think of that

thanks

2012/5/11 Dave Brosius 

> Inequalities on secondary indices are always done in memory, so without at
> least one EQ on another secondary index you will be loading every row in
> the database, which with a massive database isn't a good idea. So by
> requiring at least one EQ on an index, you hopefully limit the set of rows
> that need to be read into memory to a manageable size. Although obviously
> you can still get into trouble with that as well.
>
>
>
>
> On 05/11/2012 09:39 AM, cyril auburtin wrote:
>
>> Sorry for askign that
>> but Why is it necessary to always have at least one EQ comparison
>>
>> [default@Keyspace1] get test where birth_year>1985;
>>No indexed columns present in index clause with operator EQ
>>
>> It oblige to have one dummy indexed column, to do this query
>>
>> [default@Keyspace1] get test where tag=sea and birth_year>1985;
>> ---
>> RowKey: sam
>> => (column=birth_year, value=1988, timestamp=1336742346059000)
>>
>>
>>
>


Re: primary keys query

2012-05-11 Thread Dave Brosius
Inequalities on secondary indices are always done in memory, so without 
at least one EQ on another secondary index you will be loading every row 
in the database, which with a massive database isn't a good idea. So by 
requiring at least one EQ on an index, you hopefully limit the set of 
rows that need to be read into memory to a manageable size. Although 
obviously you can still get into trouble with that as well.




On 05/11/2012 09:39 AM, cyril auburtin wrote:

Sorry for askign that
but Why is it necessary to always have at least one EQ comparison

[default@Keyspace1] get test where birth_year>1985;
No indexed columns present in index clause with operator EQ

It oblige to have one dummy indexed column, to do this query

[default@Keyspace1] get test where tag=sea and birth_year>1985;
---
RowKey: sam
=> (column=birth_year, value=1988, timestamp=1336742346059000)






primary keys query

2012-05-11 Thread cyril auburtin
Sorry for askign that
but Why is it necessary to always have at least one EQ comparison

[default@Keyspace1] get test where birth_year>1985;
No indexed columns present in index clause with operator EQ

It oblige to have one dummy indexed column, to do this query

[default@Keyspace1] get test where tag=sea and birth_year>1985;
---
RowKey: sam
=> (column=birth_year, value=1988, timestamp=1336742346059000)


Re: solr query for string match in CQL

2012-04-12 Thread A J
Never mind.
Double quotes within the single quotes worked:
select * from solr where solr_query='body:"sixty eight million nine
hundred forty three thousand four hundred twenty four"';


On Thu, Apr 12, 2012 at 11:42 AM, A J  wrote:
> What is the syntax for a string match in CQL for solr_query ?
>
> cqlsh:wiki> select * from solr where solr_query='body:sixty eight
> million nine hundred forty three thousand four hundred twenty four';
> Request did not complete within rpc_timeout.
>
> url encoding just returns without retrieving the row present:
> cqlsh:wiki> select count(*) from solr where
> solr_query='body:%22sixty%20eight%20million%20nine%20hundred%20forty%20three%20thousand%20four%20hundred%20twenty%20four%22'
> ;
>  count
> ---
>     0
>
> I have exactly one row matching this string that I can retrieve
> through direct solr query.
>
>
> Thanks.


Re: composite query performance depends on component ordering

2012-04-03 Thread Alexandru Sicoe
Hi Sylvain and Aaron,

Thanks for the comment Sylvain, what you say makes sense, I have
microsecond precision timestamps and looking at some row printouts I see
everything is happening at a different timestamp which means that it won't
compare the second 100 bytes component.

As for the methodology it's not so thorough. I used Cassandra 0.8.5.

What I did is I had acquired a large data set about 300 hrs worth of data
in Schema 1 (details below) which I found was easily hitting thousands of
rows for some queries, thus giving me very poor performance. I converted
this data to Schema 2 (details below) thus grouping the data together in
the same row and increasing the time bucket for the row (with two versions
"Timestamp:ID" and "ID:Timestamp" for the column names). So I obtained a CF
with 66 rows, 11 rows for 3 different types of data sources which are
dominant in the rates of info they give me (each row is a 24 hr time
bucket).

These are the results I got using the CompositeQueryIterator (with a
modified max of 100.000 cols returned per slice) taken from the Composite
query tutorial at
http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1(code
is at
https://github.com/zznate/cassandra-tutorial). So basically I used null for
start and end in order to read entire rows at a time. I timed my code. The
actual values are doubles for all 3 types. The size is the file size after
dumping the results to a text file.

Ok, in my previous email I just looked at the rows with the max size which
gave me a 20% difference. In earnest it's less.


   Type1

ID:Timestamp Timestamp:ID

 No. Cols returned Size of file ExecTime (sec) ExecTime (sec) ExecTime Diff
%
 387174 25M 12.59 8.6 31.68
 1005113 66M 31.83 21.84 31.38
 579633 38M 18.07 12.46 31.03
 1217634 81M 33.77 24.65 26.99
 376303 24M 12.32 10.36 15.94
 2493007 169M 68.68 59.93 12.74
 6298275 428M 183.28 147.57 19.48
 2777962 189M 83.16 73.3 11.86
 6138047 416M 170.88 155.83 8.81
 3193450 216M 93.26 82.84 11.18
 2302928 155M 69.91 61.62 11.85




Avg 19.3 %


   Type 2

ID:Timestamp Timestamp:ID
 No Cols returned Size of file ExecTime (sec) ExecTime (sec) ExecTime Diff %
350468 40M 12.92 13.12 -1.59  1303797 148M 43.33 38.98 10.04  697763 79M
26.78 22.05 17.66  825414 94M 33.5 26.69 20.31  55075 6.2M 2.97 2.13 28.15
1873775 213M 72.37 51.12 29.37  3982433 453M 147.04 110.71 24.71  1546491
176M 54.86 42.13 23.21  4117491 468M 143.1 114.62 19.9  1747506 199M 63.23
63.05 0.28  2720160 308M 96.06 82.47 14.14



Avg = 16.9 %

   Type 3

ID:Timestamp Timestamp:ID
 No Cols returned Size of file ExecTime (sec) ExecTime (sec) ExecTime Diff %
192667 7.2M 5.88 6.5 -10.49  210593 7.9M 6.33 5.57 12.06  144677 5.4M 3.78
3.74 1.22  207706 7.7M 6.33 5.74 9.28  235937 8.7M 6.34 6.11 3.64  159985
6.0M 4.23 3.93 7.07  134859 5.5M 3.91 3.38 13.46  70545 2.9M 2.96 2.08 29.84
98487 3.9M 4.04 2.62 35.22  205979 8.2M 7.35 5.67 22.87  166045 6.2M 5.12
3.99 22.1



Avg = 13.3 %

Just to understand why I did the tests.

Data set:
I have ~300.000 data sources. Each data source has several variables it can
output values for. There are ~12 variables / data source. This gives ~4
million independent time series (let's call them streams) that need to go
into Cassandra. The streams give me (timestamp,value) pairs at higly
different rates, depending on the data source it comes from and operating
conditions. This translates into very different row lengths if a unique
time bucket is used across all streams.

The data sources can be further grouped in types (several data sources can
share the same type). There are ~100 types.

Use case:
The system
- will serve a web dashboard.
- should allow queries at highest granularity for short periods of time (up
to between 4-8hrs) on any individual stream or grouping of streams
- should allow a method of obtaining on demand (offline) analytics over
long periods of time (up to 1 year) and then (real-time) querying on the
analytics data

Cassandra schemes used so far:
Schema 1: 1 row for each of the 3 million streams. Each row is a 4hr time
bucket.
Schema 2: 1 row for each of the 100 types. Each row is an 24hr time bucket.

Now I'm planning to use Schema 2 only with an 8hr time bucket to better
reconcile between rows that get very long and ones that don't.

Cheers,
Alex


On Sat, Mar 31, 2012 at 9:35 PM, aaron morton wrote:

> Can you post the details of the queries you are running, including the
> methodology of the tests ?
>
> (Here is the methodology I used to time queries previously
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/)
>
> Cheers
>
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/03/2012, at 1:29 AM, Alexandru Sicoe wrote:
>
> Hi guys,
>  I am consistently seeing a 20% improvement in query retrieval times if I
> use the composite comparator &

Re: composite query performance depends on component ordering

2012-03-31 Thread aaron morton
Can you post the details of the queries you are running, including the 
methodology of the tests ? 

(Here is the methodology I used to time queries previously 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/)

Cheers



-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/03/2012, at 1:29 AM, Alexandru Sicoe wrote:

> Hi guys,
>  I am consistently seeing a 20% improvement in query retrieval times if I use 
> the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where 
> Timestamp=Long and ID=~100 character strings. I am retrieving all columns  
> (~1 million) from a single row. Why is this happening?
> 
> Cheers,
> Alex



Re: composite query performance depends on component ordering

2012-03-30 Thread Sylvain Lebresne
When you do a query, there's a lot of comparison happening between
what's queries and the column names. But the composite comparator is
lazy in that when it compares two names, if the first component are
not equal, it doesn't have to compare the second one. So What's likely
happening is that in the first case you do 1 millions comparison of 8
bytes, in the latter you do 1 millions comparison of 100 bytes.

--
Sylvain

On Fri, Mar 30, 2012 at 2:30 PM, Alexandru Sicoe  wrote:
> Sender: adsi...@gmail.com
> Subject: composite query performance depends on component ordering
> Message-Id: 
> 
> Recipient: adam.nicho...@hl.co.uk
>
>
> __
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> __
>
> -- Forwarded message --
> From: Alexandru Sicoe 
> To: 
> Cc:
> Date: Fri, 30 Mar 2012 14:29:47 +0200
> Subject: composite query performance depends on component ordering
> Hi guys,
>  I am consistently seeing a 20% improvement in query retrieval times if I use 
> the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where 
> Timestamp=Long and ID=~100 character strings. I am retrieving all columns  
> (~1 million) from a single row. Why is this happening?
>
> Cheers,
> Alex
>


composite query performance depends on component ordering

2012-03-30 Thread Alexandru Sicoe
Sender: adsi...@gmail.com
Subject: composite query performance depends on component ordering
Message-Id: 
Recipient: adam.nicho...@hl.co.uk


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__--- Begin Message ---
Hi guys,
 I am consistently seeing a 20% improvement in query retrieval times if I
use the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where
Timestamp=Long and ID=~100 character strings. I am retrieving all columns
(~1 million) from a single row. Why is this happening?

Cheers,
Alex
--- End Message ---


composite query performance depends on component ordering

2012-03-30 Thread Alexandru Sicoe
Hi guys,
 I am consistently seeing a 20% improvement in query retrieval times if I
use the composite comparator "Timestamp:ID" instead of "ID:Timestamp" where
Timestamp=Long and ID=~100 character strings. I am retrieving all columns
(~1 million) from a single row. Why is this happening?

Cheers,
Alex


Re: Composite Key Query in CLI

2012-03-17 Thread Tamar Fraenkel
I think you are doing ok,
I have a CF with the following schema
 ColumnFamily: tk_counters
  Key Validation Class:
org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UUIDType)
  Default column value validator:
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  Key cache size / save period in seconds: 20.0/14400
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

*and the following cli query:*
get tk_counters['d:9eff24f7-949f-487b-a566-0dedd07656ce'];
*returns:*
=> (counter=no, value=1)
=> (counter=yes, value=2)
Returned 2 results.



*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Mar 13, 2012 at 11:28 PM, Ali Basiri  wrote:

> Hey,
>
> I'm have a set of composite keys with data and trying to query them
> through the CLI. However, the result set returned is always empty.
>
> The schema is like this:
>
>ColumnFamily: Routes
>   Key Validation Class:
> org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.TimeUUIDType,org.apache.cassandra.db.marshal.IntegerType)
>   Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   Row Cache Provider:
> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
>   ...
>
> The Data:
> ---
> RowKey: fd24a000-6d51-11e1-a260-109addb27473:4
> => (column=enabled, value=true, timestamp=1331673484419000)
> => (column=providerId, value=0575af10-6d52-11e1-a260-109addb27473,
> timestamp=1331673484419001)
> ---
> RowKey: fd24a000-6d51-11e1-a260-109addb27473:5
> => (column=enabled, value=true, timestamp=1331673476181000)
> => (column=providerId, value=0086b6c0-6d52-11e1-a260-109addb27473,
> timestamp=1331673476181001)
> ---
>
>
> The Query:
> >  get Routes['fd24a000-6d51-11e1-a260-109addb27473:4'];
> Returned 0 results.
> Elapsed time: 4 msec(s).
>
> The cli correctly identifies the composite key types if I type them wrong.
> From example an 'a' instead of the '4'.
>
> What am I doing wrong?
>
> Thanks.
>
<>

Composite Key Query in CLI

2012-03-13 Thread Ali Basiri
Hey,

I'm have a set of composite keys with data and trying to query them through
the CLI. However, the result set returned is always empty.

The schema is like this:

   ColumnFamily: Routes
  Key Validation Class:
org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.TimeUUIDType,org.apache.cassandra.db.marshal.IntegerType)
  Default column value validator:
org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  ...

The Data:
---
RowKey: fd24a000-6d51-11e1-a260-109addb27473:4
=> (column=enabled, value=true, timestamp=1331673484419000)
=> (column=providerId, value=0575af10-6d52-11e1-a260-109addb27473,
timestamp=1331673484419001)
---
RowKey: fd24a000-6d51-11e1-a260-109addb27473:5
=> (column=enabled, value=true, timestamp=1331673476181000)
=> (column=providerId, value=0086b6c0-6d52-11e1-a260-109addb27473,
timestamp=1331673476181001)
---


The Query:
>  get Routes['fd24a000-6d51-11e1-a260-109addb27473:4'];
Returned 0 results.
Elapsed time: 4 msec(s).

The cli correctly identifies the composite key types if I type them wrong.
>From example an 'a' instead of the '4'.

What am I doing wrong?

Thanks.


Re: Cassndra 1.0.6 GC query

2012-02-27 Thread Ben Coverston
Heap dump is really the gold standard for analysis, but if you don't want
to take a heap dump for some reason:

1. Decrease the cache sizes
2. Increase the index interval size

These in combination may reduce pressure on the heap enough so you do not
see these warnings in the log.

On Mon, Feb 27, 2012 at 4:12 PM, Roshan  wrote:

> As a configuration issue, I haven't enable the heap dump directory.
>
> Is there another way to find the cause to this and identify possible
> configuration changes?
>
> Thanks.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323690.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>



-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: Cassndra 1.0.6 GC query

2012-02-27 Thread Roshan
As a configuration issue, I haven't enable the heap dump directory. 

Is there another way to find the cause to this and identify possible
configuration changes?

Thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323690.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassndra 1.0.6 GC query

2012-02-27 Thread Jonathan Ellis
Take a heap dump (there should be one from when you OOMed) and see
what is consuming your memory.

On Mon, Feb 27, 2012 at 3:45 PM, Roshan  wrote:
> Hi Experts
>
> After getting an OOM error in production, I reduce the
> -XX:CMSInitiatingOccupancyFraction to .45 (from .75) and
> flush_largest_memtables_at to .45 (from .75). But still I am get an warning
> message in production for the same Cassandra node regarding OOM. Also reduce
> the concurrent compactions to one.
>
> 2012-02-27 08:01:12,913 WARN  [GCInspector] Heap is 0.45604122065696395
> full.  You may need to reduce memtable and/or cache sizes.  Cassandra will
> now flush up to the two largest memtables to free up memory.  Adjust
> flush_largest_memtables_at threshold in cassandra.yaml if you don't want
> Cassandra to do this automatically
> 2012-02-27 08:01:12,913 WARN  [StorageService] Flushing
> CFS(Keyspace='WCache', ColumnFamily='WStandard') to relieve memory pressure
>
> Could someone please explain why still I am getting GC warnings like above.
> Many thanks.
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323457.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Cassndra 1.0.6 GC query

2012-02-27 Thread Roshan
Hi Experts

After getting an OOM error in production, I reduce the
-XX:CMSInitiatingOccupancyFraction to .45 (from .75) and
flush_largest_memtables_at to .45 (from .75). But still I am get an warning
message in production for the same Cassandra node regarding OOM. Also reduce
the concurrent compactions to one.

2012-02-27 08:01:12,913 WARN  [GCInspector] Heap is 0.45604122065696395
full.  You may need to reduce memtable and/or cache sizes.  Cassandra will
now flush up to the two largest memtables to free up memory.  Adjust
flush_largest_memtables_at threshold in cassandra.yaml if you don't want
Cassandra to do this automatically
2012-02-27 08:01:12,913 WARN  [StorageService] Flushing
CFS(Keyspace='WCache', ColumnFamily='WStandard') to relieve memory pressure

Could someone please explain why still I am getting GC warnings like above.
Many thanks.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tp7323457p7323457.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Geohash nearby query implementation in Cassandra.

2012-02-17 Thread Mike Malone
2012/2/17 Raúl Raja Martínez 

>  Hello everyone,
>
> I'm working on a application that uses Cassandra and has a geolocation
> component.
> I was wondering beside the slides and video at
> http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that
> simplegeo published regarding their strategy if anyone has implemented
> geohash storage and search in cassandra.
> The basic usage is to allow a user to find things close to a geo location
> based on distance radius.
>
> I though about a couple of approaches.
>
> 1. Have the geohashes be the keys using the Ordered partitioner and get a
> group of rows between keys then store the items as columns in what it would
> end up looking like wide rows since each column would point to another row
> in a different column family representing the item nearby.
>

That's what we did early on at SimpleGeo.


> 2. Simply store the geohash prefixes as columns and use secondary indexes
> to do queries such as >= and <=.
>

This seems like a reasonable approach now that secondary indexes are
available. It might even address some of the hotspot problems we had with
the order preserving partitioner since the indices are distributed across
all hosts. Of course there are tradeoffs there too. Seems like a viable
option for sure.


> The problem I'm facing in both cases is ordering by distance and searching
> neighbors.
>

This will always be a problem with dimensionality reduction techniques like
geohashes. A brief bit of pedantry: it is mathematically impossible to do
dimensionality reduction without losing information. You can't embed a 2
dimensional space in a 1 dimensional space and preserve the 2D
topology. This manifests itself all sorts of ways, but when it comes to
doing kNN queries it's particularly obvious. Things that are near in 2D
space can be far apart in 1D space and vice versa. Doing a 1D embedding
like this will always result in suboptimal performance for at least some
queries. You'll have to over-fetch and post-process to get the correct
results.

That said, a 1D embedding is certainly easier to code since
multidimensional indexes are not available in Cassandra. And there are
plenty of data sets that don't hit any degenerate cases. Moreover, if
you're mostly doing bounding-radius queries the geohash approach isn't
nearly as bad (the only trouble comes when you want to limit the results,
in which case you often want things ordered by distance from centroid and
the query is no longer a bounding radius query - rather, it's a kNN with a
radius constraint). In any case, geohash is a reasonable starting point, at
least.

The neighbors problem is clearly explained here:
> https://github.com/davetroy/geohash-js
>
> Once the neighbors are calculated an item can be fetched with SQL similar
> to this.
>
> SELECT * FROM table WHERE LEFT(geohash,6) IN ('dqcjqc',
> 'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8')
>
> Since Cassandra does not currently support OR or a IN statement with
> elements that are not keys I'm not sure what the best way to implement
> geohashes may be.
>

Can't you use the thrift interface and use multiget_slice? If I recall
correctly, we implemented a special version of multiget_slice that stopped
when we got a certain number of columns across all rows. I don't have that
code handy but we did that work early in our Cassandra careers and,
starting from the thrift interface and following control flow for the
multiget_slice command, it wasn't terribly difficult to add.

Mike


Geohash nearby query implementation in Cassandra.

2012-02-17 Thread Raúl Raja Martínez
Hello everyone, 

I'm working on a application that uses Cassandra and has a geolocation 
component.
I was wondering beside the slides and video at 
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that 
simplegeo published regarding their strategy if anyone has implemented geohash 
storage and search in cassandra.
The basic usage is to allow a user to find things close to a geo location based 
on distance radius.

I though about a couple of approaches.

1. Have the geohashes be the keys using the Ordered partitioner and get a group 
of rows between keys then store the items as columns in what it would end up 
looking like wide rows since each column would point to another row in a 
different column family representing the item nearby.

2. Simply store the geohash prefixes as columns and use secondary indexes to do 
queries such as >= and <=. 

The problem I'm facing in both cases is ordering by distance and searching 
neighbors. 

The neighbors problem is clearly explained here: 
https://github.com/davetroy/geohash-js

Once the neighbors are calculated an item can be fetched with SQL similar to 
this.

SELECT * FROM table WHERE LEFT(geohash,6) IN ('dqcjqc', 
'dqcjqf','dqcjqb','dqcjr1','dqcjq9','dqcjqd','dqcjr4','dqcjr0','dqcjq8')

Since Cassandra does not currently support OR or a IN statement with elements 
that are not keys I'm not sure what the best way to implement geohashes may be.

Thanks in advance for any tips.

Re: CQL query issue when fetching data from Cassandra

2012-02-16 Thread aaron morton
> 1). The "IN" operator is not working
> SELECT * FROM TestCF WHERE status IN ('Failed', 'Success')
IN is only valid for filtering on the row KEY
http://www.datastax.com/docs/1.0/references/cql/SELECT

e.g. it generates this error using cqlsh
cqlsh:dev> SELECT * FROM TestCF WHERE status IN ('Failed', 'Success');
Bad Request: Expected key 'KEY' to be present in WHERE clause for 'TestCF'

> 2) The "OR" operator is not fetching data.
>SELECT * FROM TestCF WHERE status='Failed' OR status='Success'
Not a valid statement 
cqlsh:dev> SELECT * FROM TestCF WHERE status='Failed' OR status='Success';
Bad Request: line 1:45 mismatched input 'OR' expecting EOF

> 3) If I use "AND" operator, it also not sending data. Query doesn't have
> issues, but result set is null.
>SELECT * FROM TestCF WHERE status='Failed' AND status='Success'

cqlsh:dev> SELECT * FROM TestCF WHERE status='Failed' AND status='Success';
Bad Request: cannot parse 'Failed' as hex bytes

Because the CF definition says the status column should contain bytes not 
ascii. 

These restrictions on secondary indexes still hold
http://www.datastax.com/docs/0.7/data_model/secondary_indexes

it looks like you are implementing a relational model. You may get some value 
is trying an approach that uses less secondary CF's. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/02/2012, at 6:58 PM, Roshan wrote:

> Hi
> 
> I am using Cassandra 1.0.6 version and having one column family in my
> keyspace.
> 
> create column family TestCF
>with comparator = UTF8Type
>and column_metadata = [
>   {column_name : userid,
>validation_class : BytesType,
>index_name : userid_idx,
>index_type : KEYS},
>   {column_name : workspace,
>validation_class : BytesType,
>index_name : wp_idx,
>index_type : KEYS},
>   {column_name : module,
>validation_class : BytesType,
>index_name : module_idx,
>index_type : KEYS},
>   {column_name : action,
>validation_class : BytesType,
>index_name : action_idx,
>index_type : KEYS},
>   {column_name : description,
>validation_class : BytesType},
>   {column_name : status,
>validation_class : BytesType,
>index_name : status_idx,
>index_type : KEYS},
>   {column_name : createdtime,
>validation_class : BytesType},
>   {column_name : created,
>validation_class : BytesType,
>index_name : created_idx,
>index_type : KEYS},
>   {column_name : logdetail,
>validation_class : BytesType}]
>and keys_cached = 1
>and rows_cached = 1000
>and row_cache_save_period = 0
>and key_cache_save_period = 3600
>and memtable_throughput = 255
>and memtable_operations = 0.29;
> 
> 1). The "IN" operator is not working
> SELECT * FROM TestCF WHERE status IN ('Failed', 'Success')
> 2) The "OR" operator is not fetching data.
>SELECT * FROM TestCF WHERE status='Failed' OR status='Success'
> 3) If I use "AND" operator, it also not sending data. Query doesn't have
> issues, but result set is null.
>SELECT * FROM TestCF WHERE status='Failed' AND status='Success'
> 4) Is there any thing similar to "LIKE" in CQL? I want to search data based
> on some part of string.
> 
> Could someone please help me to solve the above issues? Thanks.
> 
> 
> 
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-issue-when-fetching-data-from-Cassandra-tp7290072p7290072.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



Re: CQL query issue when fetching data from Cassandra

2012-02-16 Thread R. Verlangen
I'm not sure about your first 2 questions. The third might be an exception:
check your Cassandra logs.

About the "like"-thing: there's no such query possibiliy in Cassandra / CQL.

You can take a look at Hadoop / Hive to tackle those problems.

2012/2/16 Roshan 

> Hi
>
> I am using Cassandra 1.0.6 version and having one column family in my
> keyspace.
>
> create column family TestCF
>with comparator = UTF8Type
>and column_metadata = [
>{column_name : userid,
>validation_class : BytesType,
>index_name : userid_idx,
>index_type : KEYS},
>{column_name : workspace,
>validation_class : BytesType,
>index_name : wp_idx,
>index_type : KEYS},
>{column_name : module,
>validation_class : BytesType,
>index_name : module_idx,
>index_type : KEYS},
>{column_name : action,
>validation_class : BytesType,
>index_name : action_idx,
>index_type : KEYS},
>{column_name : description,
>validation_class : BytesType},
>{column_name : status,
>validation_class : BytesType,
>index_name : status_idx,
>index_type : KEYS},
>{column_name : createdtime,
>validation_class : BytesType},
>{column_name : created,
>validation_class : BytesType,
>index_name : created_idx,
>index_type : KEYS},
>{column_name : logdetail,
>validation_class : BytesType}]
>and keys_cached = 1
>and rows_cached = 1000
>and row_cache_save_period = 0
>and key_cache_save_period = 3600
>and memtable_throughput = 255
>and memtable_operations = 0.29;
>
> 1). The "IN" operator is not working
> SELECT * FROM TestCF WHERE status IN ('Failed', 'Success')
> 2) The "OR" operator is not fetching data.
>SELECT * FROM TestCF WHERE status='Failed' OR status='Success'
> 3) If I use "AND" operator, it also not sending data. Query doesn't have
> issues, but result set is null.
>SELECT * FROM TestCF WHERE status='Failed' AND status='Success'
> 4) Is there any thing similar to "LIKE" in CQL? I want to search data based
> on some part of string.
>
> Could someone please help me to solve the above issues? Thanks.
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-issue-when-fetching-data-from-Cassandra-tp7290072p7290072.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


CQL query issue when fetching data from Cassandra

2012-02-15 Thread Roshan
Hi

I am using Cassandra 1.0.6 version and having one column family in my
keyspace.

create column family TestCF
with comparator = UTF8Type
and column_metadata = [
{column_name : userid,
validation_class : BytesType,
index_name : userid_idx,
index_type : KEYS},
{column_name : workspace,
validation_class : BytesType,
index_name : wp_idx,
index_type : KEYS},
{column_name : module,
validation_class : BytesType,
index_name : module_idx,
index_type : KEYS},
{column_name : action,
validation_class : BytesType,
index_name : action_idx,
index_type : KEYS},
{column_name : description,
validation_class : BytesType},
{column_name : status,
validation_class : BytesType,
index_name : status_idx,
index_type : KEYS},
{column_name : createdtime,
validation_class : BytesType},
{column_name : created,
validation_class : BytesType,
index_name : created_idx,
index_type : KEYS},
{column_name : logdetail,
validation_class : BytesType}]
and keys_cached = 1
and rows_cached = 1000
and row_cache_save_period = 0
and key_cache_save_period = 3600
and memtable_throughput = 255
and memtable_operations = 0.29;

1). The "IN" operator is not working
 SELECT * FROM TestCF WHERE status IN ('Failed', 'Success')
2) The "OR" operator is not fetching data.
SELECT * FROM TestCF WHERE status='Failed' OR status='Success'
3) If I use "AND" operator, it also not sending data. Query doesn't have
issues, but result set is null.
SELECT * FROM TestCF WHERE status='Failed' AND status='Success'
4) Is there any thing similar to "LIKE" in CQL? I want to search data based
on some part of string.

Could someone please help me to solve the above issues? Thanks.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-issue-when-fetching-data-from-Cassandra-tp7290072p7290072.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Can you query Cassandra while it's doing major compaction

2012-02-03 Thread Adrian Cockcroft
At Netflix we rotate the major compactions around the cluster, don't
run them all at once. We also either take that node out of client
traffic so it doesn't get used as a coordinator or use the Astyanax
client that is latency and token aware to steer traffic to the other
replicas.

We are running on EC2 with lots of CPU and RAM but only two internal
disk spindles, so if you have lots of IOPS available in your config,
you may find that it doesn't affect read times much.

Adrian

On Thu, Feb 2, 2012 at 11:44 PM, Peter Schuller
 wrote:
>> If every node in the cluster is running major compaction, would it be able to
>> answer any read request?  And is it wise to write anything to a cluster
>> while it's doing major compaction?
>
> Compaction is something that is supposed to be continuously running in
> the background. As noted, it will have a performance impact in that it
> (1) generates I/O, (2) leads to cache eviction, and (if you're CPU
> bound rather than disk bound) (3) adds CPU load.
>
> But there is no intention that clients should have to care about
> compaction; it's to be viewed as a background operation continuously
> happening. A good rule of thumb is that an individual node should be
> able to handle your traffic when doing compaction; you don't want to
> be in the position where you're just barely dealing with the traffic,
> and a node doing compaction not being able to handle it.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Can you query Cassandra while it's doing major compaction

2012-02-02 Thread Peter Schuller
> If every node in the cluster is running major compaction, would it be able to
> answer any read request?  And is it wise to write anything to a cluster
> while it's doing major compaction?

Compaction is something that is supposed to be continuously running in
the background. As noted, it will have a performance impact in that it
(1) generates I/O, (2) leads to cache eviction, and (if you're CPU
bound rather than disk bound) (3) adds CPU load.

But there is no intention that clients should have to care about
compaction; it's to be viewed as a background operation continuously
happening. A good rule of thumb is that an individual node should be
able to handle your traffic when doing compaction; you don't want to
be in the position where you're just barely dealing with the traffic,
and a node doing compaction not being able to handle it.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Can you query Cassandra while it's doing major compaction

2012-02-02 Thread R. Verlangen
It will have a performance penalty, so it would be better to spread the
compactions over a period of time. But Cassandra will still take care of
any reads/writes (within the given timeout).

2012/2/3 myreasoner 

> If every node in the cluster is running major compaction, would it be able
> to
> answer any read request?  And is it wise to write anything to a cluster
> while it's doing major compaction?
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-you-query-Cassandra-while-it-s-doing-major-compaction-tp7249985p7249985.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Hector + Range query problem

2012-01-18 Thread Philippe
Hi aaron

Nope: I'm using BOP...forgot to mention it in my original message.

I changed it to a multiget and it works but i think the range would be more
efficient so I'd really like to solve this.
Thanks
Le 18 janv. 2012 09:18, "aaron morton"  a écrit :

> Does this help ?
> http://wiki.apache.org/cassandra/FAQ#range_rp
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/01/2012, at 10:58 AM, Philippe wrote:
>
> Hello,
> I've been trying to retrieve rows based on key range but every single time
> I test, Hector retrieves ALL the rows, no matter the range I give it.
> What can I possibly be doing wrong ? Thanks.
>
> I'm doing a test on a single-node RF=1 cluster (c* 1.0.5) with one column
> family (I've added & truncated the CF quite a few times during my tests).
> Each row has a single column whose name is the byte value "2". The keys
> are 0,1,2,3 (shifted by a number of bits). The values are 0,1,2,3.
> list in the CLI gives me
>
> Using default limit of 100
> ---
> RowKey: 02
> => (column=02, value=00, timestamp=1326750723079000)
> ---
> RowKey: 010002
> => (column=02, value=01, timestamp=1326750723239000)
> ---
> RowKey: 020002
> => (column=02, value=02, timestamp=1326750723329000)
> ---
> RowKey: 030002
> => (column=02, value=03, timestamp=1326750723416000)
>
> 4 Rows Returned.
>
>
>
> Hector code:
>
>> RangeSlicesQuery query =
>> HFactory.createRangeSlicesQuery(keyspace, keySerializer,
>> columnNameSerializer, BytesArraySerializer
>> .get());
>> query.setColumnFamily(overlay).setKeys(keyStart, keyEnd).setColumnNames((
>> byte)2);
>
> query.execute();
>
>
> The execution log shows
>
> 1359 [main] INFO  com.sensorly.heatmap.drawing.cassandra.CassandraTileDao
>>  - Range query from TileKey [overlayName=UNSET, tilex=0, tiley=0, zoom=2]
>> to TileKey [overlayName=UNSET, tilex=1, tiley=0, zoom=2] => morton codes =
>> [02,010002]
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=0,
>> zoom=2] with 1 columns, morton = 000002
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=0,
>> zoom=2] with 1 columns, morton = 010002
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=1,
>> zoom=2] with 1 columns, morton = 020002
>> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=1,
>> zoom=2] with 1 columns, morton = 030002
>
> => ALL rows are returned when I really expect it to only return the 1st
> one.
>
>
>
>
>
>


Re: Hector + Range query problem

2012-01-18 Thread aaron morton
Does this help ? 
http://wiki.apache.org/cassandra/FAQ#range_rp

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/01/2012, at 10:58 AM, Philippe wrote:

> Hello,
> I've been trying to retrieve rows based on key range but every single time I 
> test, Hector retrieves ALL the rows, no matter the range I give it.
> What can I possibly be doing wrong ? Thanks.
> 
> I'm doing a test on a single-node RF=1 cluster (c* 1.0.5) with one column 
> family (I've added & truncated the CF quite a few times during my tests).
> Each row has a single column whose name is the byte value "2". The keys are 
> 0,1,2,3 (shifted by a number of bits). The values are 0,1,2,3.
> list in the CLI gives me
> 
> Using default limit of 100
> ---
> RowKey: 02
> => (column=02, value=00, timestamp=1326750723079000)
> ---
> RowKey: 010002
> => (column=02, value=01, timestamp=1326750723239000)
> ---
> RowKey: 020002
> => (column=02, value=02, timestamp=1326750723329000)
> ---
> RowKey: 030002
> => (column=02, value=03, timestamp=1326750723416000)
> 
> 4 Rows Returned.
> 
> 
> 
> Hector code:
> RangeSlicesQuery query = 
> HFactory.createRangeSlicesQuery(keyspace, keySerializer, 
> columnNameSerializer, BytesArraySerializer
> .get());
> query.setColumnFamily(overlay).setKeys(keyStart, 
> keyEnd).setColumnNames((byte)2);
> query.execute();  
> 
> 
> The execution log shows
> 
> 
> 1359 [main] INFO  com.sensorly.heatmap.drawing.cassandra.CassandraTileDao  - 
> Range query from TileKey [overlayName=UNSET, tilex=0, tiley=0, zoom=2] to 
> TileKey [overlayName=UNSET, tilex=1, tiley=0, zoom=2] => morton codes = 
> [02,010002]
> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=0, 
> zoom=2] with 1 columns, morton = 02
> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=0, 
> zoom=2] with 1 columns, morton = 010002
> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=1, 
> zoom=2] with 1 columns, morton = 020002
> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=1, 
> zoom=2] with 1 columns, morton = 030002
> => ALL rows are returned when I really expect it to only return the 1st one.
> 
> 
> 
> 
> 



Hector + Range query problem

2012-01-16 Thread Philippe
Hello,
I've been trying to retrieve rows based on key range but every single time
I test, Hector retrieves ALL the rows, no matter the range I give it.
What can I possibly be doing wrong ? Thanks.

I'm doing a test on a single-node RF=1 cluster (c* 1.0.5) with one column
family (I've added & truncated the CF quite a few times during my tests).
Each row has a single column whose name is the byte value "2". The keys are
0,1,2,3 (shifted by a number of bits). The values are 0,1,2,3.
list in the CLI gives me

Using default limit of 100
---
RowKey: 02
=> (column=02, value=00, timestamp=1326750723079000)
---
RowKey: 010002
=> (column=02, value=01, timestamp=1326750723239000)
---
RowKey: 020002
=> (column=02, value=02, timestamp=1326750723329000)
---
RowKey: 030002
=> (column=02, value=03, timestamp=1326750723416000)

4 Rows Returned.



Hector code:

> RangeSlicesQuery query =
> HFactory.createRangeSlicesQuery(keyspace, keySerializer,
> columnNameSerializer, BytesArraySerializer
> .get());
> query.setColumnFamily(overlay).setKeys(keyStart, keyEnd).setColumnNames((
> byte)2);

query.execute();


The execution log shows

1359 [main] INFO  com.sensorly.heatmap.drawing.cassandra.CassandraTileDao
>  - Range query from TileKey [overlayName=UNSET, tilex=0, tiley=0, zoom=2]
> to TileKey [overlayName=UNSET, tilex=1, tiley=0, zoom=2] => morton codes =
> [02,010002]
> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=0,
> zoom=2] with 1 columns, morton = 02
> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=0,
> zoom=2] with 1 columns, morton = 010002
> getFiles() query returned TileKey [overlayName=UNSET, tilex=0, tiley=1,
> zoom=2] with 1 columns, morton = 020002
> getFiles() query returned TileKey [overlayName=UNSET, tilex=1, tiley=1,
> zoom=2] with 1 columns, morton = 030002

=> ALL rows are returned when I really expect it to only return the 1st one.


simplest example of a query by date range

2011-12-27 Thread Michael Cetrulo
I want to store an ID and a date and I want to retrieve all entries from
dateA up to dateB, what exactly do I need to be able to perform:
select from my_column_family where date >= dateA and date < dateB;

@so: http://stackoverflow.com/q/8638646/226201


Re: a query that's killing cassandra

2011-12-12 Thread Philippe
You've got at least one row over 1GB, compacted !
Have you checked whether you are running out of heap ?

2011/12/12 Wojtek Kulik 

> Hello everyone!
>
> I have a strange problem with Cassandra (v1.0.5, one node, 8GB, 2xCPU): a
> query asking for each key from a certain (super) CF results in timeout and
> almost dead cassandra after that (it's somewhat alive, but does not return
> any data - has to be restarted).
>
> CF details:
>Column Family: 
>SSTable count: 11
>Space used (live): 4254606435
>Space used (total): 4254606435
>Number of Keys (estimate): 1408
>Memtable Columns Count: 26923
>Memtable Data Size: 25420941
>Memtable Switch Count: 16
>Read Count: 0
>Read Latency: NaN ms.
>Write Count: 39649
>Write Latency: 0.032 ms.
>Pending Tasks: 0
>Bloom Filter False Postives: 0
>Bloom Filter False Ratio: 0.0
>Bloom Filter Space Used: 30416
>Key cache capacity: 24
>Key cache size: 24
>Key cache hit rate: NaN
>Row cache: disabled
>Compacted row minimum size: 125
>Compacted row maximum size: 1155149911
>Compacted row mean size: 38855255
>
> Nothing suspicious in the logs. The problem is fully replicable.
> I've spent some time searching for similar issues - haven't found any.
>
> Are there any debug options I could turn on to find out more? Other
> suggestions/thoughts?
>
> Thanks in advance,
> Wojtek
>


a query that's killing cassandra

2011-12-12 Thread Wojtek Kulik

Hello everyone!

I have a strange problem with Cassandra (v1.0.5, one node, 8GB, 2xCPU): 
a query asking for each key from a certain (super) CF results in timeout 
and almost dead cassandra after that (it's somewhat alive, but does not 
return any data - has to be restarted).


CF details:
Column Family: 
SSTable count: 11
Space used (live): 4254606435
Space used (total): 4254606435
Number of Keys (estimate): 1408
Memtable Columns Count: 26923
Memtable Data Size: 25420941
Memtable Switch Count: 16
Read Count: 0
Read Latency: NaN ms.
Write Count: 39649
Write Latency: 0.032 ms.
Pending Tasks: 0
Bloom Filter False Postives: 0
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 30416
Key cache capacity: 24
Key cache size: 24
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 125
Compacted row maximum size: 1155149911
Compacted row mean size: 38855255

Nothing suspicious in the logs. The problem is fully replicable.
I've spent some time searching for similar issues - haven't found any.

Are there any debug options I could turn on to find out more? Other 
suggestions/thoughts?


Thanks in advance,
Wojtek


Varying number of rows coming from same query on same database

2011-11-17 Thread Maxim Potekhin

Hello,

I'm running the same query repeatedly. It's a secondary index query,
done from a Pycassa client. I see that when I iterate the "result" object,
I get slightly different number of entries when running the test serially.
There is no deletions in the database, and no writes, it's static for now.

Any comments will be appreciated.

Maxim



Re: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate McCall
I think you wanted to use Int32Type instead of IntegerType for
creating the indexes. IntegerType is actually representative of
java.math.BigInteger.

On Tue, Nov 8, 2011 at 12:28 PM, Nate Sammons  wrote:
> Interesting…  if I switch the columns to be UTF8 instead of integers, like
> this:
>
>
>
> create column family IndexTest with
>
>   key_validation_class = UTF8Type
>
>   and comparator = UTF8Type
>
>   and column_metadata = [
>
>   {column_name:year, validation_class:UTF8Type, index_type: KEYS},
>
>   {column_name:month, validation_class:UTF8Type, index_type: KEYS},
>
>   {column_name:day, validation_class:UTF8Type, index_type: KEYS},
>
>   {column_name:hour, validation_class:UTF8Type, index_type: KEYS},
>
>   {column_name:minute, validation_class:UTF8Type, index_type: KEYS},
>
>   {column_name:data, validation_class:UTF8Type}
>
>   ];
>
>
>
>
>
> And change the hector code to use setString(…) instead of setInteger(…).
>
>
>
> Then everything works fine.   Is there a CQL bug with respect to non-string
> columns?
>
>
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
>
>
> From: Nate Sammons [mailto:nsamm...@ften.com]
> Sent: Tuesday, November 08, 2011 11:14 AM
>
> To: user@cassandra.apache.org
> Subject: RE: Secondary index issue, unable to query for records that should
> be there
>
>
>
> Note that I had identical behavior using a fresh download of Cassandra 1.0.2
> as of today.
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
> From: Nate Sammons [mailto:nsamm...@ften.com]
> Sent: Tuesday, November 08, 2011 10:20 AM
> To: user@cassandra.apache.org
> Subject: RE: Secondary index issue, unable to query for records that should
> be there
>
>
>
> I restarted with logging turned up to DEBUG, and after quite a bit of
> logging during startup, I re-ran a query:
>
>
>
>
>
> get IndexTest where year=2011 and month=1 and day=14 and hour=18 and
> minute=49;
>
>
>
>
>
> produced the following in the following:
>
>
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line
> 728) scan
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line
> 1017) restricted ranges for query [-1,-1] are
> [[-1,160425280223280959086247334056682279392],
> (160425280223280959086247334056682279392,-1]]
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line
> 1104) scan ranges are
> [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77)
> Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line
> 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c from
> natebookpro/127.0.1.1
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96)
> Primary scan clause is minute
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109)
> Expanding slice filter to entire row to cover additional expressions
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151)
> Scanning index 'IndexTest.minute EQ 49' starting with
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line
> 189) collectAllData
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163)
> fetched null
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line
> 46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1
>
> DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 808@natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77)
> Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line
> 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d from
> natebookpro/127.0.1.1
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96)
> Primary scan clause is minute
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109)
> Expanding slice filter to entire row to cover additional expressions
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151)
> Scanning index 'IndexTest.minute EQ 49' starting with
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line
> 189) collectAllDa

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
Interesting...  if I switch the columns to be UTF8 instead of integers, like 
this:

create column family IndexTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  {column_name:year, validation_class:UTF8Type, index_type: KEYS},
  {column_name:month, validation_class:UTF8Type, index_type: KEYS},
  {column_name:day, validation_class:UTF8Type, index_type: KEYS},
  {column_name:hour, validation_class:UTF8Type, index_type: KEYS},
  {column_name:minute, validation_class:UTF8Type, index_type: KEYS},
  {column_name:data, validation_class:UTF8Type}
  ];


And change the hector code to use setString(...) instead of setInteger(...).

Then everything works fine.   Is there a CQL bug with respect to non-string 
columns?


Thanks,

-nate



From: Nate Sammons [mailto:nsamm...@ften.com]
Sent: Tuesday, November 08, 2011 11:14 AM
To: user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be 
there

Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as 
of today.

Thanks,

-nate


From: Nate Sammons [mailto:nsamm...@ften.com]<mailto:[mailto:nsamm...@ften.com]>
Sent: Tuesday, November 08, 2011 10:20 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Secondary index issue, unable to query for records that should be 
there

I restarted with logging turned up to DEBUG, and after quite a bit of logging 
during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) 
scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) 
restricted ranges for query [-1,-1] are 
[[-1,160425280223280959086247334056682279392], 
(160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) 
scan ranges are 
[-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@7bc203c<mailto:org.apache.cassandra.db.IndexScanCommand@7bc203c>
 from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@6a25a21d<mailto:org.apache.cassandra.db.IndexScanCommand@6a25a21d>
 from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" 
produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraSer

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as 
of today.

Thanks,

-nate


From: Nate Sammons [mailto:nsamm...@ften.com]
Sent: Tuesday, November 08, 2011 10:20 AM
To: user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be 
there

I restarted with logging turned up to DEBUG, and after quite a bit of logging 
during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) 
scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) 
restricted ranges for query [-1,-1] are 
[[-1,160425280223280959086247334056682279392], 
(160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) 
scan ranges are 
[-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@7bc203c<mailto:org.apache.cassandra.db.IndexScanCommand@7bc203c>
 from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) 
reading 
org.apache.cassandra.db.IndexScanCommand@6a25a21d<mailto:org.apache.cassandra.db.IndexScanCommand@6a25a21d>
 from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 
809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" 
produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) 
get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) 
Command/ConsistencyLevel is SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) 
Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) 
reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) 
LocalReadRunnable reading SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reve

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
I restarted with logging turned up to DEBUG, and after quite a bit of logging 
during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) 
scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) 
restricted ranges for query [-1,-1] are 
[[-1,160425280223280959086247334056682279392], 
(160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) 
scan ranges are 
[-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) 
reading org.apache.cassandra.db.IndexScanCommand@7bc203c from 
natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
808@natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) 
Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) 
reading org.apache.cassandra.db.IndexScanCommand@6a25a21d from 
natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) 
Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) 
Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) 
Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) 
fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 
46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 
ResponseVerbHandler.java (line 44) Processing response on a callback from 
809@natebookpro/127.0.1.1



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" 
produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) 
get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) 
Command/ConsistencyLevel is SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) 
Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) 
reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) 
LocalReadRunnable reading SliceFromReadCommand(table='Test', 
key='323031312d312d31342d31382d34392d2d31', 
column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', 
columnName='null')', start='', finish='', reversed=false, count=100)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 
189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) 
collecting 0 of 100: data:false:512@1320769510502017
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
collecting 1 of 100: day:false:4@1320769510502014
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
collecting 2 of 100: hour:false:4@1320769510502015
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) 
c

RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
Here is a simple test that shows the problem.  My setup is:


-  DSE 1.0.3 on Ubuntu 11.04, JDK 1.6.0_29 on x86_64, installed from 
the DataStax debian repo (yesterday)

-  Hector 1.0-1 (from maven)

Attached is a CLI file to create the keyspace and CF, and a java file to insert 
data and do some queries.


This creates the following CF:

create column family IndexTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  {column_name:year, validation_class:IntegerType, index_type: KEYS},
  {column_name:month, validation_class:IntegerType, index_type: KEYS},
  {column_name:day, validation_class:IntegerType, index_type: KEYS},
  {column_name:hour, validation_class:IntegerType, index_type: KEYS},
  {column_name:minute, validation_class:IntegerType, index_type: KEYS},
  {column_name:data, validation_class:UTF8Type}
  ];


Then inserts 5 rows per minute value, with the following values for 
year/month/day/hour/minute:

Year: 2011
Month: 1, 2
Day: 1-15
Hour: 1-23
Minute: 1-59

For a total of 203,550 rows.  For queries it just picks some known values for 
year/month/day/hour/minute at random and looks for rows, there should be 5 rows 
per combination.

Row keys are of the form YEAR-MONTH-DAY-HOUR-MINUTE-NUM (where NUM is 1-5).


Now once that data is inserted, using the CLI I can find records such as the 
following:


[default@Test] get IndexTest[2011-1-8-18-30--1];
=> (column=data, 
value=xvktwirapi0qs0ta29w9rchbdc2omsuv0k2chjqp9pmaodlj9ngecllaa8eq3nnx66p591b2a06mry4rpsvkd54ji5pbxikpc6mxj4czi4nuuxgoasibjd5yk65hdtqe8a0uq3yxnw81dgq6hkx8wnbs177rwo51xtkwuhwizoc0gul92pvo6tfivjgdschd9fjzfu4v1d1uxhih3argr1mp4i1h6fqybfv2utlzdzzqczq3ruu90647prrnqwdw1zqmd46ia175a929ltx2hoz8sv6rs817zm2myhp3wekfk3flnuniqgtpth7g5fns8q3oc8qde5btivt1j99gc1h2kxjbek1p448t1hs91lh9r6yrg1douj53sn7d81bnwp4nnbmz01dbr46fae1b9ter0zljet2nl1x751no6pdt64k2mdh0un01gerfihak6vn0wdvgzuv9soji3pwgnffkw2zvm5q0jlp1uf9nmy7gzswydpxwtvc35c6jw64d,
 timestamp=1320769482652005)
=> (column=day, value=8, timestamp=1320769482652002)
=> (column=hour, value=18, timestamp=1320769482652003)
=> (column=minute, value=30, timestamp=1320769482652004)
=> (column=month, value=1, timestamp=1320769482652001)
=> (column=year, value=2011, timestamp=1320769482652000)
Returned 6 results.


However a CQL query to find that same record fails:

[default@Test] get IndexTest where year=2011 and month=1 and day=8 and hour=18 
and minute=30;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1 and day=8 and hour=18;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1 and day=8;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1;


Similar results using CQLSH:

cqlsh> select * from IndexTest where year=2011 and month=1 and day=8 and 
hour=18 and minute=30;
cqlsh> select * from IndexTest where year=2011 and month=1 and day=8 and 
hour=18;
cqlsh> select * from IndexTest where year=2011 and month=1 and day=8;

(no results in any of those cases).




However, some data does show up through CQL (I omitted the column data for 
brevity):

[default@Test] get IndexTest where year=2011 and month=2 and day=8 and hour=18 
and minute=30;
---
RowKey: 2011-2-8-18-30--1
---
RowKey: 2011-2-8-18-30--4
---
RowKey: 2011-2-8-18-30--5
---
RowKey: 2011-2-8-18-30--2
---
RowKey: 2011-2-8-18-30--3

5 Rows Returned.


So it seems like (in this case), month=1 is not working, but month=2 does work 
(along with the other parts of the expression).  I havn't tried this a bunch of 
times to see if this is always the case, but it seems to be.


When running those queries using Hector, in the debugger the QueryResult's 
get() method returns null (which should have rows).



Thanks,

-nate



From: Jake Luciani [mailto:jak...@gmail.com]
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more 
insite into what's going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons 
mailto:nsamm...@ften.com>> wrote:
This is against a single server, not a cluster.  Replication factor for the 
keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if 
multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rka...@gmail.com<mailto:rka...@gmail.com>]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Nate, is this all against a

Re: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Jake Luciani
Hi Nate,

Could you try running it with debug enabled on the logs? it will give more
insite into what's going on.

-Jake


On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons  wrote:

> This is against a single server, not a cluster.  Replication factor for
> the keyspace is set to 1, CL is the default for Hector, which I think is
> QUORUM.
>
> ** **
>
> I’m trying to get a simple test together that shows this.  Does anyone
> know if multiple indexes like this are efficient?
>
> ** **
>
> Thanks,
>
> ** **
>
> -nate
>
> ** **
>
> ** **
>
> *From:* Riyad Kalla [mailto:rka...@gmail.com]
> *Sent:* Monday, November 07, 2011 4:31 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Secondary index issue, unable to query for records that
> should be there
>
> ** **
>
> Nate, is this all against a single Cassandra server, or do you have a ring
> setup? If you do have a ring setup, what is your replicationfactor set to?
> Also what ConsistencyLevel are you writing with when storing the values?**
> **
>
> ** **
>
> -R
>
> On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons  wrote:***
> *
>
> Hello,
>
>  
>
> I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
> a CF with several secondary indexes to try out some options.  Right now I
> have the following to create my CF using the CLI:
>
>  
>
> create column family MyTest with
>
>   key_validation_class = UTF8Type
>
>   and comparator = UTF8Type
>
>   and column_metadata = [
>
>   -- absolute timestamp for this message, also indexed
> year/month/day/hour/minute
>
>   -- index these as they are low cardinality
>
>   {column_name:messageTimestamp, validation_class:LongType},
>
>   {column_name:messageYear, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageMonth, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageDay, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageHour, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageMinute, validation_class:IntegerType,
> index_type: KEYS},
>
>  
>
>     … other non-indexed columns defined
>
>  
>
>   ];
>
>  
>
>  
>
> So when I insert data, I calculate a year/month/day/hour/minute and set
> these values on a Hector ColumnFamilyUpdater instance and update that way.
> Then later I can query from the command line with CQL such as:
>
>  
>
> get MyTest where messageYear=2011 and messageMonth=6 and
> messageDay=1 and messageHour=13 and messageMinute=44;
>
>  
>
> etc.  This generally works, however at some point queries that I know
> should return data no longer return any rows.
>
>  
>
> So for instance, part way through my test (inserting 250K rows), I can
> query for what should be there and get data back such as the above query,
> but later that same query returns 0 rows.  Similarly, with fewer clauses in
> the expression, like this:
>
>  
>
> get MyTest where messageYear=2011 and messageMonth=6;
>
>  
>
> Will also return 0 rows.
>
>  
>
>  
>
> ???
>
> Any idea what could be going wrong?  I’m not getting any exceptions in my
> client during the write, and I don’t see anything in the logs (no errors
> anyway).
>
>  
>
>  
>
>  
>
> A second question – is what I’m doing insane?  I’m not sure that
> performance on CQL queries with multiple indexed columns is good (does
> Cassandra intelligently use all available indexes on these queries?)
>
>  
>
>  
>
>  
>
> Thanks,
>
>  
>
> -nate
>
> ** **
>



-- 
http://twitter.com/tjake


RE: Secondary index issue, unable to query for records that should be there

2011-11-08 Thread Nate Sammons
This is against a single server, not a cluster.  Replication factor for the 
keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if 
multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rka...@gmail.com]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be 
there

Nate, is this all against a single Cassandra server, or do you have a ring 
setup? If you do have a ring setup, what is your replicationfactor set to? Also 
what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons 
mailto:nsamm...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF 
with several secondary indexes to try out some options.  Right now I have the 
following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  -- absolute timestamp for this message, also indexed 
year/month/day/hour/minute
  -- index these as they are low cardinality
  {column_name:messageTimestamp, validation_class:LongType},
  {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMonth, validation_class:IntegerType, index_type: 
KEYS},
  {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMinute, validation_class:IntegerType, index_type: 
KEYS},

... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these 
values on a Hector ColumnFamilyUpdater instance and update that way.  Then 
later I can query from the command line with CQL such as:

get MyTest where messageYear=2011 and messageMonth=6 and 
messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should 
return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query 
for what should be there and get data back such as the above query, but later 
that same query returns 0 rows.  Similarly, with fewer clauses in the 
expression, like this:

get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???
Any idea what could be going wrong?  I'm not getting any exceptions in my 
client during the write, and I don't see anything in the logs (no errors 
anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on 
CQL queries with multiple indexed columns is good (does Cassandra intelligently 
use all available indexes on these queries?)



Thanks,

-nate



Re: Secondary index issue, unable to query for records that should be there

2011-11-07 Thread Riyad Kalla
Nate, is this all against a single Cassandra server, or do you have a ring
setup? If you do have a ring setup, what is your replicationfactor set to?
Also what ConsistencyLevel are you writing with when storing the values?

-R

On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons  wrote:

> Hello,
>
> ** **
>
> I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
> a CF with several secondary indexes to try out some options.  Right now I
> have the following to create my CF using the CLI:
>
> ** **
>
> create column family MyTest with
>
>   key_validation_class = UTF8Type
>
>   and comparator = UTF8Type
>
>   and column_metadata = [
>
>   -- absolute timestamp for this message, also indexed
> year/month/day/hour/minute
>
>   -- index these as they are low cardinality
>
>   {column_name:messageTimestamp, validation_class:LongType},
>
>   {column_name:messageYear, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageMonth, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageDay, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageHour, validation_class:IntegerType, index_type:
> KEYS},
>
>   {column_name:messageMinute, validation_class:IntegerType,
> index_type: KEYS},
>
> ** **
>
> … other non-indexed columns defined
>
> ** **
>
>   ];
>
> ** **
>
> ** **
>
> So when I insert data, I calculate a year/month/day/hour/minute and set
> these values on a Hector ColumnFamilyUpdater instance and update that way.
> Then later I can query from the command line with CQL such as:
>
> ** **
>
> get MyTest where messageYear=2011 and messageMonth=6 and
> messageDay=1 and messageHour=13 and messageMinute=44;
>
> ** **
>
> etc.  This generally works, however at some point queries that I know
> should return data no longer return any rows.
>
> ** **
>
> So for instance, part way through my test (inserting 250K rows), I can
> query for what should be there and get data back such as the above query,
> but later that same query returns 0 rows.  Similarly, with fewer clauses in
> the expression, like this:
>
> ** **
>
> get MyTest where messageYear=2011 and messageMonth=6;
>
> ** **
>
> Will also return 0 rows.
>
> ** **
>
> ** **
>
> ???
>
> Any idea what could be going wrong?  I’m not getting any exceptions in my
> client during the write, and I don’t see anything in the logs (no errors
> anyway).
>
> ** **
>
> ** **
>
> ** **
>
> A second question – is what I’m doing insane?  I’m not sure that
> performance on CQL queries with multiple indexed columns is good (does
> Cassandra intelligently use all available indexes on these queries?)
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,
>
> ** **
>
> -nate
>
> 
>


Secondary index issue, unable to query for records that should be there

2011-11-07 Thread Nate Sammons
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF 
with several secondary indexes to try out some options.  Right now I have the 
following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  -- absolute timestamp for this message, also indexed 
year/month/day/hour/minute
  -- index these as they are low cardinality
  {column_name:messageTimestamp, validation_class:LongType},
  {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMonth, validation_class:IntegerType, index_type: 
KEYS},
  {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMinute, validation_class:IntegerType, index_type: 
KEYS},

... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these 
values on a Hector ColumnFamilyUpdater instance and update that way.  Then 
later I can query from the command line with CQL such as:

get MyTest where messageYear=2011 and messageMonth=6 and 
messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should 
return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query 
for what should be there and get data back such as the above query, but later 
that same query returns 0 rows.  Similarly, with fewer clauses in the 
expression, like this:

get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???
Any idea what could be going wrong?  I'm not getting any exceptions in my 
client during the write, and I don't see anything in the logs (no errors 
anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on 
CQL queries with multiple indexed columns is good (does Cassandra intelligently 
use all available indexes on these queries?)



Thanks,

-nate


Node OOM, Slice query - missing data?

2011-11-02 Thread Thomas Richter

Hi there,

We run a 3 node cluster with 0.7.8 with replication factor 3 for all key 
spaces.


We store external->internal key mappings in a column family with one row 
for each customer. The largest row contains abount 200k columns.
If we import external data we load the whole row and map external to 
internal keys. Loading is done like


SliceQuery q =
createSliceQuery(
keyspace,
getNewStringSerializer(),
KeySerializer.get(),
MappingSerializer.get());
q.setColumnFamily(CF_MAPPING);
q.setKey(key);
final int chunkSize = 1000;
Key start = null;
do {
q.setRange(start, null, false, chunkSize);
QueryResult> r = q.execute();
final List> columns = r.get().getColumns();
for (final HColumn c : columns) {
... (add to list)
}
if (columns.size() == chunkSize) {
start = columns.get(columns.size() - 1).getName();
} else {
start = null;
}
} while (start != null);

The code ran fine for several months. Some days ago the code above 
returned much less columns than expected (e.g. 1010 instead of 198k or 
14k instead of 44k).

Is there something wrong with the code?
As a result we created and stored new mappings and now everything is 
fine again.


We realized that we had trouble with one node right before that 
behaviour so we think that's the cause.


The node went down because of OOM, and during restart another OOM killed 
the node again. One or two OOMs later the node started without any 
trouble and all seemed fine. Some hours later the next import process 
ran and then we could not read all the expected data.


As this happened two days ago at least a minor compaction took place so 
all sstables after the node crash have been merged.


Is this a known issue or can somebody imaging what's the cause? If we 
are lucky we have a backup after the crash and before the "repair", but 
if not I don't have any ideas left how to figure out what happened.


So any idea about how to dig deeper into this is very welcome.

Best,

Thomas


Re: super sub slice query?

2011-10-27 Thread Caleb Rackliffe
I had the same question you did, I think.  Below is as far as I got with Hector…

I have a column family of super-columns with long names.  The columns in each 
super-column also have long names.  I'm using Hector, and what I want to do is 
get the last column in each super-column, for a range of super-columns.  I was 
able to get the last column in a column family  like this…


Cluster cluster = HFactory.getOrCreateCluster("Cortex", config);

Keyspace keyspace = HFactory.createKeyspace("Products", cluster);

RangeSlicesQuery rangeSlicesQuery =

HFactory.createRangeSlicesQuery(keyspace, StringSerializer.get(), 
StringSerializer.get(), StringSerializer.get());

rangeSlicesQuery.setColumnFamily("Attributes");

rangeSlicesQuery.setKeys("id0", "id0");

rangeSlicesQuery.setRange("", "", true, 1);


QueryResult> rsult = 
rangeSlicesQuery.execute();


…but no luck with the additional dimension.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Guy Incognito mailto:dnd1...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thu, 27 Oct 2011 06:34:08 -0400
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: super sub slice query?

is there such a thing?  a query that runs against a SC family and returns a 
subset of subcolumns from a set of super-columns?

is there a way to have eg a slice query (or super slice query) only return the 
column names, rather than the value as well?


super sub slice query?

2011-10-27 Thread Guy Incognito
is there such a thing?  a query that runs against a SC family and 
returns a subset of subcolumns from a set of super-columns?


is there a way to have eg a slice query (or super slice query) only 
return the column names, rather than the value as well?


Re: reverse range query performance

2011-09-26 Thread aaron morton
Does not matter to much but are you looking to get all the columns for some 
know keys (get_slice, multiget_slice) ? Or are you getting the columns for keys 
within a range (get_range_slices)? 

If you provide do a reversed query the server will skip to the "end" of the 
column range.  Here is some info I wrote about how the the different slice 
predicates work http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/


Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 27/09/2011, at 5:51 AM, Ramesh Natarajan wrote:

> Hi,
> 
>  I am trying to use the range query to retrieve a bunch of columns in reverse 
> order. The API documentation has a parameter bool reversed which should 
> return the results when queried using keys in a reverse order.  
> 
> Lets say my row has about 1500 columns with column names 1 to 1500, and I 
> query asking for columns  1500 (start ) - 1400 (end ) with reverse set to 
> true.
> 
> Does cassandra read the entire row  1 - 1500 columns and then return the 
> result 1400 - 1500 or it is optimized to look directly into the 1400 - 1500 
> columns?
> 
> thanks
> Ramesh
> 
> 
> SliceRange
> A SliceRange is a structure that stores basic range, ordering and limit 
> information for a query that will return multiple columns. It could be 
> thought of as Cassandra's version of LIMIT and ORDER BY.
> 
> Attribute
> Type
> Default
> Required
> Description
> start
> binary
> n/a
> Y
> The column name to start the slice with. This attribute is not required, 
> though there is no default value, and can be safely set to '', i.e., an empty 
> byte array, to start with the first column name. Otherwise, it must be a 
> valid value under the rules of the Comparator defined for the given 
> ColumnFamily.
> finish
> binary
> n/a
> Y
> The column name to stop the slice at. This attribute is not required, though 
> there is no default value, and can be safely set to an empty byte array to 
> not stop until count results are seen. Otherwise, it must also be a valid 
> value to the ColumnFamily Comparator.
> reversed
> bool
> false
> Y
> Whether the results should be ordered in reversed order. Similar to ORDER BY 
> blah DESC in SQL.
> count
> integer
> 100
> Y
> How many columns to return. Similar to LIMIT 100 in SQL. May be arbitrarily 
> large, but Thrift will materialize the whole result into memory before 
> returning it to the client, so be aware that you may be better served by 
> iterating through slices by passing the last value of one call in as the 
> start of the next instead of increasing count arbitrarily large.



reverse range query performance

2011-09-26 Thread Ramesh Natarajan
Hi,

 I am trying to use the range query to retrieve a bunch of columns in
reverse order. The API documentation has a parameter bool reversed which
should return the results when queried using keys in a reverse order.

Lets say my row has about 1500 columns with column names 1 to 1500, and I
query asking for columns  1500 (start ) - 1400 (end ) with reverse set to
true.

Does cassandra read the entire row  1 - 1500 columns and then return the
result 1400 - 1500 or it is optimized to look directly into the 1400 - 1500
columns?

thanks
Ramesh


SliceRange

A SliceRange is a structure that stores basic range, ordering and limit
information for a query that will return multiple columns. It could be
thought of as Cassandra's version of LIMIT and ORDER BY.

*Attribute*

*Type*

*Default*

*Required*

*Description*

start

binary

n/a

Y

The column name to start the slice with. This attribute is not required,
though there is no default value, and can be safely set to '', i.e., an
empty byte array, to start with the first column name. Otherwise, it must be
a valid value under the rules of the Comparator defined for the given
ColumnFamily.

finish

binary

n/a

Y

The column name to stop the slice at. This attribute is not required, though
there is no default value, and can be safely set to an empty byte array to
not stop until count results are seen. Otherwise, it must also be a valid
value to the ColumnFamily Comparator.

reversed

bool

false

Y

Whether the results should be ordered in reversed order. Similar to
ORDER BY blah DESC in SQL.

count

integer

100

Y

How many columns to return. Similar to LIMIT 100 in SQL. May be arbitrarily
large, but Thrift will materialize the whole result into memory before
returning it to the client, so be aware that you may be better served by
iterating through slices by passing the last value of one call in as the
start of the next instead of increasing count arbitrarily large.


Re: Why no need to query all nodes on secondary index lookup?

2011-09-06 Thread Kaj Magnus Lindberg
Hi Jonathan

Thanks for the explanation

Thanks, KajMagnus

On Mon, Sep 5, 2011 at 11:05 PM, Jonathan Ellis  wrote:
> The first node can answer the question as long as you've requested
> less rows than the first node has on it.  Hence the "low cardinality"
> point in what you quoted.
>
> On Sat, Sep 3, 2011 at 5:00 AM, Kaj Magnus Lindberg
>  wrote:
>> Hello Anyone
>>
>> I have a follow up question on a question from February 2011. In
>> short, I wonder why one won't have to query all Cassandra nodes when
>> doing a secondary index lookup -- although each node only indexes data
>> that it holds locally.
>>
>> The question and answer was:
>>  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
>> === Question ===
>> As far as I understand automatic secondary indexes are generated for
>> node local data.
>>   In this case query by secondary index involve all nodes storing part of
>> column family to get results (?) so (if i am right) if data is spread across
>> 50 nodes then 50 nodes are involved in single query?
>> [...]
>> === Answer ===
>> In practice, local secondary indexes scale to {RF * the limit of a single
>> machine} for -low cardinality- values (ex: users living in a certain state)
>> since the first node is likely to be able to answer your question. This also
>> means they are good for performing filtering for analytics.
>> [...]
>>
>> === Now I wonder ===
>> Why would the first node be likely to be able to answer the question?
>> It stores only index entries for users on that particular machine,
>>     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
>>     "Each node only indexes data that it holds locally" )
>> but users might be stored by user name? And would thus be stored on
>> many different machines? Even if they happen to live in the same
>> state?
>>
>> Why won't the client need to query the indexes of [all servers that
>> store info on users] to find all relevant users, when doing a user
>> property lookup?
>>
>>
>> Best regards, KajMagnus
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Why no need to query all nodes on secondary index lookup?

2011-09-06 Thread Kaj Magnus Lindberg
Hi Martin

Yes that was helpful, thanks

(I had no idea you were reading the Cassandra users list!  :-)  )

Thanks, (Kaj) Magnus L


On Mon, Sep 5, 2011 at 10:57 PM, Martin von Zweigbergk
 wrote:
> Hi Magnus,
>
> I think the answer might be on
> https://issues.apache.org/jira/browse/CASSANDRA-749. For example,
> Jonathan writes:
>
> 
>> Is it worth creating a secondary index that only contains local data, versus 
>> a distributed secondary index (a normal ColumnFamily?)
>
> I think my initial reasoning was wrong here. I was anti-local-indexes
> because "we have to query the full cluster for any index lookup, since
> we are throwing away our usual partitioning scheme."
>
> Which is true, but it ignores the fact that, in most cases, you will
> have to "query the full cluster" to get the actual matching rows, b/c
> the indexed rows will be spread across all machines. So, having local
> indexes is better in the common case, since it actually saves a round
> trip from querying a the index to querying the rows.
>
> Also, having each node index the rows it has locally means you don't
> have to worry about sharding a very large index since it happens
> automatically.
>
> Finally, it lets us use the local commitlog to keep index + data in sync.
> 
>
> Hope that helps,
> Martin
>
> On Mon, Sep 5, 2011 at 1:52 AM, Kaj Magnus Lindberg
>  wrote:
>> Hi,
>>
>> (This is the 2nd time I'm sending this message. I sent it the first
>> time a few days ago but it does not appear in the archives.)
>>
>> I have a follow up question on a question from February 2011. In
>> short, I wonder why one won't have to query all Cassandra nodes when
>> doing a secondary index lookup -- although each node only indexes data
>> that it holds locally.
>>
>> The question and answer was:
>>  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
>> === Question ===
>> As far as I understand automatic secondary indexes are generated for
>> node local data.
>>   In this case query by secondary index involve all nodes storing part of
>> column family to get results (?) so (if i am right) if data is spread across
>> 50 nodes then 50 nodes are involved in single query?
>> [...]
>> === Answer ===
>> In practice, local secondary indexes scale to {RF * the limit of a single
>> machine} for -low cardinality- values (ex: users living in a certain state)
>> since the first node is likely to be able to answer your question. This also
>> means they are good for performing filtering for analytics.
>> [...]
>>
>> === Now I wonder ===
>> Why would the first node be likely to be able to answer the question?
>> It stores only index entries for users on that particular machine,
>>     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
>>     "Each node only indexes data that it holds locally" )
>> but users might be stored by user name? And would thus be stored on
>> many different machines? Even if they happen to live in the same
>> state?
>>
>> Why won't the client need to query the indexes of [all servers that
>> store info on users] to find all relevant users, when doing a user
>> property lookup?
>>
>>
>> Best regards, KajMagnus
>>
>


Re: Why no need to query all nodes on secondary index lookup?

2011-09-05 Thread Jonathan Ellis
The first node can answer the question as long as you've requested
less rows than the first node has on it.  Hence the "low cardinality"
point in what you quoted.

On Sat, Sep 3, 2011 at 5:00 AM, Kaj Magnus Lindberg
 wrote:
> Hello Anyone
>
> I have a follow up question on a question from February 2011. In
> short, I wonder why one won't have to query all Cassandra nodes when
> doing a secondary index lookup -- although each node only indexes data
> that it holds locally.
>
> The question and answer was:
>  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
> === Question ===
> As far as I understand automatic secondary indexes are generated for
> node local data.
>   In this case query by secondary index involve all nodes storing part of
> column family to get results (?) so (if i am right) if data is spread across
> 50 nodes then 50 nodes are involved in single query?
> [...]
> === Answer ===
> In practice, local secondary indexes scale to {RF * the limit of a single
> machine} for -low cardinality- values (ex: users living in a certain state)
> since the first node is likely to be able to answer your question. This also
> means they are good for performing filtering for analytics.
> [...]
>
> === Now I wonder ===
> Why would the first node be likely to be able to answer the question?
> It stores only index entries for users on that particular machine,
>     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
>     "Each node only indexes data that it holds locally" )
> but users might be stored by user name? And would thus be stored on
> many different machines? Even if they happen to live in the same
> state?
>
> Why won't the client need to query the indexes of [all servers that
> store info on users] to find all relevant users, when doing a user
> property lookup?
>
>
> Best regards, KajMagnus
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Why no need to query all nodes on secondary index lookup?

2011-09-05 Thread Martin von Zweigbergk
Hi Magnus,

I think the answer might be on
https://issues.apache.org/jira/browse/CASSANDRA-749. For example,
Jonathan writes:


> Is it worth creating a secondary index that only contains local data, versus 
> a distributed secondary index (a normal ColumnFamily?)

I think my initial reasoning was wrong here. I was anti-local-indexes
because "we have to query the full cluster for any index lookup, since
we are throwing away our usual partitioning scheme."

Which is true, but it ignores the fact that, in most cases, you will
have to "query the full cluster" to get the actual matching rows, b/c
the indexed rows will be spread across all machines. So, having local
indexes is better in the common case, since it actually saves a round
trip from querying a the index to querying the rows.

Also, having each node index the rows it has locally means you don't
have to worry about sharding a very large index since it happens
automatically.

Finally, it lets us use the local commitlog to keep index + data in sync.


Hope that helps,
Martin

On Mon, Sep 5, 2011 at 1:52 AM, Kaj Magnus Lindberg
 wrote:
> Hi,
>
> (This is the 2nd time I'm sending this message. I sent it the first
> time a few days ago but it does not appear in the archives.)
>
> I have a follow up question on a question from February 2011. In
> short, I wonder why one won't have to query all Cassandra nodes when
> doing a secondary index lookup -- although each node only indexes data
> that it holds locally.
>
> The question and answer was:
>  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
> === Question ===
> As far as I understand automatic secondary indexes are generated for
> node local data.
>   In this case query by secondary index involve all nodes storing part of
> column family to get results (?) so (if i am right) if data is spread across
> 50 nodes then 50 nodes are involved in single query?
> [...]
> === Answer ===
> In practice, local secondary indexes scale to {RF * the limit of a single
> machine} for -low cardinality- values (ex: users living in a certain state)
> since the first node is likely to be able to answer your question. This also
> means they are good for performing filtering for analytics.
> [...]
>
> === Now I wonder ===
> Why would the first node be likely to be able to answer the question?
> It stores only index entries for users on that particular machine,
>     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
>     "Each node only indexes data that it holds locally" )
> but users might be stored by user name? And would thus be stored on
> many different machines? Even if they happen to live in the same
> state?
>
> Why won't the client need to query the indexes of [all servers that
> store info on users] to find all relevant users, when doing a user
> property lookup?
>
>
> Best regards, KajMagnus
>


Why no need to query all nodes on secondary index lookup?

2011-09-04 Thread Kaj Magnus Lindberg
Hi,

(This is the 2nd time I'm sending this message. I sent it the first
time a few days ago but it does not appear in the archives.)

I have a follow up question on a question from February 2011. In
short, I wonder why one won't have to query all Cassandra nodes when
doing a secondary index lookup -- although each node only indexes data
that it holds locally.

The question and answer was:
 ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
=== Question ===
As far as I understand automatic secondary indexes are generated for
node local data.
  In this case query by secondary index involve all nodes storing part of
column family to get results (?) so (if i am right) if data is spread across
50 nodes then 50 nodes are involved in single query?
[...]
=== Answer ===
In practice, local secondary indexes scale to {RF * the limit of a single
machine} for -low cardinality- values (ex: users living in a certain state)
since the first node is likely to be able to answer your question. This also
means they are good for performing filtering for analytics.
[...]

=== Now I wonder ===
Why would the first node be likely to be able to answer the question?
It stores only index entries for users on that particular machine,
    (says http://wiki.apache.org/cassandra/SecondaryIndexes:
    "Each node only indexes data that it holds locally" )
but users might be stored by user name? And would thus be stored on
many different machines? Even if they happen to live in the same
state?

Why won't the client need to query the indexes of [all servers that
store info on users] to find all relevant users, when doing a user
property lookup?


Best regards, KajMagnus


Re: The way to query a CF with "start > 10 and end < 100"

2011-08-29 Thread Benoit Perroud
queries start > 10 and end < 100 is not straight forward to modelize,
you should use the value of start as column name, and check on client
side the second condition.

Just for comparison, modeling 10 < value < 100 is rather much easier
if you set your values as column name, or using CompositeType if you
have duplicate values.





2011/8/29 Guofeng Zhang :
> Hi,
>
>
>
> I have a standard CF that has column “start” and “end”. I need to query its
> rows using condition “start>10 and end<100”. Is there any better way to do
> it? Using native secondary index or creating a specific CF for the search. I
> do not know which one is better. If the late is preferred to, how the CF
> looks like? Your advice is appreciated.
>
>
>
> Thanks
>
>


The way to query a CF with "start > 10 and end < 100"

2011-08-29 Thread Guofeng Zhang
Hi,

I have a standard CF that has column "start" and "end". I need to query its 
rows using condition "start>10 and end<100". Is there any better way to do it? 
Using native secondary index or creating a specific CF for the search. I do not 
know which one is better. If the late is preferred to, how the CF looks like? 
Your advice is appreciated.

Thanks



RE: CQL query using 'OR' in WHERE clause

2011-08-16 Thread Deeter, Derek
Thanks, Jonathan!

-Derek

--
Derek Deeter, Sr. Software Engineer Intuit Financial 
Services
(818) 597-5932  (x76932)5601 Lindero Canyon Rd.
derek.dee...@digitalinsight.com Westlake, CA 91362
 

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Monday, August 15, 2011 7:37 PM
To: user@cassandra.apache.org
Subject: Re: CQL query using 'OR' in WHERE clause

Disjunctions are not yet supported and probably will not be until after 1.0.

On Mon, Aug 15, 2011 at 6:45 PM, Deeter, Derek
 wrote:
> Hi,
>
> We are using CQL to obtain data from Cassandra 0.8.1 using Hector and
> getting an error when using 'OR' on a secondary index.  I get the same error
> when using CQL 1.0.3.  All the items in the WHERE clause are secondary
> indices and they are all UTF8Type validation.  The query works when leaving
> out everything from 'OR' onwards.   Example:
>
> cqlsh> SELECT '.id' , '.ipAddress' , '.userProduct' , '.offeringId' ,
> '.appId', 'timeStamp', '.logType', '.tzOffset'  , '.id' , 'member' ,
> 'mfaEnrolled' , 'sessionId' , 'startPage' , 'timeStamp' FROM Audit_Log USING
> CONSISTENCY ONE WHERE '.bcId' =  '01112' AND '.userProduct' =  'IB' AND
> 'timeStamp' >=  131218200 AND '.logType' = 'login' OR '.logType' =
> 'badLogin';
>
> Bad Request: line 1:336 mismatched input 'OR' expecting EOF
>
> I also tried to use the 'IN' keyword to no avail:
>
> cqlsh> SELECT '.id' , '.ipAddress' , '.userProduct' , '.offeringId' ,
> '.appId', 'timeStamp', '.logType', '.tzOffset'  , '.id' , 'member' ,
> 'mfaEnrolled' , 'sessionId' , 'startPage' , 'timeStamp' FROM Audit_Log USING
> CONSISTENCY ONE WHERE '.bcId' =  '01112' AND '.userProduct' =  'IB' AND
> 'timeStamp' >=  131218200 AND '.logType' IN ( 'login', 'badLogin');
>
> Bad Request: line 1:326 mismatched input 'IN' expecting set null
>
> I also tried simplifying the query WHERE clause to only "WHERE '.logType' =
> 'login' OR '.logType' = 'badLogin';"  but get the same 'mismatched input'
> error.  Is there any way to set up a query on a set of values such as the
> above?  Or do I have the syntax wrong?
>
>     Thanks in advance,
>
>     -Derek
>
> Derek Deeter
> Software Engineer, Sr
>
> o: 818-597-5932  |  m: 661-645-7842  |  f: 818-878-7555
>
> This email may contain confidential and privileged material for the sole use
> of the intended recipient. Any review or distribution by others is strictly
> prohibited. If you are not the intended recipient, please contact the sender
> and delete all copies.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: CQL query using 'OR' in WHERE clause

2011-08-15 Thread Jonathan Ellis
Disjunctions are not yet supported and probably will not be until after 1.0.

On Mon, Aug 15, 2011 at 6:45 PM, Deeter, Derek
 wrote:
> Hi,
>
> We are using CQL to obtain data from Cassandra 0.8.1 using Hector and
> getting an error when using ‘OR’ on a secondary index.  I get the same error
> when using CQL 1.0.3.  All the items in the WHERE clause are secondary
> indices and they are all UTF8Type validation.  The query works when leaving
> out everything from ‘OR’ onwards.   Example:
>
> cqlsh> SELECT '.id' , '.ipAddress' , '.userProduct' , '.offeringId' ,
> '.appId', 'timeStamp', '.logType', '.tzOffset'  , '.id' , 'member' ,
> 'mfaEnrolled' , 'sessionId' , 'startPage' , 'timeStamp' FROM Audit_Log USING
> CONSISTENCY ONE WHERE '.bcId' =  '01112' AND '.userProduct' =  'IB' AND
> 'timeStamp' >=  131218200 AND '.logType' = 'login' OR '.logType' =
> 'badLogin';
>
> Bad Request: line 1:336 mismatched input 'OR' expecting EOF
>
> I also tried to use the ‘IN’ keyword to no avail:
>
> cqlsh> SELECT '.id' , '.ipAddress' , '.userProduct' , '.offeringId' ,
> '.appId', 'timeStamp', '.logType', '.tzOffset'  , '.id' , 'member' ,
> 'mfaEnrolled' , 'sessionId' , 'startPage' , 'timeStamp' FROM Audit_Log USING
> CONSISTENCY ONE WHERE '.bcId' =  '01112' AND '.userProduct' =  'IB' AND
> 'timeStamp' >=  131218200 AND '.logType' IN ( 'login', 'badLogin');
>
> Bad Request: line 1:326 mismatched input 'IN' expecting set null
>
> I also tried simplifying the query WHERE clause to only “WHERE '.logType' =
> 'login' OR '.logType' = 'badLogin';”  but get the same ‘mismatched input’
> error.  Is there any way to set up a query on a set of values such as the
> above?  Or do I have the syntax wrong?
>
>     Thanks in advance,
>
>     -Derek
>
> Derek Deeter
> Software Engineer, Sr
>
> o: 818-597-5932  |  m: 661-645-7842  |  f: 818-878-7555
>
> This email may contain confidential and privileged material for the sole use
> of the intended recipient. Any review or distribution by others is strictly
> prohibited. If you are not the intended recipient, please contact the sender
> and delete all copies.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


CQL query using 'OR' in WHERE clause

2011-08-15 Thread Deeter, Derek
Hi,

We are using CQL to obtain data from Cassandra 0.8.1 using Hector and
getting an error when using 'OR' on a secondary index.  I get the same
error when using CQL 1.0.3.  All the items in the WHERE clause are
secondary indices and they are all UTF8Type validation.  The query works
when leaving out everything from 'OR' onwards.   Example:

cqlsh> SELECT '.id' , '.ipAddress' , '.userProduct' , '.offeringId' ,
'.appId', 'timeStamp', '.logType', '.tzOffset'  , '.id' , 'member' ,
'mfaEnrolled' , 'sessionId' , 'startPage' , 'timeStamp' FROM Audit_Log
USING CONSISTENCY ONE WHERE '.bcId' =  '01112' AND '.userProduct' =
'IB' AND 'timeStamp' >=  131218200 AND '.logType' = 'login' OR
'.logType' = 'badLogin';
Bad Request: line 1:336 mismatched input 'OR' expecting EOF

I also tried to use the 'IN' keyword to no avail:

cqlsh> SELECT '.id' , '.ipAddress' , '.userProduct' , '.offeringId' ,
'.appId', 'timeStamp', '.logType', '.tzOffset'  , '.id' , 'member' ,
'mfaEnrolled' , 'sessionId' , 'startPage' , 'timeStamp' FROM Audit_Log
USING CONSISTENCY ONE WHERE '.bcId' =  '01112' AND '.userProduct' =
'IB' AND 'timeStamp' >=  131218200 AND '.logType' IN ( 'login',
'badLogin');
Bad Request: line 1:326 mismatched input 'IN' expecting set null

I also tried simplifying the query WHERE clause to only "WHERE
'.logType' = 'login' OR '.logType' = 'badLogin';"  but get the same
'mismatched input' error.  Is there any way to set up a query on a set
of values such as the above?  Or do I have the syntax wrong?

Thanks in advance,
-Derek


Derek Deeter
Software Engineer, Sr
 <http://ifs.intuit.com/> 
o: 818-597-5932  |  m: 661-645-7842  |  f: 818-878-7555
This email may contain confidential and privileged material for the sole
use of the intended recipient. Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please
contact the sender and delete all copies.




AW: results of index slice query

2011-07-29 Thread Roland Gude
Hi,

I have so far not been able to reproduce this bug on any other cluster than our 
production cluster which started with the behavior only after the upgrade from 
0.7.5 to 0.7.7 I have attached logs to the issue but I have absolutely no clue 
how to move forward. Any ideas anybody?

-Ursprüngliche Nachricht-
Von: Roland Gude [mailto:roland.g...@yoochoose.com] 
Gesendet: Donnerstag, 28. Juli 2011 11:22
An: user@cassandra.apache.org
Betreff: AW: results of index slice query

Created https://issues.apache.org/jira/browse/CASSANDRA-2964

-Ursprüngliche Nachricht-
Von: Jonathan Ellis [mailto:jbel...@gmail.com] 
Gesendet: Mittwoch, 27. Juli 2011 17:35
An: user@cassandra.apache.org
Betreff: Re: results of index slice query

Sounds like a Cassandra bug to me.

On Wed, Jul 27, 2011 at 6:44 AM, Roland Gude  wrote:
> Hi,
>
> I was just experiencing that when i do an IndexSliceQuery with the index
> column not in the slicerange the index column will be returned anyways. Is
> this behavior intended or is it a bug (if so - is it a Cassandra bug or a
> hector bug)?
>
> I am using Cassandra 0.7.7 and hector 0.7-26
>
>
>
> Greetings,
>
> roland
>
>
>
> --
>
> YOOCHOOSE GmbH
>
>
>
> Roland Gude
>
> Software Engineer
>
>
>
> Im Mediapark 8, 50670 Köln
>
>
>
> +49 221 4544151 (Tel)
>
> +49 221 4544159 (Fax)
>
> +49 171 7894057 (Mobil)
>
>
>
>
>
> Email: roland.g...@yoochoose.com
>
> WWW: www.yoochoose.com
>
>
>
> YOOCHOOSE GmbH
>
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
>
> Handelsregister: Amtsgericht Köln HRB 65275
>
> Ust-Ident-Nr: DE 264 773 520
>
> Sitz der Gesellschaft: Köln
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com






AW: results of index slice query

2011-07-28 Thread Roland Gude
Created https://issues.apache.org/jira/browse/CASSANDRA-2964

-Ursprüngliche Nachricht-
Von: Jonathan Ellis [mailto:jbel...@gmail.com] 
Gesendet: Mittwoch, 27. Juli 2011 17:35
An: user@cassandra.apache.org
Betreff: Re: results of index slice query

Sounds like a Cassandra bug to me.

On Wed, Jul 27, 2011 at 6:44 AM, Roland Gude  wrote:
> Hi,
>
> I was just experiencing that when i do an IndexSliceQuery with the index
> column not in the slicerange the index column will be returned anyways. Is
> this behavior intended or is it a bug (if so - is it a Cassandra bug or a
> hector bug)?
>
> I am using Cassandra 0.7.7 and hector 0.7-26
>
>
>
> Greetings,
>
> roland
>
>
>
> --
>
> YOOCHOOSE GmbH
>
>
>
> Roland Gude
>
> Software Engineer
>
>
>
> Im Mediapark 8, 50670 Köln
>
>
>
> +49 221 4544151 (Tel)
>
> +49 221 4544159 (Fax)
>
> +49 171 7894057 (Mobil)
>
>
>
>
>
> Email: roland.g...@yoochoose.com
>
> WWW: www.yoochoose.com
>
>
>
> YOOCHOOSE GmbH
>
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
>
> Handelsregister: Amtsgericht Köln HRB 65275
>
> Ust-Ident-Nr: DE 264 773 520
>
> Sitz der Gesellschaft: Köln
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com




Re: results of index slice query

2011-07-27 Thread Jonathan Ellis
Sounds like a Cassandra bug to me.

On Wed, Jul 27, 2011 at 6:44 AM, Roland Gude  wrote:
> Hi,
>
> I was just experiencing that when i do an IndexSliceQuery with the index
> column not in the slicerange the index column will be returned anyways. Is
> this behavior intended or is it a bug (if so – is it a Cassandra bug or a
> hector bug)?
>
> I am using Cassandra 0.7.7 and hector 0.7-26
>
>
>
> Greetings,
>
> roland
>
>
>
> --
>
> YOOCHOOSE GmbH
>
>
>
> Roland Gude
>
> Software Engineer
>
>
>
> Im Mediapark 8, 50670 Köln
>
>
>
> +49 221 4544151 (Tel)
>
> +49 221 4544159 (Fax)
>
> +49 171 7894057 (Mobil)
>
>
>
>
>
> Email: roland.g...@yoochoose.com
>
> WWW: www.yoochoose.com
>
>
>
> YOOCHOOSE GmbH
>
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
>
> Handelsregister: Amtsgericht Köln HRB 65275
>
> Ust-Ident-Nr: DE 264 773 520
>
> Sitz der Gesellschaft: Köln
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


results of index slice query

2011-07-27 Thread Roland Gude
Hi,
I was just experiencing that when i do an IndexSliceQuery with the index column 
not in the slicerange the index column will be returned anyways. Is this 
behavior intended or is it a bug (if so - is it a Cassandra bug or a hector 
bug)?
I am using Cassandra 0.7.7 and hector 0.7-26

Greetings,
roland

--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.g...@yoochoose.com
WWW: www.yoochoose.com

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln



Re: Range query ordering with CQL JDBC

2011-07-18 Thread samal
I haven't used CQL functionality much, but thirft client

I think what I encounter is exactly this problem!
>
If you want to query over key, you can index keys to other CF, get the
column names (that is key of other CF ). and then query actual CF with keys.

switch away from the random partitioner.
>
switching away is not a good choice, RP is very good for load distribution.


Re: Range query ordering with CQL JDBC

2011-07-17 Thread Matthieu Nahoum
Aaron, thanks for the reply.

I think what I encounter is exactly this problem!

I'll try the suggestions, or switch away from the random partitioner.

Cordially,

Matthieu Nahoum

On Sun, Jul 17, 2011 at 5:50 PM, aaron morton wrote:

> You are probably seeing this http://wiki.apache.org/cassandra/FAQ#range_rp
>
> Row keys are not ordered by their key, they are ordered by the token
> created by the partitioner.
>
> If you still think there is a problem provide an example of the data your
> are seeing and what you expected to see.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16 Jul 2011, at 06:09, Matthieu Nahoum wrote:
>
> Hi Eric,
>
> I am using the default partitioner, which is the RandomPartitioner I guess.
> The key type is String. Are Strings ordered by lexicographic rules?
>
> Thanks
>
> On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans  wrote:
>
>> On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
>> > I am trying to range-query a column family on which the keys are
>> > epochs (similar to the output of System.currentTimeMillis() in Java).
>> > In CQL (Cassandra 0.8.1 with JDBC driver):
>> >
>> > SELECT * FROM columnFamily WHERE KEY > '130920500';
>> >
>> > I can't get to have a result that make sense, it always returns wrong
>> > timestamps. So I must make an error somewhere in the way I input the
>> > querying value. I tried in clear (like above), in hexadecimal, etc.
>> >
>> > What is the correct way of doing this? Is it possible that my key is
>> > too long?
>>
>> What partitioner are you using?  What is the key type?
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>
>
>
>


Re: Range query ordering with CQL JDBC

2011-07-17 Thread aaron morton
You are probably seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

Row keys are not ordered by their key, they are ordered by the token created by 
the partitioner.

If you still think there is a problem provide an example of the data your are 
seeing and what you expected to see. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jul 2011, at 06:09, Matthieu Nahoum wrote:

> Hi Eric,
> 
> I am using the default partitioner, which is the RandomPartitioner I guess.
> The key type is String. Are Strings ordered by lexicographic rules?
> 
> Thanks 
> 
> On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans  wrote:
> On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
> > I am trying to range-query a column family on which the keys are
> > epochs (similar to the output of System.currentTimeMillis() in Java).
> > In CQL (Cassandra 0.8.1 with JDBC driver):
> >
> > SELECT * FROM columnFamily WHERE KEY > '130920500';
> >
> > I can't get to have a result that make sense, it always returns wrong
> > timestamps. So I must make an error somewhere in the way I input the
> > querying value. I tried in clear (like above), in hexadecimal, etc.
> >
> > What is the correct way of doing this? Is it possible that my key is
> > too long?
> 
> What partitioner are you using?  What is the key type?
> 
> --
> Eric Evans
> eev...@rackspace.com
> 
> 
> 
> 
> -- 
> ---
> Engineer at NAVTEQ
> Berkeley Systems Engineer '10
> ENAC Engineer '09
> 
> 151 N. Michigan Ave.
> Appt. 3716
> Chicago, IL, 60601
> USA
> Cell: +1 (510) 423-1835
> 
> http://www.linkedin.com/in/matthieunahoum
> 



Re: Range query ordering with CQL JDBC

2011-07-15 Thread Matthieu Nahoum
Hi Eric,

I am using the default partitioner, which is the RandomPartitioner I guess.
The key type is String. Are Strings ordered by lexicographic rules?

Thanks

On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans  wrote:

> On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
> > I am trying to range-query a column family on which the keys are
> > epochs (similar to the output of System.currentTimeMillis() in Java).
> > In CQL (Cassandra 0.8.1 with JDBC driver):
> >
> > SELECT * FROM columnFamily WHERE KEY > '130920500';
> >
> > I can't get to have a result that make sense, it always returns wrong
> > timestamps. So I must make an error somewhere in the way I input the
> > querying value. I tried in clear (like above), in hexadecimal, etc.
> >
> > What is the correct way of doing this? Is it possible that my key is
> > too long?
>
> What partitioner are you using?  What is the key type?
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


-- 
---
Engineer at *NAVTEQ*
Berkeley Systems Engineer '10
ENAC Engineer '09

151 N. Michigan Ave.
Appt. 3716
Chicago, IL, 60601
USA
Cell: +1 (510) 423-1835

http://www.linkedin.com/in/matthieunahoum


Re: Range query ordering with CQL JDBC

2011-07-15 Thread Eric Evans
On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
> I am trying to range-query a column family on which the keys are
> epochs (similar to the output of System.currentTimeMillis() in Java).
> In CQL (Cassandra 0.8.1 with JDBC driver): 
> 
> SELECT * FROM columnFamily WHERE KEY > '130920500'; 
> 
> I can't get to have a result that make sense, it always returns wrong
> timestamps. So I must make an error somewhere in the way I input the
> querying value. I tried in clear (like above), in hexadecimal, etc. 
> 
> What is the correct way of doing this? Is it possible that my key is
> too long? 

What partitioner are you using?  What is the key type?

-- 
Eric Evans
eev...@rackspace.com



Range query ordering with CQL JDBC

2011-07-14 Thread Matthieu Nahoum
Hi,

I am trying to range-query a column family on which the keys are epochs
(similar to the output of System.currentTimeMillis() in Java).
In CQL (Cassandra 0.8.1 with JDBC driver):

SELECT * FROM columnFamily WHERE KEY > '130920500';

I can't get to have a result that make sense, it always returns wrong
timestamps. So I must make an error somewhere in the way I input the
querying value.
I tried in clear (like above), in hexadecimal, etc.

What is the correct way of doing this? Is it possible that my key is too
long?

Thanks,

Matthieu Nahoum


Re: Query indexed column with key filter‏

2011-06-28 Thread aaron morton
Currently these are two different types of query, using a key range is 
equivalent to the get_range_slices() API function and column clauses is a 
get_indexed_slices() call. So you would be asking for a potentially painful 
join between.

Creating a column with the same value as the key sounds reasonable. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 05:31, Daning wrote:

> I found this code
> 
>// Start and finish keys, *and* column relations (KEY>  foo AND KEY<  
> bar and name1 = value1).
>if (select.isKeyRange()&&  (select.getKeyFinish() != null)&&  
> (select.getColumnRelations().size()>  0))
>throw new InvalidRequestException("You cannot combine key range 
> and by-column clauses in a SELECT");
> 
> in
> 
> http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/cql/QueryProcessor.java
> 
> 
> This operation is exactly what I want - query by column then filter by key. I 
> want to know why this query is not supported, and what's the good work around 
> for it? At this moment my workaound is to create a column which is exactly 
> same as key.
> 
> Thanks,
> 
> Daning



Query indexed column with key filter‏

2011-06-28 Thread Daning

I found this code

// Start and finish keys, *and* column relations (KEY>  foo AND KEY<  
bar and name1 = value1).
if (select.isKeyRange()&&  (select.getKeyFinish() != null)&&  
(select.getColumnRelations().size()>  0))
throw new InvalidRequestException("You cannot combine key range and 
by-column clauses in a SELECT");

in

http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/cql/QueryProcessor.java


This operation is exactly what I want - query by column then filter by 
key. I want to know why this query is not supported, and what's the good 
work around for it? At this moment my workaound is to create a column 
which is exactly same as key.


Thanks,

Daning


Re: Keys-only query

2011-06-21 Thread Jeremy Hanna
Also - there is an open ticket to create a .NET CQL driver - may be worth 
watching or if you'd like to help out with it somehow:
https://issues.apache.org/jira/browse/CASSANDRA-2634

On Jun 21, 2011, at 9:31 AM, Stephen Pope wrote:

> We just recently switched to 0.8 (from 0.7.4), and it looks like key-only 
> queries are broken (number of columns = 0). The same query works if we switch 
> the number of columns to 1. Is there a new mechanism for getting key-only? We 
> can’t use CQL yet since we’re using .NET for our development.
>  
> Cheers,
> Steve



Re: Keys-only query

2011-06-21 Thread Nate McCall
This is a known issue and is being tracked on the following:
https://issues.apache.org/jira/browse/CASSANDRA-2653

On Tue, Jun 21, 2011 at 9:31 AM, Stephen Pope  wrote:
> We just recently switched to 0.8 (from 0.7.4), and it looks like key-only
> queries are broken (number of columns = 0). The same query works if we
> switch the number of columns to 1. Is there a new mechanism for getting
> key-only? We can’t use CQL yet since we’re using .NET for our development.
>
>
>
> Cheers,
>
> Steve


Keys-only query

2011-06-21 Thread Stephen Pope
We just recently switched to 0.8 (from 0.7.4), and it looks like key-only 
queries are broken (number of columns = 0). The same query works if we switch 
the number of columns to 1. Is there a new mechanism for getting key-only? We 
can't use CQL yet since we're using .NET for our development.

Cheers,
Steve


Re: Can I get all the query data back into memory?

2011-06-09 Thread Mark Kerzner
Thanks a bunch.
Mark

On Thu, Jun 9, 2011 at 10:26 PM, Jonathan Ellis  wrote:

> On Thu, Jun 9, 2011 at 9:50 PM, Mark Kerzner 
> wrote:
> > Hi,
> > when I am issuing some query, that returns a HashMap, does the whole
> HashMap
> > have to be in memory?
>
> Yes.
>
> > If so, it can easily use up all memory? Is there some
> > cursor or paging provisions?
>
> Yes, that is what all the start_key parameters are for.
>
> But we strongly recommend using a high-level client like Hector
> instead of raw Thrift.  Also see the CQL drivers:
>
> http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-cassandra-0-8-part-1-cql-the-cassandra-query-language
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Can I get all the query data back into memory?

2011-06-09 Thread Jonathan Ellis
On Thu, Jun 9, 2011 at 9:50 PM, Mark Kerzner  wrote:
> Hi,
> when I am issuing some query, that returns a HashMap, does the whole HashMap
> have to be in memory?

Yes.

> If so, it can easily use up all memory? Is there some
> cursor or paging provisions?

Yes, that is what all the start_key parameters are for.

But we strongly recommend using a high-level client like Hector
instead of raw Thrift.  Also see the CQL drivers:
http://www.datastax.com/dev/blog/what%E2%80%99s-new-in-cassandra-0-8-part-1-cql-the-cassandra-query-language

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Can I get all the query data back into memory?

2011-06-09 Thread Mark Kerzner
Hi,

when I am issuing some query, that returns a HashMap, does the whole HashMap
have to be in memory? If so, it can easily use up all memory? Is there some
cursor or paging provisions?

Thank you very much.

Mark


Re: "range query" vs "slice range query"

2011-05-25 Thread david lee
so, that was actually simpler than i thought ay?
cheers guys~

On 26 May 2011 05:38, Roland Gude  wrote:

> That is correct. Random partitioner orders rows according to the MD5 sum.
>
> Am 25.05.2011 um 16:11 schrieb "Robert Jackson"  >:
>
> Also, it is my understanding that if you are not using
> OrderPreservingPartitioner a get_range_slices may not return what you would
> expect.
>
> With the RandomPartitioner you can iterate over the complete list by using
> the last row key as the start for subsequent requests, but if you are using
> a single query you will be returned all the rows where the returned row
> key's md5 is between the md5 of the start row key and stop row key.
>
> Reference:
> http://wiki.apache.org/cassandra/FAQ - "Why aren't range slices/sequential
> scans giving me the expected results?"
>
> Robert Jackson
>
> --
> *From: *"Jonathan Ellis" 
> *To: *user@cassandra.apache.org
> *Sent: *Wednesday, May 25, 2011 8:54:34 AM
> *Subject: *Re: "range query" vs "slice range query"
>
> get_range_slices is the api to get a slice (of columns) from each of a
> range (of rows)
>
> On Wed, May 25, 2011 at 3:42 AM, david lee  wrote:
> > hi guys,
> > i'm reading up on the book "Cassandra - Definitive guide"
> > and i don't seem to understand what it says about "ranges and slices"
> > my understanding is
> > a range as in "a mathematical range to define a subset from an ordered
> set
> > of elements",
> > in cassandra typically means a range of rows whereas
> > a slice means a range of columns.
> > a range query refers to a query to retrieve a range of rows whereas
> > a slice range queyr refers to a query to retrieve range of columns within
> a
> > row.
> > i may be talking about total nonsense but i really am more confused after
> > reading this portion of the book
> >
> http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false
> > many thanx in advance
> > david
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
>


Re: "range query" vs "slice range query"

2011-05-25 Thread Roland Gude
That is correct. Random partitioner orders rows according to the MD5 sum.

Am 25.05.2011 um 16:11 schrieb "Robert Jackson" 
mailto:robe...@promedicalinc.com>>:

Also, it is my understanding that if you are not using 
OrderPreservingPartitioner a get_range_slices may not return what you would 
expect.

With the RandomPartitioner you can iterate over the complete list by using the 
last row key as the start for subsequent requests, but if you are using a 
single query you will be returned all the rows where the returned row key's md5 
is between the md5 of the start row key and stop row key.

Reference:
http://wiki.apache.org/cassandra/FAQ - "Why aren't range slices/sequential 
scans giving me the expected results?"

Robert Jackson


From: "Jonathan Ellis" mailto:jbel...@gmail.com>>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Sent: Wednesday, May 25, 2011 8:54:34 AM
Subject: Re: "range query" vs "slice range query"

get_range_slices is the api to get a slice (of columns) from each of a
range (of rows)

On Wed, May 25, 2011 at 3:42 AM, david lee  wrote:
> hi guys,
> i'm reading up on the book "Cassandra - Definitive guide"
> and i don't seem to understand what it says about "ranges and slices"
> my understanding is
> a range as in "a mathematical range to define a subset from an ordered set
> of elements",
> in cassandra typically means a range of rows whereas
> a slice means a range of columns.
> a range query refers to a query to retrieve a range of rows whereas
> a slice range queyr refers to a query to retrieve range of columns within a
> row.
> i may be talking about total nonsense but i really am more confused after
> reading this portion of the book
> http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false
> many thanx in advance
> david
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com



Re: "range query" vs "slice range query"

2011-05-25 Thread Robert Jackson
Also, it is my understanding that if you are not using 
OrderPreservingPartitioner a get_range_slices may not return what you would 
expect. 


With the RandomPartitioner you can iterate over the complete list by using the 
last row key as the start for subsequent requests, but if you are using a 
single query you will be returned all the rows where the returned row key's md5 
is between the md5 of the start row key and stop row key. 


Reference: 
http://wiki.apache.org/cassandra/FAQ - " Why aren't range slices/sequential 
scans giving me the expected results?" 

Robert Jackson 

- Original Message -

From: "Jonathan Ellis"  
To: user@cassandra.apache.org 
Sent: Wednesday, May 25, 2011 8:54:34 AM 
Subject: Re: "range query" vs "slice range query" 

get_range_slices is the api to get a slice (of columns) from each of a 
range (of rows) 

On Wed, May 25, 2011 at 3:42 AM, david lee  wrote: 
> hi guys, 
> i'm reading up on the book "Cassandra - Definitive guide" 
> and i don't seem to understand what it says about "ranges and slices" 
> my understanding is 
> a range as in "a mathematical range to define a subset from an ordered set 
> of elements", 
> in cassandra typically means a range of rows whereas 
> a slice means a range of columns. 
> a range query refers to a query to retrieve a range of rows whereas 
> a slice range queyr refers to a query to retrieve range of columns within a 
> row. 
> i may be talking about total nonsense but i really am more confused after 
> reading this portion of the book 
> http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false
>  
> many thanx in advance 
> david 
> 



-- 
Jonathan Ellis 
Project Chair, Apache Cassandra 
co-founder of DataStax, the source for professional Cassandra support 
http://www.datastax.com 



Re: "range query" vs "slice range query"

2011-05-25 Thread Roland Gude
I cannot Display the Book page you are referring to, but your General 
understanding is correct. A Range Refers to several rows, a slice Refers to 
several columns. A RangeSlice is a combination of Both. From all rows in a 
Range get a specific slice of columns.

Am 25.05.2011 um 10:43 schrieb "david lee" 
mailto:iecan...@gmail.com>>:

hi guys,

i'm reading up on the book "Cassandra - Definitive guide"
and i don't seem to understand what it says about "ranges and slices"

my understanding is
a range as in "a mathematical range to define a subset from an ordered set of 
elements",
in cassandra typically means a range of rows whereas
a slice means a range of columns.

a range query refers to a query to retrieve a range of rows whereas
a slice range queyr refers to a query to retrieve range of columns within a row.

i may be talking about total nonsense but i really am more confused after 
reading this portion of the book
<http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false>http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false

many thanx in advance
david



Re: "range query" vs "slice range query"

2011-05-25 Thread Jonathan Ellis
get_range_slices is the api to get a slice (of columns) from each of a
range (of rows)

On Wed, May 25, 2011 at 3:42 AM, david lee  wrote:
> hi guys,
> i'm reading up on the book "Cassandra - Definitive guide"
> and i don't seem to understand what it says about "ranges and slices"
> my understanding is
> a range as in "a mathematical range to define a subset from an ordered set
> of elements",
> in cassandra typically means a range of rows whereas
> a slice means a range of columns.
> a range query refers to a query to retrieve a range of rows whereas
> a slice range queyr refers to a query to retrieve range of columns within a
> row.
> i may be talking about total nonsense but i really am more confused after
> reading this portion of the book
> http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false
> many thanx in advance
> david
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


"range query" vs "slice range query"

2011-05-25 Thread david lee
hi guys,

i'm reading up on the book "Cassandra - Definitive guide"
and i don't seem to understand what it says about "ranges and slices"

my understanding is
a range as in "a mathematical range to define a subset from an ordered set
of elements",
in cassandra typically means a range of rows whereas
a slice means a range of columns.

a range query refers to a query to retrieve a range of rows whereas
a slice range queyr refers to a query to retrieve range of columns within a
row.

i may be talking about total nonsense but i really am more confused after
reading this portion of the book
http://books.google.com/books?id=MKGSbCbEdg0C&pg=PA134&lpg=PA134&dq=cassandra+%22range+query%22+%22range+slice%22&source=bl&ots=XoPB4uA60u&sig=uDDoQe0FRkQobHnr-vPvvQ3B8TQ&hl=en&ei=ub3cTcvGLZLevQOuxs3CDw&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCwQ6AEwAw#v=onepage&q=cassandra%20%22range%20query%22%20%22range%20slice%22&f=false

many thanx in advance
david


Re: A query in deletioelo

2011-05-09 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Mon, May 9, 2011 at 9:24 PM, anuya joshi  wrote:
> Hello,
>
> I am unclear on Why deleting a row in Cassandra does not delete a row key?
> Is an empty row never deleted from Column Family?
>
> It would be of great help if someone can elaborate on this.
>
> Thanks,
> Anuya
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


A query in deletion of a row

2011-05-09 Thread anuya joshi
  

Hello,
>
> I am unclear on Why deleting a row in Cassandra does not delete a row key?
> Is an empty row never deleted from Column Family?
>
> It would be of great help if someone can elaborate on this.
>
> Thanks,
> Anuya
>


<    6   7   8   9   10   11   12   >