Dynamic Column Families in CQLSH v3

2012-08-23 Thread Erik Onnen
Hello All,

Attempting to create what the Datastax 1.1 documentation calls a
Dynamic Column Family
(http://www.datastax.com/docs/1.1/ddl/column_family#dynamic-column-families)
via CQLSH.

This works in v2 of the shell:

"create table data ( key varchar PRIMARY KEY) WITH comparator=LongType;"

When defined this way via v2 shell, I can successfully switch to v3
shell and query the CF fine.

The same syntax in v3 yields:

"Bad Request: comparator is not a valid keyword argument for CREATE TABLE"

The 1.1 documentation indicates that comparator is a valid option for
at least ALTER TABLE:

http://www.datastax.com/docs/1.1/configuration/storage_configuration#comparator

This leads me to believe that the correct way to create a dynamic
column family is to create a table with no named columns and alter the
table later but that also does not work:

"create table data (key varchar PRIMARY KEY);"

yields:

"Bad Request: No definition found that is not part of the PRIMARY KEY"

So, my question is, how do I create a Dynamic Column Family via the CQLSH v3?

Thanks!
-erik


Secondary index partially created

2012-08-23 Thread Richard Crowley
I have a three-node cluster running Cassandra 1.0.10.  In this cluster
is a keyspace with RF=3.  I *updated* a column family via Astyanax to
add a column definition with an index on that column.  Then I ran a
backfill to populate the column in every row.  Then I tried to query
the index from Java and it failed but so did cassandra-cli:

get my_column_family where my_column = 'my_value';

Two out of the three nodes are unable to query the new index and throw
this error:

InvalidRequestException(why:No indexed columns present in index
clause with operator EQ)

The third is able to query the new index happily but doesn't find any
results, even when I expect it to.

`describe cluster;` in cassandra-cli confirms that all three nodes
have the same schema and `show schema;` confirms that schema includes
the new column definition and its index.

The my_column_family.my_index-hd-* files only exist on that one node
that can query the index.

I ran `nodetool repair` on each node and waited for `nodetool
compactionstats` to report zero pending tasks.  Ditto for `nodetool
compact`.  The nodes that failed still fail.  The node that succeeded
still succeed.

Can anyone shed some light?  How do I convince it to let me query the
index from any node?  How do I get it to find results?

Thanks,

Richard


Re: Cassandra API Library.

2012-08-23 Thread Michael Morris
Agreed, +1 for Hector, it's feature rich, has an active development
community, and is pretty well documented to get you started.  I agree with
the comments on avoiding raw Thrift, I'm working on writing a more up to
date client for Perl, and looking at the code generated from the Thrift
compiler, it's pretty nasty.

- Mike

On Thu, Aug 23, 2012 at 11:27 AM, Aaron Turner  wrote:

> +1 vote for Hector.
>
> That said, don't use SuperColumns unless you really really know what
> you're doing.
>
> On Thu, Aug 23, 2012 at 4:25 AM, Amit Handa  wrote:
> > hi,
> >
> > kindly let me know which java client api is more matured, and easy to use
> > with all features(Super Columns, caching, pooling, etc) of Cassandra 1.X.
> > Right now i come to know that following client exists:
> >
> > 1) Hector(Java)
> > 2) Thrift (Java)
> > 3) Kundera (Java)
> >
> >
> > With Regards,
> > Amit
>
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>


Steps to Manually Resolve Stuck Schema!! CASSANDRA-4561

2012-08-23 Thread Arya Goudarzi
Just had a good conversation with rcoli in chat. Wanted to clarify the
steps for resolving this issue and see if there are any pitfalls I am
missing.

Issue: I upgraded from 1.1.2 to 1.1.3 a while ago and today I realized I
cannot make any schema changes since the fix in
https://issues.apache.org/jira/browse/CASSANDRA-4432.

Solution: Somehow I have to make Cassandra system's column family to forget
about those old schemas with nanosecond timestamps. I have to do this
either live or with a brief downtime. Please advice of any pitfalls or
incorrectness in my steps. I am planning to automate them so please advice.

Within a short downtime, I have to do this:

1. Take all nodes out of service;
2. Run nodetool flush on each;
3. Stop cassandra on each node;
4. Remove /var/lib/cassandra/data/system
5. Remove /var/lib/cassandra/saved_caches/system-*
6. Start all nodes;
7. cassandra-cli < schema_definition_file on one node only. (includes
create keyspace and create column familiy entries)
8. put the nodes back in service.
9. Done.

Please advice if I have got the steps correctly or if I am missing
something.

Thanks in advance for you help.

Cheers,
-Arya


Re: nodetool repair - when is it not needed ?

2012-08-23 Thread aaron morton
> Also when hints are replayed they are sent of as mutations, which may still 
> be dropped by the target if they are not serviced before rpc_timeout. Sending 
> nodes throttle their requests so it's unlikely but possible. 

My bad there. I thought the mutations were send one way. 

When node is sending hints it waits the normal rpc_timeout. If there is a time 
out hint delivery for that endpoint is aborted. It will be re-tried the in the 
next HH round, which is every 10 minutes. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2012, at 9:36 PM, aaron morton  wrote:

> HH works to a point. Specifically, it only collects hints for the first hour 
> the node is down and it has a safety valve to avoid the node collecting hints 
> getting overwhelmed. Looking at the code it takes a bit for that the trip and 
> you would get a TimeoutException coming back. 
> 
> Also when hints are replayed they are sent of as mutations, which may still 
> be dropped by the target if they are not serviced before rpc_timeout. Sending 
> nodes throttle their requests so it's unlikely but possible. 
> 
> HH is is much more robust, but AFAIK repair is still _the_ way to ensure on 
> disk consistency. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 23/08/2012, at 6:59 AM, Rob Coli  wrote:
> 
>> On Wed, Aug 22, 2012 at 8:37 AM, Senthilvel Rangaswamy
>>  wrote:
>>> We are running Cassandra 1.1.2 on EC2. Our database is primarily all
>>> counters and we don't do any
>>> deletes.
>>> 
>>> Does nodetool repair do anything for such a database. All the docs I read
>>> for nodetool repair suggests
>>> that nodetool repair is needed only if there is deletes.
>> 
>> Since 1.0, repair is only needed if a node crashes. If a node crashes,
>> my understanding is that a cluster-wide repair (with -pr on each node)
>> is required, because the crashed node could have lost a hint for any
>> other node.
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-2034
>> 
>> =Rob
>> 
>> -- 
>> =Robert Coli
>> AIM>ALK - rc...@palominodb.com
>> YAHOO - rcoli.palominob
>> SKYPE - rcoli_palominodb
> 



commit log to disk with periodic mode

2012-08-23 Thread rubbish me
Hi all

First off, please let me introduce the setup.


- a balance ring of 6 x C* 1.1.2 in active DC (DC1), 6 in another (DC2); 
- keyspace's RF=3 in each DC;
- client talks only to DC1 unless DC1 can't serve the request, in which case 
talks only to DC2;
- commit log is being sync periodically with the default setting of 10s.
- consistency policy = LOCAL QUORUM for both read and write.
- we are running on production linux VMs (not ideal but this is out of our 
hands)
-

As part of a DR exercise, we brutally killed all 6 nodes in DC1, client started 
talking to DC2. All data survived, everything continued to work perfectly.

Then we brought all nodes in DC1 up, one by one We saw each with message saying 
commit logs were all replayed. No errors reported.  We didn't run repair at 
this time.

However, DC1 lost data that was written an hour before the DR exercise.  It 
seemed everything after the last memtable-flush was gone.

If we understand correctly, commit logs are being written first and then sync 
to disk every 10s. At worst we would have lost the last 10s of data. 
But it seemed as if the periodic sync didnt happen.  What could be the cause of 
this behaviour?

With the blessing of C* we could recovered all these data from DC2. But we 
would like to understand the possible cause.

Many thanks in advanced.

- A

Re: Node forgets about most of its column families

2012-08-23 Thread Edward Sargisson

Ah, yes, I forgot that bit thanks!

1.1.2 running on Centos.

Running nodetool resetlocalschema then nodetool repair fixed the problem 
but not understanding what happened is a concern.


Cheers,
Edward


On 12-08-23 12:40 PM, Rob Coli wrote:

On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
 wrote:

I was wondering if anybody had seen the following behaviour before and how
we might detect it and keep the application running.

I don't know the answer to your problem, but anyone who does will want
to know in what version of Cassandra you are encountering this issue.
:)

=Rob



--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net 


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




Re: Node forgets about most of its column families

2012-08-23 Thread Rob Coli
On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson
 wrote:
> I was wondering if anybody had seen the following behaviour before and how
> we might detect it and keep the application running.

I don't know the answer to your problem, but anyone who does will want
to know in what version of Cassandra you are encountering this issue.
:)

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Node forgets about most of its column families

2012-08-23 Thread Edward Sargisson

Hi all,
I was wondering if anybody had seen the following behaviour before and 
how we might detect it and keep the application running.


We have a 6 node cluster. It seems that one of these nodes forgot about 
all but one of the application column families - possibly after a 
restart. Then, when our application connects using Hector, it can't find 
any data so gives back an exception.


I'm currently running nodetool repair on one of the *other* nodes which 
is taking a very long time to complete. (35mins and counting a load of <9MB)


The  logs from the failing node say:
 INFO [MemoryMeter:1] 2012-08-23 14:59:14,807 Memtable.java (line 213) 
CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is
1.1219167666485013 (just-counted was 1.0).  calculation took 28ms for 
1252 columns
 INFO [main] 2012-08-23 14:59:14,949 CommitLogReplayer.java (line 272) 
Finished reading /var/lib/cassandra/commitlog/CommitLog-22654969122258

24.log
 INFO [main] 2012-08-23 14:59:14,950 CommitLogReplayer.java (line 103) 
Skipped 8216 mutations from unknown (probably removed) CF with id 1016
 INFO [main] 2012-08-23 14:59:14,950 CommitLogReplayer.java (line 103) 
Skipped 3013 mutations from unknown (probably removed) CF with id 1017


... and so on.

Hector is saying:
InvalidRequestException(why:unconfigured columnfamily user_conversations)


Thanks for any comments or advice,
Edward

--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net 


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)


Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.



Ask about *Global Relay Message* 
*--- *The Future of 
Collaboration in the Financial Services World


*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.




Re: Data Modelling Suggestions

2012-08-23 Thread Guillermo Winkler
I think you need another CF as index.

user_itemid -> timestamped column_name

Otherwise you can't guess what's the timestamp to use in the column name.

Anyway I would prefer storing the item-ids as column names in the main
column family and having a second CF for the order-by-date query only with
the pair timestamp_itemid. That way you can add later other query
strategies without messing with how you store the item information.

Maybe you can solve it with a secondary index by timestamp too.

Guille


On Thu, Aug 23, 2012 at 7:26 AM, Roshni Rajagopal <
roshni.rajago...@wal-mart.com> wrote:

> Hi,
>
> Need some help on a data modelling question. We're using Hector & Datastax
> Enterprise 2.1.
>
>
> I want to associate a list of items for a user. It should be sorted on the
> time added. And items can be updated (quantity of the item can be changed),
> and items can be deleted.
> I can model it like this so that its denormalized and I get all my
> information in one go from one row, sorted by time added. I can use
> composite columns.
>
> Row key: User Id
> Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price:
> Item Qty
> Column Value : Null
>
> Now, how do I handle manipulations
>
>  1.  Add new item :Easy , just a new column
>  2.  Add exiting item or modify qty: I want to get to the correct column
> to update . Can I search by second column in the composite column (equals
> condition) & update the column name itself to reflect new TimeUUID and qty?
>  Or would it be better to just add it as a new column and always use the
> latest column for an item in the application code and delete duplicates in
> the background.
>  3.  Delete item: Can I search by second column in the composite column to
> find the correct column to delete?
>
> I was trying to find hector examples where we search for second column in
> a composite column, but I couldn't find any good one. Im not sure if its
> possible.…if you have any do have any example please share.
>
> Regards,
> Roshni
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the individual or entity to whom they are addressed. If you have
> received this email in error destroy it immediately. *** Walmart
> Confidential ***
>


RE: Expanding cluster to include a new DR datacenter

2012-08-23 Thread Bryce Godfrey
Thanks for the information!  Answers my questions.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, August 22, 2012 7:10 PM
To: user@cassandra.apache.org
Subject: Re: Expanding cluster to include a new DR datacenter

If you didn't see this particular section, you may find it useful: 
http://www.datastax.com/docs/1.1/operations/cluster_management#adding-a-data-center-to-a-cluster

Some comments inline:
On Wed, Aug 22, 2012 at 3:43 PM, Bryce Godfrey 
mailto:bryce.godf...@azaleos.com>> wrote:
We are in the process of building out a new DR system in another Data Center, 
and we want to mirror our Cassandra environment to that DR.  I have a couple 
questions on the best way to do this after reading the documentation on the 
Datastax website.  We didn't initially plan for this to be a DR setup when 
first deployed a while ago due to budgeting, but now we need to.  So I'm just 
trying to nail down the order of doing this as well as any potential issues.

For the nodes, we don't plan on querying the servers in this DR until we fail 
over to this data center.   We are going to have 5 similar nodes in the DR, 
should I join them into the ring at token+1?

Join them at token+10 just to leave a little space.  Make sure you're using 
LOCAL_QUORUM for your queries instead of regular QUORUM.


All keyspaces are set to the replication strategy of SimpleStrategy.  Can I 
change the replication strategy after joining the new nodes in the DR to 
NetworkTopologyStategy with the updated replication factor for each dr?

Switch your keyspaces over to NetworkTopologyStrategy before adding the new 
nodes.  For the strategy options, just list the first dc until the second is up 
(e.g. {main_dc: 3}).


Lastly, is changing snitch from default of SimpleSnitch to RackInferringSnitch 
going to cause any issues?  Since its in the Cassandra.yaml file I assume a 
rolling restart to pick up the value would be ok?

This is the first thing you'll want to do.  Unless your node IPs would 
naturally put all nodes in a DC in the same rack, I recommend using 
PropertyFileSnitch, explicitly using the same rack.  (I tend to prefer PFSnitch 
regardless; it's harder to accidentally mess up.)  A rolling restart is 
required to pick up the change.  Make sure to fill out 
cassandra-topology.properties first if using PFSnitch.


This is all on Cassandra 1.1.4, Thanks for any help!





--
Tyler Hobbs
DataStax


Commit log periodic sync?

2012-08-23 Thread rubbish me
Hi all

First off, let's introduce the setup. 

- 6 x C* 1.1.2 in active DC (DC1), another 6 in another (DC2)
- keyspace's RF=3 in each DC
- Hector as client.
- client talks only to DC1 unless DC1 can't serve the request. In which case 
talks only to DC2
- commit log was periodically sync with the default setting of 10s. 
- consistency policy = LOCAL QUORUM for both read and write. 
- we are running on production linux VMs (not ideal but this is out of our 
hands)
-
As part of a DR exercise, we killed all 6 nodes in DC1, hector starts talking 
to DC2, all the data was still there, everything continued to work perfectly. 

Then we brought all nodes, one by one, in DC1 up. We saw a message saying all 
the commit logs were replayed. No errors reported.  We didn't run repair at 
this time. 

We noticed that data that was written an hour before the exercise, around the 
last memtables being flushed,was not found in DC1. 

If we understand correctly, commit logs are being written first and then to 
disk every 10s. At worst we lost the last 10s of data. What could be the cause 
of this behaviour? 

With the blessing of C* we could recovered all these data from DC2. But we 
would like to understand why. 

Many thanks in advanced. 

Amy




Re: Cassandra API Library.

2012-08-23 Thread Aaron Turner
+1 vote for Hector.

That said, don't use SuperColumns unless you really really know what
you're doing.

On Thu, Aug 23, 2012 at 4:25 AM, Amit Handa  wrote:
> hi,
>
> kindly let me know which java client api is more matured, and easy to use
> with all features(Super Columns, caching, pooling, etc) of Cassandra 1.X.
> Right now i come to know that following client exists:
>
> 1) Hector(Java)
> 2) Thrift (Java)
> 3) Kundera (Java)
>
>
> With Regards,
> Amit



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
Ha… how could I forget? =)
Adding it now.

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42    •
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Robin Verlangen 
Reply-To:  
Date:  Thursday, August 23, 2012 9:56 AM
To:  
Subject:  Re: Cassandra API Library.

@Brian: You're missing PhpCassa (PHP library)

With kind regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/8/23 Hiller, Dean 
> No problem, if you like SQL at all and don't mind adding a PARTITIONS
> clause, we have a raw ad-hoc layer(if you have properly added meta data
> which the ORM objects do for you but can be done manually) you get a query
> like this
> 
> PARTITIONS p('account56') SELECT tr FROM Trades as tr WHERE tr. price > 70;
> 
> So it queries just the partition of the Trades table.  We are still
> investigating how large partitions can be but we know it is quite large
> from previous nosql projects.
> 
> Dean
> 
> 
> On 8/23/12 7:51 AM, "Brian O'Neill"  wrote:
> 
>> >
>> >Thanks Dean… I hadn't played with that one.  I wonder if that would better
>> >fit the bill for the Spring Data Cassandra module I'm hacking on.
>> >https://github.com/boneill42/spring-data-cassandra
>> >
>> >I'll poke around.
>> >
>> >-brian
>> >
>> >---
>> >Brian O'Neill
>> >Lead Architect, Software Development
>> >
>> >Health Market Science
>> >The Science of Better Results
>> >2700 Horizon Drive • King of Prussia, PA • 19406
>> >M: 215.588.6024   • @boneill42
>>   •
>> >healthmarketscience.com 
>> >
>> >This information transmitted in this email message is for the intended
>> >recipient only and may contain confidential and/or privileged material. If
>> >you received this email in error and are not the intended recipient, or
>> >the person responsible to deliver it to the intended recipient, please
>> >contact the sender at the email above and delete this email and any
>> >attachments and destroy any copies thereof. Any review, retransmission,
>> >dissemination, copying or other use of, or taking any action in reliance
>> >upon, this information by persons or entities other than the intended
>> >recipient is strictly prohibited.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:
>> >
>>> >>playOrm has a raw layer that if your columns are not defined ahead of
>>> >>time
>>> >>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>>> >>added shortly BUT joins are for joining partitions so that your system
>>> >>can
>>> >>still scale to infinity.  Also has an in-memory database as well for unit
>>> >>testing that you can do TDD with built in.
>>> >>
>>> >>So if you like JQL but want infinite scale JQL, try playOrm.
>>> >>
>>> >>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>>> >>end of the year.
>>> >>
>>> >>Dean
>>> >>
>>> >>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>>> >>
 >>>
 >>>
 >>>We've used 'em all andŠ (IMHO)
 >>>
 >>>1) I would avoid Thrift directly.
 >>>2) Hector is a sure bet.
 >>>3) Astyanax is the up and comer.
 >>>4) Kundera is good, but works like an ORM -- so not so good if your
 >>>columns aren't defined ahead of time.
 >>>
 >>>-brian
 >>>
 >>>---
 >>>Brian O'Neill
 >>>Lead Architect, Software Development
 >>>
 >>>Health Market Science
 >>>The Science of Better Results
 >>>2700 Horizon Drive € King of Prussia, PA € 19406
 >>>M: 215.588.6024   € @boneill42
   €
 >>>healthmarketscience.com 
 >>>
 >>>This information transmitted in this email message is for the intended

Re: Cassandra API Library.

2012-08-23 Thread Robin Verlangen
@Brian: You're missing PhpCassa (PHP library)

With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/8/23 Hiller, Dean 

> No problem, if you like SQL at all and don't mind adding a PARTITIONS
> clause, we have a raw ad-hoc layer(if you have properly added meta data
> which the ORM objects do for you but can be done manually) you get a query
> like this
>
> PARTITIONS p('account56') SELECT tr FROM Trades as tr WHERE tr. price > 70;
>
> So it queries just the partition of the Trades table.  We are still
> investigating how large partitions can be but we know it is quite large
> from previous nosql projects.
>
> Dean
>
>
> On 8/23/12 7:51 AM, "Brian O'Neill"  wrote:
>
> >
> >Thanks Dean… I hadn't played with that one.  I wonder if that would better
> >fit the bill for the Spring Data Cassandra module I'm hacking on.
> >https://github.com/boneill42/spring-data-cassandra
> >
> >I'll poke around.
> >
> >-brian
> >
> >---
> >Brian O'Neill
> >Lead Architect, Software Development
> >
> >Health Market Science
> >The Science of Better Results
> >2700 Horizon Drive • King of Prussia, PA • 19406
> >M: 215.588.6024 • @boneill42   •
> >healthmarketscience.com
> >
> >This information transmitted in this email message is for the intended
> >recipient only and may contain confidential and/or privileged material. If
> >you received this email in error and are not the intended recipient, or
> >the person responsible to deliver it to the intended recipient, please
> >contact the sender at the email above and delete this email and any
> >attachments and destroy any copies thereof. Any review, retransmission,
> >dissemination, copying or other use of, or taking any action in reliance
> >upon, this information by persons or entities other than the intended
> >recipient is strictly prohibited.
> >
> >
> >
> >
> >
> >
> >
> >On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:
> >
> >>playOrm has a raw layer that if your columns are not defined ahead of
> >>time
> >>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
> >>added shortly BUT joins are for joining partitions so that your system
> >>can
> >>still scale to infinity.  Also has an in-memory database as well for unit
> >>testing that you can do TDD with built in.
> >>
> >>So if you like JQL but want infinite scale JQL, try playOrm.
> >>
> >>All 45 tests are passing.  We expect 100 unit tests to be in place by the
> >>end of the year.
> >>
> >>Dean
> >>
> >>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
> >>
> >>>
> >>>
> >>>We've used 'em all andŠ (IMHO)
> >>>
> >>>1) I would avoid Thrift directly.
> >>>2) Hector is a sure bet.
> >>>3) Astyanax is the up and comer.
> >>>4) Kundera is good, but works like an ORM -- so not so good if your
> >>>columns aren't defined ahead of time.
> >>>
> >>>-brian
> >>>
> >>>---
> >>>Brian O'Neill
> >>>Lead Architect, Software Development
> >>>
> >>>Health Market Science
> >>>The Science of Better Results
> >>>2700 Horizon Drive € King of Prussia, PA € 19406
> >>>M: 215.588.6024 € @boneill42   €
> >>>healthmarketscience.com
> >>>
> >>>This information transmitted in this email message is for the intended
> >>>recipient only and may contain confidential and/or privileged material.
> >>>If
> >>>you received this email in error and are not the intended recipient, or
> >>>the person responsible to deliver it to the intended recipient, please
> >>>contact the sender at the email above and delete this email and any
> >>>attachments and destroy any copies thereof. Any review, retransmission,
> >>>dissemination, copying or other use of, or taking any action in reliance
> >>>upon, this information by persons or entities other than the intended
> >>>recipient is strictly prohibited.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>On 8/23/12 7:40 AM, "Thomas Spengler" 
> >>>wrote:
> >>>
> 4) pelops (Thrift,Java)
> 
> On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
> > I would vote for Hector :)
> >
> > On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
> >wrote:
> >
> >> hi,
> >>
> >> kindly let me know which java client api is more matured, and easy
> >>to
> >>use
> >> with all features(Super Columns, caching, pooling, etc) of Cassandra
> >>1.X.
> >> Right now i come to know that following client exists:
> >>
> >> 1) Hector(Java)
> >> 2) Thrift (Java)
> >> 3) Kundera 

Re: Cassandra API Library.

2012-08-23 Thread Hiller, Dean
No problem, if you like SQL at all and don't mind adding a PARTITIONS
clause, we have a raw ad-hoc layer(if you have properly added meta data
which the ORM objects do for you but can be done manually) you get a query
like this

PARTITIONS p('account56') SELECT tr FROM Trades as tr WHERE tr. price > 70;

So it queries just the partition of the Trades table.  We are still
investigating how large partitions can be but we know it is quite large
from previous nosql projects.

Dean


On 8/23/12 7:51 AM, "Brian O'Neill"  wrote:

>
>Thanks Dean… I hadn't played with that one.  I wonder if that would better
>fit the bill for the Spring Data Cassandra module I'm hacking on.
>https://github.com/boneill42/spring-data-cassandra
>
>I'll poke around.
>
>-brian
>
>---
>Brian O'Neill
>Lead Architect, Software Development
> 
>Health Market Science
>The Science of Better Results
>2700 Horizon Drive • King of Prussia, PA • 19406
>M: 215.588.6024 • @boneill42   •
>healthmarketscience.com
>
>This information transmitted in this email message is for the intended
>recipient only and may contain confidential and/or privileged material. If
>you received this email in error and are not the intended recipient, or
>the person responsible to deliver it to the intended recipient, please
>contact the sender at the email above and delete this email and any
>attachments and destroy any copies thereof. Any review, retransmission,
>dissemination, copying or other use of, or taking any action in reliance
>upon, this information by persons or entities other than the intended
>recipient is strictly prohibited.
> 
>
>
>
>
>
>
>On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:
>
>>playOrm has a raw layer that if your columns are not defined ahead of
>>time
>>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>>added shortly BUT joins are for joining partitions so that your system
>>can
>>still scale to infinity.  Also has an in-memory database as well for unit
>>testing that you can do TDD with built in.
>>
>>So if you like JQL but want infinite scale JQL, try playOrm.
>>
>>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>>end of the year.
>>
>>Dean
>>
>>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>>
>>>
>>>
>>>We've used 'em all andŠ (IMHO)
>>>
>>>1) I would avoid Thrift directly.
>>>2) Hector is a sure bet.
>>>3) Astyanax is the up and comer.
>>>4) Kundera is good, but works like an ORM -- so not so good if your
>>>columns aren't defined ahead of time.
>>>
>>>-brian
>>>
>>>---
>>>Brian O'Neill
>>>Lead Architect, Software Development
>>> 
>>>Health Market Science
>>>The Science of Better Results
>>>2700 Horizon Drive € King of Prussia, PA € 19406
>>>M: 215.588.6024 € @boneill42   €
>>>healthmarketscience.com
>>>
>>>This information transmitted in this email message is for the intended
>>>recipient only and may contain confidential and/or privileged material.
>>>If
>>>you received this email in error and are not the intended recipient, or
>>>the person responsible to deliver it to the intended recipient, please
>>>contact the sender at the email above and delete this email and any
>>>attachments and destroy any copies thereof. Any review, retransmission,
>>>dissemination, copying or other use of, or taking any action in reliance
>>>upon, this information by persons or entities other than the intended
>>>recipient is strictly prohibited.
>>> 
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 8/23/12 7:40 AM, "Thomas Spengler" 
>>>wrote:
>>>
4) pelops (Thrift,Java)

On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
> I would vote for Hector :)
> 
> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>wrote:
> 
>> hi,
>>
>> kindly let me know which java client api is more matured, and easy
>>to
>>use
>> with all features(Super Columns, caching, pooling, etc) of Cassandra
>>1.X.
>> Right now i come to know that following client exists:
>>
>> 1) Hector(Java)
>> 2) Thrift (Java)
>> 3) Kundera (Java)
>>
>>
>> With Regards,
>> Amit
>>
> 


-- 
Thomas Spengler
Chief Technology Officer
---
-

TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
---
-
-
>>>
>>>
>>
>
>



Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill
FWIW.. I just threw this together...
http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html

Let me know if I missed any others. (I didn't have playorm on there)

-brian

On Thu, Aug 23, 2012 at 9:51 AM, Brian O'Neill  wrote:
>
> Thanks Dean… I hadn't played with that one.  I wonder if that would better
> fit the bill for the Spring Data Cassandra module I'm hacking on.
> https://github.com/boneill42/spring-data-cassandra
>
> I'll poke around.
>
> -brian
>
> ---
> Brian O'Neill
> Lead Architect, Software Development
>
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive • King of Prussia, PA • 19406
> M: 215.588.6024 • @boneill42   •
> healthmarketscience.com
>
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or
> the person responsible to deliver it to the intended recipient, please
> contact the sender at the email above and delete this email and any
> attachments and destroy any copies thereof. Any review, retransmission,
> dissemination, copying or other use of, or taking any action in reliance
> upon, this information by persons or entities other than the intended
> recipient is strictly prohibited.
>
>
>
>
>
>
>
> On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:
>
>>playOrm has a raw layer that if your columns are not defined ahead of time
>>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>>added shortly BUT joins are for joining partitions so that your system can
>>still scale to infinity.  Also has an in-memory database as well for unit
>>testing that you can do TDD with built in.
>>
>>So if you like JQL but want infinite scale JQL, try playOrm.
>>
>>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>>end of the year.
>>
>>Dean
>>
>>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>>
>>>
>>>
>>>We've used 'em all andŠ (IMHO)
>>>
>>>1) I would avoid Thrift directly.
>>>2) Hector is a sure bet.
>>>3) Astyanax is the up and comer.
>>>4) Kundera is good, but works like an ORM -- so not so good if your
>>>columns aren't defined ahead of time.
>>>
>>>-brian
>>>
>>>---
>>>Brian O'Neill
>>>Lead Architect, Software Development
>>>
>>>Health Market Science
>>>The Science of Better Results
>>>2700 Horizon Drive € King of Prussia, PA € 19406
>>>M: 215.588.6024 € @boneill42   €
>>>healthmarketscience.com
>>>
>>>This information transmitted in this email message is for the intended
>>>recipient only and may contain confidential and/or privileged material.
>>>If
>>>you received this email in error and are not the intended recipient, or
>>>the person responsible to deliver it to the intended recipient, please
>>>contact the sender at the email above and delete this email and any
>>>attachments and destroy any copies thereof. Any review, retransmission,
>>>dissemination, copying or other use of, or taking any action in reliance
>>>upon, this information by persons or entities other than the intended
>>>recipient is strictly prohibited.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 8/23/12 7:40 AM, "Thomas Spengler" 
>>>wrote:
>>>
4) pelops (Thrift,Java)

On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
> I would vote for Hector :)
>
> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>wrote:
>
>> hi,
>>
>> kindly let me know which java client api is more matured, and easy to
>>use
>> with all features(Super Columns, caching, pooling, etc) of Cassandra
>>1.X.
>> Right now i come to know that following client exists:
>>
>> 1) Hector(Java)
>> 2) Thrift (Java)
>> 3) Kundera (Java)
>>
>>
>> With Regards,
>> Amit
>>
>


--
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor

-
>>>
>>>
>>
>
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill

Thanks Dean… I hadn't played with that one.  I wonder if that would better
fit the bill for the Spring Data Cassandra module I'm hacking on.
https://github.com/boneill42/spring-data-cassandra

I'll poke around.

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42   •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/23/12 9:19 AM, "Hiller, Dean"  wrote:

>playOrm has a raw layer that if your columns are not defined ahead of time
>and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
>added shortly BUT joins are for joining partitions so that your system can
>still scale to infinity.  Also has an in-memory database as well for unit
>testing that you can do TDD with built in.
>
>So if you like JQL but want infinite scale JQL, try playOrm.
>
>All 45 tests are passing.  We expect 100 unit tests to be in place by the
>end of the year.
>
>Dean
>
>On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:
>
>>
>>
>>We've used 'em all andŠ (IMHO)
>>
>>1) I would avoid Thrift directly.
>>2) Hector is a sure bet.
>>3) Astyanax is the up and comer.
>>4) Kundera is good, but works like an ORM -- so not so good if your
>>columns aren't defined ahead of time.
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>> 
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42   €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 8/23/12 7:40 AM, "Thomas Spengler" 
>>wrote:
>>
>>>4) pelops (Thrift,Java)
>>>
>>>On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
 I would vote for Hector :)
 
 On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
wrote:
 
> hi,
>
> kindly let me know which java client api is more matured, and easy to
>use
> with all features(Super Columns, caching, pooling, etc) of Cassandra
>1.X.
> Right now i come to know that following client exists:
>
> 1) Hector(Java)
> 2) Thrift (Java)
> 3) Kundera (Java)
>
>
> With Regards,
> Amit
>
 
>>>
>>>
>>>-- 
>>>Thomas Spengler
>>>Chief Technology Officer
>>>
>>>
>>>TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>>>Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>>>thomas.speng...@toptarif.de | www.toptarif.de
>>>
>>>Amtsgericht Charlottenburg, HRB 113287 B
>>>Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>>>
>>>-
>>
>>
>




new type of join just discovered on cassandra

2012-08-23 Thread Hiller, Dean
With playOrm we have been researching partitioning and joining partitions for 
OLTP applications which you typically partition per client anyways such that 
you can have infinite clients.  Naturally, we have been looking at a lot of 
nested loop join, block nested loop join, sort merge join, and hash join.

We just discovered a new one that is truly nice in the noSQL world.  We call it 
the lookahead nested loop join.  It is one step better than the nested block 
join because data arrives before you end up at the top of the loop again.  
While you are looping over one batch, the next batch is being fetch(ie. 
Lookahead nested loop join).

We plan on incorporating that optimization into playOrm and testing joins in 
playOrm vs. hibernate joins on an RDBMS with 100k's of rows in a PARTITION 
(100k's of rows in a table for the hibernate test) to see how it pans out.  We 
may scale the test up to a join of 1,000,000 rows with 500k rows as well(not 
sure how far we will push it yet).

If you are interested, let me know and I can send you results of our join tests.

Dean


Re: Cassandra API Library.

2012-08-23 Thread Hiller, Dean
playOrm has a raw layer that if your columns are not defined ahead of time
and SQL with no limitations on <, =, <=, etc. etc. as well as joins being
added shortly BUT joins are for joining partitions so that your system can
still scale to infinity.  Also has an in-memory database as well for unit
testing that you can do TDD with built in.

So if you like JQL but want infinite scale JQL, try playOrm.

All 45 tests are passing.  We expect 100 unit tests to be in place by the
end of the year.

Dean

On 8/23/12 6:46 AM, "Brian O'Neill"  wrote:

>
>
>We've used 'em all andŠ (IMHO)
>
>1) I would avoid Thrift directly.
>2) Hector is a sure bet.
>3) Astyanax is the up and comer.
>4) Kundera is good, but works like an ORM -- so not so good if your
>columns aren't defined ahead of time.
>
>-brian
>
>---
>Brian O'Neill
>Lead Architect, Software Development
> 
>Health Market Science
>The Science of Better Results
>2700 Horizon Drive € King of Prussia, PA € 19406
>M: 215.588.6024 € @boneill42   €
>healthmarketscience.com
>
>This information transmitted in this email message is for the intended
>recipient only and may contain confidential and/or privileged material. If
>you received this email in error and are not the intended recipient, or
>the person responsible to deliver it to the intended recipient, please
>contact the sender at the email above and delete this email and any
>attachments and destroy any copies thereof. Any review, retransmission,
>dissemination, copying or other use of, or taking any action in reliance
>upon, this information by persons or entities other than the intended
>recipient is strictly prohibited.
> 
>
>
>
>
>
>
>On 8/23/12 7:40 AM, "Thomas Spengler"  wrote:
>
>>4) pelops (Thrift,Java)
>>
>>On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
>>> I would vote for Hector :)
>>> 
>>> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>>>wrote:
>>> 
 hi,

 kindly let me know which java client api is more matured, and easy to
use
 with all features(Super Columns, caching, pooling, etc) of Cassandra
1.X.
 Right now i come to know that following client exists:

 1) Hector(Java)
 2) Thrift (Java)
 3) Kundera (Java)


 With Regards,
 Amit

>>> 
>>
>>
>>-- 
>>Thomas Spengler
>>Chief Technology Officer
>>
>>
>>TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>>Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>>thomas.speng...@toptarif.de | www.toptarif.de
>>
>>Amtsgericht Charlottenburg, HRB 113287 B
>>Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>>-
>
>



Re: Cassandra API Library.

2012-08-23 Thread Brian O'Neill


We've used 'em all andŠ (IMHO)

1) I would avoid Thrift directly.
2) Hector is a sure bet.
3) Astyanax is the up and comer.
4) Kundera is good, but works like an ORM -- so not so good if your
columns aren't defined ahead of time.

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42   €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 8/23/12 7:40 AM, "Thomas Spengler"  wrote:

>4) pelops (Thrift,Java)
>
>On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
>> I would vote for Hector :)
>> 
>> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa 
>>wrote:
>> 
>>> hi,
>>>
>>> kindly let me know which java client api is more matured, and easy to
>>>use
>>> with all features(Super Columns, caching, pooling, etc) of Cassandra
>>>1.X.
>>> Right now i come to know that following client exists:
>>>
>>> 1) Hector(Java)
>>> 2) Thrift (Java)
>>> 3) Kundera (Java)
>>>
>>>
>>> With Regards,
>>> Amit
>>>
>> 
>
>
>-- 
>Thomas Spengler
>Chief Technology Officer
>
>
>TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
>Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
>thomas.speng...@toptarif.de | www.toptarif.de
>
>Amtsgericht Charlottenburg, HRB 113287 B
>Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
>-




Re: Cassandra API Library.

2012-08-23 Thread Thomas Spengler
4) pelops (Thrift,Java)

On 08/23/2012 01:28 PM, Baskar Sikkayan wrote:
> I would vote for Hector :)
> 
> On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa  wrote:
> 
>> hi,
>>
>> kindly let me know which java client api is more matured, and easy to use
>> with all features(Super Columns, caching, pooling, etc) of Cassandra 1.X.
>> Right now i come to know that following client exists:
>>
>> 1) Hector(Java)
>> 2) Thrift (Java)
>> 3) Kundera (Java)
>>
>>
>> With Regards,
>> Amit
>>
> 


-- 
Thomas Spengler
Chief Technology Officer


TopTarif Internet GmbH, Pappelallee 78-79, D-10437 Berlin
Tel.: (030) 2000912 0 | Fax: (030) 2000912 100
thomas.speng...@toptarif.de | www.toptarif.de

Amtsgericht Charlottenburg, HRB 113287 B
Geschäftsführer: Dr. Rainer Brosch, Dr. Carolin Gabor
-


Re: Cassandra API Library.

2012-08-23 Thread Baskar Sikkayan
I would vote for Hector :)

On Thu, Aug 23, 2012 at 4:55 PM, Amit Handa  wrote:

> hi,
>
> kindly let me know which java client api is more matured, and easy to use
> with all features(Super Columns, caching, pooling, etc) of Cassandra 1.X.
> Right now i come to know that following client exists:
>
> 1) Hector(Java)
> 2) Thrift (Java)
> 3) Kundera (Java)
>
>
> With Regards,
> Amit
>


Data Modelling Suggestions

2012-08-23 Thread Roshni Rajagopal
Hi,

Need some help on a data modelling question. We're using Hector & Datastax 
Enterprise 2.1.


I want to associate a list of items for a user. It should be sorted on the time 
added. And items can be updated (quantity of the item can be changed), and 
items can be deleted.
I can model it like this so that its denormalized and I get all my information 
in one go from one row, sorted by time added. I can use composite columns.

Row key: User Id
Column Name: TimeUUID:item ID: Item Name: Item Description: Item Price: Item Qty
Column Value : Null

Now, how do I handle manipulations

 1.  Add new item :Easy , just a new column
 2.  Add exiting item or modify qty: I want to get to the correct column to 
update . Can I search by second column in the composite column (equals 
condition) & update the column name itself to reflect new TimeUUID and qty?  Or 
would it be better to just add it as a new column and always use the latest 
column for an item in the application code and delete duplicates in the 
background.
 3.  Delete item: Can I search by second column in the composite column to find 
the correct column to delete?

I was trying to find hector examples where we search for second column in a 
composite column, but I couldn't find any good one. Im not sure if its 
possible.…if you have any do have any example please share.

Regards,
Roshni


This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: Cassandra 1.1.4 RPM required

2012-08-23 Thread Adeel Akbar
Dear Aaron, Its required username and password which I have not. Can yo 
share direct link?



Thanks & Regards

*Adeel**Akbar*

On 8/23/2012 3:02 PM, aaron morton wrote:

See step 1 here http://wiki.apache.org/cassandra/GettingStarted

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2012, at 7:40 PM, Adeel Akbar > wrote:



Hi,

I would like to install Apache Cassandra 1.1.4 from RPM. Please share 
a link to download rpm for CentOS (x86_64) and (i386).


--


Thanks & Regards

*Adeel**Akbar*







Re: Cassandra 1.1.4 RPM required

2012-08-23 Thread aaron morton
See step 1 here http://wiki.apache.org/cassandra/GettingStarted

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2012, at 7:40 PM, Adeel Akbar  wrote:

> Hi,
> 
> I would like to install Apache Cassandra 1.1.4 from RPM.  Please share a link 
> to download rpm for CentOS (x86_64) and (i386).
> 
> -- 
> 
> Thanks & Regards
> 
> Adeel Akbar
> 



Re: Deleting a row from a counter CF

2012-08-23 Thread aaron morton
I would guess that Pelops has called remove() on the Thrift API rather than 
remove_counter(). 

Check the code in Pelops. If you turn server side logging up to DEBUG it will 
log "remove" for the non counter call and "remove_counter" for the counter one. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2012, at 7:36 AM, Oleg Dulin  wrote:

> I get this:
> 
> InvalidRequestException(why:invalid operation for commutative columnfamily
> 
> Any thoughts ?
> 
> We use Pelops…



Re: nodetool repair - when is it not needed ?

2012-08-23 Thread aaron morton
HH works to a point. Specifically, it only collects hints for the first hour 
the node is down and it has a safety valve to avoid the node collecting hints 
getting overwhelmed. Looking at the code it takes a bit for that the trip and 
you would get a TimeoutException coming back. 

Also when hints are replayed they are sent of as mutations, which may still be 
dropped by the target if they are not serviced before rpc_timeout. Sending 
nodes throttle their requests so it's unlikely but possible. 

HH is is much more robust, but AFAIK repair is still _the_ way to ensure on 
disk consistency. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2012, at 6:59 AM, Rob Coli  wrote:

> On Wed, Aug 22, 2012 at 8:37 AM, Senthilvel Rangaswamy
>  wrote:
>> We are running Cassandra 1.1.2 on EC2. Our database is primarily all
>> counters and we don't do any
>> deletes.
>> 
>> Does nodetool repair do anything for such a database. All the docs I read
>> for nodetool repair suggests
>> that nodetool repair is needed only if there is deletes.
> 
> Since 1.0, repair is only needed if a node crashes. If a node crashes,
> my understanding is that a cluster-wide repair (with -pr on each node)
> is required, because the crashed node could have lost a hint for any
> other node.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-2034
> 
> =Rob
> 
> -- 
> =Robert Coli
> AIM>ALK - rc...@palominodb.com
> YAHOO - rcoli.palominob
> SKYPE - rcoli_palominodb



Re: Facing problem while configuring key and row cache

2012-08-23 Thread aaron morton
Answered in the other email as well, cache stats are in nodetool info. 

But you also have to set the "caching" property of the CF. By default it's 
keys_only. It should be documented in the cassanra-cli and the CQL docs 
http://www.datastax.com/docs/1.1/references/cql/index

cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/08/2012, at 2:18 AM, Amit Handa  wrote:

> hi all,
> 
> I am exploring apache cassandra 1.1.3. I am facing problem with how to 
> improve performance of cassandra using caching configurations.
> 
> I modified key_cache_size_in_mb and row_cache_size_in_mb values inside 
> cassandra.yaml. 
> key_cache_size_in_mb: 20008
>  row_cache_size_in_mb: 14000
> 
> 
> 
> Please let me know how to verify that the setting for key_cache_size and 
> row_chache_size has taken place. 
> 
> 
> When i am checking that this particular configuration are really been 
> configured using command:  
> ./nodetool -h 107.108.189.212 cfstats
> 
> it's showing following results for keySpace DemoUser and column Family Users:
> Keyspace: DemoUser 
> Read Count: 21914 
> Read Latency: 0.08268495026010769 ms. 
> Write Count: 87656 
> Write Latency: 0.06009481381765082 ms. 
> Pending Tasks: 0 
> Column Family: Users 
> SSTable count: 1 
> Space used (live): 1573335 
> Space used (total): 1573335 
> Number of Keys (estimate): 22016 
> Memtable Columns Count: 0 
> Memtable Data Size: 0 
> Memtable Switch Count: 1 
> Read Count: 21914 
> Read Latency: 0.083 ms. 
> Write Count: 87656 
> Write Latency: 0.060 ms. 
> Pending Tasks: 0 
> Bloom Filter False Postives: 0 
> Bloom Filter False Ratio: 0.0 
> Bloom Filter Space Used: 41104 
> Compacted row minimum size: 150 
> Compacted row maximum size: 179 
> Compacted row mean size: 179 
> 
> I am unable to see the effect of key_cache_size_in_mb and 
> row_cache_size_in_mb.
> 
> With Regards,
> Amit
> 



Cassandra 1.1.4 RPM required

2012-08-23 Thread Adeel Akbar

Hi,

I would like to install Apache Cassandra 1.1.4 from RPM.  Please share a 
link to download rpm for CentOS (x86_64) and (i386).


--


Thanks & Regards

*Adeel**Akbar*