Re: State Of: CQL - driver devs

2011-03-21 Thread Ted Zlatanov
On Sun, 20 Mar 2011 19:56:39 -0500 Eric Evans  wrote: 

EE> (Hopefully )for the next version, we'll replace Thrift with a dedicated
EE> protocol, one that eliminates the Thrift dependency, and more
EE> importantly, implements streaming.  This should be transparent to
EE> applications for the most part though.

That would be wonderful.  I hope you'll consider HTTP as the transport
protocol.  But regardless CQL (or whatever it's called in the end) is
going to be a great feature for Cassandra.  Thank you for working on it.

Ted



Re: Reducing confusion around client libraries

2010-12-13 Thread Ted Zlatanov
On Sun, 12 Dec 2010 01:56:17 +0100 Bjorn Borud  wrote: 

BB> (users ought to be named, because an anonymous "upvote" or "downvote"
BB> conveys next to no meaningful information to me)

Alternatively the votes could be kept as two separate sets for
authenticated vs. anonymous users.

Ted


Re: NoSQL, YesCQL?

2010-10-29 Thread Ted Zlatanov
On Fri, 29 Oct 2010 10:07:43 -0500 (CDT) "Stu Hood"  
wrote: 

SH> Most reasonable languages these days have a way to define what looks
SH> like a DSL: giving people a text DSL which is subject to injection
SH> attacks and can't be type checked without support from a client
SH> driver anyway is brain dead. 

I don't think SQL-like query languages are DSLs in the classic sense.

Injection attacks are a red herring: they are a client issue, not a
library or a server problem.  Type checking is a valid complaint and I
think it's balanced out by the flexibility of a text protocol.

SH> Regarding performance: assuming optimized RPC libraries (which we do
SH> not yet have in Avro, and which Thrift is getting better at),
SH> serializing to a string and back will never be as performant as
SH> using a pre-parsed representation of the statement on both
SH> sides. "Oh but we can add prepared statements!" Poppycock.

Consider that Cassandra's statements will never be as complicated as a
regular RDBMS, so parsing them efficiently is not so hard.  The
parameters can be attached to the query, not necessarily inlined.  A
native JDBC level 4 driver could be a very efficient answer to this
problem, too.

SH> The stated problem is that backwards compatibility is hard to
SH> provide: if that is the core complaint, then changing to a text
SH> based serialization format with a sexy name in order to add
SH> backwards compatibility is a severe overreaction to the
SH> problem. Instead, I would propose evolving the API in a manner that
SH> simplifies it.

I think it would be great to allow multiple APIs in Cassandra (when I
proposed it in the past, it was not allowed, and AFAIK still isn't
beyond Avro and Thrift).  Then this wouldn't be an yes-or-no choice and
the Thrift API would still be available to those who need it.

Ted



Re: NoSQL, YesCQL?

2010-10-29 Thread Ted Zlatanov
On Fri, 29 Oct 2010 09:29:43 -0500 Gary Dusbabek  wrote: 

GD> 2010/10/29 Ted Zlatanov :
>> On Thu, 28 Oct 2010 14:46:15 -0700 Chip Salzenberg  wrote:
>> 
CS> Short answer: "YES Please, but we will still want a side channel for
CS> minimum overhead."
>> 
>> 100% agreed on both counts.  But IIRC the fastest side channel is to
>> become a Cassandra node.  Is that an option?

GD> Yes.  We call it the fat client.  It's a non-storage
GD> gossip-participating cassandra node that speaks directly in terms of
GD> RowMutations, etc.

Sorry, I meant to ask Chip "is that an option for you, as opposed to a
general side channel usable from any language?"

Ted



Re: NoSQL, YesCQL?

2010-10-29 Thread Ted Zlatanov
On Thu, 28 Oct 2010 14:46:15 -0700 Chip Salzenberg  wrote: 

CS> Short answer: "YES Please, but we will still want a side channel for
CS> minimum overhead."

100% agreed on both counts.  But IIRC the fastest side channel is to
become a Cassandra node.  Is that an option?

CS> Long answer: Query languages only work reliably when you have data
CS> binding assistance (insert "Bobby Tables" xkcd here).  However, they do
CS> have the wonderful property of evolving aggressively without requiring
CS> upgrades of the driver plumbing.  This is, of course, emphatically *not*
CS> true of anything like the current Thrift and Avro interfaces.  So that's
CS> why I say "Yes."  On the other hand, a very simple interface for very
CS> simple queries has a lot of value, too; see, for example,
CS> 
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
  
CS> So that's why I think we will still want to bypass the full language for
CS> minimum latency in some circumstances.

I think the sane, reasonable, simple path is to make the query language
as similar to SQL as possible (which EricQL seems to aim for).  Just
making the queries pure text would be terrific, in any case.  Then a
JDBC driver or a Perl DBD driver (and their parallels in Ruby, Python,
etc.) would be so much easier to write and Cassandra clients wouldn't
have to be so damn complicated.  So I'd rather see specialized tools for
minimum latency and overhead, especially for inserts and dumps (like
MySQL provides mysqlinsert and mysqldump).

Ted



Re: admin web UI

2010-05-07 Thread Ted Zlatanov
On Fri, 7 May 2010 09:24:40 +0200 gabriele renzi  wrote: 

gr> On Thu, May 6, 2010 at 7:00 PM, Nathan McCall  
wrote:
>> FYI - I asked a similar question in #cassandra-dev yesterday (based on
>> this message thread actually) and was directed to this issue:
>> https://issues.apache.org/jira/browse/CASSANDRA-754

gr> Interesting, but it seems the objection is more on the geneal idea of
gr> "multiple APIs are a pain to mantain" than on the idea of having a
gr> simple way for plugging external components with a lifecycle.
gr> Maybe there is still space for such a minor patch?

gr> (Also, though I understand the reasoning "one API should be enough for
gr> everyone and two is too much for us" I don't see why such an option
gr> should be ruled out for third parties, but I guess other people have
gr> already put more thought than me in this)

I'm sure if Cassandra maintainers heard from other people besides myself
in favor of multiple APIs they would be more likely to listen.  I've
argued enough on that ticket.

FWIW I would again contribute in that direction if there was a chance of
acceptance and I am concerned about the operational risks of the Thrift
API and with the state of the Avro API.

Ted



Re: loading schema in trunk

2010-04-19 Thread Ted Zlatanov
On Tue, 13 Apr 2010 10:19:23 -0500 Ted Zlatanov  wrote: 

TZ> I think everyone agrees loadSchemaFromXML can go away after 0.7 but just
TZ> to be clear, you don't think Cassandra after 0.7 should come bundled
TZ> with a tool that can dump, clear, and restore the schema?  It's trivial
TZ> to implement some very basic support for that without trying to provide
TZ> a full management tool.

TZ> I think it would be a big help for new users, troubleshooting (because
TZ> you don't depend on tollkit X or language Y to know the true schema from
TZ> the server's POV), those who want to share schema definitions and tests
TZ> without external dependencies, and sysadmins who don't want to install
TZ> another language to do a schema backup.

I have had no reply to this so I'll just do it from the Perl side; I
opened https://issues.apache.org/jira/browse/CASSANDRA-979 which is
necessary to implement this tool from the client side.  cassidy.pl,
which is bundled with Net::Cassandra::Easy, already supports keyspace
and family define/rename/delete operations so if this ticket is done
then it (and other Thrift clients) can do schema introspection.

At least in cassidy.pl I've implemented these commands:

(with active keyspace "system")
kdefine testks org.apache.cassandra.locator.RackUnawareStrategy 1 
org.apache.cassandra.locator.EndPointSnitch'
krename testks testks2
kdelete testks2

(with active keyspace "testks")
fdefine testcf Super LongType BytesType 
comment=statuschanges,row_cache_size=0,key_cache_size=2'
frename testcf testcf2
fdelete testcf2

but as I said, it would be nice if there was a neutral format to express
this schema.  YAML would be best.

Ted



Re: need help regarding Cassandra Setup in eclipse

2010-04-16 Thread Ted Zlatanov
On Fri, 16 Apr 2010 18:17:48 +0500 bilal ahmed  
wrote: 

ba> hi
ba>   i have started playing with Cassandra from couple of  days. i downloaded
ba> its binary and configured it successfully now i want to contribute in this
ba> project but i m unable to configure

ba>   its source code in eclipse. i followed same steps which were written on
ba> this page  "http://wiki.apache.org/cassandra/RunningCassandraInEclipse"; but
ba> i m facing some issues.

ba>   issue is...

ba>   when i check out the code its directory structure looks like this...

ba>   project-name
ba> |
ba>  src -> java ->org -> apache ->cassandra

ba>   but when i open any class its package statement looks like this...

ba>package org.apache.cassandra.auth; or any other package

ba>   so here eclipse gives me error "*org.apache.cassandra.auth" does not match
ba> the expected package "java.org.apache.cassandra.auth*" i tired a lot but i m
ba> unable to resolve it.

Your source folder should not be "src" but "src/java"

You also need to add the "interface/thrift/gen-java" and "test/unit"
source folders (only the former is necessary but the latter is good to
have for code searches and to run the tests).

Ted



dropped keyspace directories

2010-04-13 Thread Ted Zlatanov
Should the dropped keyspace directories, being empty, get rmdir()ed?
When I run tests against a server the directory gets polluted but it's
not really a bug so I wasn't sure if it's worth a ticket.

Ted



Re: loading schema in trunk

2010-04-13 Thread Ted Zlatanov
On Tue, 13 Apr 2010 09:08:44 -0500 Eric Evans  wrote: 

EE> On Tue, 2010-04-13 at 08:44 -0500, Gary Dusbabek wrote:
>> 2010/4/13 Ted Zlatanov :
>> > Should the functionality be exposed only through JMX, through nodetool,
>> > or through cassandra-cli?  I'll create the ticket if you like and then I
>> > or whoever wants to can work on it.

>> I prefer thrift (and nodetool), but I'd like to hear thoughts from the
>> community.

EE> If we're going to do this, I suggest a separate utility scoped at
EE> migrating existing definitions from an 0.6 configuration (and nothing
EE> else), and then deprecate it right off the bat (read: for removal in
EE> 0.8).

EE> The point here is to be as clear as possible that it is transient, and
EE> shouldn't be adopted as a management tool.

I think everyone agrees loadSchemaFromXML can go away after 0.7 but just
to be clear, you don't think Cassandra after 0.7 should come bundled
with a tool that can dump, clear, and restore the schema?  It's trivial
to implement some very basic support for that without trying to provide
a full management tool.

I think it would be a big help for new users, troubleshooting (because
you don't depend on tollkit X or language Y to know the true schema from
the server's POV), those who want to share schema definitions and tests
without external dependencies, and sysadmins who don't want to install
another language to do a schema backup.

Ted



Re: loading schema in trunk

2010-04-13 Thread Ted Zlatanov
On Tue, 13 Apr 2010 07:57:42 -0500 Gary Dusbabek  wrote: 

GD> 2010/4/13 Ted Zlatanov :
>> I agree loadSchemaFromXML should go away, although IMHO there should be
>> an easy way through something bundled with Cassandra (nodetool or
>> cassandra-cli) to dump, wipe, and restore the schema even though the
>> general schema support is punted to external tools.  Are you against
>> providing even that rudimentary support?

GD> Not at all.

Should the functionality be exposed only through JMX, through nodetool,
or through cassandra-cli?  I'll create the ticket if you like and then I
or whoever wants to can work on it.

Ted



Re: loading schema in trunk

2010-04-13 Thread Ted Zlatanov
On Mon, 12 Apr 2010 20:25:15 -0500 Gary Dusbabek  wrote: 

GD> 2010/4/12 Ted Zlatanov :
>> 
>> OK, I was hoping nodetool would support that operation.  I wanted to use
>> something on the same machine as the Cassandra instance so I can
>> automate a complete install in QA, and jconsole won't work unattended
>> AFAIK.  I don't know JMX well so I'll look for something suitable;
>> recommendations are welcome.

GD> This was deliberate.  I fully intend to deprecate loadSchemaFromXML in
GD> 0.7+1 and remove it completely in 0.7+2.  Hopefully by then the tool
GD> support (provided by high-level clients) will be such that updating
GD> the schema using thrift is a no-brainer.

I already started work on it in Net::Cassandra::Easy but needed to keep
things going with our current setup and jconsole wasn't working.

I agree loadSchemaFromXML should go away, although IMHO there should be
an easy way through something bundled with Cassandra (nodetool or
cassandra-cli) to dump, wipe, and restore the schema even though the
general schema support is punted to external tools.  Are you against
providing even that rudimentary support?

Ted



Re: loading schema in trunk

2010-04-12 Thread Ted Zlatanov
On Tue, 13 Apr 2010 00:15:32 +0100 Ryan Daum  wrote: 

RD> jmxterm is a nice cli jmx tool

Sweet.  And it supports tab-completion to boot.  Thanks!

Ted



Re: loading schema in trunk

2010-04-12 Thread Ted Zlatanov
On Mon, 12 Apr 2010 17:28:12 -0500 Eric Evans  wrote: 

EE> On Mon, 2010-04-12 at 17:16 -0500, Ted Zlatanov wrote:
>> In my checkout, nodetool can't load the schemas as the wiki suggests
>> for 0.6 upgrades.  Is that coming or planned?  Or is the user supposed
>> to put together their own JMX invocation manually? 

EE> I can't see where the wiki suggests that. This is the first I'd heard of
EE> using nodetool.

Those wese two separate thoughts that got merged, sorry:

1) nodetool can't load the schemas

2) the wiki suggests to load the schemas for 0.6 upgrades

EE> Any general purpose JMX client should work; I used jconsole.

OK, I was hoping nodetool would support that operation.  I wanted to use
something on the same machine as the Cassandra instance so I can
automate a complete install in QA, and jconsole won't work unattended
AFAIK.  I don't know JMX well so I'll look for something suitable;
recommendations are welcome.

Thanks
Ted



Re: loading schema in trunk

2010-04-12 Thread Ted Zlatanov
On Mon, 12 Apr 2010 13:08:29 -0500 Ted Zlatanov  wrote: 

>>> In trunk, the schema is loaded through Thrift.  Is there a way to load
>>> it from the storage-conf.xml AKA cassandra.xml file without writing
>>> custom code?

In my checkout, nodetool can't load the schemas as the wiki suggests for
0.6 upgrades.  Is that coming or planned?  Or is the user supposed to
put together their own JMX invocation manually?

Thanks
Ted



Re: loading schema in trunk

2010-04-12 Thread Ted Zlatanov
On Mon, 12 Apr 2010 12:39:49 -0500 Eric Evans  wrote: 

EE> On Mon, 2010-04-12 at 11:50 -0500, Ted Zlatanov wrote:
>> In trunk, the schema is loaded through Thrift.  Is there a way to load
>> it from the storage-conf.xml AKA cassandra.xml file without writing
>> custom code?

EE> http://wiki.apache.org/cassandra/LiveSchemaUpdates

Thanks, Eric.  Thanks to Gary as well for doing all that work and
documenting it.  I didn't know about this page, though; I'll subscribe
to the wiki's Recent Changes feed.

Also, the StorageConfiguration page should probably point to
LiveSchemaUpdates as well.

Ted




loading schema in trunk

2010-04-12 Thread Ted Zlatanov
In trunk, the schema is loaded through Thrift.  Is there a way to load
it from the storage-conf.xml AKA cassandra.xml file without writing
custom code?

Thanks
Ted



Re: Thrift out of memory crashes

2010-03-26 Thread Ted Zlatanov
On Fri, 26 Mar 2010 09:44:23 -0500 Jonathan Ellis  wrote: 

JE> The workarounds we can apply at the Cassandra level have too high a
JE> cost:benefit ratio.  The long term fix is to move to Avro.

Can you list the workarounds you've considered?

Is TBinaryProtocol.setReadLength completely useless?

Can we at least do a minimal sanity check of incoming messages?

When the benefit is "you won't crash because someone telnetted to the
wrong port" I'm willing to pay a pretty a high cost.

Ted



Re: Thrift out of memory crashes

2010-03-26 Thread Ted Zlatanov
On Fri, 26 Mar 2010 07:48:43 -0500 Jonathan Ellis  wrote: 

JE> 2010/3/26 Ted Zlatanov :
>> I know this has been discussed in tickets and here previously.  I just
>> wanted to comment on it because of the upcoming 0.6 release.
>> 
>> In my environment I patch Cassandra to prevent the OOM errors from
>> malformed incoming Thrift data, which as everyone knows let anyone crash
>> the servers hard with a netcat invocation.  For those who don't know the
>> story, see https://issues.apache.org/jira/browse/THRIFT-601
>> 
>> I think the OOM guard should be in the Cassandra releases, at least as
>> an option.  Just because Thrift doesn't give us airbags doesn't mean we
>> don't need brakes.

JE> Catching OOME is a bug, not a fix.  OOME is the JVM saying "I give up;
JE> you're screwed."  The JVM isn't stable anymore.

I didn't know that, thanks for explaining.  I thought the JVM could
recover.  

Can we patch the Thrift-generated Java code, at least, set the read
length, or do something else?  I hate to give up on this just because
Thrift is broken (as we've discussed, there's no viable Thrift
replacement yet, and we won't allow users to replace the Thrift API with
their own implementation as I proposed with IPluggableAPI).

Thanks
Ted



Thrift out of memory crashes

2010-03-26 Thread Ted Zlatanov
I know this has been discussed in tickets and here previously.  I just
wanted to comment on it because of the upcoming 0.6 release.

In my environment I patch Cassandra to prevent the OOM errors from
malformed incoming Thrift data, which as everyone knows let anyone crash
the servers hard with a netcat invocation.  For those who don't know the
story, see https://issues.apache.org/jira/browse/THRIFT-601

I think the OOM guard should be in the Cassandra releases, at least as
an option.  Just because Thrift doesn't give us airbags doesn't mean we
don't need brakes.

Ted



Re: Standardizing Timestamps Across Clients

2010-03-19 Thread Ted Zlatanov
On Thu, 18 Mar 2010 13:20:34 -0700 Michael Malone  wrote: 

MM> A standard default would be nice, but while we're making
MM> recommendations I'd also suggest that client libs should make this
MM> parameter easy to override. Client apps can do lots of interesting
MM> things by setting timestamps explicitly. You can get a sort of quasi- 
MM> transaction by using the same timestamp for a set of operations, for
MM> example.

That's a good idea.  I made the change in Net::Cassandra::Easy 0.05 (you
just pass a subroutine reference to the constructor if you don't want
the default microseconds).

Thanks
Ted



Re: Standardizing Timestamps Across Clients

2010-03-18 Thread Ted Zlatanov
On Thu, 18 Mar 2010 02:36:35 -0500 Jonathan Hseu  wrote: 

JH> Jonathan Ellis suggested that I bring this issue to the dev mailing list:
JH> Cassandra should recommended a default timestamp across all clients
JH> libraries.
...
JH> Here's what different clients are using:

JH> 1. Cassandra CLI: Milliseconds since UTC epoch.
JH> 2. lazyboy: Seconds since UTC epoch.  It used to be seconds since local time
JH> epoch.  Now it's changing again to microseconds since UTC epoch.
JH> 3. driftx's client: Milliseconds since UTC epoch.
JH> 4. The example app, Twissandra: Microseconds since UTC epoch.
JH> 5. pycassa: Microseconds since UTC epoch.  It used to be seconds since local
JH> time epoch.
JH> 6. The most popular Cassandra Ruby client: Microseconds since UTC epoch.

It's good to standardize :)

In Perl land, Net::Cassandra::Easy is using seconds but should be using
microseconds.  I'll change it for 0.4 (the underlying Thrift code will
DTRT for the 64-bit encoding using Bit::Vector).  Net::Cassandra uses
seconds and should also be changed; CC-d to that module's maintainer.

Ted



Re: GMane groups updated with new mailing list addresses

2010-03-17 Thread Ted Zlatanov
On Tue, 16 Mar 2010 14:26:10 -0500 Jonathan Ellis  wrote: 

JE> 2010/3/16 Ted Zlatanov :
>> I requested this yesterday and it's done: you can read the mailing lists
>> through GMane again, they are changed to the new addresses.  The
>> addresses are (NNTP protocol)
>> 
>> news.gmane.org:gmane.comp.db.cassandra.devel
>> news.gmane.org:gmane.comp.db.cassandra.user
>> 
>> This is very convenient if you don't want to subscribe to the mailing
>> lists.

JE> Thanks for getting that done!

No problem.  The dev group is showing cross posts from the ActiveMQ
group due to a misconfiguration and I've already told the GMane admins,
but Cassandra dev articles are flowing so this is a temporary
inconvenience.  The user list/group gateway is working great.

Ted



GMane groups updated with new mailing list addresses

2010-03-16 Thread Ted Zlatanov
I requested this yesterday and it's done: you can read the mailing lists
through GMane again, they are changed to the new addresses.  The
addresses are (NNTP protocol)

news.gmane.org:gmane.comp.db.cassandra.devel
news.gmane.org:gmane.comp.db.cassandra.user

This is very convenient if you don't want to subscribe to the mailing
lists.

HTH
Ted



Re: thinking about dropping hinted handoff

2010-03-15 Thread Ted Zlatanov
On Wed, 10 Mar 2010 15:59:55 -0600 Jonathan Ellis  wrote: 

JE> Read-only for a specific client is completely different from trying to
JE> read-only the entire node / cluster.  So no, nothing wrong with that.

Cool, thanks.  See CASSANDRA-900 for my proposal.

Ted