Re: user / password authentication advice

2013-12-11 Thread onlinespending
OK, thanks for getting me going in the right direction. I imagine most people would store password and tokenized authentication information in a single table, using the username (e.g. email address) as the key? On Dec 11, 2013, at 10:44 PM, Janne Jalkanen wrote: > > Hi! > > You're right, th

Re: user / password authentication advice

2013-12-11 Thread Janne Jalkanen
Hi! You're right, this isn't really Cassandra-specific. Most languages/web frameworks have their own way of doing user authentication, and then you just typically write a plugin that just stores whatever data the system needs in Cassandra. For example, if you're using Java (or Scala or Groovy

Re: user / password authentication advice

2013-12-11 Thread Aaron Morton
Not sure if you are asking about the authentication & authorisation in cassandra or how to implemented the same using cassandra. info on the cassandra authentication and authorisation is here http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/security/securityTOC.h

Re: Write performance with 1.2.12

2013-12-11 Thread Aaron Morton
> It is the write latency, read latency is ok. Interestingly the latency is low > when there is one node. When I join other nodes the latency drops about 1/3. > To be specific, when I start sending traffic to the other nodes the latency > for all the nodes increases, if I stop traffic to other n

Re: efficient way to store 8-bit or 16-bit value?

2013-12-11 Thread Aaron Morton
> What do people recommend I do to store a small binary value in a column? I’d > rather not simply use a 32-bit int for a single byte value. blob is a byte array or you could use the varint, a variable length integer, but you probably want the blob. cheers - Aaron Morton New Z

Re: Bulkoutputformat

2013-12-11 Thread Aaron Morton
If you don’t need to use Hadoop then try the SSTableSimpleWriter and sstableloader , this post is a little old but still relevant http://www.datastax.com/dev/blog/bulk-loading Otherwise AFAIK BulkOutputFormat is what you want from hadoop http://www.datastax.com/docs/1.1/cluster_architecture/had

Re: CLUSTERING ORDER CQL3

2013-12-11 Thread Aaron Morton
You need to specify all the clustering key components in the CLUSTERING ORDER BY clause create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY (oid,cid,ts)) WITH CLUSTERING ORDER BY (cid ASC, ts DESC); cheers - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal C

Re: Cyclop - CQL3 web based editor

2013-12-11 Thread Aaron Morton
thanks, looks handy. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 12/12/2013, at 6:16 am, Parth Patil wrote: > Hi Maciej, > This looks great! Thanks for building this. > > > On W

Re:

2013-12-11 Thread Aaron Morton
> SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', > comparator_type='CompositeType', default_validation_class='UTF8Type', > key_validation_class='UTF8Type', column_validation_classes=validators) > CompositeType is a type composed of other types, see http://

Re: 2 nodes cassandra cluster raid10 or JBOD

2013-12-11 Thread Aaron Morton
If you have two nodes, and RF 2, you will only be able to use eventual consistency. If you want to have stronger consistency and some redundancy 3 nodes is the minimum requirement. In the current setup, with only 2 nodes, I would use RAID 10 as it requires less operator intervention and there

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

2013-12-11 Thread Robert Coli
On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang wrote: > When I use sstable2json on the sstable on the destination cluster, it has > "metadata": {"deletionInfo": > {"markedForDeleteAt":1796952039620607,"localDeletionTime":0}}, whereas > it doesn't have that in the source sstable. > (Yes, this i

user / password authentication advice

2013-12-11 Thread onlinespending
Hi, I’m using Cassandra in an environment where many users can login to use an application I’m developing. I’m curious if anyone has any advice or links to documentation / blogs where it discusses common implementations or best practices for user and password authentication. My cursory search o

Re: Exactly one wide row per node for a given CF?

2013-12-11 Thread Aaron Morton
> Querying the table was fast. What I didn’t do was test the table under load, > nor did I try this in a multi-node cluster. As the number of columns in a row increases so does the size of the column index which is read as part of the read path. For background and comparisons of latency see h

Re: setting PIG_INPUT_INITIAL_ADDRESS environment . variable in Oozie for cassandra ...¿?

2013-12-11 Thread Aaron Morton
> Caused by: java.io.IOException: PIG_INPUT_INITIAL_ADDRESS or > PIG_INITIAL_ADDRESS environment variable not set > at > org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(CassandraStorage.java:314) > at > org.apache.cassandra.hadoop.pig.CassandraStorage.getSchema(Cassandra

Re: Nodetool repair exceptions in Cassandra 2.0.2

2013-12-11 Thread Aaron Morton
> [2013-12-08 11:04:02,047] Repair session ff16c510-5ff7-11e3-97c0-5973cc397f8f > for range (1246984843639507027,1266616572749926276] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #ff16c510-5ff7-11e3-97c0-5973cc397f8f on keyspace_name/col_family1, > (12469848436

Re: Write performance with 1.2.12

2013-12-11 Thread srmore
Thanks Aaron On Wed, Dec 11, 2013 at 8:15 PM, Aaron Morton wrote: > Changed memtable_total_space_in_mb to 1024 still no luck. > > Reducing memtable_total_space_in_mb will increase the frequency of > flushing to disk, which will create more for compaction to do and result in > increased IO. > > Y

Re: Data Modelling Information

2013-12-11 Thread Aaron Morton
> create table messages( > body text, > username text, > tags set > PRIMARY keys(username,tags) > ) This statement is syntactically invalid, also you cannot use a collection type in the primary key. > 1) I should be able to query by username and get all the messages for

Re: OOMs during high (read?) load in Cassandra 1.2.11

2013-12-11 Thread Aaron Morton
Do you have the back trace for from the heap dump so we can see what the array was and what was using it ? Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 10/12/2013, at 4:41 am, Klaus

Re: Write performance with 1.2.12

2013-12-11 Thread Aaron Morton
> Changed memtable_total_space_in_mb to 1024 still no luck. Reducing memtable_total_space_in_mb will increase the frequency of flushing to disk, which will create more for compaction to do and result in increased IO. You should return it to the default. > when I send traffic to one node its per

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Sven Stark
Thanks Rob. Well, I ran the repair only against the empty keyspace. C* version is 1.2.8. I guess I'll try to recreate it some time next week on two or three test hosts and if the same behaviour occurs I'll file a bug report. Cheers, Sven On Thu, Dec 12, 2013 at 12:58 PM, Robert Coli wrote: >

Re: AddContractPoint /VIP

2013-12-11 Thread Aaron Morton
> What is the good practice to put in the code as addContactPoint ie.,how many > servers ? I use the same nodes as the seed list nodes for that DC. The idea of the seed list is that it’s a list of well known nodes, and it’s easier operationally to say we have one list of well known nodes that i

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Robert Coli
On Wed, Dec 11, 2013 at 1:35 AM, Sven Stark wrote: > thanks for replying. Could you please be a bit more specific, though. Eg > what exactly is being compacted - there is/was no data at all in the > cluster save for a few hundred kB in the system CF (see the nodetool status > output). Or - how can

Re: efficient way to store 8-bit or 16-bit value?

2013-12-11 Thread Andrey Ilinykh
Column metadata is about 20 bytes. So, there is no big difference if you save 1 or 4 bytes. Thank you, Andrey On Wed, Dec 11, 2013 at 2:42 PM, onlinespending wrote: > What do people recommend I do to store a small binary value in a column? > I’d rather not simply use a 32-bit int for a single

efficient way to store 8-bit or 16-bit value?

2013-12-11 Thread onlinespending
What do people recommend I do to store a small binary value in a column? I’d rather not simply use a 32-bit int for a single byte value. Can I have a one byte blob? Or should I store it as a single character ASCII string? I imagine each is going to have the overhead of storing the length (or nul

Bulkoutputformat

2013-12-11 Thread varun allampalli
Hi All, I want to bulk insert data into cassandra. I was wondering of using BulkOutputformat in hadoop. Is it the best way or using driver and doing batch insert is the better way. Are there any disandvantages of using bulkoutputformat. Thanks for helping Varun

CLUSTERING ORDER CQL3

2013-12-11 Thread Shrikar archak
Hi All, My Usecase I want query result by ordered by timestamp DESC. But I don't want timestamp to be the second column in the primary key as that will take of my querying capability for example create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY (oid,cid,ts)) WITH CLUSTERING ORDER BY (ts

Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
This works, When I remove the comparator_type validators = { 'tid': 'IntegerType', 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type

Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
I am using ccm cassandra version *1.2.11* On Wed, Dec 11, 2013 at 12:19 PM, Kumar Ranjan wrote: > validators = { > > 'approved': 'BooleanType', > > 'text': 'UTF8Type', > > 'favorite_count':'IntegerType', > > 'retweet_count': 'IntegerType', > > 'expanded_url':

Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
validators = { 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongType', 'screen_name': 'UTF8Type', 'profile_image': 'UTF8Type',

Re: Cyclop - CQL3 web based editor

2013-12-11 Thread Parth Patil
Hi Maciej, This looks great! Thanks for building this. On Wed, Dec 11, 2013 at 12:45 AM, Murali wrote: > Hi Maciej, > Thanks for sharing it. > > > > > On Wed, Dec 11, 2013 at 2:09 PM, Maciej Miklas wrote: > >> Hi all, >> >> This is the Cassandra mailing list, but I've developed something that i

[no subject]

2013-12-11 Thread Kumar Ranjan
Hey Folks, So I am creating, column family using pycassaShell. See below: validators = { 'approved': 'BooleanType', 'text': 'UTF8Type', 'favorite_count':'IntegerType', 'retweet_count': 'IntegerType', 'expanded_url': 'UTF8Type', 'tuid': 'LongTy

Re: How to create counter column family via Pycassa?

2013-12-11 Thread Tyler Hobbs
What options are available depends on what version of Cassandra you're using. You can specify the row key type with 'key_validation_class'. For column types, use 'column_validation_classes', which is a dict mapping column names to types. For example: sys.create_column_family('mykeyspace', 'user

Re: How to create counter column family via Pycassa?

2013-12-11 Thread Kumar Ranjan
What are the all possible values for cf_kwargs ?? SYSTEM_MANAGER.create_column_family('Narrative','Twitter_search_test', comparator_type=UTF8Type, ) - Here I want to specify, Column data types and row key type. How can I do that ? On Thu, Aug 15, 2013 at 12:30 PM, Tyler Hobbs wrote:

Re: Try to configure commitlog_archiving.properties

2013-12-11 Thread Bonnet Jonathan .
Thanks Artur, You're right i must comment restore directory too. Now i'll try to practice around restore. Regards, Bonnet Jonathan.

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Robert Wille
Very good point. I¹ve written code to do a very large number of inserts, but I¹ve only ever run it on a single-node cluster. I may very well find out when I run it against a multinode cluster that the performance benefits of large unlogged batches mostly go away. From: Sylvain Lebresne Reply-To:

Re: Try to configure commitlog_archiving.properties

2013-12-11 Thread Artur Kronenberg
So, looking at the code: public void maybeRestoreArchive() { if (Strings.isNullOrEmpty(restoreDirectories)) return; for (String dir : restoreDirectories.split(",")) { File[] files = new File(dir).listFiles(); if (files == null)

Data tombstoned during bulk loading 1.2.10 -> 2.0.3

2013-12-11 Thread Mathijs Vogelzang
Hi all, We're running into a weird problem trying to migrate our data from a 1.2.10 cluster to a 2.0.3 one. I've taken a snapshot on the old cluster, and for each host there, I'm running sstableloader -d KEYSPACE/COLUMNFAMILY (the sstableloader process from the 2.0.3 distribution, the one from 1

Re: Try to configure commitlog_archiving.properties

2013-12-11 Thread Bonnet Jonathan .
Artur Kronenberg openmarket.com> writes: > > > hi Bonnet, > that doesn't seem to be a problem with your archiving, rather with > the restoring. What is your restore command? > -- artur > On 11/12/13 13:47, Bonnet Jonathan. wrote: > > > Thanks to answear

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread David Tinker
I didn't do any warming up etc. I am new to Cassandra and was just poking around with some scripts to try to find the fastest way to do things. That said all the mini-tests ran under the same conditions. In our case the batches will have a variable number of different inserts/updates in them so do

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Sylvain Lebresne
On Wed, Dec 11, 2013 at 1:52 PM, Robert Wille wrote: > Network latency is the reason why the batched query is fastest. One trip > to Cassandra versus 1000. If you execute the inserts in parallel, then that > eliminates the latency issue. > While it is true a batch will means only one client-serv

Re: Try to configure commitlog_archiving.properties

2013-12-11 Thread Artur Kronenberg
hi Bonnet, that doesn't seem to be a problem with your archiving, rather with the restoring. What is your restore command? -- artur On 11/12/13 13:47, Bonnet Jonathan. wrote: Bonnet Jonathan externe.bnpparibas.com> writes: > >Thanks a lot, > >It Works, i see commit log bein archived.

Re: Try to configure commitlog_archiving.properties

2013-12-11 Thread Bonnet Jonathan .
Bonnet Jonathan externe.bnpparibas.com> writes: > > Thanks a lot, > >It Works, i see commit log bein archived. I'll try tomorrow the restore > command. Thanks again. > > Bonnet Jonathan. > > Hello, I have restart a node today, and i have an error which seems to be in relation with c

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Robert Wille
Network latency is the reason why the batched query is fastest. One trip to Cassandra versus 1000. If you execute the inserts in parallel, then that eliminates the latency issue. From: Sylvain Lebresne Reply-To: Date: Wednesday, December 11, 2013 at 5:40 AM To: "user@cassandra.apache.org" S

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Robert Wille
I use hand-rolled batches a lot. You can get a *lot* of performance improvement. Just make sure to sanitize your strings. I¹ve been wondering, what¹s the limit, practical or hard, on the length of a query? Robert On 12/11/13, 3:37 AM, "David Tinker" wrote: >Yes thats what I found. > >This is f

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Sylvain Lebresne
Then I suspect that this is artifact of your test methodology. Prepared statements *are* faster than non prepared ones in general. They save some parsing and some bytes on the wire. The savings will tend to be bigger for bigger queries, and it's possible that for very small queries (like the one yo

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread David Tinker
Yes thats what I found. This is faster: for (int i = 0; i < 1000; i++) session.execute("INSERT INTO test.wibble (id, info) VALUES ('${"" + i}', '${"aa" + i}')") Than this: def ps = session.prepare("INSERT INTO test.wibble (id, info) VALUES (?, ?)") for (int i = 0; i < 1000; i++) session.execute

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Sven Stark
Hi Rahul, thanks for replying. Could you please be a bit more specific, though. Eg what exactly is being compacted - there is/was no data at all in the cluster save for a few hundred kB in the system CF (see the nodetool status output). Or - how can those few hundred kB in data generate Gb of netw

Re: nodetool repair keeping an empty cluster busy

2013-12-11 Thread Rahul Menon
Sven So basically when you run a repair you are essentially telling your cluster to run a validation compaction, which generates a merkle tree on all the nodes. These trees are used to identify the inconsistencies. So there is quite a bit of streaming which you see as your network traffic. Rahul

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Sylvain Lebresne
> This loop takes 2500ms or so on my test cluster: > > PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble > (id, info) VALUES (?, ?)") > for (int i = 0; i < 1000; i++) session.execute(ps.bind("" + i, "aa" + i)); > > The same loop with the parameters inline is about 1300ms. It gets

Re: Cyclop - CQL3 web based editor

2013-12-11 Thread Murali
Hi Maciej, Thanks for sharing it. On Wed, Dec 11, 2013 at 2:09 PM, Maciej Miklas wrote: > Hi all, > > This is the Cassandra mailing list, but I've developed something that is > strictly related to Cassandra, and some of you might find it useful, so > I've decided to send email to this group.

Cyclop - CQL3 web based editor

2013-12-11 Thread Maciej Miklas
Hi all, This is the Cassandra mailing list, but I've developed something that is strictly related to Cassandra, and some of you might find it useful, so I've decided to send email to this group. This is web based CQL3 editor. The idea is, to deploy it once and have simple and comfortable CQL3 int

Re: 2 nodes cassandra cluster raid10 or JBOD

2013-12-11 Thread Veysel Taşçıoğlu
Hi, What about using JBOD and replication factor 2? Regards. On 11 Dec 2013 02:03, "cem" wrote: > Hi all, > > I need to setup 2 nodes Cassandra cluster. I know that Datastax > recommends using JBOD as a disk configuration and have replication for the > redundancy. I was planning to use RAID 10