Re: MeteredFlusher in system.log entries

2012-07-07 Thread rohit bhatia
@boris 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/MeteredFlusher.java#L51

On Sun, Jul 8, 2012 at 8:44 AM, Boris Yen  wrote:
> I am not sure, but I think there should be only 6 memtables (max) based on
> the example. 1 is active, 4 are in the queue, 1 is being flushed.
>
> Is this correct?
>
>
> On Wed, Jun 6, 2012 at 9:08 PM, rohit bhatia  wrote:
>>
>> Also, Could someone please explain how the factor of 7 comes in the
>> picture in this sentence
>>
>> "For example if memtable_total_space_in_mb is 100MB, and
>> memtable_flush_writers is the default 1 (with one data directory), and
>> memtable_flush_queue_size is the default 4, and a Column Family has no
>> secondary indexes. The CF will not be allowed to get above one seventh
>> of 100MB or 14MB, as if the CF filled the flush pipeline with 7
>> memtables of this size it would take 98MB. "
>>
>> On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia  wrote:
>> > Hi..
>> >
>> > the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
>> > mentions that "From version 0.7 onwards the worse case scenario is up
>> > to CF Count + Secondary Index Count + memtable_flush_queue_size
>> > (defaults to 4) + memtable_flush_writers (defaults to 1 per data
>> > directory) memtables in memory the JVM at once.".
>> >
>> > So it implies that for flushing, Cassandra copies the memtables content.
>> > So does this imply that writes to column families are not stopped even
>> > when it is being flushed?
>> >
>> > Thanks
>> > Rohit
>> >
>> > On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia 
>> > wrote:
>> >> Hi Aaron
>> >>
>> >> Thanks for the link, I have gone through it. But this doesn't justify
>> >> nodes of exactly same config/specs differing in their flushing
>> >> frequency.
>> >> The traffic on all node is same as we are using RandomPartitioner
>> >>
>> >> Thanks
>> >> Rohit
>> >>
>> >> On Wed, Jun 6, 2012 at 12:24 AM, aaron morton 
>> >> wrote:
>> >>> See the section on memtable_total_space_in_mb here
>> >>>  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
>> >>>
>> >>> Cheers
>> >>> -
>> >>> Aaron Morton
>> >>> Freelance Developer
>> >>> @aaronmorton
>> >>> http://www.thelastpickle.com
>> >>>
>> >>> On 6/06/2012, at 2:27 AM, rohit bhatia wrote:
>> >>>
>> >>> I am trying to understand the variance in flushes frequency in a 8
>> >>> node Cassandra cluster.
>> >>> All the flushes are of the same type and initiated by
>> >>> MeteredFlusher.java =>
>> >>>
>> >>> "INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
>> >>> (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
>> >>> ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)"
>> >>> [taken from system.log]
>> >>>
>> >>> Number of flushes for 1 column family vary from 6 flushes per day to
>> >>> 24 flushes per day among nodes of same configuration and same
>> >>> hardware.
>> >>> Could you please throw light on the what conditions does
>> >>> MeteredFlusher use to trigger memtable flushes.
>> >>> Also how accurate is the estimated size in the above logfile entry.
>> >>>
>> >>> Regards
>> >>> Rohit Bhatia
>> >>> Software Engineer, Media.net
>> >>>
>> >>>
>
>


Re: Not getting all data from a 2 node cluster

2012-07-07 Thread Boris Yen
My guess is your RF is 1. When the new node joins the cluster, only part
(depends on the token) of the data goes to this new node.

On Fri, Jun 8, 2012 at 2:49 PM, Prakrati Agrawal <
prakrati.agra...@mu-sigma.com> wrote:

>  Dear all
>
> ** **
>
> I am using Cassandra to retrieve a number of rows and columns stored in it.
> 
>
> Initially I had a 1 node cluster and I flooded it with data. I ran a
> Hector code to retrieve data from it I got the following output:
>
> Total number of rows in the database are 396
>
> Total number of columns in the database are 16316426
>
> Now I added one more node to it by doing the following steps:
>
> **1.   **I added both the nodes ip addresses in the seeds property in
> Cassandra.yaml file.
>
> **2.   **I also changed the rpc_address to 0.0.0.0 in both the nodes
> config file.
>
> **3.   **I changed the listen_address to their respective ip
> addresses.
>
> **4.   **I specified the initial token in the new node config file
>
> **5.   **I did not specify auto_bootstrap option anywhere because
> there is no such option available in Cassandra 1.1.0
>
> **6.   **Then I restarted the first node and the new node
>
> Now after adding the second node when I run the same Hector code, I am
> getting the following result:
>
> Total number of rows in the database are 183
>
> Total number of columns in the database are 7903753
>
> I am using the consistency level 1 and I did not specify any replication
> factor while creating the keyspace. I used the following link for the
> reference: 
>
> http://www.datastax.com/docs/0.7/getting_started/configuring
>
> Please tell me what step am I doing wrong to not get the entire data on a
> 2 node cluster ?
>
> Thanks and Regards
>
> Prakrati
>
> ** **
>
> ** **
>
> ** **
>
> --
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the person(s)
> or entities to which it is addressed. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient
> is prohibited and may be illegal. If you received this in error, please
> contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet
> accessibility, the Company cannot accept liability for any virus introduced
> by this e-mail or any attachment and you are advised to use up-to-date
> virus checking software.
>


Re: Java heap space on Cassandra start up version 1.0.10

2012-07-07 Thread Tyler Hobbs
The heap dump is only 47mb, so something strange is going on.  Is there
anything interesting in the heap dump?

On Fri, Jul 6, 2012 at 6:00 PM, Jason Hill  wrote:

> Hello friends,
>
> I'm getting a:
>
> ERROR 22:50:29,695 Fatal exception in thread
> Thread[SSTableBatchOpen:2,5,main]
> java.lang.OutOfMemoryError: Java heap space
>
> error when I start Cassandra. This node was running fine and after
> some server work/upgrades it started throwing this error when I start
> the Cassandra service. I was on 0.8.? and have upgraded to 1.0.10 to
> see if it would help, but I get the same error. I've removed some of
> the column families from my keyspace directory to see if I can get it
> to start without the heap space error and with some combinations it
> will run. However, I'd like to get it running with all my colFams and
> wonder if someone could give me some advice on what might be causing
> my error. It doesn't seem to be related to compaction, if I am reading
> the log correctly, and most of the help I've found on this topic deals
> with compaction. I'm thinking that my 2 column families should not be
> enough to fill my heap, but I am at a loss as to what I should try
> next?
>
> Thanks for your consideration.
>
> output.log:
>
>  INFO 22:50:26,319 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
> VM/1.6.0_26
>  INFO 22:50:26,322 Heap size: 5905580032/5905580032
>  INFO 22:50:26,322 Classpath:
>
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/apache-cassandra-1.0.10.jar:/usr/share/cassandra/apache-cassandra-thrift-1.0.10.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
>  INFO 22:50:28,586 JNA mlockall successful
>  INFO 22:50:28,593 Loading settings from file:/etc/cassandra/cassandra.yaml
> DEBUG 22:50:28,677 Syncing log with a period of 1
>  INFO 22:50:28,677 DiskAccessMode 'auto' determined to be mmap,
> indexAccessMode is mmap
>  INFO 22:50:28,686 Global memtable threshold is enabled at 1877MB
> DEBUG 22:50:28,761 setting auto_bootstrap to true
> 
> DEBUG 22:50:28,797 Checking directory /var/lib/cassandra/data
> DEBUG 22:50:28,798 Checking directory /var/lib/cassandra/commitlog
> DEBUG 22:50:28,798 Checking directory /var/lib/cassandra/saved_caches
> DEBUG 22:50:28,806 Removing compacted SSTable files from NodeIdInfo
> (see http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,808 Removing compacted SSTable files from Versions (see
> http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,818 Removing compacted SSTable files from
> Versions.76657273696f6e (see
> http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,819 Removing compacted SSTable files from IndexInfo
> (see http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,821 Removing compacted SSTable files from Schema (see
> http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,823 Removing compacted SSTable files from Migrations
> (see http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,825 Removing compacted SSTable files from LocationInfo
> (see http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,827 Removing compacted SSTable files from
> HintsColumnFamily (see
> http://wiki.apache.org/cassandra/MemtableSSTable)
> DEBUG 22:50:28,833 Initializing system.NodeIdInfo
> DEBUG 22:50:28,839 Starting CFS NodeIdInfo
> DEBUG 22:50:28,868 Creating IntervalNode from []
> DEBUG 22:50:28,869 KeyCache capacity for NodeIdInfo is 1
> DEBUG 22:50:28,871 Initializing system.Versions
> DEBUG 22:50:28,873 Starting CFS Versions
>  INFO 22:50:28,877 Opening
> /var/lib/cassandra/data/system/Versions-hd-5 (248 bytes)
> DEBUG 22:50:28,879 Load metadata for
> /var/lib/cassandra/data/system/Versions-hd-5
>  INFO 22:50:28,880 Opening
> /var/lib/cassandra/data/system/Versions-hd-6 (248 bytes)
> DEBUG 22:50:28,880 Load metadata for
> /var/lib/cassandra/data/system/Versions-hd-

Re: MeteredFlusher in system.log entries

2012-07-07 Thread Boris Yen
I am not sure, but I think there should be only 6 memtables (max) based on
the example. 1 is active, 4 are in the queue, 1 is being flushed.

Is this correct?

On Wed, Jun 6, 2012 at 9:08 PM, rohit bhatia  wrote:

> Also, Could someone please explain how the factor of 7 comes in the
> picture in this sentence
>
> "For example if memtable_total_space_in_mb is 100MB, and
> memtable_flush_writers is the default 1 (with one data directory), and
> memtable_flush_queue_size is the default 4, and a Column Family has no
> secondary indexes. The CF will not be allowed to get above one seventh
> of 100MB or 14MB, as if the CF filled the flush pipeline with 7
> memtables of this size it would take 98MB. "
>
> On Wed, Jun 6, 2012 at 6:22 PM, rohit bhatia  wrote:
> > Hi..
> >
> > the link http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
> > mentions that "From version 0.7 onwards the worse case scenario is up
> > to CF Count + Secondary Index Count + memtable_flush_queue_size
> > (defaults to 4) + memtable_flush_writers (defaults to 1 per data
> > directory) memtables in memory the JVM at once.".
> >
> > So it implies that for flushing, Cassandra copies the memtables content.
> > So does this imply that writes to column families are not stopped even
> > when it is being flushed?
> >
> > Thanks
> > Rohit
> >
> > On Wed, Jun 6, 2012 at 9:42 AM, rohit bhatia 
> wrote:
> >> Hi Aaron
> >>
> >> Thanks for the link, I have gone through it. But this doesn't justify
> >> nodes of exactly same config/specs differing in their flushing
> >> frequency.
> >> The traffic on all node is same as we are using RandomPartitioner
> >>
> >> Thanks
> >> Rohit
> >>
> >> On Wed, Jun 6, 2012 at 12:24 AM, aaron morton 
> wrote:
> >>> See the section on memtable_total_space_in_mb here
> >>>  http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/
> >>>
> >>> Cheers
> >>> -
> >>> Aaron Morton
> >>> Freelance Developer
> >>> @aaronmorton
> >>> http://www.thelastpickle.com
> >>>
> >>> On 6/06/2012, at 2:27 AM, rohit bhatia wrote:
> >>>
> >>> I am trying to understand the variance in flushes frequency in a 8
> >>> node Cassandra cluster.
> >>> All the flushes are of the same type and initiated by
> MeteredFlusher.java =>
> >>>
> >>> "INFO [OptionalTasks:1] 2012-06-05 06:32:05,873 MeteredFlusher.java
> >>> (line 62) flushing high-traffic column family CFS(Keyspace='Stats',
> >>> ColumnFamily='Minutewise_Channel_Stats') (estimated 501695882 bytes)"
> >>> [taken from system.log]
> >>>
> >>> Number of flushes for 1 column family vary from 6 flushes per day to
> >>> 24 flushes per day among nodes of same configuration and same
> >>> hardware.
> >>> Could you please throw light on the what conditions does
> >>> MeteredFlusher use to trigger memtable flushes.
> >>> Also how accurate is the estimated size in the above logfile entry.
> >>>
> >>> Regards
> >>> Rohit Bhatia
> >>> Software Engineer, Media.net
> >>>
> >>>
>


Re: how big can you slice

2012-07-07 Thread Tyler Hobbs
On Fri, Jul 6, 2012 at 8:37 PM, Deno Vichas  wrote:

> all,
>
> are there any guidelines to how much you can slice.  how does total
> payload size vs # of column affect performance?
>
> thanks,
> deno
>

The data size matters most.  I recommend keeping each slice under 10mb.

-- 
Tyler Hobbs
DataStax 


Re: sliced_buffer_size_in_kb

2012-07-07 Thread Tyler Hobbs
This option was removed in 1.1, so probably not :)

On Sat, Jul 7, 2012 at 8:50 PM, Deno Vichas  wrote:

> all,
>
> is it advisable to mess with sliced_buffer_size_in_kb.  i normal take
> slice of a couple hundred columns that are 50-100K each.
>
>
> thanks,
> deno
>



-- 
Tyler Hobbs
DataStax 


Re: Random errors using phpcassa

2012-07-07 Thread Tyler Hobbs
phpcassa doesn't actually support CQL at all yet, it just doesn't stop you
from grabbing the connection and trying to run a cql query anyways.  So, I
would expect quite a few things to randomly break.

Additionally, you need to set the CQL API version to 3.0.0 (I believe)
using set_cql_version() on the connection in order to use cql3-specific
behavior.

On Fri, Jul 6, 2012 at 5:37 PM, Marco Matarazzo  wrote:

> Greetings.
>
> I am experiencing problems using a cassandra DB with phpcassa, but I am
> unable to understand if the error is on phpcassa client itself or on
> cassandra… As far as I understand, phpcassa just "pass" it to the thrift
> layer, and errors I am seeing are coming back from cassandra itself, so I
> guess it's some sort of cassandra problem (or, more easily, I am doing
> something wrong but I don't know what). I hope I'm not too far from true.
>
>
> I am using CQL3 from php pages, and I wrote a class that basically wraps
> this behaviour:
>
> $this->pool=new
> \phpcassa\Connection\ConnectionPool($keyspace,$servers);
> […yadda yadda…]
> $client=$this->pool->get()->client;
>
> $result=$client->prepare_cql_query($query,\Cassandra\Compression::NONE);
> $itemid=$result->itemId;
> return $client->execute_prepared_cql_query($itemid,$args);
>
> This works ALMOST always. Sometimes, hovewere, I get "random" errors on
> queries that works. When I say "Query that works" it means that on error i
> dump on screen the value of $query and $args, and this is always a valid
> query that works, bot on the same page if I simply reload it, and on cqlsh
> -3, with the very same parameters.
>
>
> Sample errors, and relative queries and CF schemas are:
>
> ===
>
> Executing [SELECT qt FROM cargobays USING CONSISTENCY QUORUM WHERE
> corporation_id = ? and station_id = ? and item_id = ?] with
> (edd051f0-44aa-4bc3-ad08-3174abcd1a0d,1110129,10025)
> error: No indexed columns present in by-columns clause with "equals"
> operator
>
> cqlsh:goh_release> describe columnfamily cargobays
>
> CREATE TABLE cargobays (
>   corporation_id ascii,
>   station_id ascii,
>   item_id ascii,
>   qt ascii,
>   PRIMARY KEY (corporation_id, station_id, item_id)
> ) WITH COMPACT STORAGE AND
>   comment='' AND
>   caching='KEYS_ONLY' AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
>
> ===
>
> Executing [UPDATE agents_skill USING CONSISTENCY QUORUM SET value = ?
> WHERE agent_id = ? and skill = ?] with
> (3,b716738b-95e6-4e22-9924-5334ee7f2f5d,pilot)
> error: line 1:78 mismatched input 'and' expecting EOF
>
> cqlsh:goh_release> describe columnfamily agents_skill ;
>
> CREATE TABLE agents_skill (
>   agent_id ascii,
>   skill ascii,
>   value ascii,
>   PRIMARY KEY (agent_id, skill)
> ) WITH COMPACT STORAGE AND
>   comment='' AND
>   caching='KEYS_ONLY' AND
>   read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   min_compaction_threshold=4 AND
>   max_compaction_threshold=32 AND
>   replicate_on_write='true' AND
>   compaction_strategy_class='SizeTieredCompactionStrategy' AND
>   compression_parameters:sstable_compression='SnappyCompressor';
>
>
> We have a cluster of 3 nodes, and the keyspace is defined as follow:
>
> CREATE KEYSPACE v_release WITH strategy_class = 'SimpleStrategy'
>   AND strategy_options:replication_factor = '3';
>
>
> We're using (packaged) Cassandra 1.1.2 on an Ubuntu LTS 12.04.
>
>
> I really hope it's something that can be sorted out, because we're pretty
> lost here.
>
> Thank you.
>
> --
> Marco Matarazzo
>
>
>
>
>


-- 
Tyler Hobbs
DataStax 


Re: node vs node latency

2012-07-07 Thread Tyler Hobbs
Those latencies look like the difference between a couple of disk seeks and
reading something that's already in the os cache.

The dynamic snitch will favor nodes with lower latencies.  Once a node has
served enough reads, it might not have to hit disk very often, which
produces lower latencies.  So, if you have a hot dataset that fits into
memory, the dynamic snitch starts a positive feedback loop where most reads
will be served from one replica.

I'm guessing the node with the low latencies is serving most of your reads.
You can look at how quickly the total read count is increasing for each of
the replicas to confirm this.  It's not easy to do with only nodetool
cfstats, but something like OpsCenter would help.

On Fri, Jul 6, 2012 at 9:11 PM, Deno Vichas  wrote:

> all,
>
> what would explain a huge different (12ms vs 0.1ms) in read latency from
> node to node.  i've got a 4 node cluster w/ replication factor of 3 using
> hector.  i'm seeing these numbers with nodetool cfstats.
>
>
> thx,
> deno
>



-- 
Tyler Hobbs
DataStax 


cannot build 1.1.2 from source

2012-07-07 Thread Arya Goudarzi
Hi Fellows,

I used to be able to build cassandra 1.1 up to 1.1.1 with the same set
of procedures by running ant on the same machine, but now the stuff
associated with gen-cli-grammar breaks the build. Any advice will be
greatly appreciated.

-Arya

Source:
source tarball for 1.1.2 downloaded from one of the mirrors in
cassandra.apache.org
OS:
Ubuntu 10.04 Precise 64bit
Ant:
Apache Ant(TM) version 1.8.2 compiled on December 3 2011
Maven:
Apache Maven 3.0.3 (r1075438; 2011-02-28 17:31:09+)
Java:
java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)



Buildfile: /home/arya/workspace/cassandra-1.1.2/build.xml

maven-ant-tasks-localrepo:

maven-ant-tasks-download:

maven-ant-tasks-init:

maven-declare-dependencies:

maven-ant-tasks-retrieve-build:

init-dependencies:
 [echo] Loading dependency paths from file:
/home/arya/workspace/cassandra-1.1.2/build/build-dependencies.xml

init:
[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/classes/main
[mkdir] Created dir:
/home/arya/workspace/cassandra-1.1.2/build/classes/thrift
[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/test/lib
[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/test/classes
[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/src/gen-java

check-avro-generate:

avro-interface-generate-internode:
 [echo] Generating Avro internode code...

avro-generate:

build-subprojects:

check-gen-cli-grammar:

gen-cli-grammar:
 [echo] Building Grammar
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g
 
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:697:1:
Multiple token rules can match input such as "'-'":
IntegerNegativeLiteral, COMMENT
 [java]
 [java] As a result, token(s) COMMENT were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'I'": INCR, INDEX,
Identifier
 [java]
 [java] As a result, token(s) INDEX,Identifier were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'0'..'9'": IP_ADDRESS,
IntegerPositiveLiteral, DoubleLiteral, Identifier
 [java]
 [java] As a result, token(s)
IntegerPositiveLiteral,DoubleLiteral,Identifier were disabled for that
input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'T'": TRUNCATE, TTL,
Identifier
 [java]
 [java] As a result, token(s) TTL,Identifier were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'A'": T__109,
API_VERSION, AND, ASSUME, Identifier
 [java]
 [java] As a result, token(s) API_VERSION,AND,ASSUME,Identifier
were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'E'": EXIT, Identifier
 [java]
 [java] As a result, token(s) Identifier were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'L'": LIST, LIMIT,
Identifier
 [java]
 [java] As a result, token(s) LIMIT,Identifier were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'B'": BY, Identifier
 [java]
 [java] As a result, token(s) Identifier were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'O'": ON, Identifier
 [java]
 [java] As a result, token(s) Identifier were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
Multiple token rules can match input such as "'K'": KEYSPACE,
KEYSPACES, Identifier
 [java]
 [java] As a result, token(s) KEYSPACES,Identifier were disabled
for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:38:1:
Multiple token rules can match input such as "'<'": T__113, T__115
 [java]
 [java] As a result, token(s) T__115 were disabled for that input
 [java] warning(209):
/home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:693:1:
Multiple token rules can match input such as "' '": DoubleLiteral, WS
 [java]
 [java] 

sliced_buffer_size_in_kb

2012-07-07 Thread Deno Vichas

all,

is it advisable to mess with sliced_buffer_size_in_kb.  i normal take 
slice of a couple hundred columns that are 50-100K each.



thanks,
deno


Re: unable to rename commitlog, cassandra can't accept writes

2012-07-07 Thread Frank Hsueh
bug already reported:

https://issues.apache.org/jira/browse/CASSANDRA-4337



On Sat, Jul 7, 2012 at 6:26 PM, Frank Hsueh  wrote:

> Hi,
>
> I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest
> versions).  If it matters, I'm using a late version of Astyanax as my
> client.
>
> I'm using 4 threads to write a lot of data into a single CF.
>
> After several minutes of load (~ 30m at last incident), Cassandra stops
> accepting writes (client reports an OperationTimeoutException).  I looked
> at the logs and I see on the Cassandra server:
>
> 
> ERROR 18:00:42,807 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> java.io.IOError: java.io.IOException: Rename from
> \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
> 703272597990002 failed
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127)
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204)
> at
> org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166)
> at
> org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.IOException: Rename from
> \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
> 703272597990002 failed
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:105)
> ... 5 more
> 
>
> Anybody else seen this before ?
>
>
> --
> Frank Hsueh | frank.hs...@gmail.com
>



-- 
Frank Hsueh | frank.hs...@gmail.com


unable to rename commitlog, cassandra can't accept writes

2012-07-07 Thread Frank Hsueh
Hi,

I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest
versions).  If it matters, I'm using a late version of Astyanax as my
client.

I'm using 4 threads to write a lot of data into a single CF.

After several minutes of load (~ 30m at last incident), Cassandra stops
accepting writes (client reports an OperationTimeoutException).  I looked
at the logs and I see on the Cassandra server:


ERROR 18:00:42,807 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.io.IOError: java.io.IOException: Rename from
\var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
703272597990002 failed
at
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:127)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Rename from
\var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
703272597990002 failed
at
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:105)
... 5 more


Anybody else seen this before ?


-- 
Frank Hsueh | frank.hs...@gmail.com


Re: Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread Edward Capriolo
On Sat, Jul 7, 2012 at 11:17 AM, prasenjit mukherjee
 wrote:
> Have 2 questions :
>
> 1. In RP on a given node, are the rows ordered by hash(key) or key ?
> If the rows on a node are ordered by hash(key) then essentially it has
> to be implemented by a full-scan on that node.
>
> 2. In RP, How does a cassandra node route a client's range-query
> request ? The range is distributed across the ring, so essentially
> either it send has to send the request to all nodes in the ring or
> just do a local processing.
>
> On Sat, Jul 7, 2012 at 7:47 PM, Edward Capriolo  wrote:
>> On Sat, Jul 7, 2012 at 9:26 AM, prasenjit mukherjee
>>  wrote:
>>> Wondering how a rangequery request is handled if RP is used.  Will the
>>> receiving node do a fan-out to all the nodes in the ring or it will
>>> just execute the rangequery on its own local partition ?
>>>
>>> -Prasenjit
>>
>> With RP the data is still ordered. It is ordered pseudo randomly. Like
>> all ranging scanning you can start with the null start row key for
>> your first range scan. Then for the next range scan use the last row
>> key from your results from the first scan.
1)
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/dht/RandomPartitioner.java?view=markup

http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/dht/AbstractPartitioner.java?revision=1208993&view=markup

2) A single range slice is not handled by all nodes in the cluster.
The request is routed to one or more of the natural endpoints for the
range. An exception would be a range slice that crosses a token
boundary of a node.

Random Partitioner is not actually random the data is ordered by the
hash of the key. Thus data is in predictable location and repeated
range scans return the same order. However because md5 generates
drastically different hashes for similar keys like data will not clump
together.

To put it another way, if you have a 10 node cluster with RP and you
with to range scan the entire dataset, 0 - >2^128 (or whatever that
big number is) you will notice that the range scans first make three
of the nodes busy, then a forth node starts taking requests as the
first nodes starts getting less requests, finally the first node gets
no more requests and so on.

Another option is that row keys can now be composite and cassandra
will use the first part of the composite to locate the node and the
second part of the composite to order the data. Sweet!


Re: Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread prasenjit mukherjee
Have 2 questions :

1. In RP on a given node, are the rows ordered by hash(key) or key ?
If the rows on a node are ordered by hash(key) then essentially it has
to be implemented by a full-scan on that node.

2. In RP, How does a cassandra node route a client's range-query
request ? The range is distributed across the ring, so essentially
either it send has to send the request to all nodes in the ring or
just do a local processing.

On Sat, Jul 7, 2012 at 7:47 PM, Edward Capriolo  wrote:
> On Sat, Jul 7, 2012 at 9:26 AM, prasenjit mukherjee
>  wrote:
>> Wondering how a rangequery request is handled if RP is used.  Will the
>> receiving node do a fan-out to all the nodes in the ring or it will
>> just execute the rangequery on its own local partition ?
>>
>> -Prasenjit
>
> With RP the data is still ordered. It is ordered pseudo randomly. Like
> all ranging scanning you can start with the null start row key for
> your first range scan. Then for the next range scan use the last row
> key from your results from the first scan.


Re: Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread Edward Capriolo
On Sat, Jul 7, 2012 at 9:26 AM, prasenjit mukherjee
 wrote:
> Wondering how a rangequery request is handled if RP is used.  Will the
> receiving node do a fan-out to all the nodes in the ring or it will
> just execute the rangequery on its own local partition ?
>
> -Prasenjit

With RP the data is still ordered. It is ordered pseudo randomly. Like
all ranging scanning you can start with the null start row key for
your first range scan. Then for the next range scan use the last row
key from your results from the first scan.


Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread prasenjit mukherjee
Wondering how a rangequery request is handled if RP is used.  Will the
receiving node do a fan-out to all the nodes in the ring or it will
just execute the rangequery on its own local partition ?

-Prasenjit


Effect of rangequeries with RandomPartitioner

2012-07-07 Thread prasenjit mukherjee
Wondering how a rangequery request is handled if RP is used.  Will the
receiving node do a fan-out to all the nodes in the ring or it will
just execute the rangequery on its own local partition ?

-- 
Sent from my mobile device