Re: Config changes to leverage new hardware

2013-11-06 Thread Aaron Morton
> Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon.
You will make more use of the extra memory moving to 1.2 as it moves bloom 
filters and compression data off heap. 

Also grab the TLAB setting from cassandra-env.sh in v1.2

> As of now, our performance tests (our application specific as well as 
> cassandra-stress) are not showing any significant difference in the 
> hardwares, which is a little disheartening, since the new hardware has a lot 
> more RAM and CPU.
For reads or writes or both ? 

Writes tend to scale with cores as long as the commit log can keep up. 
Reads improve with disk IO and page cache size when the hot set is in memory. 

> Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 disk used 
> for commitlog and 3 disks RAID 0 for data)
> New Hardware: 32 cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB 
> disks ( 1 disk used for commitlog and 7 disks RAID 0 for data)
Is the disk IO on the commit log volume keeping up ?
You cranked up the concurrent writers and the commit log may not keep up. You 
could put the commit log on the same RAID volume to see if that improves 
writes. 

> The config we tried modifying so far was concurrent_reads to (16 * number of 
> drives) and concurrent_writes to (8 * number of cores) as per 
256 write threads is a lot. Make sure the commit log can keep up, I would put 
it back to 32, maybe try 64. Not sure the concurrent list for the commit log 
will work well with that many threads. 

May want to put the reads down as well. 

It’s easier to tune the system if you can provide some info on the workload. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/11/2013, at 12:35 pm, Arindam Barua  wrote:

>  
> We want to upgrade our Cassandra cluster to have newer hardware, and were 
> wondering if anyone has suggestions on Cassandra or linux config changes that 
> will prove to be beneficial.
> As of now, our performance tests (our application specific as well as 
> cassandra-stress) are not showing any significant difference in the 
> hardwares, which is a little disheartening, since the new hardware has a lot 
> more RAM and CPU.
>  
> Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 disk used 
> for commitlog and 3 disks RAID 0 for data)
> New Hardware: 32 cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB 
> disks ( 1 disk used for commitlog and 7 disks RAID 0 for data)
>  
> Most of the cassandra config currently is the default, and we are using 
> LeveledCompaction strategy. Default key cache, row cache turned off.
> The config we tried modifying so far was concurrent_reads to (16 * number of 
> drives) and concurrent_writes to (8 * number of cores) as per recommendation 
> in cassandra.yaml, but that didn’t make much difference.
> We were hoping that at least the extra RAM in the new hardware will be used 
> for Linux file caching and hence an improvement in performance will be 
> observed.
>  
> Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon.
>  
> Thanks,
> Arindam



Re: CQL 'IN' predicate

2013-11-06 Thread Aaron Morton
> If one big query doesn't cause problems

Every row you read becomes a (roughly) RF number of tasks in the cluster. If 
you ask for 100 rows in one query it will generate 300 tasks that are processed 
by the read thread pool which as a default of 32 threads. If you ask for a lot 
of rows and the number of nodes in low there is a chance the client starve 
others as they wait for all the tasks to be completed. So i tend to like asking 
for fewer rows. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/11/2013, at 12:19 pm, Dan Gould  wrote:

> Thanks Nate,
> 
> I assume 10k is the return limit.  I don't think I'll ever get close to 10k 
> matches to the IN query.  That said, you're right: to be safe I'll increase 
> the limit to match the number of items on the IN.
> 
> I didn't know CQL supported stored procedures, but I'll take a look.  I 
> suppose my question was asking about parsing overhead, however.  If one big 
> query doesn't cause problems--which I assume it wouldn't since there can be 
> multiple threads parsing and I assume C* is smart about memory when 
> accumulating results--I'd much rather do that.
> 
> Dan
> 
> On 11/6/13 3:05 PM, Nate McCall wrote:
>> Unless you explicitly set a page size (i'm pretty sure the query is 
>> converted to a paging query automatically under the hood) you will get 
>> capped at the default of 10k which might get a little weird semantically. 
>> That said, you should experiment with explicit page sizes and see where it 
>> gets you (i've not tried this yet with an IN clause - would be real curious 
>> to hear how it worked). 
>> 
>> Another thing to consider is that it's a pretty big statement to parse every 
>> time. You might want to go the (much) smaller batch route so these can be 
>> stored procedures? (another thing I havent tried with IN clause - don't see 
>> why it would not work though).
>> 
>> 
>> 
>> 
>> On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould  wrote:
>> I was wondering if anyone had a sense of performance/best practices
>> around the 'IN' predicate.
>> 
>> I have a list of up to potentially ~30k keys that I want to look up in a
>> table (typically queries will have <500, but I worry about the long tail).  
>> Most
>> of them will not exist in the table, but, say, about 10-20% will.
>> 
>> Would it be best to do:
>> 
>> 1) SELECT fields FROM table WHERE id in (uuid1, uuid2, .. uuid3);
>> 
>> 2) Split into smaller batches--
>> for group_of_100 in all_3:
>>// ** Issue in parallel or block after each one??
>>SELECT fields FROM table WHERE id in (group_of_100 uuids);
>> 
>> 3) Something else?
>> 
>> My guess is that (1) is fine and that the only worry is too much data 
>> returned (which won't be a problem in this case), but I wanted to check that 
>> it's not a C* anti-pattern before.
>> 
>> [Conversely, is a batch insert with up to 30k items ok?]
>> 
>> Thanks,
>> Dan
>> 
>> 
>> 
>> 
>> -- 
>> -
>> Nate McCall
>> Austin, TX
>> @zznate
>> 
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
> 



Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

2013-11-06 Thread Aaron Morton
> Class Name
>   | Shallow Heap | Retained Heap
> ---
>   
>   |  |  
> java.nio.HeapByteBuffer @ 0x7806a0848 
>   |   48 |80
> '- name org.apache.cassandra.db.Column @ 0x7806424e8  
>   |   32 |   112
>|- [338530] java.lang.Object[540217] @ 0x57d62f560 Unreachable 
>   |2,160,888 | 2,160,888
>|- [338530] java.lang.Object[810325] @ 0x591546540 
>   |3,241,320 | 7,820,328
>|  '- elementData java.util.ArrayList @ 0x75e8424c0
>   |   24 | 7,820,352
>| |- list 
> org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18 
>  |   48 |   128
>| |  '- val$filteredIter 
> org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | 
>   32 | 7,820,568
>| | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 
> 0x5940e0b68 Unreachable   |   24 | 7,820,592
>| |- this$0, parent java.util.ArrayList$SubList @ 0x5940e0bb8  
>   |   40 |40
>| |  '- this$1 java.util.ArrayList$SubList$1 @ 0x5940e0be0 
>   |   40 |80
>| | '- currentSlice 
> org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 
> 0x5940e0b18|   48 |   128
>| |'- val$filteredIter 
> org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48   |   
> 32 | 7,820,568
>| |   '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 
> @ 0x5940e0b68 Unreachable |   24 | 7,820,592
>| |- columns org.apache.cassandra.db.ArrayBackedSortedColumns @ 
> 0x5b0a33488  |   32 |56
>| |  '- val$cf org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 
> 0x5940e0b48   |   32 | 7,820,568
>| | '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 
> 0x5940e0b68 Unreachable   |   24 | 7,820,592
>| '- Total: 3 entries  
>   |  |  
>|- [338530] java.lang.Object[360145] @ 0x7736ce2f0 Unreachable 
>   |1,440,600 | 1,440,600
>'- Total: 3 entries
>   |  |  

Are you doing large slices or do could you have a lot of tombstones on the rows 
? 

> We have disabled row cache on one node to see  the  difference. Please
> see attached plots from visual VM, I think that the effect is quite
> visible.
The default row cache is of the JVM heap, have you changed to the 
ConcurrentLinkedHashCacheProvider ? 

One way the SerializingCacheProvider could impact GC is if the CF takes a lot 
of writes. The SerializingCacheProvider invalidates the row when it is written 
to and had to read the entire row and serialise it on a cache miss.

>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G
>> -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError
You probably want the heap to be 4G to 8G in size, 10G will encounter longer 
pauses. 
Also the size of the new heap may be too big depending on the number of cores. 
I would recommend trying 800M


> prg01.visual.vm.png
Shows the heap growing very quickly. This could be due to wide reads or a high 
write throughput. 

Hope that helps. 

 

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 7/11/2013, at 6:29 am, Chris Burroughs  wrote:

> Both caches involve several objects per entry (What do we want?  Packed 
> objects.  When do we want them? Now!).  The "size" is an estimate of the off 
> heap values only and not the total size nor number of entries.
> 
> An acceptable size will depend on your data and access patterns.  In one case 
> we had a cluster that at 512mb would go into a GC death spiral despite plenty 
> of free heap (presumably just due to the number of objects) while empirically 
> the cluster runs smoothly at 384mb.
> 
> Your caches appear on the larger size, I suggest trying smaller values and 
> o

Config changes to leverage new hardware

2013-11-06 Thread Arindam Barua

We want to upgrade our Cassandra cluster to have newer hardware, and were 
wondering if anyone has suggestions on Cassandra or linux config changes that 
will prove to be beneficial.
As of now, our performance tests (our application specific as well as 
cassandra-stress) are not showing any significant difference in the hardwares, 
which is a little disheartening, since the new hardware has a lot more RAM and 
CPU.

Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 disk used 
for commitlog and 3 disks RAID 0 for data)
New Hardware: 32 cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB 
disks ( 1 disk used for commitlog and 7 disks RAID 0 for data)

Most of the cassandra config currently is the default, and we are using 
LeveledCompaction strategy. Default key cache, row cache turned off.
The config we tried modifying so far was concurrent_reads to (16 * number of 
drives) and concurrent_writes to (8 * number of cores) as per recommendation in 
cassandra.yaml, but that didn't make much difference.
We were hoping that at least the extra RAM in the new hardware will be used for 
Linux file caching and hence an improvement in performance will be observed.

Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon.

Thanks,
Arindam


Re: CQL 'IN' predicate

2013-11-06 Thread Dan Gould

Thanks Nate,

I assume 10k is the return limit.  I don't think I'll ever get close to 
10k matches to the IN query.  That said, you're right: to be safe I'll 
increase the limit to match the number of items on the IN.


I didn't know CQL supported stored procedures, but I'll take a look.  I 
suppose my question was asking about parsing overhead, however.  If one 
big query doesn't cause problems--which I assume it wouldn't since there 
can be multiple threads parsing and I assume C* is smart about memory 
when accumulating results--I'd much rather do that.


Dan

On 11/6/13 3:05 PM, Nate McCall wrote:
Unless you explicitly set a page size (i'm pretty sure the query is 
converted to a paging query automatically under the hood) you will get 
capped at the default of 10k which might get a little weird 
semantically. That said, you should experiment with explicit page 
sizes and see where it gets you (i've not tried this yet with an IN 
clause - would be real curious to hear how it worked).


Another thing to consider is that it's a pretty big statement to parse 
every time. You might want to go the (much) smaller batch route so 
these can be stored procedures? (another thing I havent tried with IN 
clause - don't see why it would not work though).





On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould > wrote:


I was wondering if anyone had a sense of performance/best practices
around the 'IN' predicate.

I have a list of up to potentially ~30k keys that I want to look
up in a
table (typically queries will have <500, but I worry about the
long tail).  Most
of them will not exist in the table, but, say, about 10-20% will.

Would it be best to do:

1) SELECT fields FROM table WHERE id in (uuid1, uuid2, ..
uuid3);

2) Split into smaller batches--
for group_of_100 in all_3:
   // ** Issue in parallel or block after each one??
   SELECT fields FROM table WHERE id in (group_of_100 uuids);

3) Something else?

My guess is that (1) is fine and that the only worry is too much
data returned (which won't be a problem in this case), but I
wanted to check that it's not a C* anti-pattern before.

[Conversely, is a batch insert with up to 30k items ok?]

Thanks,
Dan




--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com




Re: CQL 'IN' predicate

2013-11-06 Thread Nate McCall
Unless you explicitly set a page size (i'm pretty sure the query is
converted to a paging query automatically under the hood) you will get
capped at the default of 10k which might get a little weird semantically.
That said, you should experiment with explicit page sizes and see where it
gets you (i've not tried this yet with an IN clause - would be real curious
to hear how it worked).

Another thing to consider is that it's a pretty big statement to parse
every time. You might want to go the (much) smaller batch route so these
can be stored procedures? (another thing I havent tried with IN clause -
don't see why it would not work though).




On Wed, Nov 6, 2013 at 4:08 PM, Dan Gould  wrote:

> I was wondering if anyone had a sense of performance/best practices
> around the 'IN' predicate.
>
> I have a list of up to potentially ~30k keys that I want to look up in a
> table (typically queries will have <500, but I worry about the long tail).
>  Most
> of them will not exist in the table, but, say, about 10-20% will.
>
> Would it be best to do:
>
> 1) SELECT fields FROM table WHERE id in (uuid1, uuid2, .. uuid3);
>
> 2) Split into smaller batches--
> for group_of_100 in all_3:
>// ** Issue in parallel or block after each one??
>SELECT fields FROM table WHERE id in (group_of_100 uuids);
>
> 3) Something else?
>
> My guess is that (1) is fine and that the only worry is too much data
> returned (which won't be a problem in this case), but I wanted to check
> that it's not a C* anti-pattern before.
>
> [Conversely, is a batch insert with up to 30k items ok?]
>
> Thanks,
> Dan
>
>


-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


CQL 'IN' predicate

2013-11-06 Thread Dan Gould

I was wondering if anyone had a sense of performance/best practices
around the 'IN' predicate.

I have a list of up to potentially ~30k keys that I want to look up in a
table (typically queries will have <500, but I worry about the long tail).  Most
of them will not exist in the table, but, say, about 10-20% will.

Would it be best to do:

1) SELECT fields FROM table WHERE id in (uuid1, uuid2, .. uuid3);

2) Split into smaller batches--
for group_of_100 in all_3:
   // ** Issue in parallel or block after each one??
   SELECT fields FROM table WHERE id in (group_of_100 uuids);

3) Something else?

My guess is that (1) is fine and that the only worry is too much data returned 
(which won't be a problem in this case), but I wanted to check that it's not a 
C* anti-pattern before.

[Conversely, is a batch insert with up to 30k items ok?]

Thanks,
Dan



Re: CQL Datatype in Cassandra

2013-11-06 Thread Jabbar Azam
Forget. The text value can be upto 2GB in size, but in practice it will be
less.

Thanks

Jabbar Azam


On 6 November 2013 21:12, Jabbar Azam  wrote:

> Hello Techy Teck,
>
> Couldn't find any evidence on the datastax website but found this
> http://wiki.apache.org/cassandra/CassandraLimitations
>
> which I believe is correct.
>
>
> Thanks
>
> Jabbar Azam
>
>
> On 6 November 2013 20:19, Techy Teck  wrote:
>
>> We are using CQL table like this -
>>
>> CREATE TABLE testing (
>>   description text,
>>   last_modified_date timeuuid,
>>   employee_id text,
>>   value text,
>>   PRIMARY KEY (employee_name, last_modified_date)
>> )
>>
>>
>> We have made description as text in the above table. I am thinking is
>> there any limitations on text data type in CQL such as it can only have
>> certain number of bytes and after that it will truncate?
>>
>> Any other limitations that I should be knowing? Should I use blob there?
>>
>>
>


Re: CQL Datatype in Cassandra

2013-11-06 Thread Jabbar Azam
Hello Techy Teck,

Couldn't find any evidence on the datastax website but found this
http://wiki.apache.org/cassandra/CassandraLimitations

which I believe is correct.


Thanks

Jabbar Azam


On 6 November 2013 20:19, Techy Teck  wrote:

> We are using CQL table like this -
>
> CREATE TABLE testing (
>   description text,
>   last_modified_date timeuuid,
>   employee_id text,
>   value text,
>   PRIMARY KEY (employee_name, last_modified_date)
> )
>
>
> We have made description as text in the above table. I am thinking is
> there any limitations on text data type in CQL such as it can only have
> certain number of bytes and after that it will truncate?
>
> Any other limitations that I should be knowing? Should I use blob there?
>
>


CQL Datatype in Cassandra

2013-11-06 Thread Techy Teck
We are using CQL table like this -

CREATE TABLE testing (
  description text,
  last_modified_date timeuuid,
  employee_id text,
  value text,
  PRIMARY KEY (employee_name, last_modified_date)
)


We have made description as text in the above table. I am thinking is there
any limitations on text data type in CQL such as it can only have certain
number of bytes and after that it will truncate?

Any other limitations that I should be knowing? Should I use blob there?


Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

2013-11-06 Thread Elias Ross
On Wed, Nov 6, 2013 at 9:10 AM, Keith Freeman <8fo...@gmail.com> wrote:

> Is it possible that the keyspace was dropped then re-created (
> https://issues.apache.org/jira/browse/CASSANDRA-4857)? I've seen similar
> stack traces in that case.
>
>
Thanks for the pointer.

There's a program (RHQ) that's managing my server and may have done the
create-drop-create sequence by mistake.

I also wonder if adding additional data directories after re-starting the
server may cause issues. What I mean is adding more dirs to
'data_file_directories' in cassandra.yaml, then restarting.


Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

2013-11-06 Thread Chris Burroughs
Both caches involve several objects per entry (What do we want?  Packed 
objects.  When do we want them? Now!).  The "size" is an estimate of the 
off heap values only and not the total size nor number of entries.


An acceptable size will depend on your data and access patterns.  In one 
case we had a cluster that at 512mb would go into a GC death spiral 
despite plenty of free heap (presumably just due to the number of 
objects) while empirically the cluster runs smoothly at 384mb.


Your caches appear on the larger size, I suggest trying smaller values 
and only increase when it produces measurable sustained gains.


On 11/05/2013 04:04 AM, Jiri Horky wrote:

Hi there,

we are seeing extensive memory allocation leading to quite long and
frequent GC pauses when using row cache. This is on cassandra 2.0.0
cluster with JNA 4.0 library with following settings:

key_cache_size_in_mb: 300
key_cache_save_period: 14400
row_cache_size_in_mb: 1024
row_cache_save_period: 14400
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
commitlog_segment_size_in_mb: 32

-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G
-Xmn1024M -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof
-Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark

We have disabled row cache on one node to see  the  difference. Please
see attached plots from visual VM, I think that the effect is quite
visible. I have also taken 10x "jmap -histo" after 5s on a affected
server and plotted the result, attached as well.

I have taken a dump of the application when the heap size was 10GB, most
of the memory was unreachable, which was expected. The majority was used
by 55-59M objects of HeapByteBuffer, byte[] and
org.apache.cassandra.db.Column classes. I also include a list of inbound
references to the HeapByteBuffer objects from which it should be visible
where they are being allocated. This was acquired using Eclipse MAT.

Here is the comparison of GC times when row cache enabled and disabled:

prg01 - row cache enabled
   - uptime 20h45m
   - ConcurrentMarkSweep - 11494686ms
   - ParNew - 14690885 ms
   - time spent in GC: 35%
prg02 - row cache disabled
   - uptime 23h45m
   - ConcurrentMarkSweep - 251ms
   - ParNew - 230791 ms
   - time spent in GC: 0.27%

I would be grateful for any hints. Please let me know if you need any
further information. For now, we are going to disable the row cache.

Regards
Jiri Horky





Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

2013-11-06 Thread Keith Freeman
Is it possible that the keyspace was dropped then re-created ( 
https://issues.apache.org/jira/browse/CASSANDRA-4857)? I've seen similar 
stack traces in that case.


On 11/05/2013 10:47 PM, Elias Ross wrote:


I'm seeing the following:

Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-1-Data.db (No 
such file or directory)
at 
org.apache.cassandra.io.util.ThrottledReader.open(ThrottledReader.java:53)
at 
org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1212)
at 
org.apache.cassandra.io.sstable.SSTableScanner.(SSTableScanner.java:54)
at 
org.apache.cassandra.io.sstable.SSTableReader.getDirectScanner(SSTableReader.java:1032)
at 
org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:594)
at 
org.apache.cassandra.db.compaction.CompactionManager.access$500(CompactionManager.java:73)
at 
org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:327)
at 
org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:253)


This is on an install with multiple data directories. The actual 
directory contains files named something else:


[rhq@st11p01ad-rhq006 ~]$ ls -l 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-*
-rw-r--r-- 1 rhq rhq 849924573 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Data.db
-rw-r--r-- 1 rhq rhq75 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Digest.sha1
-rw-r--r-- 1 rhq rhq151696 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Filter.db
-rw-r--r-- 1 rhq rhq   2186766 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Index.db
-rw-r--r-- 1 rhq rhq  5957 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Statistics.db
-rw-r--r-- 1 rhq rhq 15276 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-Summary.db
-rw-r--r-- 1 rhq rhq72 Nov  1 14:24 
/data05/rhq/data/rhq/six_hour_metrics/rhq-six_hour_metrics-ic-6-TOC.txt



It seems like it's missing the files it needs to hit? Is there 
something I can do here?




Re: Running cqlsh throws "No appropriate python interpreter found."

2013-11-06 Thread Erwin Karbasi
Hi Romin,

I've managed.
I installed Zlib and then compiled with "--with-zlib=/usr/lib".
I didn't do this python -c "import zlib".

Thanks a lot for fast turnaround response,
Erwin Karbasi
AT&T, Senior Software Architect


On Wed, Nov 6, 2013 at 4:21 PM, Romain HARDOUIN
wrote:

> Since you're on RHEL 5, you have compiled Python (no package available,
> right?).
> Have you configured Python to be built with zlib support:
> "--with-zlib=/usr/lib"?
> If not, compiled it with zlib and then run:
> python -c "import zlib"
> No error should appear.
>
> Romain
>
> erwin.karb...@gmail.com a écrit sur 06/11/2013 11:42:35 :
>
> > De : Erwin Karbasi 
> > A : "user@cassandra.apache.org" ,
> > Date : 06/11/2013 11:43
> > Objet : Re: Running cqlsh throws "No appropriate python interpreter
> found."
> > Envoyé par : erwin.karb...@gmail.com
> >
> > Now I'm experiencing this:
> >
> > [iwuser@erwin-lab2 bin]$ ./cqlsh
> >
> > Python CQL driver not installed, or not on PYTHONPATH.
> > You might try "easy_install cql".
> >
> > Python: /usr/local/bin/python2.6
> > Module load path: ['./../lib/thrift-python-internal-only-0.9.1.zip',
> > './../lib/cql-internal-only-1.4.1.zip/cql-1.4.1', '/home/iwuser/dsc-
> > cassandra-2.0.2/bin', '/usr/local/bin/python2.6', '/usr/local/lib/
> > python26.zip', '/usr/local/lib/python2.6', '/usr/local/lib/python2.
> > 6/plat-linux2', '/usr/local/lib/python2.6/lib-tk', '/usr/local/lib/
> > python2.6/lib-old', '/usr/local/lib/python2.6/lib-dynload', '/usr/
> > local/lib/python2.6/site-packages']
> >
> > Error: can't decompress data; zlib not available
>
> > Any idea?
> >
> > Thanks,
> > Erwin Karbasi
> > AT&T, Senior Software Architect
> >
>
> > On Wed, Nov 6, 2013 at 9:27 AM, Erwin Karbasi 
> wrote:
> > Hello All,
>
> > I have installed Cassandra Datastax community edition 2.2 and python
> > 2.6 but when I run ./cqlsh I faced with the following error:
> > "No appropriate python interpreter found."
>
> > I know that the error indicates that I have incompatible Python
> > version but I've already installed Python 2.6 and as it seems cqlsh
> > still uses my old version and not the new 2.6 version.
>
> > Any suggestion that shed some light would highly appreciated?
>
> > My OS is Linux RHEL 5.
> >
> > Thanks in advance,
> > Erwin Karbasi
> > AT&T, Senior Software Architect
>


Re: Running cqlsh throws "No appropriate python interpreter found."

2013-11-06 Thread Romain HARDOUIN
Since you're on RHEL 5, you have compiled Python (no package available, 
right?).
Have you configured Python to be built with zlib support: 
"--with-zlib=/usr/lib"?
If not, compiled it with zlib and then run:
python -c "import zlib" 
No error should appear.

Romain

erwin.karb...@gmail.com a écrit sur 06/11/2013 11:42:35 :

> De : Erwin Karbasi 
> A : "user@cassandra.apache.org" , 
> Date : 06/11/2013 11:43
> Objet : Re: Running cqlsh throws "No appropriate python interpreter 
found."
> Envoyé par : erwin.karb...@gmail.com
> 
> Now I'm experiencing this:
> 
> [iwuser@erwin-lab2 bin]$ ./cqlsh
> 
> Python CQL driver not installed, or not on PYTHONPATH.
> You might try "easy_install cql".
> 
> Python: /usr/local/bin/python2.6
> Module load path: ['./../lib/thrift-python-internal-only-0.9.1.zip',
> './../lib/cql-internal-only-1.4.1.zip/cql-1.4.1', '/home/iwuser/dsc-
> cassandra-2.0.2/bin', '/usr/local/bin/python2.6', '/usr/local/lib/
> python26.zip', '/usr/local/lib/python2.6', '/usr/local/lib/python2.
> 6/plat-linux2', '/usr/local/lib/python2.6/lib-tk', '/usr/local/lib/
> python2.6/lib-old', '/usr/local/lib/python2.6/lib-dynload', '/usr/
> local/lib/python2.6/site-packages']
> 
> Error: can't decompress data; zlib not available

> Any idea?
> 
> Thanks,
> Erwin Karbasi
> AT&T, Senior Software Architect
> 

> On Wed, Nov 6, 2013 at 9:27 AM, Erwin Karbasi  
wrote:
> Hello All,

> I have installed Cassandra Datastax community edition 2.2 and python
> 2.6 but when I run ./cqlsh I faced with the following error:
> "No appropriate python interpreter found."

> I know that the error indicates that I have incompatible Python 
> version but I've already installed Python 2.6 and as it seems cqlsh 
> still uses my old version and not the new 2.6 version.

> Any suggestion that shed some light would highly appreciated?

> My OS is Linux RHEL 5. 
> 
> Thanks in advance,
> Erwin Karbasi
> AT&T, Senior Software Architect

Re: Running cqlsh throws "No appropriate python interpreter found."

2013-11-06 Thread Erwin Karbasi
Now I'm experiencing this:

[iwuser@erwin-lab2 bin]$ ./cqlsh

Python CQL driver not installed, or not on PYTHONPATH.
You might try "easy_install cql".

Python: /usr/local/bin/python2.6
Module load path: ['./../lib/thrift-python-internal-only-0.9.1.zip',
'./../lib/cql-internal-only-1.4.1.zip/cql-1.4.1',
'/home/iwuser/dsc-cassandra-2.0.2/bin', '/usr/local/bin/python2.6',
'/usr/local/lib/python26.zip', '/usr/local/lib/python2.6',
'/usr/local/lib/python2.6/plat-linux2', '/usr/local/lib/python2.6/lib-tk',
'/usr/local/lib/python2.6/lib-old', '/usr/local/lib/python2.6/lib-dynload',
'/usr/local/lib/python2.6/site-packages']

*Error: can't decompress data; zlib not available*

Any idea?

Thanks,
Erwin Karbasi
AT&T, Senior Software Architect


On Wed, Nov 6, 2013 at 9:27 AM, Erwin Karbasi  wrote:

> Hello All,
>
> I have installed Cassandra Datastax community edition 2.2 and python 2.6
> but when I run ./cqlsh I faced with the following error:
> "No appropriate python interpreter found."
>
> I know that the error indicates that I have incompatible Python version
> but I've already installed Python 2.6 and as it seems cqlsh still uses my
> old version and not the new 2.6 version.
>
> Any suggestion that shed some light would highly appreciated?
>
> My OS is Linux RHEL 5.
>
> Thanks in advance,
> Erwin Karbasi
> AT&T, Senior Software Architect
>


Re: Managing index tables

2013-11-06 Thread Thomas Stets
Hi Tom,

thanks, I take your answer as "nobody else has found an elegant solution,
either" :-)

I guess I could use a secondary index for some cases, but there are several
reasons I can't use them in most cases. Especially the permissions are
problematic.

A user may have dozens of permission for different solutions, and the
permissions may have boundary values that limit their applicability. I
query for users with any roles for  a specific solution, for users with a
specific role within a specific solution, for users with a specific role,
solution and boundary value. I have index tables for all of these cases,
and since a lot of queries only need users from a specific company I also
have index tables where the key also contains the company. Plus, I need to
support wild cards in the stored boundary values as well as in the query
terms.

With several hundred thousand users in our DB all of this works fine, but
maintaining all the index tables is no fun...

We are still on 1.1.8, preparing to switch to 1.2. As far as I know, 2.0 is
not yet recommended for production. And while having transaction would
definitely be nice, most cases of inconsistent data result from bugs in the
code or changed logic.

The data we keep in Cassandra is just a denormalized copy of part of our
data in an Oracle DB, so theoretically we could just throw everything away
and rebuild. But we can't afford to be down that long.
I have a program that scans the data in our Cassandra DB for
inconsistencies, and spits our CQL code to fix it. One complete check alone
takes about 2 days.

Anyway, thanks for your help. :-)

Thomas


On Tue, Nov 5, 2013 at 11:00 PM, Tom van den Berge wrote:

> Hi Thomas,
>
> I understand your concerns about ensuring the integrity of your data when
> having to maintain the indexes yourself.
>
> In some situations, using Cassandra's built in secondary indexes is more
> efficient -- when many rows contained the indexed value. Maybe your
> permissions fall in this category? Obviously, the advantage is that
> Cassandra will do the maintenance on the index for you.
>
> For situations where secondary indexes are not recommended, you make your
> life a lot easier if all modifications of the indexed entity (like your
> user) is executed by one single piece of code, which is then also
> responsible for maintaining all associated indexes. And write tests to
> ensure that it works in all possible ways.
>
> I understood that Cassandra 2.0 supports transactions. I haven't looked at
> it yet, but this could also help maintaining your data integrity, when a
> failed update of one of your indexes results in a rollback of the entire
> transaction.
>
> I hope this is helpful to you.
> Tom
>
>
> On Mon, Nov 4, 2013 at 12:20 PM, Thomas Stets wrote:
>
>> What is the best way to manage index tables on update/deletion of the
>> indexed data?
>>
>> I have a table containing all kinds of data fora user, i.e. name,
>> address, contact data, company data etc. Key to this table is the user ID.
>>
>> I also maintain about a dozen index tables matching my queries, like
>> name, email address, company D.U.N.S number, permissions the user has, etc.
>> These index tables contain the user IDs matching the search key as column
>> names, with the column values left empty.
>>
>> Whenever a user is deleted or updated I have to make sure to update the
>> index tables, i.e. if the permissions of a user changes I have to remove
>> the user ID from the rows matching the permission he no longer has.
>>
>> My problem is to find all matching entries, especially for data I no
>> longer have.
>>
>> My solution so far is to keep a separate table to keep track of all index
>> tables and keys the user can be found in. In the case mentioned I look up
>> the keys for the permissions table, remove the user ID from there, then
>> remove the entry in the keys table.
>>
>> This works so far (in production for more than a year and a half), and it
>> also allows me to clean up after something has gone wrong.
>>
>> But still, all this additional level of meta information adds a lot of
>> complexity. I was wondering wether there is some kind of pattern that
>> addresses my problem. I found lots of information saying that creating the
>> index tables is the way to go, but nobody ever mentions maintaining the
>> index tables.
>>
>> tia, Thomas
>>
>
>
>
> --
>
> Drillster BV
> Middenburcht 136
> 3452MT Vleuten
> Netherlands
>
> +31 30 755 5330
>
> Open your free account at www.drillster.com
>