Re: Zurich / Swiss / Alps meetup

2012-07-11 Thread Benoit Perroud
Coming back on this thread, we are proud to announce we opened a Swiss
BigData UserGroup.

http://www.bigdata-usergroup.ch/

Next meetup is July 16, with topic "NoSQL Storage: War Stories and
Best Practices".

Hope to meet you there !

Benoit.


2012/5/17 Sasha Dolgy :
> All,
>
> A year ago I made a simple query to see if there were any users based in and
> around Zurich, Switzerland or the Alps region, interested in participating
> in some form of Cassandra User Group / Meetup.  At the time, 1-2 replies
> happened.  I didn't do much with that.
>
> Let's try this again.  Who all is interested?  I often am jealous about all
> the fun I miss out on with the regular meetups that happen stateside ...
>
> Regards,
> -sd
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com


Re: Zurich / Swiss / Alps meetup

2012-05-18 Thread Benoit Perroud
+1 !



2012/5/17 Sasha Dolgy :
> All,
>
> A year ago I made a simple query to see if there were any users based in and
> around Zurich, Switzerland or the Alps region, interested in participating
> in some form of Cassandra User Group / Meetup.  At the time, 1-2 replies
> happened.  I didn't do much with that.
>
> Let's try this again.  Who all is interested?  I often am jealous about all
> the fun I miss out on with the regular meetups that happen stateside ...
>
> Regards,
> -sd
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com



-- 
sent from my Nokia 3210


Re: sstableloader 1.1 won't stream

2012-05-07 Thread Benoit Perroud
You may want to upgrade all your nodes to 1.1.

The streaming process connect to every living nodes of the cluster
(you can explicitely diable some nodes), so all nodes need to speak
1.1.



2012/5/7 Pieter Callewaert :
> Hi,
>
>
>
> I’m trying to upgrade our bulk load process in our testing env.
>
> We use the SSTableSimpleUnsortedWriter to write tables, and use
> sstableloader to stream it into our cluster.
>
> I’ve changed the writer program to fit to the 1.1 api, but now I’m having
> troubles to load them to our cluster. The cluster exists out of one 1.1 node
> and two 1.0.9 nodes.
>
>
>
> I’ve enabled debug as parameter and in the log4j conf.
>
>
>
> [root@bms-app1 ~]# ./apache-cassandra/bin/sstableloader --debug -d
> 10.10.10.100 /tmp/201205071234/MapData024/HOS/
>
> INFO 16:25:40,735 Opening
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1 (1588949 bytes)
>
> INFO 16:25:40,755 JNA not found. Native methods will be disabled.
>
> DEBUG 16:25:41,060 INDEX LOAD TIME for
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1: 327 ms.
>
> Streaming revelant part of
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to
> [/10.10.10.102, /10.10.10.100, /10.10.10.101]
>
> INFO 16:25:41,083 Stream context metadata
> [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%], 1 sstables.
>
> DEBUG 16:25:41,084 Adding file
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
>
> INFO 16:25:41,087 Streaming to /10.10.10.102
>
> DEBUG 16:25:41,092 Files are
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%
>
> INFO 16:25:41,099 Stream context metadata
> [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%], 1 sstables.
>
> DEBUG 16:25:41,100 Adding file
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
>
> INFO 16:25:41,100 Streaming to /10.10.10.100
>
> DEBUG 16:25:41,100 Files are
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%
>
> INFO 16:25:41,102 Stream context metadata
> [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%], 1 sstables.
>
> DEBUG 16:25:41,102 Adding file
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.
>
> INFO 16:25:41,102 Streaming to /10.10.10.101
>
> DEBUG 16:25:41,102 Files are
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%
>
>
>
> progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1
> (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:41,107 Failed attempt 1 to
> connect to /10.10.10.101 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%. Retrying in 4000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.102 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%. Retrying in 4000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.100 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%. Retrying in 4000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1
> (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:45,109 Failed attempt 2 to
> connect to /10.10.10.101 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%. Retrying in 8000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.102 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6557280 - 0%. Retrying in 8000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.100 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1
> progress=0/6551840 - 0%. Retrying in 8000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1
> (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 16:25:53,113 Failed attempt 3 to
> connect to /10.10.10.101 to stream
> /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2
> progress=0/6566400 - 0%. Retrying in 16000 ms. (java.net.SocketException:
> Invalid argument or cannot assign requested address)
>
> WARN 16:25:53,114 Failed attempt 3 to connect to /10.10.10.102 to stream
> /tmp/201205071234/Ma

SSTableWriter and Bulk Loading life cycle enhancement

2012-05-03 Thread Benoit Perroud
Hi All,

I'm bulk loading (a lot of) data from Hadoop into Cassandra 1.0.x. The
provided CFOutputFormat is not the best case here, I wanted to use the
bulk loading feature. I know 1.1 comes with a BulkOutputFormat but I
wanted to propose a simple enhancement to SSTableSimpleUnsortedWriter
that could ease life :

When the table is flushed into the disk, it could be interesting to
have listeners that could be triggered to perform any action (copying
my SSTable into HDFS for instance).

Please have a look at the patch below to give a better idea. Do you
think it could worth while opening a jira for this ?


Regarding 1.1 BulkOutputFormat and bulk in general, the work done to
have light client to stream into the cluster is really great. The
issue now is that data is streamed at the end of the task only. This
cause all the tasks storing the data locally and streaming everything
at the end. Lot's of temporary space may be needed, and lot of
bandwidth to the nodes are used at the "same" time. With the listener,
we would be able to start streaming as soon the first table is
created. That way the streaming bandwidth could be better balanced.
Jira for this also ?

Thanks

Benoit.




--- a/src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
@@ -21,6 +21,8 @@ package org.apache.cassandra.io.sstable;
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.LinkedList;
+import java.util.List;
 import java.util.Map;
 import java.util.TreeMap;

@@ -47,6 +49,8 @@ public class SSTableSimpleUnsortedWriter extends
AbstractSSTableSimpleWriter
 private final long bufferSize;
 private long currentSize;

+private final List sSTableWrittenListeners
= new LinkedList();
+
 /**
  * Create a new buffering writer.
  * @param directory the directory where to write the sstables
@@ -123,5 +127,16 @@ public class SSTableSimpleUnsortedWriter extends
AbstractSSTableSimpleWriter
 }
 currentSize = 0;
 keys.clear();
+
+// Notify the registered listeners
+for (SSTableWriterListener listeners : sSTableWrittenListeners)
+{
+
listeners.onSSTableWrittenAndClosed(writer.getTableName(),
writer.getColumnFamilyName(), writer.getFilename());
+}
+}
+
+public void addSSTableWriterListener(SSTableWriterListener listener)
+{
+   sSTableWrittenListeners.add(listener);
 }
 }
diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableWriterListener.java
b/src/java/org/apache/cassandra/io/sstable/SSTableWriterListener.java
new file mode 100644
index 000..6628d20
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableWriterListener.java
@@ -0,0 +1,9 @@
+package org.apache.cassandra.io.sstable;
+
+import java.io.IOException;
+
+public interface SSTableWriterListener {
+
+   void onSSTableWrittenAndClosed(final String tableName, final
String columnFamilyName, final String filename) throws IOException;
+
+}


Re: Bulkload into a different CF

2012-05-01 Thread Benoit Perroud
I would just try to copy instead of moving first, and dropping the old
CF or the not needed snapshot if necessary when everything is ok.


2012/5/1 Oleg Proudnikov :
> Benoit Perroud  noisette.ch> writes:
>
>>
>> You can copy the sstables (renaming them accordingly) and
>> call nodetool refresh.
>>
>
> Thank you, Benoit.
>
> In that case could I try snapshot+move&rename+refresh on a live system?
>
> Regards,
> Oleg
>
>



-- 
sent from my Nokia 3210


Re: Bulkload into a different CF

2012-05-01 Thread Benoit Perroud
!! Without any guarantee. I know it works but I never used this in production !!

You can copy the sstables (renaming them accordingly) and call nodetool refresh.

Don't forget to create your column family CF2 before.


2012/5/1 Oleg Proudnikov :
> Hello,
>
> Is it possible to create an exact replica of a CF by these steps?
>
> 1. Take a snapshot
> 2. Isolate sstables for CF1
> 3. Rename sstables into CF2
> 4. Bulk load renamed sstables into newly created CF2 within the same Keyspace
>
> Or would you suggest using sstable2json instead?
>
> Thank you very much,
> Oleg
>



-- 
sent from my Nokia 3210


Re: Building SSTables with SSTableSimpleUnsortedWriter

2012-04-29 Thread Benoit Perroud
big buffer size will use more Heap memory at creation of the tables.
Not sure impact on server side, but shouldn't be a big difference. I
personally use 512Mb.





2012/4/28 sj.climber :
> Can anyone comment on best practices for setting the buffer size used by
> SSTableSimpleUnsortedWriter?  I'm presently using 100MB, but for some of the
> larger column families I'm working with, this can result in hundreds of
> SSTables.  After streaming to Cassandra via sstableloader, there's a fair
> amount of compaction work to do.
>
> What are the benefits and consequences of going with a higher buffer size?
>
> Thanks!
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Building-SSTables-with-SSTableSimpleUnsortedWriter-tp7507756.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.



-- 
sent from my Nokia 3210


Re: unsubscribe

2012-04-27 Thread Benoit Perroud
http://wiki.apache.org/cassandra/FAQ#unsubscribe

Le 27 avril 2012 19:20, Ramkumar Vaidyanathan (PDF)
 a écrit :
> unsubscribe
>
>
>
>
> The information in this email and any attachments to it may be confidential
> and/or privileged. Unless you are the intended recipient (or authorized to
> receive it on behalf of the intended recipient), you may not use, copy, or
> disclose to anyone the message or attachments, in whole or in part. If you
> believe that you have received the message in error, please delete it
> forever from your systems and trash, and advise the sender by reply email.
> © 2012 PDF Solutions Inc.  All rights reserved.



-- 
sent from my Nokia 3210


Re: unsubscribe

2012-04-07 Thread Benoit Perroud
http://wiki.apache.org/cassandra/FAQ#unsubscribe

Le 7 avril 2012 14:37, Jeffrey Fass  a écrit :
> unsubscribe
>
>



-- 
sent from my Nokia 3210


Bulk loading errors with 1.0.8

2012-04-05 Thread Benoit Perroud
Hi All,

I'm experiencing the following errors while bulk loading data into a cluster

ERROR [Thread-23] 2012-04-05 09:58:12,252 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[Thread-23,5,main]
java.lang.RuntimeException: Insufficient disk space to flush
7813594056494754913 bytes
at 
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:635)
at 
org.apache.cassandra.streaming.StreamIn.getContextMapping(StreamIn.java:92)
at 
org.apache.cassandra.streaming.IncomingStreamReader.(IncomingStreamReader.java:68)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

Here I'm not really sure I was able to generate 7 exa bytes of data ;)


ERROR [Thread-46] 2012-04-05 09:58:14,453 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[Thread-46,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.io.sstable.SSTable.getMinimalKey(SSTable.java:156)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
at 
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:155)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:89)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

This one sounds like a null key added to the SSTable at some point,
but I'm rather confident I'm checking for key nullity.

Errors are seen in different nodes. Some nodes succeeded. All 5
cluster nodes run 1.0.8 with JNA enabled.

Basically I'm generating SSTables in a Hadoop Reducer, storing them in
HDFS. After the job finished, I download them back into a single node
and streaming the files into Cassandra (Yes, I definitely need to try
1.1).

Do someone have a hint or a pointer where to start looking ?

Thanks,

Benoit.


Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta2 released

2012-03-27 Thread Benoit Perroud
Thanks for the quick feedback.

I will drop the schema then.

Benoit.


Le 27 mars 2012 14:50, Sylvain Lebresne  a écrit :
> Actually, there was a few changes to the on-disk format of schema
> between beta1 and beta2 so upgrade is not supported between those two
> beta versions.
> Sorry for any inconvenience.
>
> --
> Sylvain
>
> On Tue, Mar 27, 2012 at 12:57 PM, Benoit Perroud  wrote:
>> Hi All,
>>
>> Thanks a lot for the release.
>> I just upgraded my 1.1-beta1 to 1.1-beta2, and I get the following error :
>>
>>  INFO 10:56:17,089 Opening
>> /app/cassandra/data/data/system/LocationInfo/system-LocationInfo-hc-18
>> (74 bytes)
>>  INFO 10:56:17,092 Opening
>> /app/cassandra/data/data/system/LocationInfo/system-LocationInfo-hc-17
>> (486 bytes)
>> ERROR 10:56:17,306 Exception encountered during startup
>> java.lang.NullPointerException
>>        at 
>> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:163)
>>        at 
>> org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:120)
>>        at org.apache.cassandra.cql.jdbc.JdbcUTF8.getString(JdbcUTF8.java:77)
>>        at org.apache.cassandra.cql.jdbc.JdbcUTF8.compose(JdbcUTF8.java:97)
>>        at org.apache.cassandra.db.marshal.UTF8Type.compose(UTF8Type.java:35)
>>        at 
>> org.apache.cassandra.cql3.UntypedResultSet$Row.getString(UntypedResultSet.java:87)
>>        at 
>> org.apache.cassandra.config.CFMetaData.fromSchemaNoColumns(CFMetaData.java:1008)
>>        at 
>> org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1053)
>>        at 
>> org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:261)
>>        at 
>> org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:242)
>>        at org.apache.cassandra.db.DefsTable.loadFromTable(DefsTable.java:158)
>>        at 
>> org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:514)
>>        at 
>> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:182)
>>        at 
>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
>>        at 
>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)
>>
>>
>> Thanks for your support,
>>
>> Benoit.
>>
>>
>> Le 27 mars 2012 11:55, Sylvain Lebresne  a écrit :
>>> The Cassandra team is pleased to announce the release of the second beta for
>>> the future Apache Cassandra 1.1.
>>>
>>> Note that this is beta software and as such is *not* ready for production 
>>> use.
>>>
>>> The goal of this release is to give a preview of what will become Cassandra
>>> 1.1 and to get wider testing before the final release. All help in testing
>>> this release would be therefore greatly appreciated and please report any
>>> problem you may encounter[3,4]. Have a look at the change log[1] and the
>>> release notes[2] to see where Cassandra 1.1 differs from the previous 
>>> series.
>>>
>>> Apache Cassandra 1.1.0-beta2[5] is available as usual from the cassandra
>>> website (http://cassandra.apache.org/download/) and a debian package is
>>> available using the 11x branch (see
>>> http://wiki.apache.org/cassandra/DebianPackaging).
>>>
>>> Thank you for your help in testing and have fun with it.
>>>
>>> [1]: http://goo.gl/nX7UL (CHANGES.txt)
>>> [2]: http://goo.gl/TB9ro (NEWS.txt)
>>> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>>> [4]: user@cassandra.apache.org
>>> [5]: 
>>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-beta2
>>
>>
>>
>> --
>> sent from my Nokia 3210



-- 
sent from my Nokia 3210


Re: [BETA RELEASE] Apache Cassandra 1.1.0-beta2 released

2012-03-27 Thread Benoit Perroud
Hi All,

Thanks a lot for the release.
I just upgraded my 1.1-beta1 to 1.1-beta2, and I get the following error :

 INFO 10:56:17,089 Opening
/app/cassandra/data/data/system/LocationInfo/system-LocationInfo-hc-18
(74 bytes)
 INFO 10:56:17,092 Opening
/app/cassandra/data/data/system/LocationInfo/system-LocationInfo-hc-17
(486 bytes)
ERROR 10:56:17,306 Exception encountered during startup
java.lang.NullPointerException
at 
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:163)
at 
org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:120)
at org.apache.cassandra.cql.jdbc.JdbcUTF8.getString(JdbcUTF8.java:77)
at org.apache.cassandra.cql.jdbc.JdbcUTF8.compose(JdbcUTF8.java:97)
at org.apache.cassandra.db.marshal.UTF8Type.compose(UTF8Type.java:35)
at 
org.apache.cassandra.cql3.UntypedResultSet$Row.getString(UntypedResultSet.java:87)
at 
org.apache.cassandra.config.CFMetaData.fromSchemaNoColumns(CFMetaData.java:1008)
at 
org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1053)
at 
org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:261)
at 
org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:242)
at org.apache.cassandra.db.DefsTable.loadFromTable(DefsTable.java:158)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:514)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:182)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:353)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:106)


Thanks for your support,

Benoit.


Le 27 mars 2012 11:55, Sylvain Lebresne  a écrit :
> The Cassandra team is pleased to announce the release of the second beta for
> the future Apache Cassandra 1.1.
>
> Note that this is beta software and as such is *not* ready for production use.
>
> The goal of this release is to give a preview of what will become Cassandra
> 1.1 and to get wider testing before the final release. All help in testing
> this release would be therefore greatly appreciated and please report any
> problem you may encounter[3,4]. Have a look at the change log[1] and the
> release notes[2] to see where Cassandra 1.1 differs from the previous series.
>
> Apache Cassandra 1.1.0-beta2[5] is available as usual from the cassandra
> website (http://cassandra.apache.org/download/) and a debian package is
> available using the 11x branch (see
> http://wiki.apache.org/cassandra/DebianPackaging).
>
> Thank you for your help in testing and have fun with it.
>
> [1]: http://goo.gl/nX7UL (CHANGES.txt)
> [2]: http://goo.gl/TB9ro (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
> [4]: user@cassandra.apache.org
> [5]: 
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.1.0-beta2



-- 
sent from my Nokia 3210


Re: Cassandra - crash with “free() invalid pointer”

2012-03-22 Thread Benoit Perroud
Sounds like a race condition in the off heap caching while calling
Unsafe.free().

Do you use cache ? What is your use case when you encounter this error
? Are you able to reproduce it ?


2012/3/22 Maciej Miklas :
> Hi *,
>
> My Cassandra installation runs on flowing system:
>
> Linux with Kernel 2.6.32.22
> jna-3.3.0
> Java 1.7.0-b147
>
> Sometimes we are getting following error:
>
> *** glibc detected *** /var/opt/java1.7/bin/java: free(): invalid pointer:
> 0x7f66088a6000 ***
> === Backtrace: =
> /lib/libc.so.6[0x7f661d7099a8]
> /lib/libc.so.6(cfree+0x76)[0x7f661d70bab6]
> /lib64/ld-linux-x86-64.so.2(_dl_deallocate_tls+0x59)[0x7f661e02f349]
> /lib/libpthread.so.0[0x7f661de09237]
> /lib/libpthread.so.0[0x7f661de0931a]
> /lib/libpthread.so.0[0x7f661de0a0bd]
> /lib/libc.so.6(clone+0x6d)[0x7f661d76564d]
> === Memory map: 
> 0040-00401000 r-xp  68:07 537448203
> /var/opt/jdk1.7.0/bin/java
> 0060-00601000 rw-p  68:07 537448203
> /var/opt/jdk1.7.0/bin/java
> 01bae000-01fd rw-p  00:00 0
> [heap]
> 01fd-15798000 rw-p  00:00 0
> [heap]
> 40002000-40005000 ---p  00:00 0
> 40005000-40023000 rw-p  00:00 0
> 4003-40033000 ---p  00:00 0
> 40033000-40051000 rw-p  00:00 0
>
> Does anyone have similar problems? or maybe some hints?
>
> Thanks,
> Maciej



-- 
sent from my Nokia 3210


Re: Link in Wiki broken

2012-03-18 Thread Benoit Perroud
http://blip.tv/datastax/getting-to-know-the-cassandra-codebase-4034648


2012/3/18 Tharindu Mathew :
> Hi,
>
> It seems that [1] is broken. Wonder if it exists somewhere else?
>
> [1] -
> http://www.channels.com/episodes/show/11765800/Getting-to-know-the-Cassandra-Codebase
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>



-- 
sent from my Nokia 3210


Re: design that mimics twitter tweet search

2012-03-18 Thread Benoit Perroud
The simpliest modeling you could have is using the keyword as key, a
timestamp/time UUID as column name and the tweetid as value

-> cf['keyword']['timestamp'] = tweetid

then you do a range query to get all tweetid sorted by time (you may
want them in reverse order) and you can limit to the number of tweets
displayed on the page.

As some rows can become large, you could use key patitionning by
concatening for instance keyword and the month and year.


2012/3/18 Sasha Dolgy :
> Hi All,
>
> With twitter, when I search for words like:  "cassandra is the bestest", 4
> tweets will appear, including one i just did.  My understand that the
> internals of twitter work in that each word in a tweet is allocated,
> irrespective of the presence of a  # hash tag, and the tweet id is assigned
> to a row for that word.  What is puzzling to me, and hopeful that some smart
> people on here can shed some light on -- is how would this work with
> Cassandra?
>
> row [ cassandra ]: key -> tweetid  / timestamp
> row [ bestest ]: key -> tweetid / timestamp
>
> I had thought that I could simply pull a list of all column names from each
> row (representing each word) and flag all occurrences (tweet id's) that
> exist in each row ... however, these rows would get quite long over time.
>
> Am I missing an easier way to get a list of all "tweetid's" that exist in
> multiple rows?
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com



-- 
sent from my Nokia 3210


Re: Cassandra 1.1 row isolation cross datacenter replication

2012-02-21 Thread Benoit Perroud
The isolation is guarantee locally to the node.
If two client are reading / writing to the same node, the one that
read will not see partial mutations.

2012/2/21 Allen Servedio :
> Hi,
>
> I saw that row level isolation was added in the beta of Cassandra 1.1 and I
> have the following question: given a ring that has two datacenters and a
> keyspace defined to replicate to both, if a row is written with local quorum
> to the first data center, will it still be isolated when it replicates to
> the other datacenter? Or is isolation for that row broken in the other
> datacenter (so that isolation is essentially back to the column level for
> the other datacenter)?
>
> Hopefully what I am asking makes sense... If not, I will give a more
> detailed example.
>
> Thanks,
> Allen



-- 
sent from my Nokia 3210


Re: Counters and Top 10

2011-12-25 Thread Benoit Perroud
With Composite Column Name, you can even have column composed of sore
(int) and userid (uuid or whatever). Empty column value to avoid
repeating user UUID.


2011/12/22 R. Verlangen :
> I would suggest you to create a CF with a single row (or multiple for
> historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple
> columns for every user's score. The column (utf8) would then be the score +
> something unique of the user (e.g. hex representation of the TimeUUID). The
> value would be the TimeUUID of the user.
>
> By default columns will be sorted and you can perform a slice to get the top
> 10.
>
> 2011/12/14 cbert...@libero.it 
>
>> Hi all,
>> I'm using Cassandra in production for a small social network (~10.000
>> people).
>> Now I have to assign some "credits" to each user operation (login, write
>> post
>> and so on) and then beeing capable of providing in each moment the top 10
>> of
>> the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
>> version in order to use Counters for the user points but ... what about
>> the top
>> 10?
>> I was thinking about a specific ROW that always keeps the 10 most active
>> users
>> ... but I think it would be heavy (to write and to handle in thread-safe
>> mode)
>> ... can counters provide something like a "value ordered list"?
>>
>> Thanks for any help.
>> Best regards,
>>
>> Carlo
>>
>>
>



-- 
sent from my Nokia 3210


Re: need help with choosing correct tokens for ByteOrderedPartitioner

2011-11-28 Thread Benoit Perroud
You may want to add 29991231 instead of appending.

Le lundi 28 novembre 2011, Piavlo  a écrit :
> Anyone can help with this?
>
> Thanks
>
> On 11/24/2011 11:55 AM, Piavlo wrote:
>>
>>  Hi,
>>
>> We need help with choosing  correct tokens for ByteOrderedPartitioner
>> Originally the key where supposed to be member_id-mmdd
>> but since we need to male rage scans on same member_id and varying date
ranges mmdd
>> we decided to use ByteOrderedPartitioner, so we need that same member
will be assigned to same token range.
>> So we decided that the keys will be md5(member_id)mmdd
>> Since md5 on member_id should give even distribution or member_id across
tokens.
>>
>> We have 4 nodes, and don't understand how to choose the tokens.
>> We tried the following tokens
>>
>> # ./tokengentool 4
>> token 0: 0
>> token 1: 42535295865117307932921825928971026432
>> token 2: 85070591730234615865843651857942052864
>> token 3: 127605887595351923798765477786913079296
>>
>> and appended 29991231
>>
>> so we ended up with the following tokens
>>
>> token 0: 0
>> token 1: 4253529586511730793292182592897102643229991231
>> token 2: 8507059173023461586584365185794205286429991231
>> token 3: 12760588759535192379876547778691307929629991231
>>
>> But the key end up not evenly distributed.
>>
>> So any help is appreciated.
>>
>> Thanks
>> Alex
>
>

-- 
sent from my Nokia 3210


Re: Off-heap caching through ByteBuffer.allocateDirect when JNA not available ?

2011-11-10 Thread Benoit Perroud
Thanks for the answer.
I saw the move to sun.misc.
In what sense allocateDirect is broken ?

Thanks,

Benoit.


2011/11/9 Jonathan Ellis :
> allocateDirect is broken for this purpose, but we removed the JNA
> dependency using sun.misc.Unsafe instead:
> https://issues.apache.org/jira/browse/CASSANDRA-3271
>
> On Wed, Nov 9, 2011 at 5:54 AM, Benoit Perroud  wrote:
>> Hi,
>>
>> I wonder if you have already discussed about ByteBuffer.allocateDirect
>> alternative to JNA memory allocation ?
>>
>> If so, do someone mind send me a pointer ?
>>
>> Thanks !
>>
>> Benoit.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
sent from my Nokia 3210


Off-heap caching through ByteBuffer.allocateDirect when JNA not available ?

2011-11-09 Thread Benoit Perroud
Hi,

I wonder if you have already discussed about ByteBuffer.allocateDirect
alternative to JNA memory allocation ?

If so, do someone mind send me a pointer ?

Thanks !

Benoit.


Re: Multiple Keyword Lookup Indexes

2011-11-07 Thread Benoit Perroud
You could directly use secondary indexes on the other fields instead
of handling yourself your indexes :

Define your global id (can be UUID), and have columns loginName, email
etc with a secondary index. Retrieval will then be fast.

2011/11/7 Felix Sprick :
> Hallo,
>
> We are implementing a Cassandra-backed user database. The challange in
> this is that there are 4 different sort of user IDs that all need to
> be indexed in order to access user data via them quickly. For example
> the user has a unique UUID, but also a LoginName and an email address,
> which can all be used for authentication.
>
> How do I model this in Cassandra?
>
> My approach would be to have one main "table" which is indexed by the
> most frequently used lookup value as row-key, lets say this is the
> UUID. This table would contain all customer data. Then I would create
> a index "table" for each of the other login alternatives, where I just
> reference to the UUID. So each alternative login which is not using
> the UUID would require two Cassandra queries. Are there any better
> approaches to model this?
>
> Also, I read somewhere that Cassandra is not optimized for these
> "reference tables" which are very short with two columns only. What is
> the reason for that?
>
> thanks,
> Felix
>



-- 
sent from my Nokia 3210


Re: Bulk uploader issue on multi-node cluster

2011-09-23 Thread Benoit Perroud
On the sstableloader config, make sure you have the seed set and rpc_address
and rpc_port pointing to your cassandra instance (127.0.0.2)



2011/9/23 Thamizh 

> Hi All,
>
> I am using bulk-loading to upload data(from lab02) to multi-node cluster of
> 3 machines(lab02,lab03 & lab04) with sigle ethernet card. I have created
> SSTable instance on lab02 by duplicating look back address( sudo ifconfig
> lo:2 127.0.0.2 netmask 255.0.0.0 up; ) "127.0.0.2" as rpc and storage
> address. Here 'sstableloader' ended up with below error message,
>
> Starting client (and waiting 30 seconds for gossip) ...
> java.lang.IllegalStateException: Cannot load any sstable, no live member
> found in the cluster
>
> Here, in my case, Does lab02 machine should have 2 ethernet card(one for
> cassandra original instance and another for 'sstableloader') ?
>
> Regards,
> Thamizhannal
>


Re: Possibility of going OOM using get_count

2011-09-19 Thread Benoit Perroud
The workaround for 0.7 is calling get_slice and count on client side.
It's heavier, sure, but you will then be able to set start column
accordingly.



2011/9/19 Tharindu Mathew :
> Thanks Aaron and Jake for the replies.
> Any chance of a possible workaround to use for Cassandra 0.7?
>
> On Mon, Sep 19, 2011 at 3:48 AM, aaron morton 
> wrote:
>>
>> Cool
>> Thanks, A
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 19/09/2011, at 9:55 AM, Jake Luciani wrote:
>>
>> This is fixed in 1.0
>> https://issues.apache.org/jira/browse/CASSANDRA-2894
>>
>> On Sun, Sep 18, 2011 at 2:16 PM, Tharindu Mathew 
>> wrote:
>>>
>>> Hi everyone,
>>> I noticed this line in the API docs,
>>>
>>> The method is not O(1). It takes all the columns from disk to calculate
>>> the answer. The only benefit of the method is that you do not need to pull
>>> all the columns over Thrift interface to count them.
>>>
>>> Does this mean if a row has a large number of columns calling this method
>>> might make it go OOM?
>>> Thanks in advance.
>>> --
>>> Regards,
>>>
>>> Tharindu
>>> blog: http://mackiemathew.com/
>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> Regards,
>
> Tharindu
> blog: http://mackiemathew.com/
>


Re: import data into cassandra

2011-09-18 Thread Benoit Perroud
There is no direct way to do that, but reading a CSV and inserting
rows in Java is really easy.

But you may want have a look at the new bulk loading tool,
sstableloader, described here :
http://www.datastax.com/dev/blog/bulk-loading

Small detail, it seems you still write email at the incubator ML.
Please use user@cassandra.apache.org instead.



2011/9/16 nehalmehta :
> Hi,
>
> Is there a tool which imports data from large CSV files into Cassandra using
> Thrift API (If using JAVA, it would be great).
>
> Thanks,
> Nehal Mehta
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/import-data-into-cassandra-tp4627325p6801723.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>


Re: SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-02 Thread Benoit Perroud
Thanks for your answer.

2011/9/2 Sylvain Lebresne :
> On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud  wrote:
>> Hi All,
>>
>> I started using SSTableSimpleUnsortedWriter to load data, and my data
>> has a few rows but a lot of column name in each rows.
>>
>> I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.
>>
>> But the time taken to insert columns is increasing as the column
>> family is increasing. The problem appears because everytime we call
>> newRow, all the columns of the previous CF is added to the new CF.
>
> If I understand correctly, each row has way more that 10 000 columns, but
> you call newRow every 10 000 columns, right ?

Yes. I call newRow every 10 000 columns to be sure to flush as soon as possible.

> Note that you have the possibility to decrease the frequency of the calls to
> newRow.
>
> But anyway, I agree that the code shouldn't suck like that.
>
>> Attached is a small patch that check which is the smallest CF, and add
>> the smallest CF to the biggest one.
>>
>> Should I open I bug for that ?
>
> Please do. I'm actually thinking of a slightly different fix: we should not 
> have
> to add all the previous columns to the new column family, we should just
> directly reuse the previous column family when adding the new column.
> But the JIRA ticket will be a better place to discuss this.

Opened : https://issues.apache.org/jira/browse/CASSANDRA-3122
Let's discuss there.

Thanks !

Benoit.

> --
> Sylvain
>


SSTableSimpleUnsortedWriter take long time when inserting big rows

2011-09-02 Thread Benoit Perroud
Hi All,

I started using SSTableSimpleUnsortedWriter to load data, and my data
has a few rows but a lot of column name in each rows.

I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.

But the time taken to insert columns is increasing as the column
family is increasing. The problem appears because everytime we call
newRow, all the columns of the previous CF is added to the new CF.

Attached is a small patch that check which is the smallest CF, and add
the smallest CF to the biggest one.

Should I open I bug for that ?

Thanks in advance,

Benoit
Index: src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java
===
--- src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java	(revision 1164377)
+++ src/java/org/apache/cassandra/io/sstable/SSTableSimpleUnsortedWriter.java	(working copy)
@@ -73,9 +73,17 @@
 
 // Note that if the row was existing already, our size estimation will be slightly off
 // since we'll be counting the key multiple times.
-if (previous != null)
-columnFamily.addAll(previous);
-
+if (previous != null) {
+// Add the smallest CF to the other one
+if (columnFamily.getSortedColumns().size() < previous.getSortedColumns().size()) {
+previous.addAll(columnFamily);
+// Re-add the previous CF to the map because it has been overwritten
+keys.put(key, previous);
+} else {
+columnFamily.addAll(previous);
+}
+}
+
 if (currentSize > bufferSize)
 sync();
 }


Re: The way to query a CF with "start > 10 and end < 100"

2011-08-29 Thread Benoit Perroud
queries start > 10 and end < 100 is not straight forward to modelize,
you should use the value of start as column name, and check on client
side the second condition.

Just for comparison, modeling 10 < value < 100 is rather much easier
if you set your values as column name, or using CompositeType if you
have duplicate values.





2011/8/29 Guofeng Zhang :
> Hi,
>
>
>
> I have a standard CF that has column “start” and “end”. I need to query its
> rows using condition “start>10 and end<100”. Is there any better way to do
> it? Using native secondary index or creating a specific CF for the search. I
> do not know which one is better. If the late is preferred to, how the CF
> looks like? Your advice is appreciated.
>
>
>
> Thanks
>
>


Re: CompositeType

2011-08-15 Thread Benoit Perroud
You should give a look at https://github.com/edanuff/CassandraIndexedCollections

This is a rather good starting point for Composites.

2011/8/15 Stephen Pope :
>  Hey, is there any documentation or examples of how to use the CompositeType? 
> I can't find anything about it on the wiki or the datastax docs.
>
>  Cheers,
>  Steve
>


Re: Need help in CF design

2011-08-11 Thread Benoit Perroud
You can apply this query really simply using cassandra and secondary 
indexes.


You will have a CF "TABLE", where row keys are your PK. Just to be sure 
of my understanding, your SQL query will either return 1 row or no row, 
right ?


3) SliceQuery returns a range of columns for a given key, it may be your 
friend.






On 11. 08. 11 07:50, a...@9y.com wrote:

I recently started with Cassandra and found interesting.

I was curious in SQL we have

SELECT * from TABLE where PK="primary_key" and other_attribute
between 500 and 1000;

My questions are :
1) Is it possible to design to get equivalent results for above
query ( using CQL or Hector) with Cassandra.

2) How can we design CF in that case? Using secondary index is an
option but I am not clear how that can be applied here.

3) Is there any way we can have a range slice over columns names
instead of having range or row keys.

I am just a novice. So, Can anyone help me with these question



Re: Setup Cassandra0.8 in Eclipse

2011-08-07 Thread Benoit Perroud

Make sure svn is on the PATH.

If you open a terminal (or cmd), running svn command should work.


On 07. 08. 11 23:39, Alvin UW wrote:

It seems svn wasn't installed, but i did install it.


Re: Fewer wide rows vs. more smaller rows

2011-08-07 Thread Benoit Perroud

Great ! Thanks for the link.



On 07. 08. 11 10:10, aaron morton wrote:
Wider rows may need to run through the slower 2-phase compaction 
process, see in_memory_compaction_limit_in_mb in the yaml file. They 
can also result in more GC, depending on work load etc.


Some testing I did on query performance 
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/


There is no magic number. The best advice is to follow Jonathan's advice.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Aug 2011, at 08:22, Benoit Perroud wrote:


Thanks for your advise. Make sense.

And without sticking to my dummy example, conceptually, what has a 
smaller memory footprint : 1M rows of 1 column or 1 row with 1M columns ?


And if the row key and column name are known, is there any 
performance difference between both scenarios ?


Thanks

Benoit.


On 04. 08. 11 18:24, Jonathan Ellis wrote:

"keep data you retrieve at the same time, in the same row."




Re: How to solve this kind of schema disagreement...

2011-08-05 Thread Benoit Perroud
Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
75eece10-bf48-11e0--4d205df954a7 own the majority, so shutdown and
remove the schema* and migration* sstables from both 192.168.1.28 and
192.168.1.27


2011/8/5 Dikang Gu :
> [default@unknown] describe cluster;
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions:
> 743fe590-bf48-11e0--4d205df954a7: [192.168.1.28]
> 75eece10-bf48-11e0--4d205df954a7: [192.168.1.9, 192.168.1.25]
> 06da9aa0-bda8-11e0--9510c23fceff: [192.168.1.27]
>
>  three different schema versions in the cluster...
> --
> Dikang Gu
> 0086 - 18611140205
>


Re: Fewer wide rows vs. more smaller rows

2011-08-04 Thread Benoit Perroud

Thanks for your advise. Make sense.

And without sticking to my dummy example, conceptually, what has a 
smaller memory footprint : 1M rows of 1 column or 1 row with 1M columns ?


And if the row key and column name are known, is there any performance 
difference between both scenarios ?


Thanks

Benoit.


On 04. 08. 11 18:24, Jonathan Ellis wrote:

"keep data you retrieve at the same time, in the same row."


Re: HOW TO select a column or all columns that start with X

2011-08-04 Thread Benoit Perroud
https://github.com/edanuff/CassandraIndexedCollections

2011/8/4 CASSANDRA learner :
> Can you please gimme an example on this using hector client
>
> On Thu, Aug 4, 2011 at 7:18 AM, Boris Yen  wrote:
>>
>> It seems to me that your column name consists of two components. If you
>> have the luxury to upgrade your cassandra to 0.8.1+, I think you can think
>> about using the composite type/column. Conceptually, it might suit your use
>> case better.
>>
>> On Wed, Aug 3, 2011 at 5:28 AM, Eldad Yamin  wrote:
>>>
>>> Hello,
>>> I wonder if I can select a column or all columns that start with X.
>>> E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns
>>> that start with ABC_ - is that possible?
>>>
>>>
>>> Thanks!
>
>


Re: Sample Cassandra project in Tomcat

2011-08-04 Thread Benoit Perroud
Or directly what you are looking at (tomcat + cassandra using hector
client) : https://github.com/riptano/twissjava

2011/8/3 Benoit Perroud :
> 2011/8/3 CASSANDRA learner :
>> Hi,
>>  can you please send me the mailing list address of tomcat
>
> http://tomcat.apache.org/lists.html
>
>> On Wed, Aug 3, 2011 at 4:07 PM, Benoit Perroud  wrote:
>>>
>>> I suppose what you are looking for is an example of interacting with a
>>> java app.
>>>
>>> You should have a look at the high(er) level client hector
>>> https://github.com/rantav/hector/
>>> You should find what you are looking for there.
>>> If you are looking for a tomcat (and .war) example, you should send an
>>> email to the tomcat mailing list.
>>>
>>>
>>> 2011/8/3 CASSANDRA learner :
>>> > Hiii,
>>> >
>>> > Can any one pleaze send me any sample application which is (.war)
>>> > implemented in java/jsp and cassandra db (Tomcat)
>>> >
>>
>>
>


Fewer wide rows vs. more smaller rows

2011-08-04 Thread Benoit Perroud
Hi All,

In a conceptual point of view, I'm wondering what is the pros & cons,
mainly in term of access efficiency, of both approach :

- Grouping row keys together to reduce the number of keys, but having
wider rows (with more columns)
- One object in one row

Let's illustrate with an example :

I want to store objects like : User { email, firstname, lastname, ...
}, with a key hash(email).

Given the two following users :

User1 { "email1", "firstname1", "lastname1", ... }, hash("email1") = "abcdefgh"
User1 { "email2", "firstname2", "lastname2", ... }, hash("email2") = "abcdabcd"

I can either store

UserCF["abcdefgh"] = { email = "email1", firstname = "firstname1",
lastname = "lastname1" }
UserCF["abcdabcd"] = { email = "email2", firstname = "firstname2",
lastname = "lastname2" }

or do something like (for example using composites) :

UserCF["abcd"] = { ("abcd", email) = "email2", ("abcd", firstname) =
"firstname2", ("abcd", lastname) = "lastname2", ("efgh", email) =
"email1", ("efgh", firstname) = "firstname1", ("efgh", lastname) =
"lastname1" }

Thanks in advance for your advises.


Re: Sample Cassandra project in Tomcat

2011-08-03 Thread Benoit Perroud
2011/8/3 CASSANDRA learner :
> Hi,
>  can you please send me the mailing list address of tomcat

http://tomcat.apache.org/lists.html

> On Wed, Aug 3, 2011 at 4:07 PM, Benoit Perroud  wrote:
>>
>> I suppose what you are looking for is an example of interacting with a
>> java app.
>>
>> You should have a look at the high(er) level client hector
>> https://github.com/rantav/hector/
>> You should find what you are looking for there.
>> If you are looking for a tomcat (and .war) example, you should send an
>> email to the tomcat mailing list.
>>
>>
>> 2011/8/3 CASSANDRA learner :
>> > Hiii,
>> >
>> > Can any one pleaze send me any sample application which is (.war)
>> > implemented in java/jsp and cassandra db (Tomcat)
>> >
>
>


Re: Significance of java_pidxxx.hprof

2011-08-03 Thread Benoit Perroud
When an OutOfMemoryError is thrown, a heap dump file named
java_pid.hprof will be created automatically is you run your java
app with  +HeapDumpOnOutMemoryError

2011/8/3 CASSANDRA learner :
> As per subject, Please explain me what is the significance of
> java_pidxxx.hprof
>


Re: Killing cassandra is not working

2011-08-03 Thread Benoit Perroud
so use netstat to find out which process had opened the port.

2011/8/3 CASSANDRA learner 

> Thnks for the reply Nila
>
> When i did PS command, I could not able to find any process related to
> cassandra. Thts the problem..
>
>
> On Wed, Aug 3, 2011 at 4:12 PM, Benoit Perroud  wrote:
>
>> Seems like you have already a Cassandra instance running, so the second
>> instance cannot open the same port twice.
>>
>> I would suggest you to kill all instances of Cassandra and start it again.
>>
>>
>>
>> 2011/8/3 Nilabja Banerjee 
>>
>>> try to use *grep* command to check the port where your cassandra was
>>> running.
>>>
>>>
>>> On 3 August 2011 16:01, CASSANDRA learner wrote:
>>>
>>>> H,,
>>>>
>>>> I was running cassandra in my mac and after some time the machine got to
>>>> sleep mode. Now after the machine is On. I tried to kill the process of
>>>> cassandra. But i could not able to do that as i could not able to find out
>>>> the process id. theres no process there when i pinged PS command in mac os.
>>>> I though its already killed and i have tried to start cassandra by typing
>>>> bin/cassandra.
>>>> I got the below error (in red color) due to the same.
>>>> ERROR: JDWP Transport dt_socket failed to initialize,
>>>> TRANSPORT_INIT(510)
>>>> JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
>>>> initialized [../../../src/share/back/debugInit.c:690]
>>>> FATAL ERROR in native method: JDWP No transports initialized,
>>>> jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
>>>>
>>>>
>>>> Can you please resolve this issue.
>>>>
>>>> Thnkx in Adv[?]
>>>>
>>>
>>>
>>
>
<<330.gif>>

Re: Killing cassandra is not working

2011-08-03 Thread Benoit Perroud
Seems like you have already a Cassandra instance running, so the second
instance cannot open the same port twice.

I would suggest you to kill all instances of Cassandra and start it again.



2011/8/3 Nilabja Banerjee 

> try to use *grep* command to check the port where your cassandra was
> running.
>
>
> On 3 August 2011 16:01, CASSANDRA learner wrote:
>
>> H,,
>>
>> I was running cassandra in my mac and after some time the machine got to
>> sleep mode. Now after the machine is On. I tried to kill the process of
>> cassandra. But i could not able to do that as i could not able to find out
>> the process id. theres no process there when i pinged PS command in mac os.
>> I though its already killed and i have tried to start cassandra by typing
>> bin/cassandra.
>> I got the below error (in red color) due to the same.
>> ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
>> JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized
>> [../../../src/share/back/debugInit.c:690]
>> FATAL ERROR in native method: JDWP No transports initialized,
>> jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
>>
>>
>> Can you please resolve this issue.
>>
>> Thnkx in Adv[?]
>>
>
>
<<330.gif>>

Re: Sample Cassandra project in Tomcat

2011-08-03 Thread Benoit Perroud
I suppose what you are looking for is an example of interacting with a
java app.

You should have a look at the high(er) level client hector
https://github.com/rantav/hector/
You should find what you are looking for there.
If you are looking for a tomcat (and .war) example, you should send an
email to the tomcat mailing list.


2011/8/3 CASSANDRA learner :
> Hiii,
>
> Can any one pleaze send me any sample application which is (.war)
> implemented in java/jsp and cassandra db (Tomcat)
>


Re: Cassandra start/stop scripts

2011-08-02 Thread Benoit Perroud
Kill -9 (SIGKILL) is the worst signal to use. It has the advantage to
kill quickly the process, but no shutdown hook are called. You should
better kill -15 (SIGTERM, which is the default).



2011/7/26 mcasandra :
> I need to write cassandra start/stop script. Currently I run "cassandra" to
> start and kill -9 to stop.
>
> Is this the best way? kill -9 doesn't sound right :) Wondering how others do
> it.
>
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-start-stop-scripts-tp6622977p6622977.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>


Small typo in conf/cassandra.yaml

2011-05-10 Thread Benoit Perroud
Hi all,

I found out a small typo in cassandra.yaml, which can confuse
inattentive copy-paster. Here is the patch.


Index: conf/cassandra.yaml
===
--- conf/cassandra.yaml (revision 1101465)
+++ conf/cassandra.yaml (working copy)
@@ -80,7 +80,7 @@
 # Size to allow commitlog to grow to before creating a new segment
 commitlog_rotation_threshold_in_mb: 128

-# commitlog_sync may be either "periodic" or "batch."
+# commitlog_sync may be either "periodic" or "batch".
 # When in batch mode, Cassandra won't ack writes until the commit log
 # has been fsynced to disk.  It will wait up to
 # CommitLogSyncBatchWindowInMS milliseconds for other writes, before


Kind regards,

Benoit.


Re: Usage Pattern : "unique" value of a key.

2011-01-13 Thread Benoit Perroud
Thanks for your answer.

You're right when you say it's unlikely that 2 threads have the same
timestamp, but it can. So it could work for user creation, but maybe
not on a more write intensive problem.

Moreover, we cannot rely on fully time synchronized node in the
cluster (but on node synchronized at a few ms), so a second node could
theoretically write a smaller timestamp after the first node.
An even worst case could be the one illustrated here
(http://noisette.ch/cassandra/cassandra_unique_key_pattern.png) :
nodes are synchronized, but something goes wrong (slow) during the
write, then both nodes think the key belongs to them.
So my idea of writing a lock is not well suitable...
Does anyone have another idea to share regarding this topic ?

Thanks,

Kind regards,

Benoit.

2011/1/13 Oleg Anastasyev :
> Benoit Perroud  noisette.ch> writes:
>
>>
>> My idea to solve such use case is to have both thread writing the
>> username, but with a colum like "lock-", and then read
>> the row, and find out if the first lock column appearing belong to the
>> thread. If this is the case, it can continue the process, otherwise it
>> has been preempted by another thread.
>
> This looks ok for this task. As an alternative you can avoid creating extra
> \lock-random value' column and compare timestamps of new user data you just
> written. It is unlikely that both racing threads will have exactly the same
> microsecond timestamp at the moment of creating a new user - so if data you 
> read
> have exactly the same timestamp you used to write data - this is your data.
>
> Another possible way is to use some external lock coordinator, eg zookeeper.
> Although for this task it looks a bit overkill, but this can become even more
> valuable, if you have more data concurrency issues to solve and can bear extra
> 5-10ms update operations latency.
>
>


Usage Pattern : "unique" value of a key.

2011-01-12 Thread Benoit Perroud
Hi ML,

I wonder if someone has already experiment some kind of unique index
on a column family key.

Let's go for a short example : the key is the username. What happens
if 2 users want to signup at the same time with the same username ?

So has someone already addressed this "pattern" in Cassandra point of view ?

My idea to solve such use case is to have both thread writing the
username, but with a colum like "lock-", and then read
the row, and find out if the first lock column appearing belong to the
thread. If this is the case, it can continue the process, otherwise it
has been preempted by another thread.

Has someone another idea to share ?

Thanks in advance,

Kind regards,

Benoit.


Re: Quick Poll: Server names

2010-07-27 Thread Benoit Perroud
We use name of (european) cities for "logical" functionnalities :

- berlin01, berlin02, berlin03 part are mysql cluster,
- zurich1 and zurich2 are AD,
- roma01, roma02, and so on are Cassanrda cluster for the Roma project
- and so on.

We found this way a good tradeoff.

Regards,

Benoit.

2010/7/27 uncle mantis :
> +1. Quick and simple.
>
> Regards,
>
> Michael
>
>
> On Tue, Jul 27, 2010 at 10:54 AM, Benjamin Black  wrote:
>>
>> [role][sequence].[airport code][sequence].[domain].[tld]
>
>


Re: Does anybody work about transaction on cassandra ?

2010-04-24 Thread Benoit Perroud
Ok in this particular context it means no dependencies.

Thanks for your precision.

Kind regards,

Benoit.


2010/4/24 Jonathan Ellis :
> On Sat, Apr 24, 2010 at 12:44 PM, Benoit Perroud  wrote:
>> "orthogonal" means "90 degrees".  Two lines are orthogonal if the
>> cross at 90 degrees.
>>
>> Two ideas are orthogonal means that they are not compatible.
>
> No, it just means they don't have dependencies on each other.  In this
> case, it means you could create a transactional layer on top of
> cassandra, without having to make it part of the core.
>


Re: Does anybody work about transaction on cassandra ?

2010-04-24 Thread Benoit Perroud
"orthogonal" means "90 degrees".  Two lines are orthogonal if the
cross at 90 degrees.

Two ideas are orthogonal means that they are not compatible.

Transactions is orthogonal with Cassandra's design means that it will
require a lot of work and trade-off to implement transactions into
Cassandra.

Is it more clear ?

Kind regards,

Benoit

2010/4/24 dir dir :
> Do you mean orthogonal like Commit and Rollback?? For example after we
> perform Rollback, hence we cannot going back.
>
>>Including "transaction" in Cassandra needs to turn 90 degrees
>>the design of Cassandra
>
> I do not understand what is the meaning of "needs to turn 90 degrees"??
>
> Thank you.
>
> On Sun, Apr 25, 2010 at 12:30 AM, Benoit Perroud  wrote:
>>
>> "orthogonal" means "go to the opposite direction, but without going
>> back". Including "transaction" in Cassandra needs to turn 90 degrees
>> the design of Cassandra.
>>
>> Kind regards,
>>
>> Benoit.
>>
>>
>>
>> 2010/4/24 dir dir :
>> >>Transactions are orthogonal to the design of Cassandra
>> >
>> > Sorry, Would you want to tell me what is an orthogonal mean in this
>> > context??
>> > honestly I do not understand what is it.
>> >
>> > Thank you.
>> >
>> >
>> > On Thu, Apr 22, 2010 at 9:14 PM, Miguel Verde 
>> > wrote:
>> >>
>> >> No, as far as I know no one is working on transaction support in
>> >> Cassandra.  Transactions are orthogonal to the design of
>> >> Cassandra[1][2],
>> >> although a system could be designed incorporating Cassandra and other
>> >> elements a la Google's MegaStore[3] to support transactions.  Google
>> >> uses
>> >> Paxos, one might be able to use Zookeeper[4] to design such a system,
>> >> but it
>> >> would be a daunting task.
>> >>
>> >> [1] http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
>> >> [2]
>> >> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
>> >> [3] http://perspectives.mvdirona.com/2008/07/10/GoogleMegastore.aspx
>> >> [4] http://hadoop.apache.org/zookeeper/
>> >>
>> >> On Thu, Apr 22, 2010 at 2:56 AM, Jeff Zhang  wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> I need transaction support on cassandra, so wondering is anybody work
>> >>> on
>> >>> it ?
>> >>>
>> >>>
>> >>> --
>> >>> Best Regards
>> >>>
>> >>> Jeff Zhang
>> >>
>> >
>> >
>
>


Re: Does anybody work about transaction on cassandra ?

2010-04-24 Thread Benoit Perroud
"orthogonal" means "go to the opposite direction, but without going
back". Including "transaction" in Cassandra needs to turn 90 degrees
the design of Cassandra.

Kind regards,

Benoit.



2010/4/24 dir dir :
>>Transactions are orthogonal to the design of Cassandra
>
> Sorry, Would you want to tell me what is an orthogonal mean in this
> context??
> honestly I do not understand what is it.
>
> Thank you.
>
>
> On Thu, Apr 22, 2010 at 9:14 PM, Miguel Verde 
> wrote:
>>
>> No, as far as I know no one is working on transaction support in
>> Cassandra.  Transactions are orthogonal to the design of Cassandra[1][2],
>> although a system could be designed incorporating Cassandra and other
>> elements a la Google's MegaStore[3] to support transactions.  Google uses
>> Paxos, one might be able to use Zookeeper[4] to design such a system, but it
>> would be a daunting task.
>>
>> [1] http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
>> [2] http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
>> [3] http://perspectives.mvdirona.com/2008/07/10/GoogleMegastore.aspx
>> [4] http://hadoop.apache.org/zookeeper/
>>
>> On Thu, Apr 22, 2010 at 2:56 AM, Jeff Zhang  wrote:
>>>
>>> Hi all,
>>>
>>> I need transaction support on cassandra, so wondering is anybody work on
>>> it ?
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>
>
>


Re: ORM in Cassandra?

2010-04-23 Thread Benoit Perroud
I understand the question more like : Is there already a lib which
help to get rid of writing hardcoded and hard to maintain lines like :

MyClass data;
String[] myFields = {"name", "label", ...}
List columns;
for (String field : myFields) {
if (field == "name") {
   columns.add(new Column(field, data.getName()))
} else if (field == "label") {
  columns.add(new Column(field, data.getLabel()))
} else ...
}
(same for loading (instanciating) automagically the object).

Kind regards,

Benoit.

2010/4/23 dir dir :
>>So maybe it's weird to combine ORM and Cassandra, right? Is there
>>anything we can take from ORM?
>
> Honestly I do not understand what is your question. It is clear that
> you can not combine ORM such as Hibernate or iBATIS with Cassandra.
> Cassandra it self is not a RDBMS, so you will not map the table into
> the object.
>
> Dir.
>
> On Fri, Apr 23, 2010 at 12:12 PM, aXqd  wrote:
>>
>> Hi, all:
>>
>> I know many people regard O/R Mapping as rubbish. However it is
>> undeniable that ORM is quite easy to use in most simple cases,
>> Meanwhile Cassandra is well known as No-SQL solution, a.k.a.
>> No-Relational solution.
>> So maybe it's weird to combine ORM and Cassandra, right? Is there
>> anything we can take from ORM?
>> I just hate to write CRUD functions/Data layer for each object in even
>> a disposable prototype program.
>>
>> Regards.
>> -Tian
>
>


Re: How many KeySpace will you use in a single application?

2010-04-10 Thread Benoit Perroud
One point in using several keyspaces is that replication factor is per keyspace.

If you have a part of your application which generate a lot of data
whoss can be lost (some non critical logs?), then a dedicated keyspace
with a smaller replication factor can be a good thing.

Kind regards,

Benoit.

2010/4/10 Dop Sun :
> Hi, a question troubles me now: how many KeySpaces one application is better
> to use?
>
>
>
> The question is coming out since 0.6, Cassandra introduced a new API named
> as “login”, which is done against a specific keySpace. Thanks to the
> org.apache.cassandra.auth.AllowAllAuthenticator, the old version clients can
> still work without authentication.
>
>
>
> Actually, while I’m working with the previous version, I just take the
> KeySpace as another level of the whole structure, KeySpace – ColumnFamily –
> Super Column (optional) – Column – Value.  And consider the whole Cassandra
> cluster as the root of all these, and one application controls everything
> under this cluster.
>
>
>
> Now, looks like I need to re-think this and put the KeySpace as a kind of
> root. It may be better to make one application only takes one KeySpace (a
> silly question? Since all old time, one application usually uses only one
> database, but forgive me, I may abuses the flexibility of Cassandra.)? Is
> there any pros or cons to user multiple key spaces vs. single key spaces,
> other than the authentication requirements?
>
>
>
> Can anyone give me some suggestions on this?
>
>
>
> Dop


Re: 0.5.1 exception: java.io.IOException: Reached an EOL or something bizzare occured

2010-04-03 Thread Benoit Perroud
My guess is that the servers I use have not enough I/O nor CPU power.
I run on a virtualized env, and even the vmstat command lag a lot.

But it do not appears that the overall application behavior is
degraded by this error, only the "eventually" takes a little longer.

--
Kind regads,

Benoit.

2010/4/3 Anty :
> Does anyone have solve the problem?I encounter the same error too.
>
> On Mon, Mar 29, 2010 at 12:12 AM, Benoit Perroud  wrote:
>>
>> I got the same error when the nodes are using lot of I/O, i.e during
>> compaction.
>>
>> 2010/3/28 Eric Yu :
>> > I have not restart my nodes.
>> > OK, may be I should give 0.6 a try.
>> >
>> > On Sun, Mar 28, 2010 at 9:53 AM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> It means that a MessagingService socket closed unexpectedly.  If
>> >> you're starting and restarting nodes that could cause it.
>> >>
>> >> This code is obsolete in 0.6 anyway.
>> >>
>> >> On Sat, Mar 27, 2010 at 8:51 PM, Eric Yu  wrote:
>> >> > And one more clue here, when ReplicateFactor is 1, it's OK, after
>> >> > changed to
>> >> > 2, the exception occurred.
>> >> >
>> >> > On Sun, Mar 28, 2010 at 9:46 AM, Eric Yu  wrote:
>> >> >>
>> >> >> Hi Jonathan,
>> >> >>
>> >> >> I upgraded my jdk to latest version, and I am sure I start Cassandra
>> >> >> with
>> >> >> it (set JAVA_HOME in cassansra.in.sh).
>> >> >> But the exception still there, any idea?
>> >> >>
>> >> >> On Sun, Mar 28, 2010 at 12:02 AM, Jonathan Ellis 
>> >> >> wrote:
>> >> >>>
>> >> >>> This means you need to upgrade your jdk to build 18 or later
>> >> >>>
>> >> >>> On Sat, Mar 27, 2010 at 10:55 AM, Eric Yu  wrote:
>> >> >>> > Hi, list
>> >> >>> > I got this exception when insert into a cluster with 5 node, is
>> >> >>> > this
>> >> >>> > a
>> >> >>> > bug
>> >> >>> > or something else is wrong.
>> >> >>> >
>> >> >>> > here is the system log:
>> >> >>> >
>> >> >>> >  INFO [GMFD:1] 2010-03-27 23:15:16,145 Gossiper.java (line 543)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now UP
>> >> >>> > ERROR [Timer-1] 2010-03-27 23:23:27,739 TcpConnection.java (line
>> >> >>> > 308)
>> >> >>> > Closing down connection java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:58261 remote=/172.19.15.210:7000] with
>> >> >>> > 342218
>> >> >>> > writes
>> >> >>> > remaining.
>> >> >>> >  INFO [Timer-1] 2010-03-27 23:23:27,792 Gossiper.java (line 194)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now dead.
>> >> >>> >  INFO [GMFD:1] 2010-03-27 23:23:32,214 Gossiper.java (line 543)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now UP
>> >> >>> > ERROR [Timer-1] 2010-03-27 23:24:47,846 TcpConnection.java (line
>> >> >>> > 308)
>> >> >>> > Closing down connection java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:59801 remote=/172.19.15.210:7000] with
>> >> >>> > 256285
>> >> >>> > writes
>> >> >>> > remaining.
>> >> >>> >  INFO [Timer-1] 2010-03-27 23:24:47,846 Gossiper.java (line 194)
>> >> >>> > InetAddress
>> >> >>> > /172.19.15.210 is now dead.
>> >> >>> >  WARN [MESSAGING-SERVICE-POOL:1] 2010-03-27 23:25:05,580
>> >> >>> > TcpConnection.java
>> >> >>> > (line 484) Problem reading from socket connected to :
>> >> >>> > java.nio.channels.SocketChannel[connected
>> >> >>> > local=/172.19.15.209:7000
>> >> >>> > remote=/172.19.15.210:55473]
>> >> >>> >  INFO [GMFD:1] 2010-03-27 23:25:05,580 Gossiper.java (line 5

Re: multinode cluster wiki page

2010-04-03 Thread Benoit Perroud
Hi,

Nice work.
I guess just a small mistake :

the second 192.168.1.1 should be
192.168.2.34

And I would suggest to add a small part on making the thrift interface
listening on more than localhost.

Kind regards,

Benoit.

2010/4/3 Benjamin Black :
> Just added this to the wiki as it seemed a very frequent request on
> irc: http://wiki.apache.org/cassandra/MultinodeCluster
>
> Would very much appreciate feedback and edits to improve it.
>
>
> b
>


Re: Heap sudden jump during import

2010-04-03 Thread Benoit Perroud
Have a look at either Eclipse Memory Analyser (they have a standalone
version of the memory analyser) or YourKit Java Profiler (commercial,
but with evaluation license). I successfully load and browse heap
bigger than the available memory on the system.

Regards,

Benoit

2010/4/3 Weijun Li :
> Thank you Benoit. I did a search but couldn't find any that you mentioned.
> Both jhat and netbean load entire map file int memory. Do you know the name
> of the tools that requires less memory to view map file?
> Thanks,
> -Weijun
>
> On Sat, Apr 3, 2010 at 12:55 AM, Benoit Perroud  wrote:
>>
>> It exists other tools than jhat to browse a heap dump, which stream
>> the heap dump instead of loading it full in memory like jhat do.
>>
>> Kind regards,
>>
>> Benoit.
>>
>> 2010/4/3 Weijun Li :
>> > I'm running a test to write 30 million columns (700bytes each) to
>> > Cassandra:
>> > the process ran smoothly for about 20mil then the heap usage suddenly
>> > jumped
>> > from 2GB to 3GB which is the up limit of JVM, --from this point
>> > Cassandra
>> > will freeze for long time (terrible latency, no response to nodetool
>> > that I
>> > have to stop the import client ) before it comes back to normal . It's a
>> > single node cluster with JVM maximum heap size of 3GB. So what could
>> > cause
>> > this spike? What kind of tool can I use to find out what are the objects
>> > that are filling the additional 1GB heap? I did a heap dump but could
>> > get
>> > jhat to work to browse the dumped file.
>> >
>> > Thanks,
>> >
>> > -Weijun
>> >
>
>


Re: Heap sudden jump during import

2010-04-03 Thread Benoit Perroud
It exists other tools than jhat to browse a heap dump, which stream
the heap dump instead of loading it full in memory like jhat do.

Kind regards,

Benoit.

2010/4/3 Weijun Li :
> I'm running a test to write 30 million columns (700bytes each) to Cassandra:
> the process ran smoothly for about 20mil then the heap usage suddenly jumped
> from 2GB to 3GB which is the up limit of JVM, --from this point Cassandra
> will freeze for long time (terrible latency, no response to nodetool that I
> have to stop the import client ) before it comes back to normal . It's a
> single node cluster with JVM maximum heap size of 3GB. So what could cause
> this spike? What kind of tool can I use to find out what are the objects
> that are filling the additional 1GB heap? I did a heap dump but could get
> jhat to work to browse the dumped file.
>
> Thanks,
>
> -Weijun
>


Re: get_range_slice leads to java.lang.OutOfMemoryError?

2010-04-02 Thread Benoit Perroud
A way to read all the db without having an OOM is to limit the amount
of rows to be returned, and to iterate over the query, the starting
key being the last returned key. Note that doing that way the first
key of the next iteration is the same as the last key of the preivous
iteration.

The warning in SliceRange can also be applied for this function :

http://wiki.apache.org/cassandra/API
"How many columns to return. Similar to LIMIT 100 in SQL. May be
arbitrarily large, but Thrift will materialize the whole result into
memory before returning it to the client, so be aware that you may be
better served by iterating through slices by passing the last value of
one call in as the start of the next instead of increasing count
arbitrarily large."

Kind regards,

Benoit.

2010/4/2 Gautam Singaraju :
> I call the get_range_slice method in Java to get the list of all keys
> in Cassandra db. The db is pretty small, about 1.3GB on disk. I
> received the following error on the server:
>
> "java.lang.OutOfMemoryError: Requested array size exceeds VM limit"
>
> I changed the JVM size from 1 GB to 2 GB in
> cassandra/bin/cassandra.in.sh. But, the error still persists.
>
> Any help will be much appreciated.
> ---
> Gautam
>


Re: 0.5.1 exception: java.io.IOException: Reached an EOL or something bizzare occured

2010-03-28 Thread Benoit Perroud
I got the same error when the nodes are using lot of I/O, i.e during compaction.

2010/3/28 Eric Yu :
> I have not restart my nodes.
> OK, may be I should give 0.6 a try.
>
> On Sun, Mar 28, 2010 at 9:53 AM, Jonathan Ellis  wrote:
>>
>> It means that a MessagingService socket closed unexpectedly.  If
>> you're starting and restarting nodes that could cause it.
>>
>> This code is obsolete in 0.6 anyway.
>>
>> On Sat, Mar 27, 2010 at 8:51 PM, Eric Yu  wrote:
>> > And one more clue here, when ReplicateFactor is 1, it's OK, after
>> > changed to
>> > 2, the exception occurred.
>> >
>> > On Sun, Mar 28, 2010 at 9:46 AM, Eric Yu  wrote:
>> >>
>> >> Hi Jonathan,
>> >>
>> >> I upgraded my jdk to latest version, and I am sure I start Cassandra
>> >> with
>> >> it (set JAVA_HOME in cassansra.in.sh).
>> >> But the exception still there, any idea?
>> >>
>> >> On Sun, Mar 28, 2010 at 12:02 AM, Jonathan Ellis 
>> >> wrote:
>> >>>
>> >>> This means you need to upgrade your jdk to build 18 or later
>> >>>
>> >>> On Sat, Mar 27, 2010 at 10:55 AM, Eric Yu  wrote:
>> >>> > Hi, list
>> >>> > I got this exception when insert into a cluster with 5 node, is this
>> >>> > a
>> >>> > bug
>> >>> > or something else is wrong.
>> >>> >
>> >>> > here is the system log:
>> >>> >
>> >>> >  INFO [GMFD:1] 2010-03-27 23:15:16,145 Gossiper.java (line 543)
>> >>> > InetAddress
>> >>> > /172.19.15.210 is now UP
>> >>> > ERROR [Timer-1] 2010-03-27 23:23:27,739 TcpConnection.java (line
>> >>> > 308)
>> >>> > Closing down connection java.nio.channels.SocketChannel[connected
>> >>> > local=/172.19.15.209:58261 remote=/172.19.15.210:7000] with 342218
>> >>> > writes
>> >>> > remaining.
>> >>> >  INFO [Timer-1] 2010-03-27 23:23:27,792 Gossiper.java (line 194)
>> >>> > InetAddress
>> >>> > /172.19.15.210 is now dead.
>> >>> >  INFO [GMFD:1] 2010-03-27 23:23:32,214 Gossiper.java (line 543)
>> >>> > InetAddress
>> >>> > /172.19.15.210 is now UP
>> >>> > ERROR [Timer-1] 2010-03-27 23:24:47,846 TcpConnection.java (line
>> >>> > 308)
>> >>> > Closing down connection java.nio.channels.SocketChannel[connected
>> >>> > local=/172.19.15.209:59801 remote=/172.19.15.210:7000] with 256285
>> >>> > writes
>> >>> > remaining.
>> >>> >  INFO [Timer-1] 2010-03-27 23:24:47,846 Gossiper.java (line 194)
>> >>> > InetAddress
>> >>> > /172.19.15.210 is now dead.
>> >>> >  WARN [MESSAGING-SERVICE-POOL:1] 2010-03-27 23:25:05,580
>> >>> > TcpConnection.java
>> >>> > (line 484) Problem reading from socket connected to :
>> >>> > java.nio.channels.SocketChannel[connected local=/172.19.15.209:7000
>> >>> > remote=/172.19.15.210:55473]
>> >>> >  INFO [GMFD:1] 2010-03-27 23:25:05,580 Gossiper.java (line 543)
>> >>> > InetAddress
>> >>> > /172.19.15.210 is now UP
>> >>> >  WARN [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580
>> >>> > TcpConnection.java
>> >>> > (line 484) Problem reading from socket connected to :
>> >>> > java.nio.channels.SocketChannel[connected local=/172.19.15.209:7000
>> >>> > remote=/172.19.15.210:45504]
>> >>> >  WARN [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580
>> >>> > TcpConnection.java
>> >>> > (line 485) Exception was generated at : 03/27/2010 23:25:05 on
>> >>> > thread
>> >>> > MESSAGING-SERVICE-POOL:2
>> >>> > Reached an EOL or something bizzare occured. Reading from:
>> >>> > /172.19.15.210
>> >>> > BufferSizeRemaining: 16
>> >>> > java.io.IOException: Reached an EOL or something bizzare occured.
>> >>> > Reading
>> >>> > from: /172.19.15.210 BufferSizeRemaining: 16
>> >>> >     at
>> >>> > org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>> >>> >     at
>> >>> >
>> >>> > org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>> >>> >     at
>> >>> > org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>> >>> >     at
>> >>> >
>> >>> >
>> >>> > org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>> >>> >     at
>> >>> >
>> >>> >
>> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >>> >     at
>> >>> >
>> >>> >
>> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >>> >     at java.lang.Thread.run(Thread.java:636)
>> >>> >
>> >>> >  INFO [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580
>> >>> > TcpConnection.java
>> >>> > (line 315) Closing errored connection
>> >>> > java.nio.channels.SocketChannel[connected local=/172.19.15.209:7000
>> >>> > remote=/172.19.15.210:45504]
>> >>> >  WARN [MESSAGING-SERVICE-POOL:1] 2010-03-27 23:25:05,632
>> >>> > TcpConnection.java
>> >>> > (line 485) Exception was generated at : 03/27/2010 23:25:05 on
>> >>> > thread
>> >>> > MESSAGING-SERVICE-POOL:1
>> >>> >
>> >>
>> >
>> >
>
>


Re: Nodes Timing Out

2010-03-28 Thread Benoit Perroud
ulimit -n returns you unlimited ?


2010/3/28 James Golick :
> unlimited
>
> On Sat, Mar 27, 2010 at 12:09 PM, Chris Goffinet  wrote:
>>
>> what's the ulimit set to?
>> -Chris
>> On Mar 27, 2010, at 10:29 AM, James Golick wrote:
>>
>> Hey,
>> I put our first cluster in to production (writing but not reading) a
>> couple of days ago. Right now, it's got two pretty sizeable nodes taking
>> about 200 writes per second each and virtually no reads.
>> Eventually, though, (and this has happened twice), both nodes seem to
>> start timing out. If I run nodetool cfstats, I get:
>> [ja...@cassandra1 ~]# /opt/cassandra/bin/nodetool -h
>> cassandra1.fetlife.com cfstats
>> Keyspace: system
>>         Read Count: 39
>>         Read Latency: 0.35925641025641025 ms.
>>         Write Count: 3
>>         Write Latency: 0.166 ms.
>>         Pending Tasks: 66
>>                 Column Family: HintsColumnFamily
>>                 SSTable count: 0
>>                 Space used (live): 0
>>                 Space used (total): 0
>> and then it just hangs there.
>> Any ideas?
>> - James
>
>