Map Reduce and Cassandra with Trigger patch
I'm having some problems during running a Map Reduce program using Cassandra as input. I already right some MapRed programs using the cassandra 1.0.9, but now I'm trying with an old version with a patch that supports trigger. (this one: https://issues.apache.org/jira/browse/CASSANDRA-1311) When I try to run, it throws the following error: 12/11/26 16:59:06 ERROR config.DatabaseDescriptor: Fatal error: Cannot locate cassandra.yaml on the classpath I had already this problem and the solution was just add the path of cassandra.yaml to a system property, but now it's not working. I also saw somewhere that one solution would be adding the line set CLASSPATH="%CASSANDRA_HOME%\conf" into /bin/cassandra-cli.bat but it also didn't work. If someone has some idea of what to do I will be really thankful. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* *"Most people do not listen with the intent to understand; they listen with the intend to reply" - Stephen Covey*
Problem getting Column Family type
Hello! I'm trying to get the column family type and I don't know why it's not working. I try to use the following code line: Schema.instance.getColumnFamilyType("School", "Grades"); This Keyspace and Column Family are inserted in Cassandra and this method only returns null. If I try to get some information of the Keyspace 'system' it works. I also tried Schema.instance.getNonSystemTables(); and it returns null. I'm using the commit log for some analyzes and the way it get's the Column Family information using the internal CfId is this one: Schema.instance.getCF(cfId); That's why I think I need to use this instance of the Schema to retrieve the desired information. Someone has some idea? Thanks in advance. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* *"Most people do not listen with the intent to understand; they listen with the intend to reply" - Stephen Covey*
Re: Getting serialized Rows from CommitLogSegment file
Wow, fast! Thank you very much Aaron. And about this Schema.getTables thing is there any idea? Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* 2012/10/3 aaron morton > > Do you know where (which e-mail thread) was it discussed? I would like to > know a little further about it. > > http://www.mail-archive.com/user@cassandra.apache.org/msg25033.html > > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 3/10/2012, at 8:24 PM, Felipe Schmidt wrote: > > Hey Ben. Yes, that's what I'm doing. > > Do you know where (which e-mail thread) was it discussed? I would like to > know a little further about it. > Anyway, if I try to use the method Schema.instance.getTables, that returns > a list of all registered tables (keyspaces) into the system, it returns > null. > Do you have some idea about why is it happening? > > Thank you very much. > > Regards, > Felipe Mathias Schmidt > *(Computer Science UFRGS, RS, Brazil)* > > > > > > 2012/10/2 Ben Hood <0x6e6...@gmail.com> > >> Filipe, >> >> On Tue, Oct 2, 2012 at 2:56 PM, Felipe Schmidt >> wrote: >> > Seems like the information was dropped or, maybe, not existent in this >> > instance of the Schema. But, as soon as I know, it's just one instance >> of >> > the schema in Cassandra, right? >> >> If I understand you correctly, you are trying to process the commit >> log to get a change list? >> >> If so, then this question has been asked and the general consensus is >> that whilst being possible, the commit log is an internal apparatus >> subject to change that is not guaranteed to give you the information >> you think you should get. Other suggested approaches include producing >> your event stream of mutations using AOP or multiplexing change events >> on the app layer as they go into Cassandra. >> >> HTH, >> >> Ben >> > > >
Re: Getting serialized Rows from CommitLogSegment file
Hey Ben. Yes, that's what I'm doing. Do you know where (which e-mail thread) was it discussed? I would like to know a little further about it. Anyway, if I try to use the method Schema.instance.getTables, that returns a list of all registered tables (keyspaces) into the system, it returns null. Do you have some idea about why is it happening? Thank you very much. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* 2012/10/2 Ben Hood <0x6e6...@gmail.com> > Filipe, > > On Tue, Oct 2, 2012 at 2:56 PM, Felipe Schmidt > wrote: > > Seems like the information was dropped or, maybe, not existent in this > > instance of the Schema. But, as soon as I know, it's just one instance of > > the schema in Cassandra, right? > > If I understand you correctly, you are trying to process the commit > log to get a change list? > > If so, then this question has been asked and the general consensus is > that whilst being possible, the commit log is an internal apparatus > subject to change that is not guaranteed to give you the information > you think you should get. Other suggested approaches include producing > your event stream of mutations using AOP or multiplexing change events > on the app layer as they go into Cassandra. > > HTH, > > Ben >
Re: Getting serialized Rows from CommitLogSegment file
I found a way how to do it, but now I have other issue. I'm getting a problem when trying to get the ColumnFamily using the CfId. This information is important to deserialize the stored ColumnFamily. When I try to use the method Schema.instance.getCF(cfId) to take the Pair it throws an 'UnknownColumnFamilyException'. Seems like the information was dropped or, maybe, not existent in this instance of the Schema. But, as soon as I know, it's just one instance of the schema in Cassandra, right? If I'm mistaking/misunderstanding some structural concept of the Cassandra here, or if it has other way to take this information, please let me know. Thanks in advance. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* 2012/10/1 Felipe Schmidt > Hello. > > I'm trying to catch the serialized RowMutations from a CommitLogSegment to > capture the data change, but I don't have much idea about how to proceed. > Some one know a way of how to do it? I supposed that it would be kind of > simple. > > Regards, > Felipe Mathias Schmidt > *(Computer Science UFRGS, RS, Brazil)* > > > >
Getting serialized Rows from CommitLogSegment file
Hello. I'm trying to catch the serialized RowMutations from a CommitLogSegment to capture the data change, but I don't have much idea about how to proceed. Some one know a way of how to do it? I supposed that it would be kind of simple. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)*
Extract data from cassandra log
Is there any way to extract data information from the cassandra log (commit-log) using a java program? I tried to open this file using gedit or tail to look the 'structure' of the cassandra loggin, but it doesn't works well. Thanks in advance. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)*
Trigger and customized filter
Does anyone know something about the following questions? 1. Does Cassandra support customized filter? customized filter means programmer can define his desired filter to select the data. 2. Does Cassandra support trigger? trigger has the same meaning as in RDBMS. Thanks in advance. Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)*
Re: Retrieving old data version for a given row
*I was taking a look at tombstones stored at SSTable and I noticed that if I perform a key deletion, the tombstone doesn’t have any timestamp, he has this appearance: “key”:[ ] In all the other deletions granularities the tombstone have a timestamp.Without this information seems to be not possible to solve conflicts when a insertion for the same key is done after this deletion. If it happens, I think Cassandra will always delete this new information because of this tombstone. I’m using a single node configuration and maybe it change how does tombstones looks like. Thanks in advance.* * * Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* 2012/5/31 aaron morton > -Is there any other way to stract the contect of SSTable, writing a > java program for example instead of using sstable2json? > > Look at the code in sstale2json and copy it :) > > -I tried to get tombstons using the thrift API, but seems to be not > possible, is it right? When I try, the program throws an exception. > > No. > Tombstones are not returned from API (See > ColumnFamilyStore.getColumnFamily() ). > You can see them if you use sstable2json. > > Cheers > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 30/05/2012, at 9:53 PM, Felipe Schmidt wrote: > > I have further questions: > -Is there any other way to stract the contect of SSTable, writing a > java program for example instead of using sstable2json? > -I tried to get tombstons using the thrift API, but seems to be not > possible, is it right? When I try, the program throws an exception. > > thanks in advance > > Regards, > Felipe Mathias Schmidt > (Computer Science UFRGS, RS, Brazil) > > > > > 2012/5/24 aaron morton : > > Ok... it's really strange to me that Cassandra doesn't support data > > versioning cause all of other key-value databases support it (at least > > those who I know). > > > You can design it into your data model if you need it. > > > > I have one remaining question: > > -in the case that I have more than 1 SSTable in the disk for the same > > column but with different data versions, is it possible to make a > > > query to get the old version instead of the newest one? > > > No. > > There is only ever 1 value for a column. > > The "older" copies of the column in the SSTables are artefacts of immutable > > on disk structures. > > If you want to see what's inside an SSTable use bin/sstable2json > > > Cheers > > > - > > Aaron Morton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote: > > > Ok... it's really strange to me that Cassandra doesn't support data > > versioning cause all of other key-value databases support it (at least > > those who I know). > > > I have one remaining question: > > -in the case that I have more than 1 SSTable in the disk for the same > > column but with different data versions, is it possible to make a > > query to get the old version instead of the newest one? > > > Regards, > > Felipe Mathias Schmidt > > (Computer Science UFRGS, RS, Brazil) > > > > > > 2012/5/16 Dave Brosius : > > > You're in for a world of hurt going down that rabbit hole. If you truely > > > want version data then you should think about changing your keying to > > > perhaps be a composite key where key is of form > > > > NaturalKey/VersionId > > > > Or if you want the versioning at the column level, use composite columns > > > with ColumnName/VersionId format > > > > > > > On 05/16/2012 10:16 AM, Felipe Schmidt wrote: > > > > That was very helpfull, thank you very much! > > > > I still have some questions: > > > -it is possible to make Cassandra keep old value data after flushing? > > > The same question for the memTable, before flushing. Seems to me that > > > when I update some tuple, the old data will be overwrited in memTable, > > > even before flushing. > > > -it is possible to scan values from the memtable, maybe using the > > > so-called Thrift API? Using the client-api I can just see the newest > > > data version, I can't see what's really happening with the memTable. > > > > I ask that cause what I'll try to do is a Change Data Capture to > > > Cassandra and the answers will define what kind of aproaches I'm able > > > to use. > > > > Thanks in advance. > >
Re: Retrieving old data version for a given row
I have further questions: -Is there any other way to stract the contect of SSTable, writing a java program for example instead of using sstable2json? -I tried to get tombstons using the thrift API, but seems to be not possible, is it right? When I try, the program throws an exception. thanks in advance Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/24 aaron morton : > Ok... it's really strange to me that Cassandra doesn't support data > versioning cause all of other key-value databases support it (at least > those who I know). > > You can design it into your data model if you need it. > > > I have one remaining question: > -in the case that I have more than 1 SSTable in the disk for the same > column but with different data versions, is it possible to make a > > query to get the old version instead of the newest one? > > No. > There is only ever 1 value for a column. > The "older" copies of the column in the SSTables are artefacts of immutable > on disk structures. > If you want to see what's inside an SSTable use bin/sstable2json > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote: > > Ok... it's really strange to me that Cassandra doesn't support data > versioning cause all of other key-value databases support it (at least > those who I know). > > I have one remaining question: > -in the case that I have more than 1 SSTable in the disk for the same > column but with different data versions, is it possible to make a > query to get the old version instead of the newest one? > > Regards, > Felipe Mathias Schmidt > (Computer Science UFRGS, RS, Brazil) > > > > > 2012/5/16 Dave Brosius : > > You're in for a world of hurt going down that rabbit hole. If you truely > > want version data then you should think about changing your keying to > > perhaps be a composite key where key is of form > > > NaturalKey/VersionId > > > Or if you want the versioning at the column level, use composite columns > > with ColumnName/VersionId format > > > > > > On 05/16/2012 10:16 AM, Felipe Schmidt wrote: > > > That was very helpfull, thank you very much! > > > I still have some questions: > > -it is possible to make Cassandra keep old value data after flushing? > > The same question for the memTable, before flushing. Seems to me that > > when I update some tuple, the old data will be overwrited in memTable, > > even before flushing. > > -it is possible to scan values from the memtable, maybe using the > > so-called Thrift API? Using the client-api I can just see the newest > > data version, I can't see what's really happening with the memTable. > > > I ask that cause what I'll try to do is a Change Data Capture to > > Cassandra and the answers will define what kind of aproaches I'm able > > to use. > > > Thanks in advance. > > > Regards, > > Felipe Mathias Schmidt > > (Computer Science UFRGS, RS, Brazil) > > > > 2012/5/14 aaron morton: > > > Cassandra does not provide access to multiple versions of the same > > column. > > It is essentially implementation detail. > > > All mutations are written to the commit log in a binary format, see the > > o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for > > analysis you may want to change commitlog_sync in cassandra.yaml) > > > Here is post about looking at multiple versions columns in an > > sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ > > > Remember that not all "versions" of a column are written to disk > > (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/). > > Also > > compaction will compress multiple versions of the same column from > > multiple > > files into a single version in a single file . > > > Hope that helps. > > > > - > > Aaron Morton > > Freelance Developer > > @aaronmorton > > http://www.thelastpickle.com > > > On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote: > > > Yes, I need this information just for academic purposes. > > > So, to read old data values, I tried to open the Commitlog using tail > > -f and also the log files viewer of Ubuntu, but I can not see many > > informations inside of the log! > > Is there any other way to open this log? I didn't find any Cassandra > > API for this purpose. > > > Thanks averybody in advance. > > > Regards, >
Re: Retrieving old data version for a given row
Ok... it's really strange to me that Cassandra doesn't support data versioning cause all of other key-value databases support it (at least those who I know). I have one remaining question: -in the case that I have more than 1 SSTable in the disk for the same column but with different data versions, is it possible to make a query to get the old version instead of the newest one? Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/16 Dave Brosius : > You're in for a world of hurt going down that rabbit hole. If you truely > want version data then you should think about changing your keying to > perhaps be a composite key where key is of form > > NaturalKey/VersionId > > Or if you want the versioning at the column level, use composite columns > with ColumnName/VersionId format > > > > > On 05/16/2012 10:16 AM, Felipe Schmidt wrote: >> >> That was very helpfull, thank you very much! >> >> I still have some questions: >> -it is possible to make Cassandra keep old value data after flushing? >> The same question for the memTable, before flushing. Seems to me that >> when I update some tuple, the old data will be overwrited in memTable, >> even before flushing. >> -it is possible to scan values from the memtable, maybe using the >> so-called Thrift API? Using the client-api I can just see the newest >> data version, I can't see what's really happening with the memTable. >> >> I ask that cause what I'll try to do is a Change Data Capture to >> Cassandra and the answers will define what kind of aproaches I'm able >> to use. >> >> Thanks in advance. >> >> Regards, >> Felipe Mathias Schmidt >> (Computer Science UFRGS, RS, Brazil) >> >> >> 2012/5/14 aaron morton: >>> >>> Cassandra does not provide access to multiple versions of the same >>> column. >>> It is essentially implementation detail. >>> >>> All mutations are written to the commit log in a binary format, see the >>> o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for >>> analysis you may want to change commitlog_sync in cassandra.yaml) >>> >>> Here is post about looking at multiple versions columns in an >>> sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ >>> >>> Remember that not all "versions" of a column are written to disk >>> (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/). >>> Also >>> compaction will compress multiple versions of the same column from >>> multiple >>> files into a single version in a single file . >>> >>> Hope that helps. >>> >>> >>> - >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote: >>> >>> Yes, I need this information just for academic purposes. >>> >>> So, to read old data values, I tried to open the Commitlog using tail >>> -f and also the log files viewer of Ubuntu, but I can not see many >>> informations inside of the log! >>> Is there any other way to open this log? I didn't find any Cassandra >>> API for this purpose. >>> >>> Thanks averybody in advance. >>> >>> Regards, >>> Felipe Mathias Schmidt >>> (Computer Science UFRGS, RS, Brazil) >>> >>> >>> >>> >>> 2012/5/14 zhangcheng2: >>> >>> After compaciton, the old version data will gone! >>> >>> >>> >>> >>> zhangcheng2 >>> >>> >>> From: Felipe Schmidt >>> >>> Date: 2012-05-14 05:33 >>> >>> To: user >>> >>> Subject: Retrieving old data version for a given row >>> >>> I'm trying to retrieve old data version for some row but it seems not >>> >>> be possible. I'm a beginner with Cassandra and the unique aproach I >>> >>> know is looking to the SSTable in the storage folder, but if I insert >>> >>> some column and right after insert another value to the same row, >>> >>> after flushing, I only get the last value. >>> >>> Is there any way to get the old data version? Obviously, before >>> compaction. >>> >>> >>> Regards, >>> >>> Felipe Mathias Schmidt >>> >>> (Computer Science UFRGS, RS, Brazil) >>> >>> >>> >
Data Versioning Support
Doe's Cassandra support data versioning? I'm trying to find it in many places but I'm not quite sure about it. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil)
Re: Retrieving old data version for a given row
That was very helpfull, thank you very much! I still have some questions: -it is possible to make Cassandra keep old value data after flushing? The same question for the memTable, before flushing. Seems to me that when I update some tuple, the old data will be overwrited in memTable, even before flushing. -it is possible to scan values from the memtable, maybe using the so-called Thrift API? Using the client-api I can just see the newest data version, I can't see what's really happening with the memTable. I ask that cause what I'll try to do is a Change Data Capture to Cassandra and the answers will define what kind of aproaches I'm able to use. Thanks in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/14 aaron morton : > Cassandra does not provide access to multiple versions of the same column. > It is essentially implementation detail. > > All mutations are written to the commit log in a binary format, see the > o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for > analysis you may want to change commitlog_sync in cassandra.yaml) > > Here is post about looking at multiple versions columns in an > sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ > > Remember that not all "versions" of a column are written to disk > (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/). Also > compaction will compress multiple versions of the same column from multiple > files into a single version in a single file . > > Hope that helps. > > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 14/05/2012, at 9:50 PM, Felipe Schmidt wrote: > > Yes, I need this information just for academic purposes. > > So, to read old data values, I tried to open the Commitlog using tail > -f and also the log files viewer of Ubuntu, but I can not see many > informations inside of the log! > Is there any other way to open this log? I didn't find any Cassandra > API for this purpose. > > Thanks averybody in advance. > > Regards, > Felipe Mathias Schmidt > (Computer Science UFRGS, RS, Brazil) > > > > > 2012/5/14 zhangcheng2 : > > After compaciton, the old version data will gone! > > > > > zhangcheng2 > > > From: Felipe Schmidt > > Date: 2012-05-14 05:33 > > To: user > > Subject: Retrieving old data version for a given row > > I'm trying to retrieve old data version for some row but it seems not > > be possible. I'm a beginner with Cassandra and the unique aproach I > > know is looking to the SSTable in the storage folder, but if I insert > > some column and right after insert another value to the same row, > > after flushing, I only get the last value. > > Is there any way to get the old data version? Obviously, before compaction. > > > Regards, > > Felipe Mathias Schmidt > > (Computer Science UFRGS, RS, Brazil) > > >
Re: Retrieving old data version for a given row
Yes, I need this information just for academic purposes. So, to read old data values, I tried to open the Commitlog using tail -f and also the log files viewer of Ubuntu, but I can not see many informations inside of the log! Is there any other way to open this log? I didn't find any Cassandra API for this purpose. Thanks averybody in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/14 zhangcheng2 : > After compaciton, the old version data will gone! > > > zhangcheng2 > > From: Felipe Schmidt > Date: 2012-05-14 05:33 > To: user > Subject: Retrieving old data version for a given row > I'm trying to retrieve old data version for some row but it seems not > be possible. I'm a beginner with Cassandra and the unique aproach I > know is looking to the SSTable in the storage folder, but if I insert > some column and right after insert another value to the same row, > after flushing, I only get the last value. > Is there any way to get the old data version? Obviously, before compaction. > > Regards, > Felipe Mathias Schmidt > (Computer Science UFRGS, RS, Brazil) >
Retrieving old data version for a given row
I'm trying to retrieve old data version for some row but it seems not be possible. I'm a beginner with Cassandra and the unique aproach I know is looking to the SSTable in the storage folder, but if I insert some column and right after insert another value to the same row, after flushing, I only get the last value. Is there any way to get the old data version? Obviously, before compaction. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil)