Re: Slow repair

2017-03-16 Thread siddharth verma
s://twitter.com/instaclustr>
><https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>



-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: Count(*) is not working

2017-02-17 Thread siddharth verma
Hi,
We faced this issue too.
You could try with reduced paging size, so that tombstone threshold isn't
breached.

try using "paging 500" in cqlsh
[ https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshPaging.html ]

Similarly paging size could be set in java driver as well

This is a work around.
For this warning, do review your data model once.

Regards


On Fri, Feb 17, 2017 at 4:36 PM, Sylvain Lebresne 
wrote:

> On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves 
> wrote:
>
>> if you want a reliable count, you should use spark. performing a count
>> (*) will inevitably fail unless you make your server read timeouts and
>> tombstone fail thresholds ridiculous
>>
>
> That's just not true. count(*) is paged internally so while it is not
> particular fast, it shouldn't require bumping neither the read timeout nor
> the tombstone fail threshold in any way to work.
>
> In that case, it seems the partition does have many tombstones (more than
> live rows) and so the tombstone threshold is doing its job of warning about
> it.
>
>
>>
>> On 17 Feb. 2017 04:34, "Jan"  wrote:
>>
>>> Hi,
>>>
>>> could you post the output of nodetool cfstats for the table?
>>>
>>> Cheers,
>>>
>>> Jan
>>>
>>> Am 16.02.2017 um 17:00 schrieb Selvam Raman:
>>>
>>> I am not getting count as result. Where i keep on getting n number of
>>> results below.
>>>
>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>> LIMIT 100 (see tombstone_warn_threshold)
>>>
>>> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten  wrote:
>>>
>>>> Hi,
>>>>
>>>> do you got a result finally?
>>>>
>>>> Those messages are simply warnings telling you that c* had to read many
>>>> tombstones while processing your query - rows that are deleted but not
>>>> garbage collected/compacted. This warning gives you some explanation why
>>>> things might be much slower than expected because per 100 rows that count
>>>> c* had to read about 15 times rows that were deleted already.
>>>>
>>>> Apart from that, count(*) is almost always slow - and there is a
>>>> default limit of 10.000 rows in a result.
>>>>
>>>> Do you really need the actual live count? To get a idea you can always
>>>> look at nodetool cfstats (but those numbers also contain deleted rows).
>>>>
>>>>
>>>> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>>>>
>>>> Hi,
>>>>
>>>> I want to know the total records count in table.
>>>>
>>>> I fired the below query:
>>>>select count(*) from tablename;
>>>>
>>>> and i have got the below output
>>>>
>>>> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
>>>> keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052)
>>>> LIMIT 100 (see tombstone_warn_threshold)
>>>>
>>>> Read 100 live rows and 1435 tombstone cells for query SELECT * FROM
>>>> keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see
>>>> tombstone_warn_threshold)
>>>>
>>>> Read 96 live rows and 1385 tombstone cells for query SELECT * FROM
>>>> keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see
>>>> tombstone_warn_threshold).
>>>>
>>>>
>>>>
>>>>
>>>> Can you please help me to get the total count of the table.
>>>>
>>>> --
>>>> Selvam Raman
>>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>>
>>>>
>>>
>>>
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>
>>>
>>>
>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: [External] Re: Cassandra ad hoc search options

2017-01-30 Thread siddharth verma
Hi,
*Are you using the DataStax connector as well? *
Yes, we used it to query on lucene index.

*Does it support querying against any column well (not just clustering
columns)?*
Yes it does. We used lucene particularly for this purpose.
( You can use :
1.
https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.10/doc/documentation.rst#searching
2. https://www.youtube.com/watch?v=Hg5s-hXy_-M
for more details)

*I’m wondering how it could build the index around them “on-the-fly”*
You can build indexes at run time, but it takes time(took a lot of time on
our cluster. Plus, CPU utilization went through the roof)

*did you use Spark for the full set of data or just partial*
We weren't allowed to install spark ( tech decision)
Some tech discussions going around for the bulk job ecosystem.

Hence as a work around, we used a faster scan utility.
For all the adhoc purposes/scripts, you could do a full scan.

I hope it helps.

Regards


On Tue, Jan 31, 2017 at 4:11 AM, Yu, John  wrote:

> A follow up question is: did you use Spark for the full set of data or
> just partial? In our case, I feel we need all the data to support ad hoc
> queries (with multiple conditional filters).
>
>
>
> Thanks,
>
> John
>
>
>
> *From:* Yu, John [mailto:john...@sandc.com]
> *Sent:* Monday, January 30, 2017 12:04 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: [External] Re: Cassandra ad hoc search options
>
>
>
> Thanks for the input! Are you using the DataStax connector as well? Does
> it support querying against any column well (not just clustering columns)?
> I’m wondering how it could build the index around them “on-the-fly”.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* siddharth verma [mailto:sidd.verma29.l...@gmail.com
> ]
> *Sent:* Friday, January 27, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [External] Re: Cassandra ad hoc search options
>
>
>
> Hi
>
> We used lucene stratio plugin with C*3.0.3
>
>
>
> Helped to solve a lot of some read patterns. Served well for prefix.
>
> But created problems as repairs failed repeatedly.
>
> We might have used it sub optimally, not sure.
>
>
>
> Later, we had to do away with it, and tried to serve most of the read
> patterns with materialised views. (currently C*3.0.9)
>
>
>
> Currently, for adhoc querries, we use spark or full scan.
>
>
>
> Regards,
>
>
>
> On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:
>
> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution or use of the information contained in or attached to this
> message is strictly prohibited. If you have received this email
> transmission in error, please notify the sender by replying to this email
> and then delete it from your system.
>
>
>
>
>
> --
>
> Siddharth Verma
>
> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
> table scan)
>



-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: [External] Re: Cassandra ad hoc search options

2017-01-27 Thread siddharth verma
Hi
We used lucene stratio plugin with C*3.0.3

Helped to solve a lot of some read patterns. Served well for prefix.
But created problems as repairs failed repeatedly.
We might have used it sub optimally, not sure.

Later, we had to do away with it, and tried to serve most of the read
patterns with materialised views. (currently C*3.0.9)

Currently, for adhoc querries, we use spark or full scan.

Regards,

On Fri, Jan 27, 2017 at 1:03 PM, Yu, John  wrote:

> Thanks a lot. Mind sharing a couple of points where you feel it’s better
> than the alternatives.
>
>
>
> Regards,
>
> John
>
>
>
> *From:* Jonathan Haddad [mailto:j...@jonhaddad.com]
> *Sent:* Thursday, January 26, 2017 2:33 PM
> *To:* user@cassandra.apache.org
> *Subject:* [External] Re: Cassandra ad hoc search options
>
>
>
> > With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS?
>
>
>
> Your best options are Spark w/ the DataStax connector or Presto.
> Cassandra isn't built for ad-hoc queries so you need to use other tools to
> make it work.
>
>
>
> On Thu, Jan 26, 2017 at 2:22 PM Yu, John  wrote:
>
> Hi All,
>
>
>
> Hope I can get some help here. We’re using Cassandra for services, and
> recently we’re adding UI support.
>
> With Cassandra, what are the options for ad hoc query/search similar to
> RDBMS? We love the features of Cassandra but it seems it’s a known
> “weakness” that it doesn’t come with strong support of indexing and ad hoc
> queries. There’re some recent development with SASI as part of secondary
> index. However I heard from a video where it says it shall not be
> extensively used.
>
>
>
> Has anyone have much experience with SASI? How does it compare to Lucene
> plugin?
>
> What is the direction of Apache Cassandra in the search area?
>
>
>
> We’re also looking into Solr or ElasticSearch integration, but it seems it
> might take more efforts, and possibly involve data duplication.
>
> For Solr, we don’t have DSE.
>
> Sorry if this has been asked before, but I haven’t seen a more complete
> answer.
>
>
>
> Thanks!
>
> John
> --
>
> NOTICE OF CONFIDENTIALITY:
> This message may contain information that is considered confidential and
> which may be prohibited from disclosure under applicable law or by
> contractual agreement. The information is intended solely for the use of
> the individual or entity named above. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution or use of the information contained in or attached to this
> message is strictly prohibited. If you have received this email
> transmission in error, please notify the sender by replying to this email
> and then delete it from your system.
>
>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: parallel processing - splitting data

2017-01-19 Thread siddharth verma
Hi Frank,
You could try this
https://github.com/siddv29/cfs

I have processed 1.2 billion rows in 480 seconds with just 20 threads on
client side.
C* 3.0.9
Nodes = 6
RF = 3

Have a go at it. You might be surprised.

Regards,


On Thu, Jan 19, 2017 at 5:35 PM, Frank Hughes 
wrote:

> Hello there,
>
> I'm running a 4 node cluster of Cassandra 3.9 with a replication factor of
> 4.
>
> I want to be able to run a java process on each node only selecting a 25%
> of the data on each node,
> so i can process all of the data in parallel on each node.
>
> What is the best way to do this with the java driver ?
>
> I was assuming I could retrieve the token ranges for each node and page
> through the data using these ranges, but this includes the replicated data.
> I was hoping there was away of only selecting the data that a node is
> responsible for and avoiding the replicated data.
>
> Many thanks for any help and guidance,
>
> Frank Hughes
>



-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread siddharth verma
d be easily able to handle tens of thousands of
> writes / s
>
>
>
> 2016-11-23 8:02 GMT+01:00 Jonathan Haddad :
>
> How are you benchmarking that?
>
> On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari <
> abhishek.maheshw...@timesinternet.in> wrote:
>
> Hi,
>
>
>
> I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and 40
> Cores and 8 SSD. Currently I have below config in Cassandra.yaml:
>
>
>
> concurrent_reads: 32
>
> concurrent_writes: 64
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 32
>
> concurrent_compactors: 8
>
>
>
> With this configuration, I can write 1700 Request/Sec per server.
>
>
>
> But our desired write performance is 3000-4000 Request/Sec per server. As
> per my Understanding Max value for these parameters can be as below:
>
> concurrent_reads: 32
>
> concurrent_writes: 128(8*16 Corew)
>
> concurrent_counter_writes: 32
>
> compaction_throughput_mb_per_sec: 128
>
> concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for
> this)
>
>
>
> Please let me know this is fine or I need to tune some other parameters
> for speedup write.
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> Education gets Exciting with IIM Kozhikode Executive Post Graduate
> Programme in Management - 2 years (AMBA accredited with full benefits of
> IIMK Alumni status). Brought to you by IIMK in association with TSW, an
> Executive Education initiative from The Times of India Group. Learn more:
> www.timestsw.com
>
>
>
>
>
>
>
>
>
> --
>
>
>
> Benjamin Roth
>
> Prokurist
>
>
>
> Jaumo GmbH · www.jaumo.com
>
> Wehrstraße 46 · 73035 Göppingen · Germany
>
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>
>
>
>
>
>
>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: ITrigger - Help

2016-11-11 Thread siddharth verma
I haven't tried CDC either.
Read about it at an abstract level.
Suggested it as an option for exploration.

We too use trigger on production to indicate which primary key has been
acted(update/insert/delete) upon.

Regards

On Sat, Nov 12, 2016 at 12:08 AM, Jonathan Haddad  wrote:

> Using CDC is going to be... difficult.  First off (to my knowledge) all
> you get is a CommitLogReader.  If you take a look at the Mutation class
> (everything is serialized and deserialized there), there's no user
> reference.  You only get a keyspace, key, and a PartitionUpdate, which
> don't include any user information.
>
> Next, you may need to dedupe your messages, since you will get RF messages
> for every mutation.  CDC is per-node, vs triggers which are executed at the
> coordinator level.  This may not apply to you as you only want queries that
> came through cqlsh, but I don't see a reasonable way to differentiate all
> the mutations anyway so I think this is a bust.
>
> I haven't spent a lot of time in this code, happy to be corrected if I'm
> wrong.
>
> Jon
>
>
> On Fri, Nov 11, 2016 at 10:14 AM siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi Sathish,
>> You could look into, Change Data Capture (CDC) (
>> https://issues.apache.org/jira/browse/CASSANDRA-8844 .
>> It might help you for some of your requirements.
>>
>> Regards
>> Siddharth Verma
>>
>> On Fri, Nov 11, 2016 at 11:34 PM, Jonathan Haddad 
>> wrote:
>>
>> cqlsh uses the Python driver, I don't see how there would be any way to
>> differentiate where the request came from unless you stuck an extra field
>> in the table that you always write when you're not in cqlsh, or you
>> modified cqlsh to include that field whenever it did an insert.
>>
>> Checking iTrigger source, all you get is a reference to the ColumnFamily
>> and some metadata.  At a glance of trunk, it doesn't look like you get the
>> user that initiated the query.
>>
>> To be honest, I wouldn't do any of this, it feels like it's going to
>> become an error prone mess.  Your best bet is to layer something on top of
>> the driver yourself.  The cleanest way I think think of, long term, is to
>> submit a JIRA / patch to enable some class loading & listener hooks in
>> cqlsh itself.  Without a patch and a really good use case I don't know who
>> would want to maintain that though, as it would lock the team into using
>> Python for cqlsh.
>>
>> Jon
>>
>> On Fri, Nov 11, 2016 at 9:52 AM sat  wrote:
>>
>> Hi,
>>
>> We are planning to use ITrigger to notify changes, when we execute
>> scripts or run commands in cqlsh prompt. If the operation is performed
>> through our application CRUD API, we are planning to handle notification in
>> our CRUD API itself, however if user performs some operation(like write
>> operation in cqlsh prompt) we want to handle those changes and update
>> modules that are listening to those changes.
>>
>> Could you please let us know whether it is possible to differentiate
>> updates done through cqlsh prompt and through application.
>>
>> We also thought about creating multiple users in cassandra and using
>> different user for cqlsh and for the application. If we go with this
>> approach, do we get the user who modified the table in ITrigger
>> implementation (ie., augment method)
>>
>>
>> Basically we are trying to limit/restrict usage of ITrigger just for
>> cqlsh prompt as it is little complex and risky (came to know it will impact
>> cassandra running in that node).
>>
>> Thanks and Regards
>> A.SathishKumar
>>
>>
>>
>>
>> --
>> Siddharth Verma
>> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
>> table scan)
>>
>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: ITrigger - Help

2016-11-11 Thread siddharth verma
Hi Sathish,
You could look into, Change Data Capture (CDC) (
https://issues.apache.org/jira/browse/CASSANDRA-8844 .
It might help you for some of your requirements.

Regards
Siddharth Verma

On Fri, Nov 11, 2016 at 11:34 PM, Jonathan Haddad  wrote:

> cqlsh uses the Python driver, I don't see how there would be any way to
> differentiate where the request came from unless you stuck an extra field
> in the table that you always write when you're not in cqlsh, or you
> modified cqlsh to include that field whenever it did an insert.
>
> Checking iTrigger source, all you get is a reference to the ColumnFamily
> and some metadata.  At a glance of trunk, it doesn't look like you get the
> user that initiated the query.
>
> To be honest, I wouldn't do any of this, it feels like it's going to
> become an error prone mess.  Your best bet is to layer something on top of
> the driver yourself.  The cleanest way I think think of, long term, is to
> submit a JIRA / patch to enable some class loading & listener hooks in
> cqlsh itself.  Without a patch and a really good use case I don't know who
> would want to maintain that though, as it would lock the team into using
> Python for cqlsh.
>
> Jon
>
> On Fri, Nov 11, 2016 at 9:52 AM sat  wrote:
>
>> Hi,
>>
>> We are planning to use ITrigger to notify changes, when we execute
>> scripts or run commands in cqlsh prompt. If the operation is performed
>> through our application CRUD API, we are planning to handle notification in
>> our CRUD API itself, however if user performs some operation(like write
>> operation in cqlsh prompt) we want to handle those changes and update
>> modules that are listening to those changes.
>>
>> Could you please let us know whether it is possible to differentiate
>> updates done through cqlsh prompt and through application.
>>
>> We also thought about creating multiple users in cassandra and using
>> different user for cqlsh and for the application. If we go with this
>> approach, do we get the user who modified the table in ITrigger
>> implementation (ie., augment method)
>>
>>
>> Basically we are trying to limit/restrict usage of ITrigger just for
>> cqlsh prompt as it is little complex and risky (came to know it will impact
>> cassandra running in that node).
>>
>> Thanks and Regards
>> A.SathishKumar
>>
>>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: ITrigger - Help

2016-11-11 Thread siddharth verma
Hi Sathish,
You could look into, Change Data Capture (CDC) ( https://issues.apache.org/
jira/browse/CASSANDRA-8844 .
It might help you for some of your requirements.

Regards
Siddharth Verma

On Fri, Nov 11, 2016 at 11:34 PM, Jonathan Haddad  wrote:

> cqlsh uses the Python driver, I don't see how there would be any way to
> differentiate where the request came from unless you stuck an extra field
> in the table that you always write when you're not in cqlsh, or you
> modified cqlsh to include that field whenever it did an insert.
>
> Checking iTrigger source, all you get is a reference to the ColumnFamily
> and some metadata.  At a glance of trunk, it doesn't look like you get the
> user that initiated the query.
>
> To be honest, I wouldn't do any of this, it feels like it's going to
> become an error prone mess.  Your best bet is to layer something on top of
> the driver yourself.  The cleanest way I think think of, long term, is to
> submit a JIRA / patch to enable some class loading & listener hooks in
> cqlsh itself.  Without a patch and a really good use case I don't know who
> would want to maintain that though, as it would lock the team into using
> Python for cqlsh.
>
> Jon
>
> On Fri, Nov 11, 2016 at 9:52 AM sat  wrote:
>
>> Hi,
>>
>> We are planning to use ITrigger to notify changes, when we execute
>> scripts or run commands in cqlsh prompt. If the operation is performed
>> through our application CRUD API, we are planning to handle notification in
>> our CRUD API itself, however if user performs some operation(like write
>> operation in cqlsh prompt) we want to handle those changes and update
>> modules that are listening to those changes.
>>
>> Could you please let us know whether it is possible to differentiate
>> updates done through cqlsh prompt and through application.
>>
>> We also thought about creating multiple users in cassandra and using
>> different user for cqlsh and for the application. If we go with this
>> approach, do we get the user who modified the table in ITrigger
>> implementation (ie., augment method)
>>
>>
>> Basically we are trying to limit/restrict usage of ITrigger just for
>> cqlsh prompt as it is little complex and risky (came to know it will impact
>> cassandra running in that node).
>>
>> Thanks and Regards
>> A.SathishKumar
>>
>>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: ITrigger - Help

2016-11-11 Thread siddharth verma
Hi Sathish,
You could look into, Change Data Capture (CDC) (
https://issues.apache.org/jira/browse/CASSANDRA-8844 .
It might help you for some of your requirements.

Regards
Siddharth Verma

On Fri, Nov 11, 2016 at 11:34 PM, Jonathan Haddad  wrote:

> cqlsh uses the Python driver, I don't see how there would be any way to
> differentiate where the request came from unless you stuck an extra field
> in the table that you always write when you're not in cqlsh, or you
> modified cqlsh to include that field whenever it did an insert.
>
> Checking iTrigger source, all you get is a reference to the ColumnFamily
> and some metadata.  At a glance of trunk, it doesn't look like you get the
> user that initiated the query.
>
> To be honest, I wouldn't do any of this, it feels like it's going to
> become an error prone mess.  Your best bet is to layer something on top of
> the driver yourself.  The cleanest way I think think of, long term, is to
> submit a JIRA / patch to enable some class loading & listener hooks in
> cqlsh itself.  Without a patch and a really good use case I don't know who
> would want to maintain that though, as it would lock the team into using
> Python for cqlsh.
>
> Jon
>
> On Fri, Nov 11, 2016 at 9:52 AM sat  wrote:
>
>> Hi,
>>
>> We are planning to use ITrigger to notify changes, when we execute
>> scripts or run commands in cqlsh prompt. If the operation is performed
>> through our application CRUD API, we are planning to handle notification in
>> our CRUD API itself, however if user performs some operation(like write
>> operation in cqlsh prompt) we want to handle those changes and update
>> modules that are listening to those changes.
>>
>> Could you please let us know whether it is possible to differentiate
>> updates done through cqlsh prompt and through application.
>>
>> We also thought about creating multiple users in cassandra and using
>> different user for cqlsh and for the application. If we go with this
>> approach, do we get the user who modified the table in ITrigger
>> implementation (ie., augment method)
>>
>>
>> Basically we are trying to limit/restrict usage of ITrigger just for
>> cqlsh prompt as it is little complex and risky (came to know it will impact
>> cassandra running in that node).
>>
>> Thanks and Regards
>> A.SathishKumar
>>
>>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: Inconsistencies in materialized views

2016-10-20 Thread siddharth verma
Hi Edward,
Thanks a lot for your help. It helped us narrow down the problem.

Regards


On Mon, Oct 17, 2016 at 9:33 PM, Edward Capriolo 
wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-11198
>
> Which has problems "maybe" fixed by:
>
> https://issues.apache.org/jira/browse/CASSANDRA-11475
>
> Which has it's own set of problems.
>
> One of these patches was merged into 3.7 which tells you are running a
> version 3.6 with known bugs. Also as the feature is "new ish" you should be
> aware that "new ish" major features usually take 4-6 versions to solidify.
>
>
>
> On Mon, Oct 17, 2016 at 3:19 AM, siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi,
>> We have a base table with ~300 million entries.
>> And in a recent sanity activity, I saw approx ~33k entires(in one DC)
>> which were in the materialized view, but not in the base table. (reads with
>> quorum, DCAware)
>> (I haven't done it the other way round yet, i.e. entries in base table
>> but not in materialized view)
>>
>> Could someone suggest a possible cause for the same?
>> We saw some glitches in cassandra cluster
>> 1. node down.
>> If this is the case, will repair fix the issue?
>> 2. IOPS maxed out in one DC.
>> 3. Another DC added with some glitches.
>>
>> Could someone suggest how could we replicate inconsistency between base
>> table and materialized view. Any help would be appreciated.
>>
>> C* 3.6
>> Regards
>> SIddharth Verma
>> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
>> table scan)
>>
>
>


-- 
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: Scenarios when blocking read repair takes place

2016-10-17 Thread siddharth verma
Sorry Krishna, didn't get were you were trying to say.

Regards
  SIddharth Verma
  (Visit https://github.com/siddv29/cfs for a high speed cassandra full
table scan)

On Sat, Oct 15, 2016 at 11:50 PM, Krishna Chandra Prajapati <
prajapat...@gmail.com> wrote:

> Hi which side is this?
> Mankapur?
>
> Krishna
>
> On Oct 14, 2016 12:15 PM, "siddharth verma" 
> wrote:
>
>> Hi,
>> Does blocking read repair take place only when we read on the primary key
>> or
>> does it take place in the following scenarios as well?
>>
>> Consistemcy ALL
>> 1. select * from ks.table_name
>> 2. select * from ks.table_name where token(pk) >= ? and token(pk) <= ?
>>
>> While using manual paging or automatic paging in either of the scenarios.
>>
>> Thanks
>> Siddharth Verma
>> (Visit https://github.com/siddv29/cfs for a high speed cassandra full
>> table scan)
>>
>>


Inconsistencies in materialized views

2016-10-17 Thread siddharth verma
Hi,
We have a base table with ~300 million entries.
And in a recent sanity activity, I saw approx ~33k entires(in one DC) which
were in the materialized view, but not in the base table. (reads with
quorum, DCAware)
(I haven't done it the other way round yet, i.e. entries in base table but
not in materialized view)

Could someone suggest a possible cause for the same?
We saw some glitches in cassandra cluster
1. node down.
If this is the case, will repair fix the issue?
2. IOPS maxed out in one DC.
3. Another DC added with some glitches.

Could someone suggest how could we replicate inconsistency between base
table and materialized view. Any help would be appreciated.

C* 3.6
Regards
SIddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Scenarios when blocking read repair takes place

2016-10-13 Thread siddharth verma
Hi,
Does blocking read repair take place only when we read on the primary key or
does it take place in the following scenarios as well?

Consistemcy ALL
1. select * from ks.table_name
2. select * from ks.table_name where token(pk) >= ? and token(pk) <= ?

While using manual paging or automatic paging in either of the scenarios.

Thanks
Siddharth Verma
(Visit https://github.com/siddv29/cfs for a high speed cassandra full table
scan)


Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread siddharth verma
Hi,
consider the schema
pk1 text,
ck1 text
v1 text,
v2 text.
PRIMARY KEY(pk1,ck1)

1. insert into ks.tablename(pk1,ck1,v1,v2) values('PK1,'CK1','a','a');
2. delete from ks.tablename where pk1='PK2' and ck1='CK2';
3. insert into ks.tablename(pk1,ck1) values('PK3,'CK3');
4. insert into ks.tablename(pk1,ck1,v1) values('PK4,'CK4','a');

3rd case is "insert of the form when ONLY primary key values are specified"

if you are sure, case 3 will never occur from your application, you can
check on length of "next"(as in the code snippet),
next.length() will be greater than zero in case 1,4
next.length() will be equal to zero in case 2,3

Thus, inspite of 3 being an insert, in the code snippet, it might appear to
be a delete.


Rephrasing
"If you are sure that your application will NOT do an insert of the form
when ONLY primary key values are specified, you can check the length of
next, to indicate whether it is an insert/update(where atleast one non
primary key column value is inserted) or a delete if length is zero."
If you are sure case 3 will never occur,
then checking the next.length(), you can decide whether it is an
insert/update(length > 0) OR delete(length == 0)

I would urge you to try the snippet once on you own, to see what kind of
data it produces in *next*. You could dump the output of next in a column
for audit table, to see that output.


Regards
Siddharth Verma

On Wed, Oct 5, 2016 at 1:23 AM, Kant Kodali  wrote:

> Hi Siddharth,
>
> I don't quite follow the assumption "If you are sure that your
> application will NOT do an insert of the form when ONLY primary key values
> are specified, you can check the length of next, to indicate whether it is
> an insert/update(where atleast one non primary key column value is
> inserted) or a delete if length is zero.". Could you please provide an
> example ?
>
> Thanks,
> kant
>
>
>
> On Tue, Oct 4, 2016 12:34 PM, siddharth verma sidd.verma29.l...@gmail.com
> wrote:
>
>> Hi,
>> I am not sure whether it will help you or not.
>> Code snippet :
>> public Collection augment(Partition update)
>> {
>> ...
>> StringBuilder next=new StringBuilder();
>> SearchIterator searchIterator =
>> update.searchIterator(ColumnFilter.all(update.metadata()),false);
>> while(searchIterator.hasNext()){
>> next.append(searchIterator.next(Clustering.EMPTY).
>> toString()+"\001");
>> }
>> ...
>> //next carries non primary key column values
>> }
>>
>> If you are sure that your application will NOT do an insert of the form
>> when ONLY primary key values are specified, you can check the length of
>> next, to indicate whether it is an insert/update(where atleast one non
>> primary key column value is inserted) or a delete if length is zero.
>>
>> The code snippet is to the best of my knowledge, however, kindly try it
>> once at your end, as this was part of some legacy code, and I am not
>> completely sure about it.
>>
>> Here, if the assumption stated above holds true, you could avoid a
>> cassandra select for that key.
>>
>> Thanks
>> Siddharth Verma
>>
>>
>> On Wed, Oct 5, 2016 at 12:20 AM, Kant Kodali  wrote:
>>
>> Thanks a lot, This helps me to make a decision on not to write one for
>> the performance reasons you pointed out!
>>
>>
>>
>> On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com wrote:
>>
>> You would have to perform a SELECT on the row in the trigger code in
>> order to determine if there was underlying data.  Cassandra is in essence
>> an append-only data store, when an INSERT or UPDATE is executed, it has no
>> idea if there is already a row underlying it, and for write performance
>> reasons it also doesn't care.
>>
>> Note that if you do this, you're going to introduce a giant bottleneck in
>> your write path and increase the IO cost of writes.  You'll also probably
>> have some race conditions such that if two writes to the same row happen in
>> quick succession your trigger might not notice that one of them is writing
>> to the same row as the other. You might need to resort to CAS operations to
>> overcome that, along with its associated overhead.  But all that said, it
>> should be possible, though you'll have to write it for yourself in your
>> trigger code.
>>
>>
>>
>> On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
>>
>> Hi all,
>>
>> How to write a trigger in Cassandra to detect updates? My requirement is
>> that I want a trigger to alert me only when there is an update to an
>> existing row and looks like given the way INSERT and Update works this
>> might be hard to do because INSERT will just overwrite if there is an
>> existing row and Update becomes new insert where there is no row that
>> belongs to certain partition key. is there a way to solve this problem?
>>
>> Thanks,
>>
>> kant
>>
>>
>>


Re: How to write a trigger in Cassandra to only detect updates of an existing row?

2016-10-04 Thread siddharth verma
Hi,
I am not sure whether it will help you or not.
Code snippet :
public Collection augment(Partition update)
{
...
StringBuilder next=new StringBuilder();
SearchIterator searchIterator =
update.searchIterator(ColumnFilter.all(update.metadata()),false);
while(searchIterator.hasNext()){

next.append(searchIterator.next(Clustering.EMPTY).toString()+"\001");
}
...
//next carries non primary key column values
}

If you are sure that your application will NOT do an insert of the form
when ONLY primary key values are specified, you can check the length of
next, to indicate whether it is an insert/update(where atleast one non
primary key column value is inserted) or a delete if length is zero.

The code snippet is to the best of my knowledge, however, kindly try it
once at your end, as this was part of some legacy code, and I am not
completely sure about it.

Here, if the assumption stated above holds true, you could avoid a
cassandra select for that key.

Thanks
Siddharth Verma


On Wed, Oct 5, 2016 at 12:20 AM, Kant Kodali  wrote:

> Thanks a lot, This helps me to make a decision on not to write one for the
> performance reasons you pointed out!
>
>
>
> On Tue, Oct 4, 2016 11:42 AM, Eric Stevens migh...@gmail.com wrote:
>
>> You would have to perform a SELECT on the row in the trigger code in
>> order to determine if there was underlying data.  Cassandra is in essence
>> an append-only data store, when an INSERT or UPDATE is executed, it has no
>> idea if there is already a row underlying it, and for write performance
>> reasons it also doesn't care.
>>
>> Note that if you do this, you're going to introduce a giant bottleneck in
>> your write path and increase the IO cost of writes.  You'll also probably
>> have some race conditions such that if two writes to the same row happen in
>> quick succession your trigger might not notice that one of them is writing
>> to the same row as the other. You might need to resort to CAS operations to
>> overcome that, along with its associated overhead.  But all that said, it
>> should be possible, though you'll have to write it for yourself in your
>> trigger code.
>>
>>
>>
>> On Tue, Oct 4, 2016 at 12:29 PM Kant Kodali  wrote:
>>
>> Hi all,
>>
>> How to write a trigger in Cassandra to detect updates? My requirement is
>> that I want a trigger to alert me only when there is an update to an
>> existing row and looks like given the way INSERT and Update works this
>> might be hard to do because INSERT will just overwrite if there is an
>> existing row and Update becomes new insert where there is no row that
>> belongs to certain partition key. is there a way to solve this problem?
>>
>> Thanks,
>>
>> kant
>>
>>


Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma
Hi Jon,
It wan't allowed.
Moreover, if someone who isn't familiar with spark, and might be new to map
filter reduce etc. operations, could also use the utility for some simple
operations assuming a sequential scan of the cassandra table.

Regards
Siddharth Verma

On Tue, Oct 4, 2016 at 1:32 AM, Jonathan Haddad  wrote:

> Couldn't set up as couldn't get it working, or its not allowed?
>
> On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi Jon,
>> We couldn't setup a spark cluster.
>>
>> For some use case, a spark cluster was required, but for some reason we
>> couldn't create spark cluster. Hence, one may use this utility to iterate
>> through the entire table at very high speed.
>>
>> Had to find a work around, that would be faster than paging on result set.
>>
>> Regards
>>
>> Siddharth Verma
>> *Software Engineer I - CaMS*
>> *M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
>> CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
>> Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
>> Download Our App
>> [image: A]
>> <https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>
>>  [image:
>> A]
>> <https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>
>>  [image:
>> W]
>> <http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>
>>
>> On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad 
>> wrote:
>>
>> It almost sounds like you're duplicating all the work of both spark and
>> the connector. May I ask why you decided to not use the existing tools?
>>
>> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
>> sidd.verma29.l...@gmail.com> wrote:
>>
>> Hi DuyHai,
>> Thanks for your reply.
>> A few more features planned in the next one(if there is one) like,
>> custom policy keeping in mind the replication of token range on specific
>> nodes,
>> fine graining the token range(for more speedup),
>> and a few more.
>>
>> I think, as fine graining a token range,
>> If one token range is split further in say, 2-3 parts, divided among
>> threads, this would exploit the possible parallelism on a large scaled out
>> cluster.
>>
>> And, as you mentioned the JIRA, streaming of request, that would of huge
>> help with further splitting the range.
>>
>> Thanks once again for your valuable comments. :-)
>>
>> Regards,
>> Siddharth Verma
>>
>>
>>


Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread Siddharth Verma
Hi Jon,
We couldn't setup a spark cluster.
For some use case, a spark cluster was required, but for some reason we
couldn't create spark cluster. Hence, one may use this utility to iterate
through the entire table at very high speed.

Had to find a work around, that would be faster than paging on result set.

Regards

Siddharth Verma
*Software Engineer I - CaMS*
*M*: +91 9013689856, *T*: 011 22791596 *EXT*: 14697
CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
Download Our App
[image: A]
<https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>
[image:
A]
<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>
[image:
W]
<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>

On Tue, Oct 4, 2016 at 12:41 AM, Jonathan Haddad  wrote:

> It almost sounds like you're duplicating all the work of both spark and
> the connector. May I ask why you decided to not use the existing tools?
>
> On Mon, Oct 3, 2016 at 2:21 PM siddharth verma <
> sidd.verma29.l...@gmail.com> wrote:
>
>> Hi DuyHai,
>> Thanks for your reply.
>> A few more features planned in the next one(if there is one) like,
>> custom policy keeping in mind the replication of token range on specific
>> nodes,
>> fine graining the token range(for more speedup),
>> and a few more.
>>
>> I think, as fine graining a token range,
>> If one token range is split further in say, 2-3 parts, divided among
>> threads, this would exploit the possible parallelism on a large scaled out
>> cluster.
>>
>> And, as you mentioned the JIRA, streaming of request, that would of huge
>> help with further splitting the range.
>>
>> Thanks once again for your valuable comments. :-)
>>
>> Regards,
>> Siddharth Verma
>>
>


Re: An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma
Hi DuyHai,
Thanks for your reply.
A few more features planned in the next one(if there is one) like,
custom policy keeping in mind the replication of token range on specific
nodes,
fine graining the token range(for more speedup),
and a few more.

I think, as fine graining a token range,
If one token range is split further in say, 2-3 parts, divided among
threads, this would exploit the possible parallelism on a large scaled out
cluster.

And, as you mentioned the JIRA, streaming of request, that would of huge
help with further splitting the range.

Thanks once again for your valuable comments. :-)

Regards,
Siddharth Verma


An extremely fast cassandra table full scan utility

2016-10-03 Thread siddharth verma
Hi,
I was working on a utility which can be used for cassandra full table scan,
at a tremendously high velocity, cassandra fast full table scan.
How fast?
The script dumped ~ 229 million rows in 116 seconds, with a cluster of size
6 nodes.
Data transfer rates were upto 25MBps was observed on cassandra nodes.

For some use case, a spark cluster was required, but for some reason we
couldn't create spark cluster. Hence, one may use this utility to iterate
through the entire table at very high speed.

But now for any full scan, I use it freely for my adhoc java programs to
manipulate or aggregate cassandra data.

You can customize the options, setting fetch size, consistency level,
degree of parallelism(number of threads) according to your need.

You can visit https://github.com/siddv29/cfs to go through the code, see
the logic behind it, or try it in your program.
A sample program is also provided.

I coded this utility in java.

Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept.
For python you may visit his blog(
http://casualreflections.io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python)
and github(
https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47e0981d36d2c0785)

Looking forward to your suggestions and comments.

P.S. Give it a try. Trust me, the iteration speed is awesome!!
It is a bare application, built asap. If you would like to contribute to
the java utility, add or build up on it, do reach out
sidd.verma29.li...@gmail.com

Thanks and Regards,
Siddharth Verma
(previous email id on this mailing list : verma.siddha...@snapdeal.com)


Re: Return value of newQueryPlan

2016-09-02 Thread Siddharth Verma
I am debugging an issue at our cluster. Trying to find the RCA of it,
according to our application behavior.
Used WhiteList policy(asked a question for the same, some time back) but,
it was stated that it can not guarantee the desired behavior.
Yes, I forgot to mention, i was referring to Java driver.
I used DCAwareRoundRobin, TokenAware Policy for application flow.
Would ask question1 on driver mailing list, If someone could help with
question 2.









On Fri, Sep 2, 2016 at 6:59 PM, Eric Stevens  wrote:

> These sound like driver-side questions that might be better addressed to
> your specific driver's mailing list.  But from the terminology I'd guess
> you're using a DataStax driver, possibly the Java one.
>
> If so, you can look at WhiteListPolicy if you want to target specific
> node(s).  However aside from testing specific scenarios (like performance
> testing coordinated operations) it's unlikely that with a correctly tuned
> LBP, you'll be able to do a better job of node selection than the driver is
> able to.  DCAwareRoundRobin with a child policy of TokenAware will choose a
> primary or replica node for any operations where it can know in advance.
>
> With RF == N like your setup, every piece of data is owned by every node,
> so as long as your LBP is distributive, and outside of performance testing,
> I can't see why you'd be needing to target specific nodes for anything.
>
> On Fri, Sep 2, 2016 at 1:59 AM Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi,
>> I have Dc1(3 nodes), Dc2(3 nodes),
>> RF:3 on both Dcs
>>
>> question 1 : when I create my LoadBalancingPolicy, and override
>> newQueryPlan, the list of hosts from newQueryPlan is the candidate
>> coordinator list?
>>
>> question 2 : Can i force the co-ordintor to hit a particular cassandra
>> node only. I used consistency LOCAL_ONE, but i guess, i doesn't guarantee
>> that data will be fetched from it.
>>
>> Thanks
>> Siddharth Verma
>>
>


Return value of newQueryPlan

2016-09-02 Thread Siddharth Verma
Hi,
I have Dc1(3 nodes), Dc2(3 nodes),
RF:3 on both Dcs

question 1 : when I create my LoadBalancingPolicy, and override
newQueryPlan, the list of hosts from newQueryPlan is the candidate
coordinator list?

question 2 : Can i force the co-ordintor to hit a particular cassandra node
only. I used consistency LOCAL_ONE, but i guess, i doesn't guarantee that
data will be fetched from it.

Thanks
Siddharth Verma


Re: ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Siddharth Verma
Debugged the issue a little.
AbstractFuture.get() throws java.util..concurrent.ExecutionException
in, Uninterruptables.getUninterruptibly interrupted gets set to true, which
does Thread.interrupt()
thus in DefaultResultSetFuture
(ResultSet)Uninterruptibles.getUninterruptibly(this) throws exception.

If someone who might have faced a similar issue could provide his/her views.

Thanks
Siddharth


Re: ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Siddharth Verma
Correction : java driver version : 3.1.0


Re: ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Siddharth Verma
Update :
I page on the ResultSet explicitly in the program(
*boundStatement.setPagingState(PagingState.fromString(currentPageInfo));*)
When i put a *limit 10* after select statement, there is no error, with the
token range set i provided it
However, without that, there is this error, on second token range set.

Thanks
Siddharth


Re: ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Siddharth Verma
Hi Ben
1. cassandra 3.6
2. driver 3.2
3. Statement : select * from my_ks.my_table1 where token(pk1) >= ? and
token(pk1) https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>
[image:
A]
<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>
[image:
W]
<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>

On Thu, Sep 1, 2016 at 4:56 PM, Ben Slater 
wrote:

> Hi Siddarth,
>
> It would probably help people provide and answer if you let everyone some
> more details like:
> - cassandra version and driver version you are using
> - query that is being executed when the error occurs
> - schema of the table that is being queried
>
> Cheers
> Ben
>
> On Thu, 1 Sep 2016 at 21:19 Siddharth Verma 
> wrote:
>
>> Hi,
>> Could someone help me out with the following exception in cassandra java
>> driver.
>> Why did it occur?
>> MyClass program is paging on the result set.
>>
>> com.datastax.driver.core.exceptions.ServerError: An unexpected error
>> occurred server side on /10.0.230.25:9042: java.lang.AssertionError:
>> [DecoratedKey(3529259302770464040, 53444c373134303435333030),min(
>> 2177391360409801028)]
>> at com.datastax.driver.core.exceptions.ServerError.copy(
>> ServerError.java:63)
>> at com.datastax.driver.core.exceptions.ServerError.copy(
>> ServerError.java:25)
>> at com.datastax.driver.core.DriverThrowables.propagateCause(
>> DriverThrowables.java:37)
>> at com.datastax.driver.core.DefaultResultSetFuture.
>> getUninterruptibly(DefaultResultSetFuture.java:245)
>> at com.datastax.driver.core.AbstractSession.execute(
>> AbstractSession.java:64)
>> at com.personal.trial.MyClass.fetchLoop(MyClass.java:63)
>> at com.personal.trial.MyClass.run(MyClass.java:85)
>> Caused by: com.datastax.driver.core.exceptions.ServerError: An
>> unexpected error occurred server side on /10.0.230.25:9042:
>> java.lang.AssertionError: [DecoratedKey(3529259302770464040,
>> 53444c373134303435333030),min(2177391360409801028)]
>> at com.datastax.driver.core.Responses$Error.asException(
>> Responses.java:108)
>> at com.datastax.driver.core.RequestHandler$
>> SpeculativeExecution.onSet(RequestHandler.java:500)
>> at com.datastax.driver.core.Connection$Dispatcher.
>> channelRead0(Connection.java:1012)
>> at com.datastax.driver.core.Connection$Dispatcher.
>> channelRead0(Connection.java:935)
>> at io.netty.channel.SimpleChannelInboundHandler.channelRead(
>> SimpleChannelInboundHandler.java:105)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:342)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:328)
>> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
>> AbstractChannelHandlerContext.java:321)
>> at io.netty.handler.timeout.IdleStateHandler.channelRead(
>> IdleStateHandler.java:266)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:342)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:328)
>> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
>> AbstractChannelHandlerContext.java:321)
>> at io.netty.handler.codec.MessageToMessageDecoder.channelRead(
>> MessageToMessageDecoder.java:102)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:342)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:328)
>> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
>> AbstractChannelHandlerContext.java:321)
>> at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(
>> ByteToMessageDecoder.java:293)
>> at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
>> ByteToMessageDecoder.java:267)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:342)
>> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(
>> AbstractChannelHandlerContext.java:328)
>> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(
>> AbstractChannelHandlerContext.java:321)
>> at io.netty.channel.DefaultChannelPipeline$HeadContext.chann

ServerError: An unexpected error occurred server side; in cassandra java driver

2016-09-01 Thread Siddharth Verma
Hi,
Could someone help me out with the following exception in cassandra java
driver.
Why did it occur?
MyClass program is paging on the result set.

com.datastax.driver.core.exceptions.ServerError: An unexpected error
occurred server side on /10.0.230.25:9042: java.lang.AssertionError:
[DecoratedKey(3529259302770464040,
53444c373134303435333030),min(2177391360409801028)]
at
com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:63)
at
com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:25)
at
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
at
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:64)
at com.personal.trial.MyClass.fetchLoop(MyClass.java:63)
at com.personal.trial.MyClass.run(MyClass.java:85)
Caused by: com.datastax.driver.core.exceptions.ServerError: An unexpected
error occurred server side on /10.0.230.25:9042: java.lang.AssertionError:
[DecoratedKey(3529259302770464040,
53444c373134303435333030),min(2177391360409801028)]
at
com.datastax.driver.core.Responses$Error.asException(Responses.java:108)
at
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:500)
at
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1012)
at
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:935)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
at
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)

Thanks
Siddharth


Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Siddharth Verma
Hi ,
Can we be sure that, token ranges in nodetool describering will be non
overlapping?

Thanks
Siddharth Verma


Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Siddharth Verma
Hi Alex,
Thanks for your reply.
I saw describering yesterday, but didn't know "the first endpoint being the
primary". Thanks for that.
Is there anyway to get the same information in application?
If there isn't any way to get the same information at application layer, I
would be using this as a backup.

Siddharth Verma


Re: Output of "select token from system.local where key = 'local' "

2016-08-30 Thread Siddharth Verma
Hi,
I saw that in cassandra-driver-core,(3.1.0) Metadata.TokenMap has
primaryToTokens which has the value for ALL the nodes.
I tried to find (primary)range ownership for nodes in one DC.
And executed the following in debug mode in IDE.

TreeMap primaryTokenMap = new TreeMap<>();
for(Host host :
main.cluster.getMetadata().tokenMap.primaryToTokens.keySet()){
if(!host.getDatacenter().equals("dc2"))
continue;
for(Token token :
main.cluster.getMetadata().tokenMap.primaryToTokens.get(host)){
primaryTokenMap.put((Long) token.getValue(),host);

}
}
primaryTokenMap //this printed the map in evaluate code fragment window

dc2 has 3 nodes, RF is 3
Sample entries :
244925668410340093 -> /10.0.3.79:9042
291047688656337660 -> /10.0.3.217:9042
317775761591844910 -> /10.0.3.135:9042
328177243900091789 -> /10.0.3.79:9042
329239043633655596 -> /10.0.3.135:9042


Can I safely assume
Token
Range
Host
244925668410340093 to 291047688656337660 -1 belongs to 10.0.3.79:9042
291047688656337660 to 317775761591844910 -1 belongs to 10.0.3.135:9042
317775761591844910 to 328177243900091789 -1 belongs to 10.0.3.135:9042
And so on.

Is the above assumption ABSOLUTELY correct?
(Kindly suggest changes/errors, if any)

Any help would be great.
Thanks and Regards,
Siddharth Verma


Re: Output of "select token from system.local where key = 'local' "

2016-08-29 Thread Siddharth Verma
Hi Eric,
Thanks for your reply.
I know about V nodes, and token ownership.
If I want to find the token range, how do i do that from this data?
For eg, the 1st entry is -1035756551821816651. which range is it, i.e. is
this the start/end, and how to find the other extremity of this range.

Thanks,
Siddharth Verma


Output of "select token from system.local where key = 'local' "

2016-08-29 Thread Siddharth Verma
7;2364536724234068419',
'2382591925891908896', '2429124684884597938', '244430838142896474',
'244925668410340093', '2485346011674373180', '2626282359632995454',
'2693462343040966729', '2718655143471808670', '2765243990793471492',
'2789661355953551747', '2792656114228506892', '2875065064454847186',
'2877658606936819232', '3115127498698455075', '3171221149125308871',
'3204314794520771859', '3225785212981795464', '328177243900091789',
'3344530729414517347', '3345238149336998459', '3502198757183164466',
'351101769587756802', '3588504874207249607', '3606382180043621196',
'3687903264019113607', '3808573696234663432', '3826930999611830586',
'3937594202513320004', '3991573227385747358', '4049586397706206888',
'4113277683763393618', '4133779736724357706', '436041502200834273',
'4409977876499706529', '4501676578785778929', '4542194500781241332',
'457304378229441003', '4625447288380261317', '4635627349070688401',
'465311781634413', '4684783897201143715', '4718159652990403843',
'4774133625331605257', '4830998188269931348', '4910055390341907857',
'4998500558397491684', '5023943523343001536', '505744292693224967',
'5162320571738032500', '5277396368371480274', '5346830336223723812',
'5360976982928138315', '5415193046016833300', '5420351854649967783',
'551728115923760971', '5611031614377902472', '564955142261319773',
'5741673630415265730', '5765467955391572803', '5777120745629762657',
'5911733864168126799', '5963370543879046708', '601095846312433394',
'6025232037324913724', '612464281122209', '6130180990267145790',
'614044759087911080', '615625082046151174', '6204536250349648494',
'6254832987275160383', '6299420497233402212', '6330219643037606275',
'6332166655931126455', '649843146990783793', '6546360696742228459',
'6667173768996479693', '6681703316740595756', '6700409845144118031',
'6768174919292984477', '6993240335038495953', '699592662848728421',
'7067614760768641586', '7464549774220819665', '7470203399706056435',
'7517126645125886173', '75179143059671689', '7532146377366822000',
'7668359789169421370', '7693365598138379140', '7693612576353655751',
'771781371808055068', '773878061649801356', '7949652196530716229',
'8101045330184519696', '8131531126072838629', '8140361357245000979',
'8302292937826438542', '8422369105285792829', '8449836158927214743',
'8499955449276612349', '850192426321138914', '8605431687486711156',
'8678800982435339567', '8687105151647096442', '8776316423500407040',
'8953364009441747611', '8979834615817990709', '8984590311736053933',
'9069513190093258312', '9080235839310378242', '9098260908744107307',
'9102600462020381711', '938833944614881458', '987385371860069729',
'990370821724370771'}


Thanks,
Siddharth Verma


Re: Cassandra 3.4 Triggers API

2016-07-29 Thread Siddharth Verma
Hi Jakub,
You can read the mail thread on how to extract clustering columns in
trigger.

https://mail-archives.apache.org/mod_mbox/cassandra-user/201605.mbox/%3CCAAam9ssYf0LvBgJ86M1Phb0ak7=jnh_acoanr8ofov4kvbr...@mail.gmail.com%3E

Extracting partition key for the operation is mentioned in the trigger
example.

Regards,
Siddharth Verma


Re: Cassandra 3.4 Triggers API

2016-07-25 Thread Siddharth Verma
Hi Jakub,
I worked with trigger, I was auditing it by time
.
I considered the following partition keys for the audit table.
1. (timeuuid,uuid) OR,
2. formatted date in groups of 5 minutes
21 March 2015, 13:44:15 -> 201503211340

*Used it to batch operations by a group of minutes.*


*Caution : If you have extremely heavy write/update work load, it may cause
to create wide rows.*
Your second point isn't clear.
If you want the values of non primary key columns to be audited, as is done
with triggers in MySQL, unfortunately, as far as I know, this can't be done.
However for the values of partition key, clustering column, on which the
update,insert,delete is called,
this can be extracted by using the UnfilteredRowIterator of the Partition
object you receive in the trigger function.

Thanks and Regards,
Siddharth Verma


Re: Performance impact on schema

2016-07-01 Thread Siddharth Verma
If anyone, who has done a POC for the same, could share his/her views.
Any help would be appreciated.

Thanks
Siddharth Verma


Performance impact on schema

2016-06-30 Thread Siddharth Verma
Hi,
We have a schema where a table has 90 columns.
We have read request patterns where say
api 1 reads column 1-20
api 2 reads column 20-40
api 3 reads column 40-60
api 4 reads column 60-90

And say, we have a write access pattern on the same corresponding group of
columns.

Do the write pattern (overlapping/ non overlapping group of columns) affect
read performance?

For better performance, should we split the table into 4 sub tables, where
each table is used to serve respective read api?

Thanks and Regards,
Siddharth Verma


Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Siddharth Verma
id is partition key,
f_name is clustering key

We weren't querying on lucene indexes.
lucene index is on id, and f_d_name (another column).


We were facing this issue on production in one column family, due to which
we had to downgrade to 3.0.3


Re: select query on entire primary key returning more than one row in result

2016-06-13 Thread Siddharth Verma
No, all rows were not the same.
Querying only on the partition key gives 20 rows.
In the erroneous result, while querying on partition key and clustering
key, we got 16 of those 20 rows.

And for "*tombstone_threshold"* there isn't any entry at column family
level.

Thanks,
Siddharth Verma


Re: select query on entire primary key returning more than one row in result

2016-06-13 Thread Siddharth Verma
Running nodetool compact fixed the issue.

Could someone help out as why it occurred.


select query on entire primary key returning more than one row in result

2016-06-13 Thread Siddharth Verma
Hi,
We are facing this issue on production,
We upgraded our cassandra from 3.0.3 to 3.5

When we ran a query with partition key and clustering column(entire primary
key specified), we get 16 rows in return.

We have 2DC's, each with RF 3 for our keyspace.

1. We connected with cqlsh, and setting consistency to local_one, and
tracing on, we saw that, we got correct result on 3, and erroneous results
on 3.
Correct result : only 1 row
Erroneous result : 16 rows

2. we executed the statement while specifying only the clustering column
with ALLOW FILTERING, then we got the only one record for that partition
key.

3. While upgrading, we dropped key_cache folder on some, not all.

What could be the causes and how to fix this issue?

We speculate that it might be due to cache.

Any help would be appreciated.

Thanks
Siddharth Verma


Re: Get clustering column in Custom cassandra trigger

2016-05-26 Thread Siddharth Verma
Hi Sam,
Sorry, I couldn't understand.

I am already using
UnfilteredRowIterator unfilteredRowIterator =partition.unfilteredIterator();

while(unfilteredRowIterator.hasNext()){
next.append(unfilteredRowIterator.next().toString()+"\001");
}

Is there another way to access it?


Re: Get clustering column in Custom cassandra trigger

2016-05-26 Thread Siddharth Verma
Tried the following as well. Still no result.

update.metadata().clusteringColumns().toString()  -> get clustering column
names
update.columns().toString()   -> gets no primary key
colulmns
update.partitionKey().toString()  -> gets token range

Any help would be appreciated.

Thanks
Siddharth Verma


Get clustering column in Custom cassandra trigger

2016-05-25 Thread Siddharth Verma
hi,
I am creating a trigger in cassandra
---
public class GenericAuditTrigger implements ITrigger
{

private static SimpleDateFormat dateFormatter = new SimpleDateFormat
("/MM/dd");

public Collection augment(Partition update)
{
String auditKeyspace = "test";
String auditTable = "audit";

RowUpdateBuilder audit = new
RowUpdateBuilder(Schema.instance.getCFMetaData(auditKeyspace, auditTable),
FBUtilities.timestampMicros(),
UUIDGen.getTimeUUID())
.clustering(dateFormatter.format(new
Date()),update.metadata().ksName,update.metadata().cfName,UUID.randomUUID());


audit.add("primary_key",update.metadata().getKeyValidator().getString(update.partitionKey().getKey()));

UnfilteredRowIterator unfilteredRowIterator =
update.unfilteredIterator();
StringBuilder next=new StringBuilder();
while(unfilteredRowIterator.hasNext()){
next.append(unfilteredRowIterator.next().toString()+"\001");
}

audit.add("values",
next.length()==0?null:next.deleteCharAt(next.length()-1).toString()+";"+update.columns().toString());

return Collections.singletonList(audit.build());
}
}

---
CREATE TABLE test.test (pk1 text, pk2 text, ck1 text, ck2 text, v1 text, v2
text, PRIMARY KEY((pk1,pk2),ck1,ck2);
---
CREATE TABLE test.audit (
timeuuid timeuuid,
date text,
keyspace_name text,
table_name text,
uuid UUID,
primary_key text,
values text,
PRIMARY KEY (timeuuid, date, keyspace_name, table_name, uuid));
---



*How to get clustering column values in trigger?*insert into test(pk1 , pk2
, ck1 , ck2 , v1 , v2 ) VALUES ('pk1','pk2','ck1','ck2_del','v1','v2');

select * from audit;

timeuuid  | 0d117390-227e-11e6-9d80-dd871f2f22d2
date| 2016/05/25
keyspace_name  | test
table_name | test
uuid| df274fc0-4362-42b1-a3bf-0030f8d2062f
primary_key| pk1:pk2
values| [[v1=v1 ts=1464184100769315], [v2=v2
ts=1464184100769315]]


How to audit ck1 and ck2 also?

Thanks,
Siddharth Verma


Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-17 Thread Siddharth Verma
Hi Eduardo,
Thanks for your reply. If it is fixed in 3.0.5.1, we will shift to it.

One more question,
If instead of truncating table, we remove some rows, then
are the lucene documents and indexes for those rows deleted?


Re: IF EXISTS checks on all nodes?

2016-05-12 Thread Siddharth Verma
Hi, I missed out on some info
node 1,2,3 are in DC1
node 4,5,6 are in DC2
and RF is 3
so all data is on all nodes


@Carlos : There was only one query. And yes all nodes have same data for
col5 only
node 6 has
P1,100,A,val1,w1
P1,100,B,val2,w2
P1,200,C,val3,w_x
P1,200,D,val4,w4

node 1,2,3,4,5 have
P1,100,A,val1,w1
P1,100,B,val2,w2
P1,200,C,null,w_x

So, when "consistency all" in cqlsh
1.IF EXISTS is checked ON EVERY NODE before applying, and if it is true on
all, ONLY then it is applied
OR
2. IF EXISTS was true on one, so applied on all.


IF EXISTS checks on all nodes?

2016-05-12 Thread Siddharth Verma
Hi,
If i have inconsistent data on nodes
Scenario :
I have 2 DCs each with 3 nodes
and I have inconsistent data on them

node 1,2,3,4,5 have
P1,100,A,val1,w1
P1,100,B,val2,w2

node 6 has
P1,100,A,val1,w1
P1,100,B,val2,w2
P1,200,C,val3,w3
P1,200,D,val4,w4

col1, col2, col3,col4,col5 in table
Primary key (col1, col2, col3)

Now i execute the query from CQLSH
update mykeyspace.my_table_1 set col5 = 'w_x' where col1='P1' and col2=200
and col3='C' IF EXISTS;

Is it possible that
node 1,2,3,4,5 will get the entry
P1,200,C,null,w_x

I.e. IF EXISTS is checked per node or only once and then execute on all?

Thanks
Siddharth Verma


Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-08 Thread Siddharth Verma
Hi Eduardo,
Thanks for your help on stratio index problem

As per your questions.

1. We ran nodetool repair on one box(no range repair), but due to it,
entire DC was non responsive.
It was up, but we were not able to connect.

2. RF is 3, and we have 2 DCs each with 3 nodes.

3. Consistency level for writes is Local_Quorum.

Thanks
Siddharth Verma


Re: Read data from specific node in cassandra

2016-05-06 Thread Siddharth Verma
@Joseph,
An incident we saw in production, and have a speculation as to how it might
have occured.

*A detailed description of use case*

*Incident*
We have a 2 DCs each with three nodes.
And our keyspace has RF 3 per DC. read_repair_chance is 0.0 for all the
tables.
After a while(we run periodic full table scans to dump data someplace
else), we saw corrupted data being dumped.
We copied the ss tables of all node of one DC to a separate cluster created
for debugging.
 We shutdown two nodes of the replica cluster, so that only one was up,
and made queries on cqlsh for the possibly corrupted data.
 What we saw was. out of the three nodes of replica, two has similar
data, and one had some extra data which shouldn't have been there for that
particular partition key.



*Speculation*
A possible cause we could come up with was, on a particular day, one of the
nodes of the production DC might have gone down. And that time might have
crossed the hinted_handoff_window.
Say, node went down on 12PM
Coordinator nodes stored hints from 12PM - 3PM.
Node was started on 6PM
All deletions/updates 3PM-6PM were not on our particular node.
And repair wasn't run on that node. After 10 days, tombstones
deleted(gc_grace_seconds).
Now that particular node still has data which was missed in deletion, and
the data has been removed from other two nodes.
So, we can't run repair now.

Again, it is a possible speculation. We are not sure. This is the only
cause we could come up with


@User
Back to the requirement "*Read data from specific node in cassandra*"
I prematurely stated whitelist worked *perfectly. *However, while scanning
the data, it isn't the case. It has caused ambiguous data dump.
This option didn't work for debugging.
Could someone suggest other alternatives?


[C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-06 Thread Siddharth Verma
Hi,
I have 2 queries. We are using cassandra dsc 3.0.3 and stratio lucene
indexes on tables.

1. when table is truncated, lucene index is not cleared for the same. we
see that it still occupied space on disk.

2. when we run nodetool repair, all node are up (nodetool status) but we
can't connect to either of the nodes in the same DC.

Any help would be appreciated

Thanks
Siddharth Verma


Re: Read data from specific node in cassandra

2016-05-06 Thread Siddharth Verma
Hi,
Whitelist worked perfectly.
Thanks for the help.

In case, someone wants to use the same, the bellow code snippet might help
them


private final Cluster mainCluster;
private final Session mainSession;
. . . . . . . . . . .
. . . . . . . . . . .
String mainHost = "IP_of_machine";
. . . . . . . . . . .
mainCluster =
Cluster.builder().addContactPoint(mainHost).withQueryOptions(new
QueryOptions().setFetchSize(fetchSize)).withCredentials(username, password).
withLoadBalancingPolicy(new WhiteListPolicy(new
RoundRobinPolicy(),Arrays.asList(new InetSocketAddress(mainHost,9042
.build();
mainSession = mainCluster.connect();


Regards,
SIddharth Verma

On Thu, May 5, 2016 at 8:59 PM, Jeff Jirsa 
wrote:

> This doesn’t actually guarantee the behavior you think it does. There’s no
> actual way to guarantee this behavior in Cassandra, as far as I can tell. A
> long time ago there was a ticket for a “coordinator only” consistency
> level, which is nearly trivial to implement, but the use case is so narrow
> that it’s unlikely to ever be done.
>
> Here’s an example trace on system_auth, where all nodes are replicas
> (RF=N), and the data is fully repaired (data exists on the local node). The
> coordinator STILL chooses a replica other than itself (far more likely to
> see this behavior on a keyspace with a HIGH replication factor, this
> particular cluster is N in the hundreds):
>
> cqlsh> tracing on;
> Tracing is already enabled. Use TRACING OFF to disable.
> cqlsh> CONSISTENCY local_one;
> Consistency level set to LOCAL_ONE.
> cqlsh> select name from system_auth.users where name='jjirsa' limit 1;
>
>  name
> 
>  jjirsa
>
> (1 rows)
>
> Tracing session: 5ffdee70-12d5-11e6-ad58-317180027532
>
>  activity
>| timestamp  | source   |
> source_elapsed
>
> -++--+
>
> Execute CQL3 query | 2016-05-05 15:23:52.919000 | x.y.z.150 |
>0
>   Parsing select * from system_auth.users where name='jjirsa' limit 1;
> [SharedPool-Worker-7] | 2016-05-05 15:23:52.919000 | x.y.z.150 |
>  100
>Preparing statement
> [SharedPool-Worker-7] | 2016-05-05 15:23:52.92 | x.y.z.150 |
>  194
>reading data from /x.y.z.151
> [SharedPool-Worker-7] | 2016-05-05 15:23:52.92 | x.y.z.150 |
>  965
>  Sending READ message to /x.y.z.151
> [MessagingService-Outgoing-/x.y.z.151] | 2016-05-05 15:23:52.921000 |
> x.y.z.150 |   1072
>   REQUEST_RESPONSE message received from /x.y.z.151
> [MessagingService-Incoming-/x.y.z.151] | 2016-05-05 15:23:52.924000 |
> x.y.z.150 |   5433
> Processing response from /x.y.z.151
> [SharedPool-Worker-4] | 2016-05-05 15:23:52.924000 | x.y.z.150 |
> 5595
>   READ message received from /x.y.z.150
> [MessagingService-Incoming-/x.y.z.150] | 2016-05-05 15:23:52.927000 |
> x.y.z.151 |104
>  Executing single-partition query on users
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.928000 | x.y.z.151 |
> 2251
>   Acquiring sstable references
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.929000 | x.y.z.151 |
> 2353
>Merging memtable tombstones
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.929000 | x.y.z.151 |
> 2414
>   Partition index with 0 entries found for sstable 384
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.929000 | x.y.z.151 |
> 2829
>Seeking to partition beginning in data file
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.93 | x.y.z.151 |
> 2913
>  Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.93 | x.y.z.151 |
> 3263
> Merging data from memtables and 1 sstables
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.931000 | x.y.z.151 |
> 3289
>  Read 1 live and 0 tombstone cells
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.931000 | x.y.z.151 |
> 3323
>Enqueuing response to /x.y.z.150
> [SharedPool-Worker-6] | 2016-05-05 15:23:52.931000 | x.y.z.151 |
> 3411
>  Sending REQUEST_RESPONSE message to /x.y.z.150
> [MessagingService-Outgoing-/x.y.z.150] | 2016-05-05 15:23:52.932000 |
> x.y.z.151 |   3577
>
>   Request complete

Read data from specific node in cassandra

2016-05-05 Thread Siddharth Verma
Hi,
We have a 3 node cluster in DC1, where replication factor of keyspace is 3.
How can i read data only from one particular node in java driver?

Thanks,
Siddharth Verma


Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Anyways, thanks for your reply.


On Thu, Apr 28, 2016 at 1:59 PM, Hannu Kröger  wrote:

> Ok, then I don’t understand the problem.
>
> Hannu
>
> On 28 Apr 2016, at 11:19, Siddharth Verma 
> wrote:
>
> Hi Hannu,
>
> Had the issue been caused due to read, the insert, and delete statement
> would have been erroneous.
> "I saw the stdout from web-ui of spark, and the query along with true was
> printed for both the queries.".
> The statements were correct as seen on the UI.
> Thanks,
> Siddharth Verma
>
>
>
> On Thu, Apr 28, 2016 at 1:22 PM, Hannu Kröger  wrote:
>
>> Hi,
>>
>> could it be consistency level issue? If you use ONE for reads and writes,
>> might be that sometimes you don't get what you are writing.
>>
>> See:
>>
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>>
>> Br,
>> Hannu
>>
>>
>> 2016-04-27 20:41 GMT+03:00 Siddharth Verma 
>> :
>>
>>> Hi,
>>> I dont know, if someone has faced this problem or not.
>>> I am running a job where some data is loaded from cassandra table. From
>>> that data, i make some insert and delete statements.
>>> and execute it (using forEach)
>>>
>>> Code snippet:
>>> boolean deleteStatus=
>>> connector.openSession().execute(delete).wasApplied();
>>> boolean  insertStatus =
>>> connector.openSession().execute(insert).wasApplied();
>>> System.out.println(delete+":"+deleteStatus);
>>> System.out.println(insert+":"+insertStatus);
>>>
>>> When i run it locally, i see the respective results in the table.
>>>
>>> However when i run it on a cluster, sometimes the result is displayed
>>> and sometime the changes don't take place.
>>> I saw the stdout from web-ui of spark, and the query along with true was
>>> printed for both the queries.
>>>
>>> I can't understand, what could be the issue.
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks,
>>> Siddharth Verma
>>>
>>
>>
>
>


Re: Query regarding spark on cassandra

2016-04-28 Thread Siddharth Verma
Hi Hannu,

Had the issue been caused due to read, the insert, and delete statement
would have been erroneous.
"I saw the stdout from web-ui of spark, and the query along with true was
printed for both the queries.".
The statements were correct as seen on the UI.
Thanks,
Siddharth Verma



On Thu, Apr 28, 2016 at 1:22 PM, Hannu Kröger  wrote:

> Hi,
>
> could it be consistency level issue? If you use ONE for reads and writes,
> might be that sometimes you don't get what you are writing.
>
> See:
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
>
> Br,
> Hannu
>
>
> 2016-04-27 20:41 GMT+03:00 Siddharth Verma :
>
>> Hi,
>> I dont know, if someone has faced this problem or not.
>> I am running a job where some data is loaded from cassandra table. From
>> that data, i make some insert and delete statements.
>> and execute it (using forEach)
>>
>> Code snippet:
>> boolean deleteStatus=
>> connector.openSession().execute(delete).wasApplied();
>> boolean  insertStatus =
>> connector.openSession().execute(insert).wasApplied();
>> System.out.println(delete+":"+deleteStatus);
>> System.out.println(insert+":"+insertStatus);
>>
>> When i run it locally, i see the respective results in the table.
>>
>> However when i run it on a cluster, sometimes the result is displayed and
>> sometime the changes don't take place.
>> I saw the stdout from web-ui of spark, and the query along with true was
>> printed for both the queries.
>>
>> I can't understand, what could be the issue.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Siddharth Verma
>>
>
>


Re: Discrepancy while paging through table, and static column updated inbetween

2016-04-28 Thread Siddharth Verma
Hi Tyler,
I have created a jira for another issue, which have encountered. It is not
limited only to our speculation about static column update.
https://issues.apache.org/jira/browse/CASSANDRA-11680

Thanks


On Tue, Apr 19, 2016 at 10:37 PM, Tyler Hobbs  wrote:

> This sounds similar to
> https://issues.apache.org/jira/browse/CASSANDRA-10010, but that only
> affected 2.x.  Can you open a Jira ticket with your table schema, the
> problematic query, and the details you posted here?
>
> On Tue, Apr 19, 2016 at 10:25 AM, Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi,
>>
>> We are using cassandra(dsc3.0.3) on production.
>>
>> For some purpose, we were doing a full table scan (setPagingState and
>> getPagingState used on ResultSet in java program), and there has been some
>> discrepancy when we ran the same job multiple times.
>> Each time some new data was added to the output, and some was left out.
>>
>> Side Note 1 :
>> Table structure
>> col1, col2, col3, col4, col5, col6
>> Primary key(col1, col2)
>> col5 is static column
>> col6 static column. Used to explicitly store updated time when col5
>> changed
>>
>>
>> Sample Data
>> 1,A,AA,AAA,STATIC,T1
>> 1,B,BB,BBB,STATIC,T1
>> 1,C,CC,CCC,STATIC,T1
>> 1,D,DD,DDD,STATIC,T1
>>
>> For some key, sometime col6 was updated while the job was running, so
>> some values were not printed for that partition key.
>>
>> Side Note 2 :
>> we did -> select col6, writetime(col6) from ... where col1=... and
>> col2=...
>> For the data that was missed out to make sure that particular entry
>> wasn't added later.
>>
>>
>> Side Note 3:
>> The above scenario that some col6 was updated while job was running,
>> therefore some entry for that partition key was ignored, is an assumption
>> from our end.
>> We can't understand why some entries were not printed in the table scan.
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Query regarding spark on cassandra

2016-04-27 Thread Siddharth Verma
Edit:
1. dc2 node has been removed.
nodetool status shows only active nodes.
2. Repair done on all nodes.
3. Cassandra restarted

Still it doesn't solve the problem.

On Thu, Apr 28, 2016 at 9:00 AM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi, If the info could be used
> we are using two DCs
> dc1 - 3 nodes
> dc2 - 1 node
> however, dc2 has been down for 3-4 weeks, and we haven't removed it yet.
>
> spark slaves on same machines as the cassandra nodes.
> each node has two instances of slaves.
>
> spark master on a separate machine.
>
> If anyone could provide insight to the problem, it would be helpful.
>
> Thanks
>
> On Wed, Apr 27, 2016 at 11:11 PM, Siddharth Verma <
> verma.siddha...@snapdeal.com> wrote:
>
>> Hi,
>> I dont know, if someone has faced this problem or not.
>> I am running a job where some data is loaded from cassandra table. From
>> that data, i make some insert and delete statements.
>> and execute it (using forEach)
>>
>> Code snippet:
>> boolean deleteStatus=
>> connector.openSession().execute(delete).wasApplied();
>> boolean  insertStatus =
>> connector.openSession().execute(insert).wasApplied();
>> System.out.println(delete+":"+deleteStatus);
>> System.out.println(insert+":"+insertStatus);
>>
>> When i run it locally, i see the respective results in the table.
>>
>> However when i run it on a cluster, sometimes the result is displayed and
>> sometime the changes don't take place.
>> I saw the stdout from web-ui of spark, and the query along with true was
>> printed for both the queries.
>>
>> I can't understand, what could be the issue.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>> Siddharth Verma
>>
>
>


Re: Query regarding spark on cassandra

2016-04-27 Thread Siddharth Verma
Hi, If the info could be used
we are using two DCs
dc1 - 3 nodes
dc2 - 1 node
however, dc2 has been down for 3-4 weeks, and we haven't removed it yet.

spark slaves on same machines as the cassandra nodes.
each node has two instances of slaves.

spark master on a separate machine.

If anyone could provide insight to the problem, it would be helpful.

Thanks

On Wed, Apr 27, 2016 at 11:11 PM, Siddharth Verma <
verma.siddha...@snapdeal.com> wrote:

> Hi,
> I dont know, if someone has faced this problem or not.
> I am running a job where some data is loaded from cassandra table. From
> that data, i make some insert and delete statements.
> and execute it (using forEach)
>
> Code snippet:
> boolean deleteStatus= connector.openSession().execute(delete).wasApplied();
> boolean  insertStatus =
> connector.openSession().execute(insert).wasApplied();
> System.out.println(delete+":"+deleteStatus);
> System.out.println(insert+":"+insertStatus);
>
> When i run it locally, i see the respective results in the table.
>
> However when i run it on a cluster, sometimes the result is displayed and
> sometime the changes don't take place.
> I saw the stdout from web-ui of spark, and the query along with true was
> printed for both the queries.
>
> I can't understand, what could be the issue.
>
> Any help would be appreciated.
>
> Thanks,
> Siddharth Verma
>


Query regarding spark on cassandra

2016-04-27 Thread Siddharth Verma
Hi,
I dont know, if someone has faced this problem or not.
I am running a job where some data is loaded from cassandra table. From
that data, i make some insert and delete statements.
and execute it (using forEach)

Code snippet:
boolean deleteStatus= connector.openSession().execute(delete).wasApplied();
boolean  insertStatus =
connector.openSession().execute(insert).wasApplied();
System.out.println(delete+":"+deleteStatus);
System.out.println(insert+":"+insertStatus);

When i run it locally, i see the respective results in the table.

However when i run it on a cluster, sometimes the result is displayed and
sometime the changes don't take place.
I saw the stdout from web-ui of spark, and the query along with true was
printed for both the queries.

I can't understand, what could be the issue.

Any help would be appreciated.

Thanks,
Siddharth Verma


Discrepancy while paging through table, and static column updated inbetween

2016-04-19 Thread Siddharth Verma
Hi,

We are using cassandra(dsc3.0.3) on production.

For some purpose, we were doing a full table scan (setPagingState and
getPagingState used on ResultSet in java program), and there has been some
discrepancy when we ran the same job multiple times.
Each time some new data was added to the output, and some was left out.

Side Note 1 :
Table structure
col1, col2, col3, col4, col5, col6
Primary key(col1, col2)
col5 is static column
col6 static column. Used to explicitly store updated time when col5 changed


Sample Data
1,A,AA,AAA,STATIC,T1
1,B,BB,BBB,STATIC,T1
1,C,CC,CCC,STATIC,T1
1,D,DD,DDD,STATIC,T1

For some key, sometime col6 was updated while the job was running, so some
values were not printed for that partition key.

Side Note 2 :
we did -> select col6, writetime(col6) from ... where col1=... and col2=...
For the data that was missed out to make sure that particular entry wasn't
added later.


Side Note 3:
The above scenario that some col6 was updated while job was running,
therefore some entry for that partition key was ignored, is an assumption
from our end.
We can't understand why some entries were not printed in the table scan.


Query regarding CassandraJavaRDD while running spark job on cassandra

2016-03-11 Thread Siddharth Verma
In cassandra I have a table with the following schema.

CREATE TABLE my_keyspace.my_table1 (
col_1 text,
col_2 text,
col_3 text,
col_4 text,,
col_5 text,
col_6 text,
col_7 text,
PRIMARY KEY (col_1, col_2, col_3)
) WITH CLUSTERING ORDER BY (col_2 ASC, col_3 ASC);

For processing I create a spark job.

CassandraJavaRDD data1 =
function.cassandraTable("my_keyspace", "my_table1")


1. Does it guarantee mutual exclusivity of fetched rows across all RDDs
which are on worker nodes?
(At the cost of redundancy and verbosity, I will reiterate.
Suppose I have an entry in the table : ('1','2','3','4','5','6','7')
What I mean to ask is, when I perform transformations/actions on data1
RDD), can I be sure that the above entry will be present on ONLY ONE worker
node?)

2. All the data pertaining to one partition will be on one node?
(Suppose I have the following entries in the table :
('p1','c2_1','c3_1','4','5','6','7')
('p1','c2_2','c3'_2,'4','5','6','7')
('p1','c2_3','c3_3','4','5','6','7')
('p1','c2_4','c3_4','4','5','6','7')
('p1' )
('p1' )
('p1' )
All the data for the same partition will be present on only one node?
)

3. If i have a DC specifically for analytics, and I place the spark worker
on the same machines as cassandra node, for that entire DC.
Can I make sure that the spark worker fetches the data from the token range
present on that node? (I.E. the node does't fetch data present on different
node)
3.1 (as with the above statement which doesn't have a 'where' clause).
3.2 (as with the above statement which has a 'where' clause).


Query regarding filter and where in spark on cassandra

2016-03-07 Thread Siddharth Verma
Hi,
While working with spark running on top of cassandra, I wanted to do some
filtering on data.
It can be done either on server side(where clause while cassandraTable
query is written) or on client side(filter transformation on rdd).
Which one of them is preferred keeping performance and time in mind?

I am using spark java connector.


*References :**1.*
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md
Note: See the description of filtering
<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md>
 to understand the limitations of the where method.
*2.*
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md
To filter rows, you can use the filter transformation provided by Spark
 To avoid this overhead, CassandraRDD offers the where method, which
lets you pass arbitrary CQL condition(s) to filter the row set on the
server.

Thanks and Regards

Siddharth Verma

*Software Engineer*

CA2125, 2nd Floor, ASF Centre-A, Jwala Mill Road,
Udyog Vihar Phase - IV, Gurgaon-122016, INDIA
Download Our App
[image: A]
<https://play.google.com/store/apps/details?id=com.snapdeal.main&utm_source=mobileAppLp&utm_campaign=android>
[image:
A]
<https://itunes.apple.com/in/app/snapdeal-mobile-shopping/id721124909?ls=1&mt=8&utm_source=mobileAppLp&utm_campaign=ios>
[image:
W]
<http://www.windowsphone.com/en-in/store/app/snapdeal/ee17fccf-40d0-4a59-80a3-04da47a5553f>