Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-27 Thread George Webster
This is an awesome effort. Thank you

Sent from my iPhone

> On Feb 27, 2018, at 1:17 PM, Carl Mueller  
> wrote:
> 
> Nice thanks
> 
> 
>> On Tue, Feb 27, 2018 at 12:03 PM, Jon Haddad  wrote:
>> There’s a section dedicated to contributing to Cassandra documentation in 
>> the docs as well: 
>> https://cassandra.apache.org/doc/latest/development/documentation.html
>> 
>> 
>> 
>>> On Feb 27, 2018, at 9:55 AM, Kenneth Brotman  
>>> wrote:
>>> 
>>> I was just getting ready to install sphinx.  Cool.  
>>>  
>>> From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad
>>> Sent: Tuesday, February 27, 2018 9:51 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Filling in the blank To Do sections on the Apache Cassandra 
>>> web site
>>>  
>>> The docs have been in tree for years :)
>>>  
>>> https://github.com/apache/cassandra/tree/trunk/doc
>>>  
>>> There’s even a docker image to build them so you don’t need to mess with 
>>> sphinx.  Check the README for instructions.
>>>  
>>> Jon
>>> 
>>> 
>>> On Feb 27, 2018, at 9:49 AM, Carl Mueller  
>>> wrote:
>>>  
>>> If there was a github for the docs, we could start posting content to it 
>>> for review. Not sure what the review/contribution process is on Apache. 
>>> Google searches on apache documentation and similar run into lots of noise 
>>> from actual projects.
>>> 
>>> I wouldn't mind trying to do a little doc work on the regular if there was 
>>> a wiki, a proven means to do collaborative docs. 
>>> 
>>>  
>>> On Tue, Feb 27, 2018 at 11:42 AM, Kenneth Brotman 
>>>  wrote:
>>> It’s just content for web pages.  There isn’t a working outline or any 
>>> draft on any of the JIRA’s yet.  I like to keep things simple.  Did I miss 
>>> something?  What does it matter right now?
>>>  
>>> Thanks Carl,
>>>  
>>> Kenneth Brotman
>>>  
>>> From: Carl Mueller [mailto:carl.muel...@smartthings.com] 
>>> Sent: Tuesday, February 27, 2018 8:50 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Filling in the blank To Do sections on the Apache Cassandra 
>>> web site
>>>  
>>> so... are those pages in the code tree of github? I don't see them or a 
>>> directory structure under /doc. Is mirroring the documentation between the 
>>> apache site and a github source a big issue?
>>>  
>>> On Tue, Feb 27, 2018 at 7:50 AM, Kenneth Brotman 
>>>  wrote:
>>> I was debating that.  Splitting it up into smaller tasks makes each one 
>>> seem less over-whelming.  
>>>  
>>> Kenneth Brotman
>>>  
>>> From: Josh McKenzie [mailto:jmckenzie@apacheorg] 
>>> Sent: Tuesday, February 27, 2018 5:44 AM
>>> To: cassandra
>>> Subject: Re: Filling in the blank To Do sections on the Apache Cassandra 
>>> web site
>>>  
>>> Might help, organizationally, to put all these efforts under a single 
>>> ticket of "Improve web site Documentation" and add these as sub-tasks 
>>> Should be able to do that translation post-creation (i.e. in its current 
>>> state) if that's something that makes sense to you.
>>>  
>>> On Mon, Feb 26, 2018 at 5:24 PM, Kenneth Brotman 
>>>  wrote:
>>> Here are the related JIRA’s.  Please add content even if It’s not well 
>>> formed compositionally.  Myself or someone else will take it from there
>>>  
>>> https://issues.apache.org/jira/browse/CASSANDRA-14274  The troubleshooting 
>>> section of the web site is empty
>>> https://issues.apache.org/jira/browse/CASSANDRA-14273  The Bulk Loading web 
>>> page on the web site is empty
>>> https://issues.apache.org/jira/browse/CASSANDRA-14272  The Backups web page 
>>> on the web site is empty
>>> https://issues.apache.org/jira/browse/CASSANDRA-14271  The Hints web page 
>>> in the web site is empty
>>> https://issues.apache.org/jira/browse/CASSANDRA-14270  The Read repair web 
>>> page is empty
>>> https://issuesapache.org/jira/browse/CASSANDRA-14269  The Data Modeling 
>>> section of the web site is empty
>>> https://issues.apache.org/jira/browse/CASSANDRA-14268  The 
>>> Architecture:Guarantees web page is empty
>>> https://issuesapache.org/jira/browse/CASSANDRA-14267  The Dynamo web page 
>>> on the Apache Cassandra site is missing content
>>> https://issues.apache.org/jira/browse/CASSANDRA-14266  The Architecture 
>>> Overview web page on the Apache Cassandra site is empty
>>>  
>>> Thanks for pitching in  
>>>  
>>> Kenneth Brotman
>>>  
>>> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
>>> Sent: Monday, February 26, 2018 1:54 PM
>>> To: user@cassandra.apache.org
>>> Subject: RE: Filling in the blank To Do sections on the Apache Cassandra 
>>> web site
>>>  
>>> Nice!  Thanks for the help Oliver!
>>>  
>>> Kenneth Brotman
>>>  
>>> From: Oliver Ruebenacker [mailto:cur...@gmail.com] 
>>> Sent: Sunday, February 25, 2018 7:12 AM
>>> To: user@cassandra.apache.org
>>> Cc: dev@cassandra.apacheorg
>>> 

Re: question of keyspace that just disappeared

2017-03-03 Thread George Webster
I think it does on drop keyspace. We had a recent enough snapshot so it
wasn't a big deal to recover. However, we didn't have a snapshot for when
the keyspace disappeared.

@Romain: I believe you are correct about reliability. We just had a repair
--full fail and CPU lock up one of the nodes at 100%. This occurred on a
fairly new keyspace that only have writes. We also are now receiving a very
high percentage of read timeouts. ... might be time to rebuild the cluster.



On Fri, Mar 3, 2017 at 2:34 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
> On Fri, Mar 3, 2017 at 7:56 AM, Romain Hardouin <romainh...@yahoo.fr>
> wrote:
>
>> I suspect a lack of 3.x reliability. Cassandra could had gave up with
>> dropped messages but not with a "drop keyspace". I mean I already saw some
>> spark jobs with too much executors that produce a high load average on a
>> DC. I saw a C* node with a 1 min. load avg of 140 that can still have a P99
>> read latency at 40ms. But I never saw a disappearing keyspace. There are
>> old tickets regarding C* 1.x but as far as I remember it was due to a
>> create/drop/create keyspace.
>>
>>
>> Le Vendredi 3 mars 2017 13h44, George Webster <webste...@gmail.com> a
>> écrit :
>>
>>
>> Thank you for your reply and good to know about the debug statement. I
>> haven't
>>
>> We never dropped or re-created the keyspace before. We haven't even
>> performed writes to that keyspace in months. I also checked the permissions
>> of Apache, that user had read only access.
>>
>> Unfortunately, I reverted from a backend recently. I cannot say for sure
>> anymore if I saw something in system before the revert.
>>
>> Anyway, hopefully it was just a fluke. We have some crazy ML libraries
>> running on it maybe Cassandra just gave up? Ohh well, Cassandra is a a
>> champ and we haven't really had issues with it before.
>>
>> On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin <romainh...@yahoo.fr>
>> wrote:
>>
>> Did you inspect system tables to see if there is some traces of your
>> keyspace? Did you ever drop and re-create this keyspace before that?
>>
>> Lines in debug appear because fd interval is > 2 seconds (logs are in
>> nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_
>> ms and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't
>> have these lines in debug logs before? I used to see them a lot prior to
>> increase intervals to 4 seconds.
>>
>> Best,
>>
>> Romain
>>
>> Le Mardi 28 février 2017 18h25, George Webster <webste...@gmail.com> a
>> écrit :
>>
>>
>> Hey Cassandra Users,
>>
>> We recently encountered an issue with a keyspace just disappeared. I was
>> curious if anyone has had this occur before and can provide some insight.
>>
>> We are using cassandra 3.10. 2 DCs  3 nodes each.
>> The data was still located in the storage folder but is not located
>> inside Cassandra
>>
>> I searched the logs for any hints of error or commands being executed
>> that could have caused a loss of a keyspace. Unfortunately I found nothing.
>> In the logs the only unusual issue i saw was a series of read timeouts that
>> occurred right around when the keyspace went away. Since then I see
>> numerous entries in debug log as the following:
>>
>> DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 -
>> Ignoring interval time of 2155674599 for /x.x.x..12
>> DEBUG [GossipStage:1] 2017-02-28 18:14:16,580 FailureDetector.java:457 -
>> Ignoring interval time of 2945213745 for /x.x.x.81
>> DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 FailureDetector.java:457 -
>> Ignoring interval time of 2006530862 for /x.x.x..69
>> DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 FailureDetector.java:457 -
>> Ignoring interval time of 3441841231 for /x.x.x.82
>> DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 -
>> Ignoring interval time of 2153964846 for /x.x.x.82
>> DEBUG [GossipStage:1] 2017-02-28 18:14:33,582 FailureDetector.java:457 -
>> Ignoring interval time of 2588593281 for /x.x.x.82
>> DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 FailureDetector.java:457 -
>> Ignoring interval time of 2005305693 for /x.x.x.69
>> DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 -
>> Ignoring interval time of 2009244850 for /x.x.x.82
>> DEBUG [GossipStage:1] 2017-02-28 18:14:43,584 FailureDetector.java:457 -
>> Ignoring interval time of 2149192677 for /x.x.x.69
>> DEBUG [GossipStage:1] 2017-02-28 

Re: question of keyspace that just disappeared

2017-03-03 Thread George Webster
Thank you for your reply and good to know about the debug statement. I
haven't

We never dropped or re-created the keyspace before. We haven't even
performed writes to that keyspace in months. I also checked the permissions
of Apache, that user had read only access.

Unfortunately, I reverted from a backend recently. I cannot say for sure
anymore if I saw something in system before the revert.

Anyway, hopefully it was just a fluke. We have some crazy ML libraries
running on it maybe Cassandra just gave up? Ohh well, Cassandra is a a
champ and we haven't really had issues with it before.

On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin <romainh...@yahoo.fr> wrote:

> Did you inspect system tables to see if there is some traces of your
> keyspace? Did you ever drop and re-create this keyspace before that?
>
> Lines in debug appear because fd interval is > 2 seconds (logs are in
> nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_ms
> and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't have
> these lines in debug logs before? I used to see them a lot prior to
> increase intervals to 4 seconds.
>
> Best,
>
> Romain
>
> Le Mardi 28 février 2017 18h25, George Webster <webste...@gmail.com> a
> écrit :
>
>
> Hey Cassandra Users,
>
> We recently encountered an issue with a keyspace just disappeared. I was
> curious if anyone has had this occur before and can provide some insight.
>
> We are using cassandra 3.10. 2 DCs  3 nodes each.
> The data was still located in the storage folder but is not located inside
> Cassandra
>
> I searched the logs for any hints of error or commands being executed that
> could have caused a loss of a keyspace. Unfortunately I found nothing. In
> the logs the only unusual issue i saw was a series of read timeouts that
> occurred right around when the keyspace went away. Since then I see
> numerous entries in debug log as the following:
>
> DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 -
> Ignoring interval time of 2155674599 <(215)%20567-4599> for /x.x.x..12
> DEBUG [GossipStage:1] 2017-02-28 18:14:16,580 FailureDetector.java:457 -
> Ignoring interval time of 2945213745 for /x.x.x.81
> DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 FailureDetector.java:457 -
> Ignoring interval time of 2006530862 for /x.x.x..69
> DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 FailureDetector.java:457 -
> Ignoring interval time of 3441841231 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 -
> Ignoring interval time of 2153964846 <(215)%20396-4846> for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:33,582 FailureDetector.java:457 -
> Ignoring interval time of 2588593281 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 FailureDetector.java:457 -
> Ignoring interval time of 2005305693 for /x.x.x.69
> DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 -
> Ignoring interval time of 2009244850 for /x.x.x.82
> DEBUG [GossipStage:1] 2017-02-28 18:14:43,584 FailureDetector.java:457 -
> Ignoring interval time of 2149192677 <(214)%20919-2677> for /x.x.x.69
> DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 FailureDetector.java:457 -
> Ignoring interval time of 2021180918 for /x.x.x.85
> DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 -
> Ignoring interval time of 2436026101 for /x.x.x.81
> DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 -
> Ignoring interval time of 2436187894 for /x.x.x.82
>
> During the time of the disappearing keyspace we had two concurrent
> activities:
> 1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a
> countbykey. It was using they Keyspace that disappeared. The operation
> crashed.
> 2) We created a new keyspace to test out scheme. Only "fancy" thing in
> that keyspace are a few material view tables. Data was being loaded into
> that keyspace during the crash. The load process was extracting information
> and then just writing to Cassandra.
>
> Any ideas? Anyone seen this before?
>
> Thanks,
> George
>
>
>


question of keyspace that just disappeared

2017-02-28 Thread George Webster
Hey Cassandra Users,

We recently encountered an issue with a keyspace just disappeared. I was
curious if anyone has had this occur before and can provide some insight.

We are using cassandra 3.10. 2 DCs  3 nodes each.
The data was still located in the storage folder but is not located inside
Cassandra

I searched the logs for any hints of error or commands being executed that
could have caused a loss of a keyspace. Unfortunately I found nothing. In
the logs the only unusual issue i saw was a series of read timeouts that
occurred right around when the keyspace went away. Since then I see
numerous entries in debug log as the following:

DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 -
Ignoring interval time of 2155674599 for /x.x.x..12
DEBUG [GossipStage:1] 2017-02-28 18:14:16,580 FailureDetector.java:457 -
Ignoring interval time of 2945213745 for /x.x.x.81
DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 FailureDetector.java:457 -
Ignoring interval time of 2006530862 for /x.x.x..69
DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 FailureDetector.java:457 -
Ignoring interval time of 3441841231 for /x.x.x.82
DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 -
Ignoring interval time of 2153964846 for /x.x.x.82
DEBUG [GossipStage:1] 2017-02-28 18:14:33,582 FailureDetector.java:457 -
Ignoring interval time of 2588593281 for /x.x.x.82
DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 FailureDetector.java:457 -
Ignoring interval time of 2005305693 for /x.x.x.69
DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 -
Ignoring interval time of 2009244850 for /x.x.x.82
DEBUG [GossipStage:1] 2017-02-28 18:14:43,584 FailureDetector.java:457 -
Ignoring interval time of 2149192677 for /x.x.x.69
DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 FailureDetector.java:457 -
Ignoring interval time of 2021180918 for /x.x.x.85
DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 -
Ignoring interval time of 2436026101 for /x.x.x.81
DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 -
Ignoring interval time of 2436187894 for /x.x.x.82

During the time of the disappearing keyspace we had two concurrent
activities:
1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a
countbykey. It was using they Keyspace that disappeared. The operation
crashed.
2) We created a new keyspace to test out scheme. Only "fancy" thing in that
keyspace are a few material view tables. Data was being loaded into that
keyspace during the crash. The load process was extracting information and
then just writing to Cassandra.

Any ideas? Anyone seen this before?

Thanks,
George


Re: Question on write failures logs show Uncaught exception on thread Thread[MutationStage-1,5,main]

2016-10-24 Thread George Webster
thank you that is quite helpful

On Mon, Oct 24, 2016 at 11:00 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> The driver will enforce a max batch size of 65k.
> This is an issue in versions of cassandra like 2.1.X. There are control
> variables for the logged and unlogged batch size. You may also have to
> tweak your commitlog size as well.
>
> I demonstrate this here:
> https://github.com/edwardcapriolo/ec/blob/master/src/test/java/Base/batch/
> BigBatches2_2_6_tweeked.java
>
> Latest tick-tock version I tried worked out of the box.
>
> The only drawback of batches is potential JVM pressure. I did some some
> permutations of memory settings with the tests above. You can get a feel
> for rate + batch size and the jvm pressure it causes.
>
> On Mon, Oct 24, 2016 at 4:10 PM, George Webster <webste...@gmail.com>
> wrote:
>
>> Hey cassandra users,
>>
>> When performing writes I have hit an issue where the server is unable to
>> perform writes. The logs show:
>>
>> WARN  [MutationStage-1] 2016-10-24 22:05:52,592
>> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on
>> thread Thread[MutationStage-1,5,main]: {}
>> java.lang.IllegalArgumentException: Mutation of 16.011MiB is too large
>> for the maximum size of 16.000MiB
>> at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:220)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at 
>> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:69)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_101]
>> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorSe
>> rvice$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>> ~[apache-cassandra-3.9.jar:3.9]
>> at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorSe
>> rvice$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>> [apache-cassandra-3.9.jar:3.9]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
>> [apache-cassandra-3.9.jar:3.9]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
>>
>>
>> Looking around on google I found this guide https://support.datastax
>> .com/hc/en-us/articles/207267063-Mutation-of-x-bytes-
>> is-too-large-for-the-maxiumum-size-of-y-
>> that states I can increase the commitlog_segment_size_in_mb to solve the
>> problem.
>>
>> However, I wanted to ask if their are any drawbacks to doing so.
>>
>> Thanks you for your guidance.
>>
>> Respectfully,
>> George
>>
>
>


Question on write failures logs show Uncaught exception on thread Thread[MutationStage-1,5,main]

2016-10-24 Thread George Webster
Hey cassandra users,

When performing writes I have hit an issue where the server is unable to
perform writes. The logs show:

WARN  [MutationStage-1] 2016-10-24 22:05:52,592
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
Thread[MutationStage-1,5,main]: {}
java.lang.IllegalArgumentException: Mutation of 16.011MiB is too large for
the maximum size of 16.000MiB
at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:262)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215)
~[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:220)
~[apache-cassandra-3.9.jar:3.9]
at
org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:69)
~[apache-cassandra-3.9.jar:3.9]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-3.9.jar:3.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_101]
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.9.jar:3.9]
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.9.jar:3.9]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
[apache-cassandra-3.9.jar:3.9]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]


Looking around on google I found this guide
https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y-
that states I can increase the commitlog_segment_size_in_mb to solve the
problem.

However, I wanted to ask if their are any drawbacks to doing so.

Thanks you for your guidance.

Respectfully,
George


Re: question when using SASI indexing

2016-08-05 Thread George Webster
Thanks DuyHai,

I would agree but we have not performed any delete operations in over a
month. To me this looks like a potential bug or misconfiguration (on my
end) with SASI.

I say this for a few reasons:
1) we have not performed a delete operation since the indexes were created
2) when I perform a query, against the same table, for the sha256 of an ELF
file I do receive a result.
SELECT * FROM testing.objects WHERE sha256 =
'1b218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f';

 sha256   | mime
--+-
 1b218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF
32-bit MSB  executable, PowerPC or cisco 4500, version 1 (SYSV)

3) If I dont use the SASI index and instead loop through the entries
manually, I get 187 results.
4) When I attempted the same SASI query again today, I again receive
inconsistent results that were between 0-7. After a few attempts it again
began to return 0.

Do you see any errors in my index command?

CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed'
: 'true', 'analyzer_class' :
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'tokenization_enable_stemming' : 'false', 'tokenization_locale' :
'en', 'tokenization_normalize_lowercase' : 'true',
'tokenization_skip_stop_words' : 'true'};


Some of our SASI indexes are fairly large as we were testing the ability to
use SASI over elastic search or basic processing through spark. I will run
some more tests today and see if I can uncover anything.


On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Ok the fact that you see some rows and after a while you see 0 rows means
> that those rows are deleted.
>
> Since SASI does only index INSERT & UPDATE but not DELETE, management of
> tombstones is let to Cassandra to handle.
>
> It means that if you do an INSERT, you'll have an entry into SASI index
> file but when you do a DELETE, SASI does not remove the entry from its
> index file.
>
> When reading, SASI will give the partition offset to Cassandra and
> Cassandra will fetch the data from SSTables, then realises that there is a
> tombstone, thus return 0 row.
>
> The only moment those entries will be remove from SASI index file is when
> your SSTable get compacted and the data are purged.
>
> The fact that you can see some rows then 0 rows mean that some of your
> replicas have missed the tombstones.
>
> "However, after about 20 attempts, all servers started to only return 0
> results. " --> Read-repair kicks in so the tombstones are propagated and
> then you see 0 row.
>
>
>
> On Tue, Aug 2, 2016 at 10:52 PM, George Webster <webste...@gmail.com>
> wrote:
>
>> The indexes were written about 1-2 months ago. No data has been added to
>> the servers since the indexes were created. Additionally, the indexes
>> appeared to be stable until I noticed the issue today. ... which occurred
>> after a made a large query without setting a LIMIT
>>
>> I set the consistency level and moved the select statement between
>> different nodes. The results remained inconsistent, returning a random
>> number between 0 and 8. It did not appear to make much difference between
>> the different nodes or consistency level. However, after about 20 attempts,
>> all servers started to only return 0 results.
>>
>>
>> Lastly, this appeared in the logs during that time:
>>
>> INFO  [IndexSummaryManager:1] 2016-08-02 22:11:43,245
>> IndexSummaryRedistribution.java:74 - Redistributing index summaries
>>
>> INFO  [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 -
>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>> 1048576 bytes
>>
>> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>>> One possible explanation is that you're querying data while the index
>>> files are being built so that the result are different
>>>  The second possible explanation is the consistency level.
>>>
>>> Try the query again using CL = QUORUM, try on several nodes to see if
>>> the results are different
>>>
>>> On Tue, Aug 2, 2016 at 6:32 PM, George Webster <webste...@gmail.com>
>>> wrote:
>>>
>>>> Hey DuyHai,
>>>> Thank you for your help.
>>>>
>>>> 1) Cassandra version
>>>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
>>>>
>>>

Re: question when using SASI indexing

2016-08-02 Thread George Webster
The indexes were written about 1-2 months ago. No data has been added to
the servers since the indexes were created. Additionally, the indexes
appeared to be stable until I noticed the issue today. ... which occurred
after a made a large query without setting a LIMIT

I set the consistency level and moved the select statement between
different nodes. The results remained inconsistent, returning a random
number between 0 and 8. It did not appear to make much difference between
the different nodes or consistency level. However, after about 20 attempts,
all servers started to only return 0 results.


Lastly, this appeared in the logs during that time:

INFO  [IndexSummaryManager:1] 2016-08-02 22:11:43,245
IndexSummaryRedistribution.java:74 - Redistributing index summaries

INFO  [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 -
Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
1048576 bytes

On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> One possible explanation is that you're querying data while the index
> files are being built so that the result are different
>  The second possible explanation is the consistency level.
>
> Try the query again using CL = QUORUM, try on several nodes to see if the
> results are different
>
> On Tue, Aug 2, 2016 at 6:32 PM, George Webster <webste...@gmail.com>
> wrote:
>
>> Hey DuyHai,
>> Thank you for your help.
>>
>> 1) Cassandra version
>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
>>
>>
>> 2) CREATE CUSTOM INDEX statement for your index
>>
>> CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING 
>> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : 
>> 'true', 'analyzer_class' : 
>> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
>> 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 
>> 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' 
>> : 'true'};
>>
>>
>> 3) Consistency level used for your SELECT
>> I am using the default consistency
>> cassandra@cqlsh> CONSISTENCY
>> Current consistency level is ONE.
>>
>>
>> 4) Replication factor
>>
>> CREATE KEYSPACE system_distributed WITH REPLICATION = {
>>  'class' : 'org.apache.cassandra.locator.SimpleStrategy',
>>  'replication_factor': '3' }
>> AND DURABLE_WRITES = true;
>>
>>
>> 5) Are you creating the index when the table is EMPTY or have you created
>> the index when the table already contains some data ?
>> I created the indexes after the tables contained data.
>>
>>
>> On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>>> Hello George
>>>
>>> Can you provide more details ?
>>>
>>> 1) Cassandra version
>>> 2) CREATE CUSTOM INDEX statement for your index
>>> 3) Consistency level used for your SELECT
>>> 4) Replication factor
>>> 5) Are you creating the index when the table is EMPTY or have you
>>> created the index when the table already contains some data ?
>>>
>>> On Tue, Aug 2, 2016 at 4:05 PM, George Webster <webste...@gmail.com>
>>> wrote:
>>>
>>>> Hey guys and gals,
>>>>
>>>> I am having a strange issue with Cassandra SASI and I was hoping you
>>>> could help solve the mystery. My issue is inconsistency between returned
>>>> results and strange log errors.
>>>>
>>>> The biggest issue is that when I perform a query I am getting back
>>>> inconsistent results. First few times I received between 3 and 7 results
>>>> and then I finally received 187 results. At no point in time did I change
>>>> the query statement. However, after I received the 187 results, any on
>>>> queries returned zero results.
>>>>
>>>> my query:
>>>> SELECT *
>>>> FROM test.objects
>>>> WHERE mime LIKE 'ELF%';
>>>>
>>>> When I look in the system.log file I see the following:
>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>>
>>>>
>>>> When I look in the debug.log file I see the following when zero results
>>>> are returned:
>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>>
>>>> Additionally, I see a lot of errors in the log that state:
>>>> INFO  [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91 -
>>>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>>>> 1048576 bytes
>>>> INFO  [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91 -
>>>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>>>> 1048576 bytes
>>>>
>>>>
>>>> Any ideas?
>>>>
>>>>
>>>
>>
>


Re: question when using SASI indexing

2016-08-02 Thread George Webster
Hey DuyHai,
Thank you for your help.

1) Cassandra version
[cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]


2) CREATE CUSTOM INDEX statement for your index

CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed'
: 'true', 'analyzer_class' :
'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'tokenization_enable_stemming' : 'false', 'tokenization_locale' :
'en', 'tokenization_normalize_lowercase' : 'true',
'tokenization_skip_stop_words' : 'true'};


3) Consistency level used for your SELECT
I am using the default consistency
cassandra@cqlsh> CONSISTENCY
Current consistency level is ONE.


4) Replication factor

CREATE KEYSPACE system_distributed WITH REPLICATION = {
'class' : 'org.apache.cassandra.locator.SimpleStrategy',
'replication_factor': '3' }
AND DURABLE_WRITES = true;


5) Are you creating the index when the table is EMPTY or have you created
the index when the table already contains some data ?
I created the indexes after the tables contained data.


On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Hello George
>
> Can you provide more details ?
>
> 1) Cassandra version
> 2) CREATE CUSTOM INDEX statement for your index
> 3) Consistency level used for your SELECT
> 4) Replication factor
> 5) Are you creating the index when the table is EMPTY or have you created
> the index when the table already contains some data ?
>
> On Tue, Aug 2, 2016 at 4:05 PM, George Webster <webste...@gmail.com>
> wrote:
>
>> Hey guys and gals,
>>
>> I am having a strange issue with Cassandra SASI and I was hoping you
>> could help solve the mystery. My issue is inconsistency between returned
>> results and strange log errors.
>>
>> The biggest issue is that when I perform a query I am getting back
>> inconsistent results. First few times I received between 3 and 7 results
>> and then I finally received 187 results. At no point in time did I change
>> the query statement. However, after I received the 187 results, any on
>> queries returned zero results.
>>
>> my query:
>> SELECT *
>> FROM test.objects
>> WHERE mime LIKE 'ELF%';
>>
>> When I look in the system.log file I see the following:
>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>> SelectStatement.java:351 - Aggregation query used without partition key
>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>> SelectStatement.java:351 - Aggregation query used without partition key
>>
>>
>> When I look in the debug.log file I see the following when zero results
>> are returned:
>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>> SelectStatement.java:351 - Aggregation query used without partition key
>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>> SelectStatement.java:351 - Aggregation query used without partition key
>>
>> Additionally, I see a lot of errors in the log that state:
>> INFO  [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91 -
>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>> 1048576 bytes
>> INFO  [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91 -
>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>> 1048576 bytes
>>
>>
>> Any ideas?
>>
>>
>


question when using SASI indexing

2016-08-02 Thread George Webster
Hey guys and gals,

I am having a strange issue with Cassandra SASI and I was hoping you could
help solve the mystery. My issue is inconsistency between returned results
and strange log errors.

The biggest issue is that when I perform a query I am getting back
inconsistent results. First few times I received between 3 and 7 results
and then I finally received 187 results. At no point in time did I change
the query statement. However, after I received the 187 results, any on
queries returned zero results.

my query:
SELECT *
FROM test.objects
WHERE mime LIKE 'ELF%';

When I look in the system.log file I see the following:
WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
SelectStatement.java:351 - Aggregation query used without partition key
WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
SelectStatement.java:351 - Aggregation query used without partition key


When I look in the debug.log file I see the following when zero results are
returned:
WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
SelectStatement.java:351 - Aggregation query used without partition key
WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
SelectStatement.java:351 - Aggregation query used without partition key

Additionally, I see a lot of errors in the log that state:
INFO  [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91 -
Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
1048576 bytes
INFO  [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91 -
Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
1048576 bytes


Any ideas?