Re: All clear to import dtest

2017-06-15 Thread Murukesh Mohanan
Will the Github repo continue to get updates (perhaps as a mirror of the
ASF one)? Just checking to see if any changes need to be made in the near
future for an in-house CI pipeline running dtests.

On Fri, 16 Jun 2017 at 09:39 Nate McCall  wrote:

> We got through all the legal shenanigans necessary to import dtest
> into the project:
> https://issues.apache.org/jira/browse/CASSANDRA-13584
>
> This is just a heads up that I have created CASSANDRA-13613 for
> tracking the actual import and will be adding some sub-tasks next week
> for the actual migration into ASF infra and housekeeping around such.
>
> To sum up previous mail list conversations [0], we will be creating a
> new "cassandra-dtest" git repo at the ASF with the same committer list
> as the project.
>
> Anybody have any remaining issues or concerns here?
>
> Thanks,
> -Nate
>
> [0]
> https://lists.apache.org/thread.html/840fd900fb7f6568bfa008d122d4375b708c1f7f1b5929018118d5d5@%3Cdev.cassandra.apache.org%3E
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
> --

Murukesh Mohanan,
Yahoo! Japan


All clear to import dtest

2017-06-15 Thread Nate McCall
We got through all the legal shenanigans necessary to import dtest
into the project:
https://issues.apache.org/jira/browse/CASSANDRA-13584

This is just a heads up that I have created CASSANDRA-13613 for
tracking the actual import and will be adding some sub-tasks next week
for the actual migration into ASF infra and housekeeping around such.

To sum up previous mail list conversations [0], we will be creating a
new "cassandra-dtest" git repo at the ASF with the same committer list
as the project.

Anybody have any remaining issues or concerns here?

Thanks,
-Nate

[0] 
https://lists.apache.org/thread.html/840fd900fb7f6568bfa008d122d4375b708c1f7f1b5929018118d5d5@%3Cdev.cassandra.apache.org%3E

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: New contribution - Burst Hour Compaction Strategy

2017-06-15 Thread Pedro Gordo
Hi

Thanks for engaging in this discussion!

Cameron, regarding the benchmark, I need to spend some time exploring the
stress tool options, but I aim to create a stress test that goes on for a
period of at least 48 hours, and then run it for all strategies (with a
24-hour burst for BHCS). I want this test to be as comprehensive as
possible, so I'll need some time to do it properly. The key search is
optimised so I'm curious to see these tests' results. I'll share these when
ready.

If the tests results are not good enough to push for BHCS, I'll do what
Jeff suggested (wrapper to schedule). I don't plan to push for something,
just for the sake of contributing. I want to see improvements! :) However,
the workaround from Ed Capriolo's presentation seems to be exactly just
that. Is this workaround an established industry practice, or the last
resort hack? If yes, do you think the users would still prefer to switch to
a schedule configured per table (or keyspace)?

A couple of things to note, correct me if I'm wrong:
- The STCS major compaction creates two massive SSTables. It seems to me
this is not desirable, as I've always seen the big SSTables of STCS as its
main disadvantage.
- LCS reduces the number of key copies using the levels, but we can still
have as many key copies as levels. Wouldn't it be desirable to have a
single key in a single table like BHCS does?

>From the above points, and expanding on the idea of the wrapper around the
existing in-tree strategies, do you think it would be an advantage to offer
this process instead?:
- Background compaction: it would run the background compaction of the
strategy chosen by the user during the configured schedule.
- Scheduled Major compaction: it would run the BHCS major compaction at the
chosen time (e.g. once per day at an off-peak time), which makes keys
unique across all tables (improvement over LCS), and the SSTables would
have a configurable maximum size (improvement over STCS). This seems to me
the optimal situation (1 key and controllable tables' size) that everyone
would like to see happening on their clusters.

What do you think of this approach?

Pedro Gordo

On 15 June 2017 at 04:01, Jeff Jirsa  wrote:

> Hi Pedro,
>
> I did a quick read through of your strategy, and have a few personal
> thoughts:
>
> First, writing a compaction strategy is a lot of work, and it's great to
> see new contributors take on ambitious projects. There are even a handful
> of ideas in here that may be useful to other strategies.
>
> The overall concept is interesting - many companies have "peak" times and
> "offpeak" times, and being able to run compaction only during offpeak may
> be really helpful. This concept is actually in the old wiki dating back
> many years , for example Ed Capriolo gave a talk (
> https://www.slideshare.net/edwardcapriolo/m6d-cassandrapresentation -
> check out slide #28 ) where he showed how to achieve this with cron
> and nodetool.
>
> The actual logic you use to select candidates probably isn't perfect,
> because it can be pretty nuanced. But rather than focus on that, if we take
> advantage of the larger concept that it's useful to be able to turn
> compaction on/off on a schedule, there may be another opportunity - rather
> than try to re-implement some of the concepts of LCS without using LCS, you
> could just make BurstHourCompactionStrategy a wrapper around any
> user-specified compaction strategy. That is, it may be much less work and
> much more valuable if you actually let users specify which underlying
> compaction strategy to wrap, and then simply use the underlying the wrapped
> getNextBackgroundCompactionTask()
>
> For the project to be willing to accept a 5th compaction strategy, I
> imagine committers will want to see some benchmarks and hopefully some
> concrete examples of how it's beneficial and solves a problem that can't be
> better solved in other ways. I think at a high level many people can
> understand how it's useful, but you may want to compare/contrast it to Ed's
> method in the deck above (in particular, using nodetool and cron, you can
> have multiple "compaction-enabled" windows during the day, and you can
> throttle it so that it never fully stops, but slows down enough that it
> doesn't impact latencies.
>
> Again, that's just my personal thought based on a quick read through.
>
> Nice work so far!
> - Jeff
>
>
> On Wed, Jun 14, 2017 at 2:49 PM, Pedro Gordo 
> wrote:
>
>> Hi
>>
>> I've addressed the issues with Git. I believe this is what Stefan asking
>> for: https://github.com/sedulam/cassandra/tree/12201
>> I've also added more tests for BHCS, including more for wide rows
>> following
>> Jeff's suggestion.
>>
>> Thanks for the directions so far! If there's something else you would like
>> to see tested or some metrics, please let me know what would be relevant.
>>
>> All the best
>>
>>
>> Pedro Gordo
>>
>> On 13 June 2017 at 15:43, Pedro Gordo  wrote:
>>
>> > Hi all
>> >
>> > Although a couple of people eng

Re: Is concurrent_batchlog_writes option used/implemented?

2017-06-15 Thread Jeremiah D Jordan
The project hosted docs can be found here:
http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html 


If you find something wrong in those open a JIRA.

DataStax has a documentation feedback page here if you want to contact their 
documentation team: 
http://docs.datastax.com/en/landing_page/doc/landing_page/contact.html 


-Jeremiah


> On Jun 15, 2017, at 11:22 AM, Jason Brown  wrote:
> 
> Hey Tomas,
> 
> Thanks for finding these errors. Unfortunately, those are problems on the
> Datastax-hosted documentation, not the docs hosted by the Apache project.
> To fix those problems you should contact Datastax (I don't have a URL handy
> rn, but if one of the DS folks who follow this list can add one that would
> be great).
> 
> I can't look right now, but do we have similar documentation on the Apache
> docs?
> 
> Thanks,
> 
> Jason
> 
> On Thu, Jun 15, hose2017 at 01:46 Tomas Repik  wrote:
> 
>> And yet another glitch in the ob at:
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__cqlTruncateequest_timeout_in_ms
>> 
>> I guess it should be truncate_timeout_in_ms instead.
>> 
>> Is there a more proper way I should use to report these kind of issues? If
>> yes, thanks for giving any directions.
>> 
>> Tomas
>> 
>> - Original Message -
>>> Thanks for information I thought this would be the case ...
>>> 
>>> I found another option that is not documented properly:
>>> allocate_tokens_for_local_replication_factor [1] option is not found in
>> any
>>> config file instead the allocate_tokens_for_keyspace option is present. I
>>> guess it is the replacement for the former but I can't see it documented
>>> anywhere. Thanks for clarification.
>>> 
>>> Tomas
>>> 
>>> [1]
>>> 
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__allocate_tokens_for_local_replication_factor
>>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 



Re: Is concurrent_batchlog_writes option used/implemented?

2017-06-15 Thread Jason Brown
Hey Tomas,

Thanks for finding these errors. Unfortunately, those are problems on the
Datastax-hosted documentation, not the docs hosted by the Apache project.
To fix those problems you should contact Datastax (I don't have a URL handy
rn, but if one of the DS folks who follow this list can add one that would
be great).

I can't look right now, but do we have similar documentation on the Apache
docs?

Thanks,

Jason

On Thu, Jun 15, hose2017 at 01:46 Tomas Repik  wrote:

> And yet another glitch in the ob at:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__cqlTruncateequest_timeout_in_ms
>
> I guess it should be truncate_timeout_in_ms instead.
>
> Is there a more proper way I should use to report these kind of issues? If
> yes, thanks for giving any directions.
>
> Tomas
>
> - Original Message -
> > Thanks for information I thought this would be the case ...
> >
> > I found another option that is not documented properly:
> > allocate_tokens_for_local_replication_factor [1] option is not found in
> any
> > config file instead the allocate_tokens_for_keyspace option is present. I
> > guess it is the replacement for the former but I can't see it documented
> > anywhere. Thanks for clarification.
> >
> > Tomas
> >
> > [1]
> >
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__allocate_tokens_for_local_replication_factor
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Is concurrent_batchlog_writes option used/implemented?

2017-06-15 Thread Tomas Repik
And yet another glitch in the documentation at: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__cqlTruncateequest_timeout_in_ms

I guess it should be truncate_timeout_in_ms instead.

Is there a more proper way I should use to report these kind of issues? If yes, 
thanks for giving any directions. 

Tomas

- Original Message -
> Thanks for information I thought this would be the case ...
> 
> I found another option that is not documented properly:
> allocate_tokens_for_local_replication_factor [1] option is not found in any
> config file instead the allocate_tokens_for_keyspace option is present. I
> guess it is the replacement for the former but I can't see it documented
> anywhere. Thanks for clarification.
> 
> Tomas
>  
> [1]
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__allocate_tokens_for_local_replication_factor
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org