Re: Apache Incubation task items

2018-05-31 Thread Atul Mohan
Hello Charles,
Going through the item list, I just had a quick question. Are all these
tasks meant to be taken up only by the committers? If there are tasks which
can be completed by contributors, I would be happy to help.

Thanks,
Atul

On Thu, May 31, 2018 at 12:42 PM, Charles Allen  wrote:

> https://github.com/druid-io/druid/projects/3 is a list of all the items in
> http://incubator.apache.org/projects/druid.html
>
> We will need help getting these resourced and completed. For a thing to be
> completed and closed, the page at
> http://incubator.apache.org/projects/druid.html needs updated with any
> relevant information.
>
> I have also created a new label
> https://github.com/druid-io/druid/issues?q=is%3Aissue+is%
> 3Aopen+label%3AApache
> for
> any issues related to being a part of ASF, not specifically related to the
> Druid code itself.
>
> The kanban board is in no specific order, so please do not take the
> relative order or issue number as any sort of indicator.
>
> Thank you all for your assistance as we go along this exciting path!
>
> Cheers,
> Charles Allen
>



-- 
Atul Mohan



Apache Incubation task items

2018-05-31 Thread Charles Allen
https://github.com/druid-io/druid/projects/3 is a list of all the items in
http://incubator.apache.org/projects/druid.html

We will need help getting these resourced and completed. For a thing to be
completed and closed, the page at
http://incubator.apache.org/projects/druid.html needs updated with any
relevant information.

I have also created a new label
https://github.com/druid-io/druid/issues?q=is%3Aissue+is%3Aopen+label%3AApache
for
any issues related to being a part of ASF, not specifically related to the
Druid code itself.

The kanban board is in no specific order, so please do not take the
relative order or issue number as any sort of indicator.

Thank you all for your assistance as we go along this exciting path!

Cheers,
Charles Allen


Re: Access to jira

2018-05-31 Thread Gian Merlino
We should probably have a label for it too.

On Thu, May 31, 2018 at 9:23 AM, Gian Merlino  wrote:

> I don't see why not!
>
> On Thu, May 31, 2018 at 9:21 AM, Charles Allen  wrote:
>
>> Sounds good. I'd like to put some more formal tracking and responsibility
>> to the remaining incubator items. Would github issues be the preferred
>> place to do that?
>>
>> On Thu, May 31, 2018 at 9:20 AM Gian Merlino 
>> wrote:
>>
>> > I think we are planning to keep using GitHub issues, based on the
>> > discussion in the migration logistics thread. And based on the fact that
>> > Apache seems to allow that now (https://github.com/apache/fluo was
>> given
>> > as
>> > an example). So probably the right thing to do is update
>> > http://incubator.apache.org/projects/druid.html accordingly?
>> >
>> > On Thu, May 31, 2018 at 9:15 AM, Charles Allen 
>> wrote:
>> >
>> > > Hi all
>> > >
>> > > http://incubator.apache.org/projects/druid.html says that
>> > > https://issues.apache.org/jira/browse/DRUID is our issue tracker,
>> but I
>> > > don't seem to have access to it. Does anyone know how to apply for
>> access
>> > > using an existing Apache JIRA login?
>> > >
>> > > Thanks,
>> > > Charles Allen
>> > >
>> >
>>
>
>


Re: Access to jira

2018-05-31 Thread Gian Merlino
I don't see why not!

On Thu, May 31, 2018 at 9:21 AM, Charles Allen  wrote:

> Sounds good. I'd like to put some more formal tracking and responsibility
> to the remaining incubator items. Would github issues be the preferred
> place to do that?
>
> On Thu, May 31, 2018 at 9:20 AM Gian Merlino 
> wrote:
>
> > I think we are planning to keep using GitHub issues, based on the
> > discussion in the migration logistics thread. And based on the fact that
> > Apache seems to allow that now (https://github.com/apache/fluo was given
> > as
> > an example). So probably the right thing to do is update
> > http://incubator.apache.org/projects/druid.html accordingly?
> >
> > On Thu, May 31, 2018 at 9:15 AM, Charles Allen 
> wrote:
> >
> > > Hi all
> > >
> > > http://incubator.apache.org/projects/druid.html says that
> > > https://issues.apache.org/jira/browse/DRUID is our issue tracker, but
> I
> > > don't seem to have access to it. Does anyone know how to apply for
> access
> > > using an existing Apache JIRA login?
> > >
> > > Thanks,
> > > Charles Allen
> > >
> >
>


Re: Access to jira

2018-05-31 Thread Charles Allen
Sounds good. I'd like to put some more formal tracking and responsibility
to the remaining incubator items. Would github issues be the preferred
place to do that?

On Thu, May 31, 2018 at 9:20 AM Gian Merlino  wrote:

> I think we are planning to keep using GitHub issues, based on the
> discussion in the migration logistics thread. And based on the fact that
> Apache seems to allow that now (https://github.com/apache/fluo was given
> as
> an example). So probably the right thing to do is update
> http://incubator.apache.org/projects/druid.html accordingly?
>
> On Thu, May 31, 2018 at 9:15 AM, Charles Allen  wrote:
>
> > Hi all
> >
> > http://incubator.apache.org/projects/druid.html says that
> > https://issues.apache.org/jira/browse/DRUID is our issue tracker, but I
> > don't seem to have access to it. Does anyone know how to apply for access
> > using an existing Apache JIRA login?
> >
> > Thanks,
> > Charles Allen
> >
>


Re: Access to jira

2018-05-31 Thread Gian Merlino
I think we are planning to keep using GitHub issues, based on the
discussion in the migration logistics thread. And based on the fact that
Apache seems to allow that now (https://github.com/apache/fluo was given as
an example). So probably the right thing to do is update
http://incubator.apache.org/projects/druid.html accordingly?

On Thu, May 31, 2018 at 9:15 AM, Charles Allen  wrote:

> Hi all
>
> http://incubator.apache.org/projects/druid.html says that
> https://issues.apache.org/jira/browse/DRUID is our issue tracker, but I
> don't seem to have access to it. Does anyone know how to apply for access
> using an existing Apache JIRA login?
>
> Thanks,
> Charles Allen
>


Access to jira

2018-05-31 Thread Charles Allen
Hi all

http://incubator.apache.org/projects/druid.html says that
https://issues.apache.org/jira/browse/DRUID is our issue tracker, but I
don't seem to have access to it. Does anyone know how to apply for access
using an existing Apache JIRA login?

Thanks,
Charles Allen


Re: A question about Druid design

2018-05-31 Thread Anastasia Braginsky
 Hi Gian,
Thanks for the explanations! 
I have one more question:

You say that 
"...the RollupFactsHolder there will be a _single_ fact row per TimeAndDims... 
But with the PlainFactsHolder there may be more than one fact row per 
TimeAndDims..."In PlainFactsHolder we have more than one fact row per Timestamp 
actually, or am I missing something? I mean in RollupFactsHolder could you scan 
only TimeAndDims (leading to rows) with some Timestamp and get the same result? 
Is it true that TimeAndDims are ordered firstly according to time anyway?
I am most likely missing something, just would like to understand what :)
Thanks,Anastasia

On Wednesday, May 30, 2018, 10:56:26 AM GMT+3, Gian Merlino 
 wrote:  
 
 Hi Anastasia,

1) At ingestion time the FactsHolder is sorted. The unsorted code path is
used by groupBy v1, which hasn't been common since groupBy v2 was made the
default a few releases ago. So I would only worry about the sorted case.

2) PlainFactsHolder is used when the user has disabled rollup at ingestion
time. The idea is that with the RollupFactsHolder there will be a _single_
fact row per TimeAndDims (and Druid may combine multiple input rows into
one indexed fact row). But with the PlainFactsHolder there may be more than
one fact row per TimeAndDims (in particular: there will be one fact row per
input row).

Hope this helps.

On Wed, May 30, 2018 at 12:14 AM, Anastasia Braginsky <
anas...@oath.com.invalid> wrote:

> Hi,
> Recall our suggestion to use the new concurrent map named Oak as a base
> for Incremental Index. Oak stands for Off-heap Allocated Keys, for more
> details please see issue #5698. We had a great progress with Oak
> integration and stabilizing OakIndex performance. We have some questions
> regarding FactsHolder. As we explained in our design document and
> refactoring suggestion we prefer to remove the FactsHolder usage in
> the OakIndex, because Oak maps the keys (Time&Dims) to the values
> (Aggregators) directly. Therefore the Oak mapping is always sorted and only
> from keys to values. From here we have two questions.
>
> 1. Unsorted FactsHolder: It is understandable that unsorted mapping via
> HashMap (O(1) access) might be faster than sorted mapping (O(logN) access).
> The question is whether the unsorted variant used frequently? When it is
> used? And is it acceptable that in this case Oak will give slightly lower
> performance?
>
> 2. Regarding Plain- vs Rollup- FactsHolder: It can be seen that
> PlainFactsHolder is holding a queue of Key->Value (Time&Dims->Aggregator)
> per Timestamp, where the sorting is via Timestamp. Therefore, Oak
> implements mostly sorted RollupFactsHolder logic. Additionally, Timestamp
> is also a part of TIme&Dims and the sorting is initially according to
> Timestamp, then other dimensions. The question is what are the use-cases
> where the PlainFactsHolder and not Rollup is used? And is there any
> functionality that can be given by Plain but not by Rollup?
>
> Thanks,Anastasia
>