subject:"\[DISCUSSION\] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps"

[DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-23 Thread Akash Nilugal

Hi Community,

Timeseries data are simply measurements or events that are
tracked,monitored, downsampled, and aggregated over time.
Basicallytimeseries data analysis helps in analyzing or monitoring
theaggregated data over period of time to take better decision forbusiness.
So since carbondata supports olap datamap like preaggregate, MV and since
time series is of atmost importance,
we can supporttimeseries for carbondata over MV datamap model.

Currentlycarbondata supports timeseries on preaggregate datamap, but its
analpha feature and there are so many limitations when we compare and
analyze the existing timeseries database or projects which supportstime
series like apache druid or influxdb. So, in this feature we can support
timeseries
by avoiding the limitations in the current system. After doing the analysis
on the current existing timeseries database like influxdb, and the apache
druid,
i have  prepared a solution/design document. Any inputs, improvements or
suggestion are most welcome.

I have created jira https://issues.apache.org/jira/browse/CARBONDATA-3525 for
this. Later i will create sub jiras for tracking.


Regards,
Akash R Nilugal

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-23 Thread chetan bhat

Hi Akash,

1. For preaggregate table deprecation as part of subtask 7 can specific details 
be provided in the design doc.
2. Will alter table add partition supported now for table that has timeseries 
MV datamap ?If not supported it can be updated in the design doc.
3. Would Complex datatypes for timeseries MV datamap be supported ? If not 
supported it can be updated in the design doc.

Regards
Chetan

On 2019/09/23 13:42:48, Akash Nilugal  wrote: 
> Hi Community,
> 
> Timeseries data are simply measurements or events that are
> tracked,monitored, downsampled, and aggregated over time.
> Basicallytimeseries data analysis helps in analyzing or monitoring
> theaggregated data over period of time to take better decision forbusiness.
> So since carbondata supports olap datamap like preaggregate, MV and since
> time series is of atmost importance,
> we can supporttimeseries for carbondata over MV datamap model.
> 
> Currentlycarbondata supports timeseries on preaggregate datamap, but its
> analpha feature and there are so many limitations when we compare and
> analyze the existing timeseries database or projects which supportstime
> series like apache druid or influxdb. So, in this feature we can support
> timeseries
> by avoiding the limitations in the current system. After doing the analysis
> on the current existing timeseries database like influxdb, and the apache
> druid,
> i have  prepared a solution/design document. Any inputs, improvements or
> suggestion are most welcome.
> 
> I have created jira https://issues.apache.org/jira/browse/CARBONDATA-3525 for
> this. Later i will create sub jiras for tracking.
> 
> 
> Regards,
> Akash R Nilugal
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-24 Thread Manhua

Hi Akash

Can user specific the granularity? Such as 5minutes, 15 minutes

Is there any constraint on timestamp_column's datatype?  Including DATE,  
TIMESTAMP,  BIGINT(Unix timestamp)

On 2019/09/23 13:42:48, Akash Nilugal  wrote: 
> Hi Community,
> 
> Timeseries data are simply measurements or events that are
> tracked,monitored, downsampled, and aggregated over time.
> Basicallytimeseries data analysis helps in analyzing or monitoring
> theaggregated data over period of time to take better decision forbusiness.
> So since carbondata supports olap datamap like preaggregate, MV and since
> time series is of atmost importance,
> we can supporttimeseries for carbondata over MV datamap model.
> 
> Currentlycarbondata supports timeseries on preaggregate datamap, but its
> analpha feature and there are so many limitations when we compare and
> analyze the existing timeseries database or projects which supportstime
> series like apache druid or influxdb. So, in this feature we can support
> timeseries
> by avoiding the limitations in the current system. After doing the analysis
> on the current existing timeseries database like influxdb, and the apache
> druid,
> i have  prepared a solution/design document. Any inputs, improvements or
> suggestion are most welcome.
> 
> I have created jira https://issues.apache.org/jira/browse/CARBONDATA-3525 for
> this. Later i will create sub jiras for tracking.
> 
> 
> Regards,
> Akash R Nilugal
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-24 Thread Akash Nilugal

Hi Manhua,

Thanks for the questions, Please find the comments 
1. User can specify the granularity as minute, so we take as 1 unit, (1 minute 
in this case) and store the data, then during query in UDF he can mention for 
minute level query, how many minutes data he wants. 

2. Since we will be loading data to datamap from fact, there is no restriction 
on it.

Regards,
Akash

On 2019/09/24 08:00:55, Manhua  wrote: 
> Hi Akash
> 
> Can user specific the granularity? Such as 5minutes, 15 minutes
> 
> Is there any constraint on timestamp_column's datatype?  Including DATE,  
> TIMESTAMP,  BIGINT(Unix timestamp)
> 
> On 2019/09/23 13:42:48, Akash Nilugal  wrote: 
> > Hi Community,
> > 
> > Timeseries data are simply measurements or events that are
> > tracked,monitored, downsampled, and aggregated over time.
> > Basicallytimeseries data analysis helps in analyzing or monitoring
> > theaggregated data over period of time to take better decision forbusiness.
> > So since carbondata supports olap datamap like preaggregate, MV and since
> > time series is of atmost importance,
> > we can supporttimeseries for carbondata over MV datamap model.
> > 
> > Currentlycarbondata supports timeseries on preaggregate datamap, but its
> > analpha feature and there are so many limitations when we compare and
> > analyze the existing timeseries database or projects which supportstime
> > series like apache druid or influxdb. So, in this feature we can support
> > timeseries
> > by avoiding the limitations in the current system. After doing the analysis
> > on the current existing timeseries database like influxdb, and the apache
> > druid,
> > i have  prepared a solution/design document. Any inputs, improvements or
> > suggestion are most welcome.
> > 
> > I have created jira https://issues.apache.org/jira/browse/CARBONDATA-3525 
> > for
> > this. Later i will create sub jiras for tracking.
> > 
> > 
> > Regards,
> > Akash R Nilugal
> > 
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-24 Thread Akash Nilugal

Hi chetan,

1. Deprecate means, not recommended to use, so i think no further details are 
required for that
2. complex and add partition not supported. it wil be updated in document.


On 2019/09/23 14:43:51, chetan bhat  wrote: 
> Hi Akash,
> 
> 1. For preaggregate table deprecation as part of subtask 7 can specific 
> details be provided in the design doc.
> 2. Will alter table add partition supported now for table that has timeseries 
> MV datamap ?If not supported it can be updated in the design doc.
> 3. Would Complex datatypes for timeseries MV datamap be supported ? If not 
> supported it can be updated in the design doc.
> 
> Regards
> Chetan
> 
> On 2019/09/23 13:42:48, Akash Nilugal  wrote: 
> > Hi Community,
> > 
> > Timeseries data are simply measurements or events that are
> > tracked,monitored, downsampled, and aggregated over time.
> > Basicallytimeseries data analysis helps in analyzing or monitoring
> > theaggregated data over period of time to take better decision forbusiness.
> > So since carbondata supports olap datamap like preaggregate, MV and since
> > time series is of atmost importance,
> > we can supporttimeseries for carbondata over MV datamap model.
> > 
> > Currentlycarbondata supports timeseries on preaggregate datamap, but its
> > analpha feature and there are so many limitations when we compare and
> > analyze the existing timeseries database or projects which supportstime
> > series like apache druid or influxdb. So, in this feature we can support
> > timeseries
> > by avoiding the limitations in the current system. After doing the analysis
> > on the current existing timeseries database like influxdb, and the apache
> > druid,
> > i have  prepared a solution/design document. Any inputs, improvements or
> > suggestion are most welcome.
> > 
> > I have created jira https://issues.apache.org/jira/browse/CARBONDATA-3525 
> > for
> > this. Later i will create sub jiras for tracking.
> > 
> > 
> > Regards,
> > Akash R Nilugal
> > 
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-28 Thread xuchuanyin

Hi akash, glad to see the feature proposed and I have some questions about
this. Please notice that some of the following descriptions are comments
followed by '===' described in the design document attached in the
corresponding jira.

1. 
"Currently carbondata supports timeseries on preaggregate datamap, but its
an alpha feature"
===
It has been some time since the preaggregate datamap was introduced and it
is still **alpha**, why it is still not product-ready? Will the new feature
also come into the similar situation? 

2.
"there are so many limitations when we compare and analyze the existing
timeseries database or projects which supports time series like apache druid
or influxdb"
===
What are the actual limitations? Besides, please give an example of this.

3.
"Segment_Timestamp_Min"
===
Suggest using camel-case style like 'segmentTimestampMin'

4.
"RP is way of telling the system, for how long the data should be kept"
===
Since the function is simple, I'd suggest using 'retentionTime'=15 and
'timeUnit'='day' instead of 'RP'='15_days'

5.
"When the data load is called for main table, use an spark accumulator to
get the maximum value of timestamp in that load and return to the load."
===
How can you get the spark accumulator? The load is launched using
loading-by-dataframe not using global-sort-by-spark.

6.
For the rest of the content, still reading.




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-29 Thread Akash Nilugal

Hi xuchuanyin,

Thanks for the comments/Suggestions

1. Preaggregate is productized, but not the timeseries with preaggregate, i 
think you  got confused with that, if im right.
2. Limitations like, auto sampling or rollup, which we will be supporting now. 
Retention policies. etc
3. segmentTimestampMin, this i will consider in design.
4. RP is added as a separate task, i thought instead of maintaining two 
variables better to maintabin one and parse it. But i will consider your point 
based on feasibility during implementation.
5. We use an accumulator which takes list, so before writing index files we 
take the min max of the timestamp column and fill in accumulator and then we 
can access accumulator.value in driver after load is finished.

Regards,
Akash R Nilugal 

On 2019/09/28 10:46:31, xuchuanyin  wrote: 
> Hi akash, glad to see the feature proposed and I have some questions about
> this. Please notice that some of the following descriptions are comments
> followed by '===' described in the design document attached in the
> corresponding jira.
> 
> 1. 
> "Currently carbondata supports timeseries on preaggregate datamap, but its
> an alpha feature"
> ===
> It has been some time since the preaggregate datamap was introduced and it
> is still **alpha**, why it is still not product-ready? Will the new feature
> also come into the similar situation? 
> 
> 2.
> "there are so many limitations when we compare and analyze the existing
> timeseries database or projects which supports time series like apache druid
> or influxdb"
> ===
> What are the actual limitations? Besides, please give an example of this.
> 
> 3.
> "Segment_Timestamp_Min"
> ===
> Suggest using camel-case style like 'segmentTimestampMin'
> 
> 4.
> "RP is way of telling the system, for how long the data should be kept"
> ===
> Since the function is simple, I'd suggest using 'retentionTime'=15 and
> 'timeUnit'='day' instead of 'RP'='15_days'
> 
> 5.
> "When the data load is called for main table, use an spark accumulator to
> get the maximum value of timestamp in that load and return to the load."
> ===
> How can you get the spark accumulator? The load is launched using
> loading-by-dataframe not using global-sort-by-spark.
> 
> 6.
> For the rest of the content, still reading.
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-29 Thread Ajantha Bhat

+ 1 ,

I have some suggestions and questions.

1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
'timeseries_column'.
 so that it won't give an impression that only time stamp datatype is
supported and update the document with all the datatype supported.

2. Querying on this datamap table is also supported right ? supporting
changing plan for main table to refer datamap table is for user to avoid
changing his query or any other reason ?

3. If user has not created day granularity datamap, but just created hour
granularity datamap. When query has day granularity, data will be fetched
form hour granularity datamap and aggregated ? or data is fetched from main
table ?

Thanks,
Ajantha

On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal 
wrote:

> Hi xuchuanyin,
>
> Thanks for the comments/Suggestions
>
> 1. Preaggregate is productized, but not the timeseries with preaggregate,
> i think you  got confused with that, if im right.
> 2. Limitations like, auto sampling or rollup, which we will be supporting
> now. Retention policies. etc
> 3. segmentTimestampMin, this i will consider in design.
> 4. RP is added as a separate task, i thought instead of maintaining two
> variables better to maintabin one and parse it. But i will consider your
> point based on feasibility during implementation.
> 5. We use an accumulator which takes list, so before writing index files
> we take the min max of the timestamp column and fill in accumulator and
> then we can access accumulator.value in driver after load is finished.
>
> Regards,
> Akash R Nilugal
>
> On 2019/09/28 10:46:31, xuchuanyin  wrote:
> > Hi akash, glad to see the feature proposed and I have some questions
> about
> > this. Please notice that some of the following descriptions are comments
> > followed by '===' described in the design document attached in the
> > corresponding jira.
> >
> > 1.
> > "Currently carbondata supports timeseries on preaggregate datamap, but
> its
> > an alpha feature"
> > ===
> > It has been some time since the preaggregate datamap was introduced and
> it
> > is still **alpha**, why it is still not product-ready? Will the new
> feature
> > also come into the similar situation?
> >
> > 2.
> > "there are so many limitations when we compare and analyze the existing
> > timeseries database or projects which supports time series like apache
> druid
> > or influxdb"
> > ===
> > What are the actual limitations? Besides, please give an example of this.
> >
> > 3.
> > "Segment_Timestamp_Min"
> > ===
> > Suggest using camel-case style like 'segmentTimestampMin'
> >
> > 4.
> > "RP is way of telling the system, for how long the data should be kept"
> > ===
> > Since the function is simple, I'd suggest using 'retentionTime'=15 and
> > 'timeUnit'='day' instead of 'RP'='15_days'
> >
> > 5.
> > "When the data load is called for main table, use an spark accumulator to
> > get the maximum value of timestamp in that load and return to the load."
> > ===
> > How can you get the spark accumulator? The load is launched using
> > loading-by-dataframe not using global-sort-by-spark.
> >
> > 6.
> > For the rest of the content, still reading.
> >
> >
> >
> >
> > --
> > Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Akash Nilugal

Hi Ajantha,

Thanks for the queries and suggestions

1. Yes, this is a good suggestion, i ll include this change. Both date and 
timestamp columns are supported, will be updated in document.
2. yes, you are right.
3. you are right, if the day level is not available, then we will try to get 
the whole day data from hour level, if not availaible, as explained in design 
document, we will get the data from datamap UNION data from main table based on 
user query.

Regards,
Akash R Nilugal


On 2019/09/30 06:56:45, Ajantha Bhat  wrote: 
> + 1 ,
> 
> I have some suggestions and questions.
> 
> 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> 'timeseries_column'.
>  so that it won't give an impression that only time stamp datatype is
> supported and update the document with all the datatype supported.
> 
> 2. Querying on this datamap table is also supported right ? supporting
> changing plan for main table to refer datamap table is for user to avoid
> changing his query or any other reason ?
> 
> 3. If user has not created day granularity datamap, but just created hour
> granularity datamap. When query has day granularity, data will be fetched
> form hour granularity datamap and aggregated ? or data is fetched from main
> table ?
> 
> Thanks,
> Ajantha
> 
> On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal 
> wrote:
> 
> > Hi xuchuanyin,
> >
> > Thanks for the comments/Suggestions
> >
> > 1. Preaggregate is productized, but not the timeseries with preaggregate,
> > i think you  got confused with that, if im right.
> > 2. Limitations like, auto sampling or rollup, which we will be supporting
> > now. Retention policies. etc
> > 3. segmentTimestampMin, this i will consider in design.
> > 4. RP is added as a separate task, i thought instead of maintaining two
> > variables better to maintabin one and parse it. But i will consider your
> > point based on feasibility during implementation.
> > 5. We use an accumulator which takes list, so before writing index files
> > we take the min max of the timestamp column and fill in accumulator and
> > then we can access accumulator.value in driver after load is finished.
> >
> > Regards,
> > Akash R Nilugal
> >
> > On 2019/09/28 10:46:31, xuchuanyin  wrote:
> > > Hi akash, glad to see the feature proposed and I have some questions
> > about
> > > this. Please notice that some of the following descriptions are comments
> > > followed by '===' described in the design document attached in the
> > > corresponding jira.
> > >
> > > 1.
> > > "Currently carbondata supports timeseries on preaggregate datamap, but
> > its
> > > an alpha feature"
> > > ===
> > > It has been some time since the preaggregate datamap was introduced and
> > it
> > > is still **alpha**, why it is still not product-ready? Will the new
> > feature
> > > also come into the similar situation?
> > >
> > > 2.
> > > "there are so many limitations when we compare and analyze the existing
> > > timeseries database or projects which supports time series like apache
> > druid
> > > or influxdb"
> > > ===
> > > What are the actual limitations? Besides, please give an example of this.
> > >
> > > 3.
> > > "Segment_Timestamp_Min"
> > > ===
> > > Suggest using camel-case style like 'segmentTimestampMin'
> > >
> > > 4.
> > > "RP is way of telling the system, for how long the data should be kept"
> > > ===
> > > Since the function is simple, I'd suggest using 'retentionTime'=15 and
> > > 'timeUnit'='day' instead of 'RP'='15_days'
> > >
> > > 5.
> > > "When the data load is called for main table, use an spark accumulator to
> > > get the maximum value of timestamp in that load and return to the load."
> > > ===
> > > How can you get the spark accumulator? The load is launched using
> > > loading-by-dataframe not using global-sort-by-spark.
> > >
> > > 6.
> > > For the rest of the content, still reading.
> > >
> > >
> > >
> > >
> > > --
> > > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > >
> >
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Kumar Vishal

Hi Akash,

In this desing document you haven't mentioned how to handle data loading
for timeseries datamap for older segments[Existing table].
If the customer's main table data is also stored based on time[increasing
time] in different segments,he can use this feature as well.

We can discuss and finalize the solution.

-Regards
Kumar Vishal

On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal 
wrote:

> Hi Ajantha,
>
> Thanks for the queries and suggestions
>
> 1. Yes, this is a good suggestion, i ll include this change. Both date and
> timestamp columns are supported, will be updated in document.
> 2. yes, you are right.
> 3. you are right, if the day level is not available, then we will try to
> get the whole day data from hour level, if not availaible, as explained in
> design document, we will get the data from datamap UNION data from main
> table based on user query.
>
> Regards,
> Akash R Nilugal
>
>
> On 2019/09/30 06:56:45, Ajantha Bhat  wrote:
> > + 1 ,
> >
> > I have some suggestions and questions.
> >
> > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > 'timeseries_column'.
> >  so that it won't give an impression that only time stamp datatype is
> > supported and update the document with all the datatype supported.
> >
> > 2. Querying on this datamap table is also supported right ? supporting
> > changing plan for main table to refer datamap table is for user to avoid
> > changing his query or any other reason ?
> >
> > 3. If user has not created day granularity datamap, but just created hour
> > granularity datamap. When query has day granularity, data will be fetched
> > form hour granularity datamap and aggregated ? or data is fetched from
> main
> > table ?
> >
> > Thanks,
> > Ajantha
> >
> > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal 
> > wrote:
> >
> > > Hi xuchuanyin,
> > >
> > > Thanks for the comments/Suggestions
> > >
> > > 1. Preaggregate is productized, but not the timeseries with
> preaggregate,
> > > i think you  got confused with that, if im right.
> > > 2. Limitations like, auto sampling or rollup, which we will be
> supporting
> > > now. Retention policies. etc
> > > 3. segmentTimestampMin, this i will consider in design.
> > > 4. RP is added as a separate task, i thought instead of maintaining two
> > > variables better to maintabin one and parse it. But i will consider
> your
> > > point based on feasibility during implementation.
> > > 5. We use an accumulator which takes list, so before writing index
> files
> > > we take the min max of the timestamp column and fill in accumulator and
> > > then we can access accumulator.value in driver after load is finished.
> > >
> > > Regards,
> > > Akash R Nilugal
> > >
> > > On 2019/09/28 10:46:31, xuchuanyin  wrote:
> > > > Hi akash, glad to see the feature proposed and I have some questions
> > > about
> > > > this. Please notice that some of the following descriptions are
> comments
> > > > followed by '===' described in the design document attached in the
> > > > corresponding jira.
> > > >
> > > > 1.
> > > > "Currently carbondata supports timeseries on preaggregate datamap,
> but
> > > its
> > > > an alpha feature"
> > > > ===
> > > > It has been some time since the preaggregate datamap was introduced
> and
> > > it
> > > > is still **alpha**, why it is still not product-ready? Will the new
> > > feature
> > > > also come into the similar situation?
> > > >
> > > > 2.
> > > > "there are so many limitations when we compare and analyze the
> existing
> > > > timeseries database or projects which supports time series like
> apache
> > > druid
> > > > or influxdb"
> > > > ===
> > > > What are the actual limitations? Besides, please give an example of
> this.
> > > >
> > > > 3.
> > > > "Segment_Timestamp_Min"
> > > > ===
> > > > Suggest using camel-case style like 'segmentTimestampMin'
> > > >
> > > > 4.
> > > > "RP is way of telling the system, for how long the data should be
> kept"
> > > > ===
> > > > Since the function is simple, I'd suggest using 'retentionTime'=15
> and
> > > > 'timeUnit'='day' instead of 'RP'='15_days'
> > > >
> > > > 5.
> > > > "When the data load is called for main table, use an spark
> accumulator to
> > > > get the maximum value of timestamp in that load and return to the
> load."
> > > > ===
> > > > How can you get the spark accumulator? The load is launched using
> > > > loading-by-dataframe not using global-sort-by-spark.
> > > >
> > > > 6.
> > > > For the rest of the content, still reading.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from:
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > > >
> > >
> >
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-09-30 Thread Akash Nilugal

Hi vishal,

In the design document, in the impacted analysis section, there is a topic 
compatibility/legacy stores, so basically For old tables when the datamap is 
created, we load all the timeseries datamaps with different granularity. I 
think this should do fine, please let me know for further suggestions/comments.

Regards,
Akash R Nilugal

On 2019/09/30 17:09:44, Kumar Vishal  wrote: 
> Hi Akash,
> 
> In this desing document you haven't mentioned how to handle data loading
> for timeseries datamap for older segments[Existing table].
> If the customer's main table data is also stored based on time[increasing
> time] in different segments,he can use this feature as well.
> 
> We can discuss and finalize the solution.
> 
> -Regards
> Kumar Vishal
> 
> On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal 
> wrote:
> 
> > Hi Ajantha,
> >
> > Thanks for the queries and suggestions
> >
> > 1. Yes, this is a good suggestion, i ll include this change. Both date and
> > timestamp columns are supported, will be updated in document.
> > 2. yes, you are right.
> > 3. you are right, if the day level is not available, then we will try to
> > get the whole day data from hour level, if not availaible, as explained in
> > design document, we will get the data from datamap UNION data from main
> > table based on user query.
> >
> > Regards,
> > Akash R Nilugal
> >
> >
> > On 2019/09/30 06:56:45, Ajantha Bhat  wrote:
> > > + 1 ,
> > >
> > > I have some suggestions and questions.
> > >
> > > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > > 'timeseries_column'.
> > >  so that it won't give an impression that only time stamp datatype is
> > > supported and update the document with all the datatype supported.
> > >
> > > 2. Querying on this datamap table is also supported right ? supporting
> > > changing plan for main table to refer datamap table is for user to avoid
> > > changing his query or any other reason ?
> > >
> > > 3. If user has not created day granularity datamap, but just created hour
> > > granularity datamap. When query has day granularity, data will be fetched
> > > form hour granularity datamap and aggregated ? or data is fetched from
> > main
> > > table ?
> > >
> > > Thanks,
> > > Ajantha
> > >
> > > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal 
> > > wrote:
> > >
> > > > Hi xuchuanyin,
> > > >
> > > > Thanks for the comments/Suggestions
> > > >
> > > > 1. Preaggregate is productized, but not the timeseries with
> > preaggregate,
> > > > i think you  got confused with that, if im right.
> > > > 2. Limitations like, auto sampling or rollup, which we will be
> > supporting
> > > > now. Retention policies. etc
> > > > 3. segmentTimestampMin, this i will consider in design.
> > > > 4. RP is added as a separate task, i thought instead of maintaining two
> > > > variables better to maintabin one and parse it. But i will consider
> > your
> > > > point based on feasibility during implementation.
> > > > 5. We use an accumulator which takes list, so before writing index
> > files
> > > > we take the min max of the timestamp column and fill in accumulator and
> > > > then we can access accumulator.value in driver after load is finished.
> > > >
> > > > Regards,
> > > > Akash R Nilugal
> > > >
> > > > On 2019/09/28 10:46:31, xuchuanyin  wrote:
> > > > > Hi akash, glad to see the feature proposed and I have some questions
> > > > about
> > > > > this. Please notice that some of the following descriptions are
> > comments
> > > > > followed by '===' described in the design document attached in the
> > > > > corresponding jira.
> > > > >
> > > > > 1.
> > > > > "Currently carbondata supports timeseries on preaggregate datamap,
> > but
> > > > its
> > > > > an alpha feature"
> > > > > ===
> > > > > It has been some time since the preaggregate datamap was introduced
> > and
> > > > it
> > > > > is still **alpha**, why it is still not product-ready? Will the new
> > > > feature
> > > > > also come into the similar situation?
> > > > >
> > > > > 2.
> > > > > "there are so many limitations when we compare and analyze the
> > existing
> > > > > timeseries database or projects which supports time series like
> > apache
> > > > druid
> > > > > or influxdb"
> > > > > ===
> > > > > What are the actual limitations? Besides, please give an example of
> > this.
> > > > >
> > > > > 3.
> > > > > "Segment_Timestamp_Min"
> > > > > ===
> > > > > Suggest using camel-case style like 'segmentTimestampMin'
> > > > >
> > > > > 4.
> > > > > "RP is way of telling the system, for how long the data should be
> > kept"
> > > > > ===
> > > > > Since the function is simple, I'd suggest using 'retentionTime'=15
> > and
> > > > > 'timeUnit'='day' instead of 'RP'='15_days'
> > > > >
> > > > > 5.
> > > > > "When the data load is called for main table, use an spark
> > accumulator to
> > > > > get the maximum value of timestamp in that load and return to the
> > load."
> > > > > ===
> > > > > How can you get the spark accumulator? The

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-01 Thread babu lal jangir

Hi Akash, Thanks for Time Series DataMap proposal.
Please check below Points.

1. During Query Planing Change Union to Union All , Otherwise will loose
row if same value appears.
2. Whether system start load for next granularity level table as soon it
matches the data condition or next granularity level table has to wait till
current  granularity level table is finished ? please handle if possible.
3. Add Configuration to load multiple Ranges at a time(across granularity
tables).
4. Please check if Current data loading min ,max is enough to find current
load . No need to refer the DataMap's min,max because data loading Range
prepration can go wrong if loading happens from multiple driver . i think
below rules are enough for loading.
4.a. Create MV should should sync data.   On any failure Rebuild should
sync again till than MV will be disabled.
4.b.  Each load has independent Ranges and should load only those
ranges. Any failure MV may go in disable state(only if intermediate ranges
load is failed ,last loads failure will NOT make MV disable).
5. We can make Data loading sync because anyway queries can be served from
fact table if any segments is in-progress in  Datamap.
6. In Data loading Pipleline ,failures in intermediate time series datamap,
still we can continue loading next level data. (ignore if already handled).
   For Example.
DataMaps:- Hour,Day,Month Level
Load Data(10 day):- 2018-01-01 01:00:00 to 2018-01-10 01:00:00
  Failure in hour level during below range
2018-01-06 01:00:00 to 2018-01-06 01:00:00
 This point of time Hour level has 5 day data.so start loading on day
level .
7. Add SubTask to support loading of in-between missing time.(Incremental
but old records if timeseries device stopped working for some time).

On Tue, Oct 1, 2019 at 10:41 AM Akash Nilugal 
wrote:

> Hi vishal,
>
> In the design document, in the impacted analysis section, there is a topic
> compatibility/legacy stores, so basically For old tables when the datamap
> is created, we load all the timeseries datamaps with different granularity.
> I think this should do fine, please let me know for further
> suggestions/comments.
>
> Regards,
> Akash R Nilugal
>
> On 2019/09/30 17:09:44, Kumar Vishal  wrote:
> > Hi Akash,
> >
> > In this desing document you haven't mentioned how to handle data loading
> > for timeseries datamap for older segments[Existing table].
> > If the customer's main table data is also stored based on time[increasing
> > time] in different segments,he can use this feature as well.
> >
> > We can discuss and finalize the solution.
> >
> > -Regards
> > Kumar Vishal
> >
> > On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal 
> > wrote:
> >
> > > Hi Ajantha,
> > >
> > > Thanks for the queries and suggestions
> > >
> > > 1. Yes, this is a good suggestion, i ll include this change. Both date
> and
> > > timestamp columns are supported, will be updated in document.
> > > 2. yes, you are right.
> > > 3. you are right, if the day level is not available, then we will try
> to
> > > get the whole day data from hour level, if not availaible, as
> explained in
> > > design document, we will get the data from datamap UNION data from main
> > > table based on user query.
> > >
> > > Regards,
> > > Akash R Nilugal
> > >
> > >
> > > On 2019/09/30 06:56:45, Ajantha Bhat  wrote:
> > > > + 1 ,
> > > >
> > > > I have some suggestions and questions.
> > > >
> > > > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > > > 'timeseries_column'.
> > > >  so that it won't give an impression that only time stamp datatype is
> > > > supported and update the document with all the datatype supported.
> > > >
> > > > 2. Querying on this datamap table is also supported right ?
> supporting
> > > > changing plan for main table to refer datamap table is for user to
> avoid
> > > > changing his query or any other reason ?
> > > >
> > > > 3. If user has not created day granularity datamap, but just created
> hour
> > > > granularity datamap. When query has day granularity, data will be
> fetched
> > > > form hour granularity datamap and aggregated ? or data is fetched
> from
> > > main
> > > > table ?
> > > >
> > > > Thanks,
> > > > Ajantha
> > > >
> > > > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal <
> akashnilu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi xuchuanyin,
> > > > >
> > > > > Thanks for the comments/Suggestions
> > > > >
> > > > > 1. Preaggregate is productized, but not the timeseries with
> > > preaggregate,
> > > > > i think you  got confused with that, if im right.
> > > > > 2. Limitations like, auto sampling or rollup, which we will be
> > > supporting
> > > > > now. Retention policies. etc
> > > > > 3. segmentTimestampMin, this i will consider in design.
> > > > > 4. RP is added as a separate task, i thought instead of
> maintaining two
> > > > > variables better to maintabin one and parse it. But i will consider
> > > your
> > > > > point based on feasibility during impl

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-01 Thread Akash Nilugal

Hi vishal,

I got your point, i have changed accordingly and updated the document in jira, 
please check

Regards,
Akash R Nilugal

On 2019/09/30 17:09:44, Kumar Vishal  wrote: 
> Hi Akash,
> 
> In this desing document you haven't mentioned how to handle data loading
> for timeseries datamap for older segments[Existing table].
> If the customer's main table data is also stored based on time[increasing
> time] in different segments,he can use this feature as well.
> 
> We can discuss and finalize the solution.
> 
> -Regards
> Kumar Vishal
> 
> On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal 
> wrote:
> 
> > Hi Ajantha,
> >
> > Thanks for the queries and suggestions
> >
> > 1. Yes, this is a good suggestion, i ll include this change. Both date and
> > timestamp columns are supported, will be updated in document.
> > 2. yes, you are right.
> > 3. you are right, if the day level is not available, then we will try to
> > get the whole day data from hour level, if not availaible, as explained in
> > design document, we will get the data from datamap UNION data from main
> > table based on user query.
> >
> > Regards,
> > Akash R Nilugal
> >
> >
> > On 2019/09/30 06:56:45, Ajantha Bhat  wrote:
> > > + 1 ,
> > >
> > > I have some suggestions and questions.
> > >
> > > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > > 'timeseries_column'.
> > >  so that it won't give an impression that only time stamp datatype is
> > > supported and update the document with all the datatype supported.
> > >
> > > 2. Querying on this datamap table is also supported right ? supporting
> > > changing plan for main table to refer datamap table is for user to avoid
> > > changing his query or any other reason ?
> > >
> > > 3. If user has not created day granularity datamap, but just created hour
> > > granularity datamap. When query has day granularity, data will be fetched
> > > form hour granularity datamap and aggregated ? or data is fetched from
> > main
> > > table ?
> > >
> > > Thanks,
> > > Ajantha
> > >
> > > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal 
> > > wrote:
> > >
> > > > Hi xuchuanyin,
> > > >
> > > > Thanks for the comments/Suggestions
> > > >
> > > > 1. Preaggregate is productized, but not the timeseries with
> > preaggregate,
> > > > i think you  got confused with that, if im right.
> > > > 2. Limitations like, auto sampling or rollup, which we will be
> > supporting
> > > > now. Retention policies. etc
> > > > 3. segmentTimestampMin, this i will consider in design.
> > > > 4. RP is added as a separate task, i thought instead of maintaining two
> > > > variables better to maintabin one and parse it. But i will consider
> > your
> > > > point based on feasibility during implementation.
> > > > 5. We use an accumulator which takes list, so before writing index
> > files
> > > > we take the min max of the timestamp column and fill in accumulator and
> > > > then we can access accumulator.value in driver after load is finished.
> > > >
> > > > Regards,
> > > > Akash R Nilugal
> > > >
> > > > On 2019/09/28 10:46:31, xuchuanyin  wrote:
> > > > > Hi akash, glad to see the feature proposed and I have some questions
> > > > about
> > > > > this. Please notice that some of the following descriptions are
> > comments
> > > > > followed by '===' described in the design document attached in the
> > > > > corresponding jira.
> > > > >
> > > > > 1.
> > > > > "Currently carbondata supports timeseries on preaggregate datamap,
> > but
> > > > its
> > > > > an alpha feature"
> > > > > ===
> > > > > It has been some time since the preaggregate datamap was introduced
> > and
> > > > it
> > > > > is still **alpha**, why it is still not product-ready? Will the new
> > > > feature
> > > > > also come into the similar situation?
> > > > >
> > > > > 2.
> > > > > "there are so many limitations when we compare and analyze the
> > existing
> > > > > timeseries database or projects which supports time series like
> > apache
> > > > druid
> > > > > or influxdb"
> > > > > ===
> > > > > What are the actual limitations? Besides, please give an example of
> > this.
> > > > >
> > > > > 3.
> > > > > "Segment_Timestamp_Min"
> > > > > ===
> > > > > Suggest using camel-case style like 'segmentTimestampMin'
> > > > >
> > > > > 4.
> > > > > "RP is way of telling the system, for how long the data should be
> > kept"
> > > > > ===
> > > > > Since the function is simple, I'd suggest using 'retentionTime'=15
> > and
> > > > > 'timeUnit'='day' instead of 'RP'='15_days'
> > > > >
> > > > > 5.
> > > > > "When the data load is called for main table, use an spark
> > accumulator to
> > > > > get the maximum value of timestamp in that load and return to the
> > load."
> > > > > ===
> > > > > How can you get the spark accumulator? The load is launched using
> > > > > loading-by-dataframe not using global-sort-by-spark.
> > > > >
> > > > > 6.
> > > > > For the rest of the content, still reading.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-01 Thread Akash Nilugal

Hi Babu,

Thanks for the inputs. Please find the comments 
1. I will change from Union to UnionAll
2. For auto datamap loading, once the data is loaded to lower level granularity 
datamap, then we load the higher level datamap from the lower level datamap. 
But as per your point, i think you are telling to load from main table itself.
3. similar to 2nd point, whether to need configuration or not we can decide i 
think.
4. a. I think the max of the datamap is required to decide the range for the 
load, because in case of failure case, we may need.
b. This point will be taken care.
5. Yes, dataload is sync based on current design, as it is non lazy, it will 
happen with main table load only.
6. Yes, this will be handled.
7. Already added a task in jira.
On 2019/10/01 08:50:05, babu lal jangir  wrote: 
> Hi Akash, Thanks for Time Series DataMap proposal.
> Please check below Points.
> 
> 1. During Query Planing Change Union to Union All , Otherwise will loose
> row if same value appears.
> 2. Whether system start load for next granularity level table as soon it
> matches the data condition or next granularity level table has to wait till
> current  granularity level table is finished ? please handle if possible.
> 3. Add Configuration to load multiple Ranges at a time(across granularity
> tables).
> 4. Please check if Current data loading min ,max is enough to find current
> load . No need to refer the DataMap's min,max because data loading Range
> prepration can go wrong if loading happens from multiple driver . i think
> below rules are enough for loading.
> 4.a. Create MV should should sync data.   On any failure Rebuild should
> sync again till than MV will be disabled.
> 4.b.  Each load has independent Ranges and should load only those
> ranges. Any failure MV may go in disable state(only if intermediate ranges
> load is failed ,last loads failure will NOT make MV disable).
> 5. We can make Data loading sync because anyway queries can be served from
> fact table if any segments is in-progress in  Datamap.
> 6. In Data loading Pipleline ,failures in intermediate time series datamap,
> still we can continue loading next level data. (ignore if already handled).
>For Example.
> DataMaps:- Hour,Day,Month Level
> Load Data(10 day):- 2018-01-01 01:00:00 to 2018-01-10 01:00:00
>   Failure in hour level during below range
> 2018-01-06 01:00:00 to 2018-01-06 01:00:00
>  This point of time Hour level has 5 day data.so start loading on day
> level .
> 7. Add SubTask to support loading of in-between missing time.(Incremental
> but old records if timeseries device stopped working for some time).
> 
> On Tue, Oct 1, 2019 at 10:41 AM Akash Nilugal 
> wrote:
> 
> > Hi vishal,
> >
> > In the design document, in the impacted analysis section, there is a topic
> > compatibility/legacy stores, so basically For old tables when the datamap
> > is created, we load all the timeseries datamaps with different granularity.
> > I think this should do fine, please let me know for further
> > suggestions/comments.
> >
> > Regards,
> > Akash R Nilugal
> >
> > On 2019/09/30 17:09:44, Kumar Vishal  wrote:
> > > Hi Akash,
> > >
> > > In this desing document you haven't mentioned how to handle data loading
> > > for timeseries datamap for older segments[Existing table].
> > > If the customer's main table data is also stored based on time[increasing
> > > time] in different segments,he can use this feature as well.
> > >
> > > We can discuss and finalize the solution.
> > >
> > > -Regards
> > > Kumar Vishal
> > >
> > > On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal 
> > > wrote:
> > >
> > > > Hi Ajantha,
> > > >
> > > > Thanks for the queries and suggestions
> > > >
> > > > 1. Yes, this is a good suggestion, i ll include this change. Both date
> > and
> > > > timestamp columns are supported, will be updated in document.
> > > > 2. yes, you are right.
> > > > 3. you are right, if the day level is not available, then we will try
> > to
> > > > get the whole day data from hour level, if not availaible, as
> > explained in
> > > > design document, we will get the data from datamap UNION data from main
> > > > table based on user query.
> > > >
> > > > Regards,
> > > > Akash R Nilugal
> > > >
> > > >
> > > > On 2019/09/30 06:56:45, Ajantha Bhat  wrote:
> > > > > + 1 ,
> > > > >
> > > > > I have some suggestions and questions.
> > > > >
> > > > > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > > > > 'timeseries_column'.
> > > > >  so that it won't give an impression that only time stamp datatype is
> > > > > supported and update the document with all the datatype supported.
> > > > >
> > > > > 2. Querying on this datamap table is also supported right ?
> > supporting
> > > > > changing plan for main table to refer datamap table is for user to
> > avoid
> > > > > changing his query or any other reason ?
> > > > >
> > > > > 3. If user has not created day granularity datamap, but just created
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-04 Thread Ravindra Pesala

Hi Akash,

I have following suggestions.

1. I think it is redundant to use granularity inside create datamap, user can 
use the respective granularity UDF in his query like time(1h) or time(1d) etc.

2. Better create separate RP commands and let user add the RP on the datamap or 
even on the main table also. It would be more manageable if you independent 
feature for RP instead of including in datamap.

3. I am not getting why exactly we need accumulator instead of using index 
min/max? Can you explain with some scenario 

4. Why to store min/max at segment level? We can get from datamap also right?

4.  Union with high granularity tables to low granularity tables are really 
needed? Any other time series DB is doing it? Or any known use case we have?

Regards,
Ravindra.

> On 1 Oct 2019, at 5:49 PM, Akash Nilugal  wrote:
> 
> Hi Babu,
> 
> Thanks for the inputs. Please find the comments 
> 1. I will change from Union to UnionAll
> 2. For auto datamap loading, once the data is loaded to lower level 
> granularity datamap, then we load the higher level datamap from the lower 
> level datamap. But as per your point, i think you are telling to load from 
> main table itself.
> 3. similar to 2nd point, whether to need configuration or not we can decide i 
> think.
> 4. a. I think the max of the datamap is required to decide the range for the 
> load, because in case of failure case, we may need.
> b. This point will be taken care.
> 5. Yes, dataload is sync based on current design, as it is non lazy, it will 
> happen with main table load only.
> 6. Yes, this will be handled.
> 7. Already added a task in jira.
> On 2019/10/01 08:50:05, babu lal jangir  wrote: 
>> Hi Akash, Thanks for Time Series DataMap proposal.
>> Please check below Points.
>> 
>> 1. During Query Planing Change Union to Union All , Otherwise will loose
>> row if same value appears.
>> 2. Whether system start load for next granularity level table as soon it
>> matches the data condition or next granularity level table has to wait till
>> current  granularity level table is finished ? please handle if possible.
>> 3. Add Configuration to load multiple Ranges at a time(across granularity
>> tables).
>> 4. Please check if Current data loading min ,max is enough to find current
>> load . No need to refer the DataMap's min,max because data loading Range
>> prepration can go wrong if loading happens from multiple driver . i think
>> below rules are enough for loading.
>>4.a. Create MV should should sync data.   On any failure Rebuild should
>> sync again till than MV will be disabled.
>>4.b.  Each load has independent Ranges and should load only those
>> ranges. Any failure MV may go in disable state(only if intermediate ranges
>> load is failed ,last loads failure will NOT make MV disable).
>> 5. We can make Data loading sync because anyway queries can be served from
>> fact table if any segments is in-progress in  Datamap.
>> 6. In Data loading Pipleline ,failures in intermediate time series datamap,
>> still we can continue loading next level data. (ignore if already handled).
>>   For Example.
>>DataMaps:- Hour,Day,Month Level
>>Load Data(10 day):- 2018-01-01 01:00:00 to 2018-01-10 01:00:00
>>  Failure in hour level during below range
>>2018-01-06 01:00:00 to 2018-01-06 01:00:00
>> This point of time Hour level has 5 day data.so start loading on day
>> level .
>> 7. Add SubTask to support loading of in-between missing time.(Incremental
>> but old records if timeseries device stopped working for some time).
>> 
>> On Tue, Oct 1, 2019 at 10:41 AM Akash Nilugal 
>> wrote:
>> 
>>> Hi vishal,
>>> 
>>> In the design document, in the impacted analysis section, there is a topic
>>> compatibility/legacy stores, so basically For old tables when the datamap
>>> is created, we load all the timeseries datamaps with different granularity.
>>> I think this should do fine, please let me know for further
>>> suggestions/comments.
>>> 
>>> Regards,
>>> Akash R Nilugal
>>> 
>>> On 2019/09/30 17:09:44, Kumar Vishal  wrote:
 Hi Akash,
 
 In this desing document you haven't mentioned how to handle data loading
 for timeseries datamap for older segments[Existing table].
 If the customer's main table data is also stored based on time[increasing
 time] in different segments,he can use this feature as well.
 
 We can discuss and finalize the solution.
 
 -Regards
 Kumar Vishal
 
 On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal 
 wrote:
 
> Hi Ajantha,
> 
> Thanks for the queries and suggestions
> 
> 1. Yes, this is a good suggestion, i ll include this change. Both date
>>> and
> timestamp columns are supported, will be updated in document.
> 2. yes, you are right.
> 3. you are right, if the day level is not available, then we will try
>>> to
> get the whole day data from hour level, if not availaible, as
>>> explained in
> design document, w

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-04 Thread Akash Nilugal

Hi Ravi,

1. I forgot to mention the CTAS query in the create datamap statement, i have 
updated the document, during create datamap user can give granularity, during 
query just the UDF. That should be fine right.
2. I think may be we can mention the RP policy in DM properties also, and then 
may be we provide add RP, drop RP, alter RP for existing and older datamaps. RP 
will be taken as a separate subtask and will be handled in later part. That 
should be fine i tink.
3. Actually consider a scenario when datamap is already created, then load 
happened to main table, then i use accumulator to get all the min max to 
driver, so that i can avoid reading index file in driver in order to load to 
datamap. 
 other scenario is when main table already has segments and then 
datamap is created, the we will read index files from each segments to decide 
the min max of timestamp column.
4. We are not storing min max in main table  table status. We are storing in 
datamap table's table status file, so that it will be used to prepare the plan 
during the query phase.

5. Other timeseries db supports only getting the data present in hour or day .. 
aggregated data. Since we cannot miss the data, plan is to get the data like 
higher to lower. May be it does not make much difference when its from minute 
to second, but it makes difference from year to month , so that we cannot avoid 
aggregations from main table.


Regards,
Akash R Nilugal

On 2019/10/04 11:35:46, Ravindra Pesala  wrote: 
> Hi Akash,
> 
> I have following suggestions.
> 
> 1. I think it is redundant to use granularity inside create datamap, user can 
> use the respective granularity UDF in his query like time(1h) or time(1d) etc.
> 
> 2. Better create separate RP commands and let user add the RP on the datamap 
> or even on the main table also. It would be more manageable if you 
> independent feature for RP instead of including in datamap.
> 
> 3. I am not getting why exactly we need accumulator instead of using index 
> min/max? Can you explain with some scenario 
> 
> 4. Why to store min/max at segment level? We can get from datamap also right?
> 
> 4.  Union with high granularity tables to low granularity tables are really 
> needed? Any other time series DB is doing it? Or any known use case we have?
> 
> Regards,
> Ravindra.
> 
> > On 1 Oct 2019, at 5:49 PM, Akash Nilugal  wrote:
> > 
> > Hi Babu,
> > 
> > Thanks for the inputs. Please find the comments 
> > 1. I will change from Union to UnionAll
> > 2. For auto datamap loading, once the data is loaded to lower level 
> > granularity datamap, then we load the higher level datamap from the lower 
> > level datamap. But as per your point, i think you are telling to load from 
> > main table itself.
> > 3. similar to 2nd point, whether to need configuration or not we can decide 
> > i think.
> > 4. a. I think the max of the datamap is required to decide the range for 
> > the load, because in case of failure case, we may need.
> > b. This point will be taken care.
> > 5. Yes, dataload is sync based on current design, as it is non lazy, it 
> > will happen with main table load only.
> > 6. Yes, this will be handled.
> > 7. Already added a task in jira.
> > On 2019/10/01 08:50:05, babu lal jangir  wrote: 
> >> Hi Akash, Thanks for Time Series DataMap proposal.
> >> Please check below Points.
> >> 
> >> 1. During Query Planing Change Union to Union All , Otherwise will loose
> >> row if same value appears.
> >> 2. Whether system start load for next granularity level table as soon it
> >> matches the data condition or next granularity level table has to wait till
> >> current  granularity level table is finished ? please handle if possible.
> >> 3. Add Configuration to load multiple Ranges at a time(across granularity
> >> tables).
> >> 4. Please check if Current data loading min ,max is enough to find current
> >> load . No need to refer the DataMap's min,max because data loading Range
> >> prepration can go wrong if loading happens from multiple driver . i think
> >> below rules are enough for loading.
> >>4.a. Create MV should should sync data.   On any failure Rebuild should
> >> sync again till than MV will be disabled.
> >>4.b.  Each load has independent Ranges and should load only those
> >> ranges. Any failure MV may go in disable state(only if intermediate ranges
> >> load is failed ,last loads failure will NOT make MV disable).
> >> 5. We can make Data loading sync because anyway queries can be served from
> >> fact table if any segments is in-progress in  Datamap.
> >> 6. In Data loading Pipleline ,failures in intermediate time series datamap,
> >> still we can continue loading next level data. (ignore if already handled).
> >>   For Example.
> >>DataMaps:- Hour,Day,Month Level
> >>Load Data(10 day):- 2018-01-01 01:00:00 to 2018-01-10 01:00:00
> >>  Failure in hour level during below range
> >>2018-01-06 01:00:00 to 2018-01-06 01:00:00
> >> This point

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-06 Thread Ravindra Pesala

HI Akash,

1. I feel user providing granularity is redundant, he can just provide 
respective udf in select query should be enough.

2. I think it is better to add the RP management now itself, otherwise if you 
start adding to DM properties as temporary then it will never be moved. Better 
put little more effort to decouple it from datamaps.

3. I feel accumulator is a added cost, we already have feature in development 
to load datamap immediately after load happens, why not use that? If the 
datamap is already in memory why we need min/max at segment level?

4. I feel there must be some reason why other timeseries db does not support 
union of data.  Consider a scenario that we have data from 1pm to 4.30 pm , it 
means 4 to 5pm data is still loading.  when user asks the data at hour level I 
feel it is safe to give data for 1,2,3 hours data, because providing 4pm is 
actually not a complete data. So atleast user comes to know that 4 pm data is 
not available and starts querying the low level data if he needs it.
I think better get some real uses how user wants this time series data.

Regards,
Ravindra.

> On 4 Oct 2019, at 9:39 PM, Akash Nilugal  wrote:
> 
> Hi Ravi,
> 
> 1. I forgot to mention the CTAS query in the create datamap statement, i have 
> updated the document, during create datamap user can give granularity, during 
> query just the UDF. That should be fine right.
> 2. I think may be we can mention the RP policy in DM properties also, and 
> then may be we provide add RP, drop RP, alter RP for existing and older 
> datamaps. RP will be taken as a separate subtask and will be handled in later 
> part. That should be fine i tink.
> 3. Actually consider a scenario when datamap is already created, then load 
> happened to main table, then i use accumulator to get all the min max to 
> driver, so that i can avoid reading index file in driver in order to load to 
> datamap. 
> other scenario is when main table already has segments and then 
> datamap is created, the we will read index files from each segments to decide 
> the min max of timestamp column.
> 4. We are not storing min max in main table  table status. We are storing in 
> datamap table's table status file, so that it will be used to prepare the 
> plan during the query phase.
> 
> 5. Other timeseries db supports only getting the data present in hour or day 
> .. aggregated data. Since we cannot miss the data, plan is to get the data 
> like higher to lower. May be it does not make much difference when its from 
> minute to second, but it makes difference from year to month , so that we 
> cannot avoid aggregations from main table.
> 
> 
> Regards,
> Akash R Nilugal
> 
> On 2019/10/04 11:35:46, Ravindra Pesala  wrote: 
>> Hi Akash,
>> 
>> I have following suggestions.
>> 
>> 1. I think it is redundant to use granularity inside create datamap, user 
>> can use the respective granularity UDF in his query like time(1h) or 
>> time(1d) etc.
>> 
>> 2. Better create separate RP commands and let user add the RP on the datamap 
>> or even on the main table also. It would be more manageable if you 
>> independent feature for RP instead of including in datamap.
>> 
>> 3. I am not getting why exactly we need accumulator instead of using index 
>> min/max? Can you explain with some scenario 
>> 
>> 4. Why to store min/max at segment level? We can get from datamap also right?
>> 
>> 4.  Union with high granularity tables to low granularity tables are really 
>> needed? Any other time series DB is doing it? Or any known use case we have?
>> 
>> Regards,
>> Ravindra.
>> 
>>> On 1 Oct 2019, at 5:49 PM, Akash Nilugal  wrote:
>>> 
>>> Hi Babu,
>>> 
>>> Thanks for the inputs. Please find the comments 
>>> 1. I will change from Union to UnionAll
>>> 2. For auto datamap loading, once the data is loaded to lower level 
>>> granularity datamap, then we load the higher level datamap from the lower 
>>> level datamap. But as per your point, i think you are telling to load from 
>>> main table itself.
>>> 3. similar to 2nd point, whether to need configuration or not we can decide 
>>> i think.
>>> 4. a. I think the max of the datamap is required to decide the range for 
>>> the load, because in case of failure case, we may need.
>>> b. This point will be taken care.
>>> 5. Yes, dataload is sync based on current design, as it is non lazy, it 
>>> will happen with main table load only.
>>> 6. Yes, this will be handled.
>>> 7. Already added a task in jira.
>>> On 2019/10/01 08:50:05, babu lal jangir  wrote: 
 Hi Akash, Thanks for Time Series DataMap proposal.
 Please check below Points.
 
 1. During Query Planing Change Union to Union All , Otherwise will loose
 row if same value appears.
 2. Whether system start load for next granularity level table as soon it
 matches the data condition or next granularity level table has to wait till
 current  granularity level table is finished ? please handle if possible.
 3. Add Co

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-07 Thread Akash Nilugal

Hi Ravi,

1. i) During create datamap, in ctas query, user does not mention udf, so if 
granularity is present in DM properties, then internally we rewrite the ctas 
query with udf and then load the data to datamap according to current design.
   ii) but if we say user to give ctas query with udf only, then internally no 
need to rewite the query, we can just load data to it and avoid giving the 
granularity in DMproperties.
Currently im planning to do first one. Please give your input on this.

2. Ok, we will not use the RP management in DMProperties, we will use as 
separate command and do proper decoupling.

3. I think you are referring to the cache pre-priming in index server. Problem 
with this is that, we wil be not sure whether the cache loaded for the segment 
or not, because as per pre-priming design, if loading to cache fails after data 
load to main table, we ignore it as query takes care of it. So we cannot 
completely rely on this feature for min max.
So for accumulator, im not calculating again, i just take the minmax before 
writing index file in dataload and use that in driver to prepare the dataload 
ranges for datamaps.

The reason to keep the segment min max in the table status of datamap is that, 
it will be helful in RP scenarios, second is we will not be missing any data 
from loading to datamap from main table[if 1st time data came from 1 to 4:15 , 
then next we get data 5:10 to 6, then there might be chance that we can miss 
15minutes of data from 4 to 4:15]. It will be helpful in querying also. So that 
we can avoid the problem i mentioned above with datamaps loaded in cache.

4. I agree, your point is valid one. I will do more abalysis on this based on 
the user use cases and then we can decide finally. That would be better.

Please give your inputs/suggestions on the above points.

regards,
Akash R Nilugal

On 2019/10/07 03:03:35, Ravindra Pesala  wrote: 
> HI Akash,
> 
> 1. I feel user providing granularity is redundant, he can just provide 
> respective udf in select query should be enough.
> 
> 2. I think it is better to add the RP management now itself, otherwise if you 
> start adding to DM properties as temporary then it will never be moved. 
> Better put little more effort to decouple it from datamaps.
> 
> 3. I feel accumulator is a added cost, we already have feature in development 
> to load datamap immediately after load happens, why not use that? If the 
> datamap is already in memory why we need min/max at segment level?
> 
> 4. I feel there must be some reason why other timeseries db does not support 
> union of data.  Consider a scenario that we have data from 1pm to 4.30 pm , 
> it means 4 to 5pm data is still loading.  when user asks the data at hour 
> level I feel it is safe to give data for 1,2,3 hours data, because providing 
> 4pm is actually not a complete data. So atleast user comes to know that 4 pm 
> data is not available and starts querying the low level data if he needs it.
> I think better get some real uses how user wants this time series data.
> 
> Regards,
> Ravindra.
> 
> > On 4 Oct 2019, at 9:39 PM, Akash Nilugal  wrote:
> > 
> > Hi Ravi,
> > 
> > 1. I forgot to mention the CTAS query in the create datamap statement, i 
> > have updated the document, during create datamap user can give granularity, 
> > during query just the UDF. That should be fine right.
> > 2. I think may be we can mention the RP policy in DM properties also, and 
> > then may be we provide add RP, drop RP, alter RP for existing and older 
> > datamaps. RP will be taken as a separate subtask and will be handled in 
> > later part. That should be fine i tink.
> > 3. Actually consider a scenario when datamap is already created, then load 
> > happened to main table, then i use accumulator to get all the min max to 
> > driver, so that i can avoid reading index file in driver in order to load 
> > to datamap. 
> > other scenario is when main table already has segments and then 
> > datamap is created, the we will read index files from each segments to 
> > decide the min max of timestamp column.
> > 4. We are not storing min max in main table  table status. We are storing 
> > in datamap table's table status file, so that it will be used to prepare 
> > the plan during the query phase.
> > 
> > 5. Other timeseries db supports only getting the data present in hour or 
> > day .. aggregated data. Since we cannot miss the data, plan is to get the 
> > data like higher to lower. May be it does not make much difference when its 
> > from minute to second, but it makes difference from year to month , so that 
> > we cannot avoid aggregations from main table.
> > 
> > 
> > Regards,
> > Akash R Nilugal
> > 
> > On 2019/10/04 11:35:46, Ravindra Pesala  wrote: 
> >> Hi Akash,
> >> 
> >> I have following suggestions.
> >> 
> >> 1. I think it is redundant to use granularity inside create datamap, user 
> >> can use the respective granularity UDF in his query like time(1h) or

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-07 Thread Ravindra Pesala

Hi Akash,

1. It is better to make it simple and let user provide the udf he wants in the 
query. So no need to rewrite the query and no need provide extra granularity 
property.

3. I got your point why you want to use accumulator to get min/max. But why I 
am worried is it should not add complexity to generate min/max as we already 
has this information available. I don’t think we should be so bothered about 
reading min/max on data loading phase as it is already heavy duty job and 
adding few more mills does not do any harm. But as you mentioned it is easier 
to do so we can go a head your way.


Regards,
Ravindra.

> On 7 Oct 2019, at 5:38 PM, Akash Nilugal  wrote:
> 
> Hi Ravi,
> 
> 1. i) During create datamap, in ctas query, user does not mention udf, so if 
> granularity is present in DM properties, then internally we rewrite the ctas 
> query with udf and then load the data to datamap according to current design.
>   ii) but if we say user to give ctas query with udf only, then internally no 
> need to rewite the query, we can just load data to it and avoid giving the 
> granularity in DMproperties.
>   Currently im planning to do first one. Please give your input on this.
> 
> 2. Ok, we will not use the RP management in DMProperties, we will use as 
> separate command and do proper decoupling.
> 
> 3. I think you are referring to the cache pre-priming in index server. 
> Problem with this is that, we wil be not sure whether the cache loaded for 
> the segment or not, because as per pre-priming design, if loading to cache 
> fails after data load to main table, we ignore it as query takes care of it. 
> So we cannot completely rely on this feature for min max.
> So for accumulator, im not calculating again, i just take the minmax before 
> writing index file in dataload and use that in driver to prepare the dataload 
> ranges for datamaps.
> 
> The reason to keep the segment min max in the table status of datamap is 
> that, it will be helful in RP scenarios, second is we will not be missing any 
> data from loading to datamap from main table[if 1st time data came from 1 to 
> 4:15 , then next we get data 5:10 to 6, then there might be chance that we 
> can miss 15minutes of data from 4 to 4:15]. It will be helpful in querying 
> also. So that we can avoid the problem i mentioned above with datamaps loaded 
> in cache.
> 
> 4. I agree, your point is valid one. I will do more abalysis on this based on 
> the user use cases and then we can decide finally. That would be better.
> 
> Please give your inputs/suggestions on the above points.
> 
> regards,
> Akash R Nilugal
> 
> On 2019/10/07 03:03:35, Ravindra Pesala  wrote: 
>> HI Akash,
>> 
>> 1. I feel user providing granularity is redundant, he can just provide 
>> respective udf in select query should be enough.
>> 
>> 2. I think it is better to add the RP management now itself, otherwise if 
>> you start adding to DM properties as temporary then it will never be moved. 
>> Better put little more effort to decouple it from datamaps.
>> 
>> 3. I feel accumulator is a added cost, we already have feature in 
>> development to load datamap immediately after load happens, why not use 
>> that? If the datamap is already in memory why we need min/max at segment 
>> level?
>> 
>> 4. I feel there must be some reason why other timeseries db does not support 
>> union of data.  Consider a scenario that we have data from 1pm to 4.30 pm , 
>> it means 4 to 5pm data is still loading.  when user asks the data at hour 
>> level I feel it is safe to give data for 1,2,3 hours data, because providing 
>> 4pm is actually not a complete data. So atleast user comes to know that 4 pm 
>> data is not available and starts querying the low level data if he needs it.
>> I think better get some real uses how user wants this time series data.
>> 
>> Regards,
>> Ravindra.
>> 
>>> On 4 Oct 2019, at 9:39 PM, Akash Nilugal  wrote:
>>> 
>>> Hi Ravi,
>>> 
>>> 1. I forgot to mention the CTAS query in the create datamap statement, i 
>>> have updated the document, during create datamap user can give granularity, 
>>> during query just the UDF. That should be fine right.
>>> 2. I think may be we can mention the RP policy in DM properties also, and 
>>> then may be we provide add RP, drop RP, alter RP for existing and older 
>>> datamaps. RP will be taken as a separate subtask and will be handled in 
>>> later part. That should be fine i tink.
>>> 3. Actually consider a scenario when datamap is already created, then load 
>>> happened to main table, then i use accumulator to get all the min max to 
>>> driver, so that i can avoid reading index file in driver in order to load 
>>> to datamap. 
>>>other scenario is when main table already has segments and then 
>>> datamap is created, the we will read index files from each segments to 
>>> decide the min max of timestamp column.
>>> 4. We are not storing min max in main table  table status. We are storing 
>>> in datamap table'

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-07 Thread Manhua

UDF might have performance problem: Spark built-in UDF  vs  Spark UDF  vs  Hive 
UDF have some different


On 2019/10/07 10:26:07, Ravindra Pesala  wrote: 
> Hi Akash,
> 
> 1. It is better to make it simple and let user provide the udf he wants in 
> the query. So no need to rewrite the query and no need provide extra 
> granularity property.
> 
> 3. I got your point why you want to use accumulator to get min/max. But why I 
> am worried is it should not add complexity to generate min/max as we already 
> has this information available. I don’t think we should be so bothered about 
> reading min/max on data loading phase as it is already heavy duty job and 
> adding few more mills does not do any harm. But as you mentioned it is easier 
> to do so we can go a head your way.
> 
> 
> Regards,
> Ravindra.
> 
> > On 7 Oct 2019, at 5:38 PM, Akash Nilugal  wrote:
> > 
> > Hi Ravi,
> > 
> > 1. i) During create datamap, in ctas query, user does not mention udf, so 
> > if granularity is present in DM properties, then internally we rewrite the 
> > ctas query with udf and then load the data to datamap according to current 
> > design.
> >   ii) but if we say user to give ctas query with udf only, then internally 
> > no need to rewite the query, we can just load data to it and avoid giving 
> > the granularity in DMproperties.
> > Currently im planning to do first one. Please give your input on this.
> > 
> > 2. Ok, we will not use the RP management in DMProperties, we will use as 
> > separate command and do proper decoupling.
> > 
> > 3. I think you are referring to the cache pre-priming in index server. 
> > Problem with this is that, we wil be not sure whether the cache loaded for 
> > the segment or not, because as per pre-priming design, if loading to cache 
> > fails after data load to main table, we ignore it as query takes care of 
> > it. So we cannot completely rely on this feature for min max.
> > So for accumulator, im not calculating again, i just take the minmax before 
> > writing index file in dataload and use that in driver to prepare the 
> > dataload ranges for datamaps.
> > 
> > The reason to keep the segment min max in the table status of datamap is 
> > that, it will be helful in RP scenarios, second is we will not be missing 
> > any data from loading to datamap from main table[if 1st time data came from 
> > 1 to 4:15 , then next we get data 5:10 to 6, then there might be chance 
> > that we can miss 15minutes of data from 4 to 4:15]. It will be helpful in 
> > querying also. So that we can avoid the problem i mentioned above with 
> > datamaps loaded in cache.
> > 
> > 4. I agree, your point is valid one. I will do more abalysis on this based 
> > on the user use cases and then we can decide finally. That would be better.
> > 
> > Please give your inputs/suggestions on the above points.
> > 
> > regards,
> > Akash R Nilugal
> > 
> > On 2019/10/07 03:03:35, Ravindra Pesala  wrote: 
> >> HI Akash,
> >> 
> >> 1. I feel user providing granularity is redundant, he can just provide 
> >> respective udf in select query should be enough.
> >> 
> >> 2. I think it is better to add the RP management now itself, otherwise if 
> >> you start adding to DM properties as temporary then it will never be 
> >> moved. Better put little more effort to decouple it from datamaps.
> >> 
> >> 3. I feel accumulator is a added cost, we already have feature in 
> >> development to load datamap immediately after load happens, why not use 
> >> that? If the datamap is already in memory why we need min/max at segment 
> >> level?
> >> 
> >> 4. I feel there must be some reason why other timeseries db does not 
> >> support union of data.  Consider a scenario that we have data from 1pm to 
> >> 4.30 pm , it means 4 to 5pm data is still loading.  when user asks the 
> >> data at hour level I feel it is safe to give data for 1,2,3 hours data, 
> >> because providing 4pm is actually not a complete data. So atleast user 
> >> comes to know that 4 pm data is not available and starts querying the low 
> >> level data if he needs it.
> >> I think better get some real uses how user wants this time series data.
> >> 
> >> Regards,
> >> Ravindra.
> >> 
> >>> On 4 Oct 2019, at 9:39 PM, Akash Nilugal  wrote:
> >>> 
> >>> Hi Ravi,
> >>> 
> >>> 1. I forgot to mention the CTAS query in the create datamap statement, i 
> >>> have updated the document, during create datamap user can give 
> >>> granularity, during query just the UDF. That should be fine right.
> >>> 2. I think may be we can mention the RP policy in DM properties also, and 
> >>> then may be we provide add RP, drop RP, alter RP for existing and older 
> >>> datamaps. RP will be taken as a separate subtask and will be handled in 
> >>> later part. That should be fine i tink.
> >>> 3. Actually consider a scenario when datamap is already created, then 
> >>> load happened to main table, then i use accumulator to get all the min 
> >>> max to driver, so that i can avoid r

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-10-21 Thread Akash Nilugal

Hi All,

Based on further analysis with druid and influxdb, current design fails to 
cover the late data arrived to load. So i have updated the design document 
based on that to support late data and attached in jira. Please help to review 
it and suggestions are welcomed.

Regards,
Akash

On 2019/09/23 13:42:48, Akash Nilugal  wrote: 
> Hi Community,
> 
> Timeseries data are simply measurements or events that are
> tracked,monitored, downsampled, and aggregated over time.
> Basicallytimeseries data analysis helps in analyzing or monitoring
> theaggregated data over period of time to take better decision forbusiness.
> So since carbondata supports olap datamap like preaggregate, MV and since
> time series is of atmost importance,
> we can supporttimeseries for carbondata over MV datamap model.
> 
> Currentlycarbondata supports timeseries on preaggregate datamap, but its
> analpha feature and there are so many limitations when we compare and
> analyze the existing timeseries database or projects which supportstime
> series like apache druid or influxdb. So, in this feature we can support
> timeseries
> by avoiding the limitations in the current system. After doing the analysis
> on the current existing timeseries database like influxdb, and the apache
> druid,
> i have  prepared a solution/design document. Any inputs, improvements or
> suggestion are most welcome.
> 
> I have created jira https://issues.apache.org/jira/browse/CARBONDATA-3525 for
> this. Later i will create sub jiras for tracking.
> 
> 
> Regards,
> Akash R Nilugal
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-11-18 Thread Kumar Vishal

+1
-Regards
Kumar Vishal

On Mon, Oct 21, 2019 at 5:46 PM Akash Nilugal 
wrote:

> Hi All,
>
> Based on further analysis with druid and influxdb, current design fails to
> cover the late data arrived to load. So i have updated the design document
> based on that to support late data and attached in jira. Please help to
> review it and suggestions are welcomed.
>
> Regards,
> Akash
>
> On 2019/09/23 13:42:48, Akash Nilugal  wrote:
> > Hi Community,
> >
> > Timeseries data are simply measurements or events that are
> > tracked,monitored, downsampled, and aggregated over time.
> > Basicallytimeseries data analysis helps in analyzing or monitoring
> > theaggregated data over period of time to take better decision
> forbusiness.
> > So since carbondata supports olap datamap like preaggregate, MV and since
> > time series is of atmost importance,
> > we can supporttimeseries for carbondata over MV datamap model.
> >
> > Currentlycarbondata supports timeseries on preaggregate datamap, but its
> > analpha feature and there are so many limitations when we compare and
> > analyze the existing timeseries database or projects which supportstime
> > series like apache druid or influxdb. So, in this feature we can support
> > timeseries
> > by avoiding the limitations in the current system. After doing the
> analysis
> > on the current existing timeseries database like influxdb, and the apache
> > druid,
> > i have  prepared a solution/design document. Any inputs, improvements or
> > suggestion are most welcome.
> >
> > I have created jira
> https://issues.apache.org/jira/browse/CARBONDATA-3525 for
> > this. Later i will create sub jiras for tracking.
> >
> >
> > Regards,
> > Akash R Nilugal
> >
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

2019-11-18 Thread Kunal Kapoor

+1


Regards
Kunal Kapoor

On Mon, Nov 18, 2019, 3:47 PM Kumar Vishal 
wrote:

> +1
> -Regards
> Kumar Vishal
>
> On Mon, Oct 21, 2019 at 5:46 PM Akash Nilugal 
> wrote:
>
> > Hi All,
> >
> > Based on further analysis with druid and influxdb, current design fails
> to
> > cover the late data arrived to load. So i have updated the design
> document
> > based on that to support late data and attached in jira. Please help to
> > review it and suggestions are welcomed.
> >
> > Regards,
> > Akash
> >
> > On 2019/09/23 13:42:48, Akash Nilugal  wrote:
> > > Hi Community,
> > >
> > > Timeseries data are simply measurements or events that are
> > > tracked,monitored, downsampled, and aggregated over time.
> > > Basicallytimeseries data analysis helps in analyzing or monitoring
> > > theaggregated data over period of time to take better decision
> > forbusiness.
> > > So since carbondata supports olap datamap like preaggregate, MV and
> since
> > > time series is of atmost importance,
> > > we can supporttimeseries for carbondata over MV datamap model.
> > >
> > > Currentlycarbondata supports timeseries on preaggregate datamap, but
> its
> > > analpha feature and there are so many limitations when we compare and
> > > analyze the existing timeseries database or projects which supportstime
> > > series like apache druid or influxdb. So, in this feature we can
> support
> > > timeseries
> > > by avoiding the limitations in the current system. After doing the
> > analysis
> > > on the current existing timeseries database like influxdb, and the
> apache
> > > druid,
> > > i have  prepared a solution/design document. Any inputs, improvements
> or
> > > suggestion are most welcome.
> > >
> > > I have created jira
> > https://issues.apache.org/jira/browse/CARBONDATA-3525 for
> > > this. Later i will create sub jiras for tracking.
> > >
> > >
> > > Regards,
> > > Akash R Nilugal
> > >
> >
>

[DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

23 matches

Site Navigation

Mail list logo

Footer information