Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Akash Nilugal Tue, 01 Oct 2019 02:04:33 -0700

Hi vishal,

I got your point, i have changed accordingly and updated the document in jira, 
please check


Regards,
Akash R Nilugal

On 2019/09/30 17:09:44, Kumar Vishal <kumarvishal1...@gmail.com> wrote: 
> Hi Akash,
> 
> In this desing document you haven't mentioned how to handle data loading
> for timeseries datamap for older segments[Existing table].
> If the customer's main table data is also stored based on time[increasing
> time] in different segments,he can use this feature as well.
> 
> We can discuss and finalize the solution.
> 
> -Regards
> Kumar Vishal
> 
> On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal <akashnilu...@gmail.com>
> wrote:
> 
> > Hi Ajantha,
> >
> > Thanks for the queries and suggestions
> >
> > 1. Yes, this is a good suggestion, i ll include this change. Both date and
> > timestamp columns are supported, will be updated in document.
> > 2. yes, you are right.
> > 3. you are right, if the day level is not available, then we will try to
> > get the whole day data from hour level, if not availaible, as explained in
> > design document, we will get the data from datamap UNION data from main
> > table based on user query.
> >
> > Regards,
> > Akash R Nilugal
> >
> >
> > On 2019/09/30 06:56:45, Ajantha Bhat <ajanthab...@gmail.com> wrote:
> > > + 1 ,
> > >
> > > I have some suggestions and questions.
> > >
> > > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > > 'timeseries_column'.
> > >  so that it won't give an impression that only time stamp datatype is
> > > supported and update the document with all the datatype supported.
> > >
> > > 2. Querying on this datamap table is also supported right ? supporting
> > > changing plan for main table to refer datamap table is for user to avoid
> > > changing his query or any other reason ?
> > >
> > > 3. If user has not created day granularity datamap, but just created hour
> > > granularity datamap. When query has day granularity, data will be fetched
> > > form hour granularity datamap and aggregated ? or data is fetched from
> > main
> > > table ?
> > >
> > > Thanks,
> > > Ajantha
> > >
> > > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal <akashnilu...@gmail.com>
> > > wrote:
> > >
> > > > Hi xuchuanyin,
> > > >
> > > > Thanks for the comments/Suggestions
> > > >
> > > > 1. Preaggregate is productized, but not the timeseries with
> > preaggregate,
> > > > i think you  got confused with that, if im right.
> > > > 2. Limitations like, auto sampling or rollup, which we will be
> > supporting
> > > > now. Retention policies. etc
> > > > 3. segmentTimestampMin, this i will consider in design.
> > > > 4. RP is added as a separate task, i thought instead of maintaining two
> > > > variables better to maintabin one and parse it. But i will consider
> > your
> > > > point based on feasibility during implementation.
> > > > 5. We use an accumulator which takes list, so before writing index
> > files
> > > > we take the min max of the timestamp column and fill in accumulator and
> > > > then we can access accumulator.value in driver after load is finished.
> > > >
> > > > Regards,
> > > > Akash R Nilugal
> > > >
> > > > On 2019/09/28 10:46:31, xuchuanyin <xuchuan...@apache.org> wrote:
> > > > > Hi akash, glad to see the feature proposed and I have some questions
> > > > about
> > > > > this. Please notice that some of the following descriptions are
> > comments
> > > > > followed by '===' described in the design document attached in the
> > > > > corresponding jira.
> > > > >
> > > > > 1.
> > > > > "Currently carbondata supports timeseries on preaggregate datamap,
> > but
> > > > its
> > > > > an alpha feature"
> > > > > ===
> > > > > It has been some time since the preaggregate datamap was introduced
> > and
> > > > it
> > > > > is still **alpha**, why it is still not product-ready? Will the new
> > > > feature
> > > > > also come into the similar situation?
> > > > >
> > > > > 2.
> > > > > "there are so many limitations when we compare and analyze the
> > existing
> > > > > timeseries database or projects which supports time series like
> > apache
> > > > druid
> > > > > or influxdb"
> > > > > ===
> > > > > What are the actual limitations? Besides, please give an example of
> > this.
> > > > >
> > > > > 3.
> > > > > "Segment_Timestamp_Min"
> > > > > ===
> > > > > Suggest using camel-case style like 'segmentTimestampMin'
> > > > >
> > > > > 4.
> > > > > "RP is way of telling the system, for how long the data should be
> > kept"
> > > > > ===
> > > > > Since the function is simple, I'd suggest using 'retentionTime'=15
> > and
> > > > > 'timeUnit'='day' instead of 'RP'='15_days'
> > > > >
> > > > > 5.
> > > > > "When the data load is called for main table, use an spark
> > accumulator to
> > > > > get the maximum value of timestamp in that load and return to the
> > load."
> > > > > ===
> > > > > How can you get the spark accumulator? The load is launched using
> > > > > loading-by-dataframe not using global-sort-by-spark.
> > > > >
> > > > > 6.
> > > > > For the rest of the content, still reading.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sent from:
> > > >
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Reply via email to