Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Kumar Vishal Mon, 30 Sep 2019 10:10:48 -0700

Hi Akash,

In this desing document you haven't mentioned how to handle data loading
for timeseries datamap for older segments[Existing table].
If the customer's main table data is also stored based on time[increasing
time] in different segments,he can use this feature as well.


We can discuss and finalize the solution.

-Regards
Kumar Vishal

On Mon, Sep 30, 2019 at 2:42 PM Akash Nilugal <akashnilu...@gmail.com>
wrote:

> Hi Ajantha,
>
> Thanks for the queries and suggestions
>
> 1. Yes, this is a good suggestion, i ll include this change. Both date and
> timestamp columns are supported, will be updated in document.
> 2. yes, you are right.
> 3. you are right, if the day level is not available, then we will try to
> get the whole day data from hour level, if not availaible, as explained in
> design document, we will get the data from datamap UNION data from main
> table based on user query.
>
> Regards,
> Akash R Nilugal
>
>
> On 2019/09/30 06:56:45, Ajantha Bhat <ajanthab...@gmail.com> wrote:
> > + 1 ,
> >
> > I have some suggestions and questions.
> >
> > 1. In DMPROPERTIES, instead of 'timestamp_column' suggest to use
> > 'timeseries_column'.
> >  so that it won't give an impression that only time stamp datatype is
> > supported and update the document with all the datatype supported.
> >
> > 2. Querying on this datamap table is also supported right ? supporting
> > changing plan for main table to refer datamap table is for user to avoid
> > changing his query or any other reason ?
> >
> > 3. If user has not created day granularity datamap, but just created hour
> > granularity datamap. When query has day granularity, data will be fetched
> > form hour granularity datamap and aggregated ? or data is fetched from
> main
> > table ?
> >
> > Thanks,
> > Ajantha
> >
> > On Mon, Sep 30, 2019 at 11:46 AM Akash Nilugal <akashnilu...@gmail.com>
> > wrote:
> >
> > > Hi xuchuanyin,
> > >
> > > Thanks for the comments/Suggestions
> > >
> > > 1. Preaggregate is productized, but not the timeseries with
> preaggregate,
> > > i think you  got confused with that, if im right.
> > > 2. Limitations like, auto sampling or rollup, which we will be
> supporting
> > > now. Retention policies. etc
> > > 3. segmentTimestampMin, this i will consider in design.
> > > 4. RP is added as a separate task, i thought instead of maintaining two
> > > variables better to maintabin one and parse it. But i will consider
> your
> > > point based on feasibility during implementation.
> > > 5. We use an accumulator which takes list, so before writing index
> files
> > > we take the min max of the timestamp column and fill in accumulator and
> > > then we can access accumulator.value in driver after load is finished.
> > >
> > > Regards,
> > > Akash R Nilugal
> > >
> > > On 2019/09/28 10:46:31, xuchuanyin <xuchuan...@apache.org> wrote:
> > > > Hi akash, glad to see the feature proposed and I have some questions
> > > about
> > > > this. Please notice that some of the following descriptions are
> comments
> > > > followed by '===' described in the design document attached in the
> > > > corresponding jira.
> > > >
> > > > 1.
> > > > "Currently carbondata supports timeseries on preaggregate datamap,
> but
> > > its
> > > > an alpha feature"
> > > > ===
> > > > It has been some time since the preaggregate datamap was introduced
> and
> > > it
> > > > is still **alpha**, why it is still not product-ready? Will the new
> > > feature
> > > > also come into the similar situation?
> > > >
> > > > 2.
> > > > "there are so many limitations when we compare and analyze the
> existing
> > > > timeseries database or projects which supports time series like
> apache
> > > druid
> > > > or influxdb"
> > > > ===
> > > > What are the actual limitations? Besides, please give an example of
> this.
> > > >
> > > > 3.
> > > > "Segment_Timestamp_Min"
> > > > ===
> > > > Suggest using camel-case style like 'segmentTimestampMin'
> > > >
> > > > 4.
> > > > "RP is way of telling the system, for how long the data should be
> kept"
> > > > ===
> > > > Since the function is simple, I'd suggest using 'retentionTime'=15
> and
> > > > 'timeUnit'='day' instead of 'RP'='15_days'
> > > >
> > > > 5.
> > > > "When the data load is called for main table, use an spark
> accumulator to
> > > > get the maximum value of timestamp in that load and return to the
> load."
> > > > ===
> > > > How can you get the spark accumulator? The load is launched using
> > > > loading-by-dataframe not using global-sort-by-spark.
> > > >
> > > > 6.
> > > > For the rest of the content, still reading.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from:
> > >
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> > > >
> > >
> >
>

Re: [DISCUSSION] Support Time Series for MV datamap and autodatamap loading of timeseries datamaps

Reply via email to