Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread manish gupta
+1

It is a good feature to have. Once the design document is uploaded we will
get a better idea of how it will be implemented.

Regards
Manish Gupta

On Tue, Jun 5, 2018 at 11:18 AM, Kumar Vishal 
wrote:

> Hi Xuchuanyin,
>
> I am working on design document, and all the points you have mentioned I
> have already captured. I will share once it is finished.
>
> -Regards
> Kumar Vishal
>
> On Tue, Jun 5, 2018 at 9:22 AM, xuchuanyin  wrote:
>
> > Hi, Kumar:
> >   Local dictionary will be nice feature and other formats like parquet
> all
> > support this.
> >
> >   My concern is that: How will you implement this feature?
> >
> >   1. What's the scope of the `local`? Page level (for all containing
> rows),
> > Blocklet level (for all containing pages), Block level(for all containing
> > blocklets)?
> >
> >   2. Where will you store the local dictionary?
> >
> >   3. How do you decide to enable the local dictionary for a column?
> >
> >   4. Have you considered to fall back to plain encoding if the local
> > dictionary encoding consumes more space?
> >
> >   5. Will you still work on V3 format or start a new V4 (or v3.1)
> version?
> >
> >   Anyway, I'm concerning about the data loading performance. Please pay
> > attention to it while you are implementing this feature.
> >
> >
> >
> > --
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> > n5.nabble.com/
> >
>


Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Kumar Vishal
Hi Xuchuanyin,

I am working on design document, and all the points you have mentioned I
have already captured. I will share once it is finished.

-Regards
Kumar Vishal

On Tue, Jun 5, 2018 at 9:22 AM, xuchuanyin  wrote:

> Hi, Kumar:
>   Local dictionary will be nice feature and other formats like parquet all
> support this.
>
>   My concern is that: How will you implement this feature?
>
>   1. What's the scope of the `local`? Page level (for all containing rows),
> Blocklet level (for all containing pages), Block level(for all containing
> blocklets)?
>
>   2. Where will you store the local dictionary?
>
>   3. How do you decide to enable the local dictionary for a column?
>
>   4. Have you considered to fall back to plain encoding if the local
> dictionary encoding consumes more space?
>
>   5. Will you still work on V3 format or start a new V4 (or v3.1) version?
>
>   Anyway, I'm concerning about the data loading performance. Please pay
> attention to it while you are implementing this feature.
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Ravindra Pesala
Hi Vishal,

+1

Thank you for starting a discussion on it. It will be a very helpful
feature to improve query performance and reduces the memory footprint.
Please add the design document for the same.

Regards,
Ravindra.

On 5 June 2018 at 09:22, xuchuanyin  wrote:

> Hi, Kumar:
>   Local dictionary will be nice feature and other formats like parquet all
> support this.
>
>   My concern is that: How will you implement this feature?
>
>   1. What's the scope of the `local`? Page level (for all containing rows),
> Blocklet level (for all containing pages), Block level(for all containing
> blocklets)?
>
>   2. Where will you store the local dictionary?
>
>   3. How do you decide to enable the local dictionary for a column?
>
>   4. Have you considered to fall back to plain encoding if the local
> dictionary encoding consumes more space?
>
>   5. Will you still work on V3 format or start a new V4 (or v3.1) version?
>
>   Anyway, I'm concerning about the data loading performance. Please pay
> attention to it while you are implementing this feature.
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>



-- 
Thanks & Regards,
Ravi


Re: [Discussion] Carbon Local Dictionary Support

2018-06-04 Thread xuchuanyin
Hi, Kumar:
  Local dictionary will be nice feature and other formats like parquet all
support this.

  My concern is that: How will you implement this feature?

  1. What's the scope of the `local`? Page level (for all containing rows),
Blocklet level (for all containing pages), Block level(for all containing
blocklets)?

  2. Where will you store the local dictionary?

  3. How do you decide to enable the local dictionary for a column?

  4. Have you considered to fall back to plain encoding if the local
dictionary encoding consumes more space?

  5. Will you still work on V3 format or start a new V4 (or v3.1) version?

  Anyway, I'm concerning about the data loading performance. Please pay
attention to it while you are implementing this feature.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[Discussion] Carbon Local Dictionary Support

2018-06-04 Thread Kumar Vishal
 Hi Community,Currently CarbonData supports global dictionary or
No-Dictionary (Plain-Text stored in LV format) for storing dimension column
data.

*Bottleneck with Global Dictionary*

   1.

   As dictionary file is mutable file, so it is not possible to support
   global dictionary in storage environment which does not support append.
   2.

   It’s difficult for user to determine whether the column should be
   dictionary or not if number of columns in table is high.
   3.

   Global dictionary generation generally slows down the load process

*Bottleneck with No-Dictionary*

   1.

   Storage size is high
   2.

   Query on No-Dictionary column is slower as data read/processed is more
   3.

   Filtering is slower on No-Dictionary columns as number of comparison is
   high
   4.

   Memory footprint is high

The above bottlenecks can be solved by *Generating Local dictionary for low
cardinality columns at each blocklet level, *which will help to achieve
below benefits:

   1.

   This will help in supporting dictionary generation on different storage
   environment irrespective of its supported operations(append) on the files.
   2.

   Reduces the extra IO operations read/write on the dictionary files
   generated in case of global dictionary.
   3.

   It will eliminate the problem for user to identify the dictionary
   columns when the number of columns are more in a table.
   4.

   It helps in getting more compression on dimension columns with less
   cardinality.
   5.

   Filter query on No-dictionary columns with local dictionary will be
   faster as filter will be done on encoded data.
   6.

   It will help in reducing the store size and memory footprint as only
   unique values will be stored as part of local dictionary and
   corresponding data will be stored as encoded data.

Please provide your comment. Any suggestion from community is most
welcomed. Please let me know for any clarification.

-Regards
Kumar Vishal


Re: Support updating/deleting data for stream table

2018-06-04 Thread Raghunandan S
Hi,
Those are 2 steps in the same solution. Not different solutions. We can
create jira considering all and implement only the part. The parent jira
would get closed when all the child jira are implemented

Regards
Raghu

On Sun, 3 Jun 2018, 1:07 pm Liang Chen,  wrote:

> Hi
>
> +1 for first considering solution1
>
> Regards
> Liang
>
> xm_zzc wrote
> > Hi  Raghu:
> >   Yep, you are right, so I said solution 1 is not very precise when there
> > are still some data you want to update/delete being stored in stream
> > segments, solution 2 can handle this scenario you mentioned.
> >   But, in my opinion, the scenario of deleting historical data is more
> > common than the one of updating data, the data size of stream table will
> > grow day by day, user generally want to delete specific data to make data
> > size not too large, for example, if user want to keep data for one year,
> > he
> > need to delete one year ago of data everyday. On the other hand, solution
> > 2
> > is more complicated than solution 1, we need to consider the implement of
> > solution 2 in depth.
> >   Based on the above reasons, Liang Chen, Jacky, David and I prefered to
> > implement Solution 1 first. Is it ok for you?
> >
> >   Is there any other suggestion?
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>