Re: About bucket feature in carbon

2018-02-09 Thread Ravindra Pesala
Yes Jacky, we will do refactor and use the partition flow.

On 9 February 2018 at 13:44, Jacky Li <13561...@qq.com> wrote:

> Hi Ravindra,
>
> You mean we can do one round of refactory for bucketed table feature in
> CarbonData 1.4.
> I am fine with it.
>
> Regards,
> Jacky
>
>
> > 在 2018年2月9日,下午3:49,Ravindra Pesala  写道:
> >
> > Hi Likun,
> >
> > I feel it is better to change the implementation to use sparks bucketing
> > generation just like how standard hive partitions generates. It will be
> > easy to change it after implementing of partition feature. And it is a
> > useful feature for joining big tables and hash based buckets and
> clustered
> > by enables the queries faster.  So it is better to change the
> > implementation instead of removing it.
> >
> > Regards,
> > Ravindra.
> >
> > On 9 February 2018 at 13:14, Jacky Li  wrote:
> >
> >> Hi,
> >>
> >> One year ago, CarbonData 1.0.0 has introduced bucket table feature, it
> was
> >> expected to improve join performance by avoiding shuffling if both
> tables
> >> are bucketed on same column with same number of buckets.
> >>
> >> However, after this feature was introduced, personally speaking it was
> not
> >> widely used in the community and it creates maintenance overhead for the
> >> developers in the community (for very new Pull Request, all bucket
> related
> >> testcase need to be fixed)
> >>
> >> And now carbon has integrated with spark standard partition, developer
> can
> >> add bucket support using spark bucketed table feature in future if it
> >> requires.
> >>
> >> So, I propose to remove bucket feature after CarbonData 1.3.0 version.
> >> What do you think?
> >>
> >> Regards,
> >> Jacky
> >>
> >>
> >
> >
> > --
> > Thanks & Regards,
> > Ravi
>
>
>
>


-- 
Thanks & Regards,
Ravi


Re: About bucket feature in carbon

2018-02-09 Thread Jacky Li
Hi Ravindra,

You mean we can do one round of refactory for bucketed table feature in 
CarbonData 1.4.
I am fine with it.

Regards,
Jacky


> 在 2018年2月9日,下午3:49,Ravindra Pesala  写道:
> 
> Hi Likun,
> 
> I feel it is better to change the implementation to use sparks bucketing
> generation just like how standard hive partitions generates. It will be
> easy to change it after implementing of partition feature. And it is a
> useful feature for joining big tables and hash based buckets and clustered
> by enables the queries faster.  So it is better to change the
> implementation instead of removing it.
> 
> Regards,
> Ravindra.
> 
> On 9 February 2018 at 13:14, Jacky Li  wrote:
> 
>> Hi,
>> 
>> One year ago, CarbonData 1.0.0 has introduced bucket table feature, it was
>> expected to improve join performance by avoiding shuffling if both tables
>> are bucketed on same column with same number of buckets.
>> 
>> However, after this feature was introduced, personally speaking it was not
>> widely used in the community and it creates maintenance overhead for the
>> developers in the community (for very new Pull Request, all bucket related
>> testcase need to be fixed)
>> 
>> And now carbon has integrated with spark standard partition, developer can
>> add bucket support using spark bucketed table feature in future if it
>> requires.
>> 
>> So, I propose to remove bucket feature after CarbonData 1.3.0 version.
>> What do you think?
>> 
>> Regards,
>> Jacky
>> 
>> 
> 
> 
> -- 
> Thanks & Regards,
> Ravi





Re: [Discussion] Implement Lucene DataMap to support full text search

2018-02-09 Thread Jacky Li
Hi,

Thanks for listing these sub-tasks.
I think we can start with first three tasks, to simply enable write text index 
and query on text index. Other sub tasks (4 to 10) can be picked up by 
community developers.

Regards,
Jacky

> 在 2018年2月7日,下午5:25,David CaiQiang  写道:
> 
> Hi all,
>Let's discuss to support full-text search.  
> 
>A solution is embedding Lucene search library, index text columns for
> each segment and support searching on text columns.
> 
>Listed some sub-tasks as following.
> 
>1). create Lucene DataMap with 'text_columns' property and build Lucene
> DataMap for all exists segments
> 
>   create datamap  on 
>   using 'lucene' 
>   dmproperties('text_columns'='col1,col2') 
> 
>2). load data should build Lucene DataMap for the segment 
> 
>3). query with Lucene DataMap while filters contain match UDF
> 
>4). compaction should rebuild Lucene DataMap for the new segment
> 
>5). update and delete data should sync Lucene DataMap 
> 
>6). show DataMap for Lucene DataMap
> 
>7). delete segment should remove Lucene DataMap of this segment
> 
>8). drop table should remove Lucene DataMap of all segments
> 
>9). block streaming feature if the table has Lucene DataMap
> 
>10). Pre-aggregate DataMap feature not support match UDF
> 
>Any suggestion, any question?
> 
> 
> -
> Best Regards
> David Cai
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/



Re: [Discussion] Implement Lucene DataMap to support full text search

2018-02-09 Thread David CaiQiang
It will be an independent module.
The layout maybe like this:

carbondata
   |_ datamap
   |__ lucene



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Discussion] Implement Lucene DataMap to support full text search

2018-02-09 Thread 徐传印

Where will the code of this feature be?
I think it will lie in a separate module. It would be better to treat datamaps 
as plugins and not strongly involved with carbondata core/processing module.

> -Original Messages-
> From: "David CaiQiang" 
> Sent Time: 2018-02-07 17:25:39 (Wednesday)
> To: dev@carbondata.apache.org
> Cc: 
> Subject: [Discussion] Implement Lucene DataMap to support full text search
> 
> Hi all,
> Let's discuss to support full-text search.  
> 
> A solution is embedding Lucene search library, index text columns for
> each segment and support searching on text columns.
> 
> Listed some sub-tasks as following.
> 
> 1). create Lucene DataMap with 'text_columns' property and build Lucene
> DataMap for all exists segments
> 
>create datamap  on 
>using 'lucene' 
>dmproperties('text_columns'='col1,col2') 
> 
> 2). load data should build Lucene DataMap for the segment 
> 
> 3). query with Lucene DataMap while filters contain match UDF
> 
> 4). compaction should rebuild Lucene DataMap for the new segment
> 
> 5). update and delete data should sync Lucene DataMap 
> 
> 6). show DataMap for Lucene DataMap
> 
> 7). delete segment should remove Lucene DataMap of this segment
> 
> 8). drop table should remove Lucene DataMap of all segments
> 
> 9). block streaming feature if the table has Lucene DataMap
> 
> 10). Pre-aggregate DataMap feature not support match UDF
> 
> Any suggestion, any question?
> 
> 
> 
> -
> Best Regards
> David Cai
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/