回复： [Discussion] Support Spark/Hive based partition in carbon

岑玉海 Sat, 09 Dec 2017 22:22:10 -0800

ok






Best regards!
Yuhai Cen


在2017年12月10日 11:17，Jacky Li<jacky.li...@qq.com> 写道：
Hi Yuhai Cen,

As told by Ravindra, I think we need to have two OutputFormat finally.

1. CarbonTableOutputFormat
This is needed to maintain the segment structure of carbondata, and enable all 
segment related command for the partitioned table, such as Show Segments, 
Delete Segment, etc.

2. CarbonFileOutputFormat
This will write carbondata files directly to the partition folder without the 
segment folder, and the segment related command may not work in this case. This 
OutputFormat is an incremental effort based on CarbonTableOutputFormat work. 

So now we are focusing on implementing CarbonTableOutputFormat, once it is 
done, CarbonFileOutputFormat can be added later.

Regards,
Jacky


> 在 2017年12月9日，下午1:50，岑玉海 <cenyuha...@163.com> 写道：
> 
> I still insist that if we want to make carbon a general fileformt on hadoop 
> ecosystem, we should support standard hive/spark folder structure.
> 
> we can use the folder structure like this:
>   TABLE_PATH
>       Customer=US
>                  |--Segement_0
>                           |---0-12212.carbonindex
>                           |---PART-00-12212.carbondata
>                           |---0-34343.carbonindex
>                           |---PART-00-34343.carbondata
> or
> TABLE_PATH
>   Customer=US
>        |--Part0
>             |--Fact
>                  |--Segement_0
>                           |---0-12212.carbonindex
>                           |---PART-00-12212.carbondata
>                           |---0-34343.carbonindex
>                           |---PART-00-34343.carbondata 
> 
> 
> 
> I know there will be some impact on compaction and segment management. 
> @Jacky @Ravindra @chenliang @David CaiQiang  can you estimate the impact?
>     
> 
> Best regards!
> Yuhai Cen
> 
> 在2017年12月5日 15:29，Ravindra Pesala<ravi.pes...@gmail.com> 
> <mailto:ravi.pes...@gmail.com> 写道： 
> Hi Jacky, 
> 
> Here we have the main problem with the underlying segment based design of 
> carbon. For every increment load carbon creates a segment and manages the 
> segments through the tablestatus file. The changes will be very big and 
> impact is more if we try to change this design. And also we will have a 
> problem with backward compatibility when the folder structure changes in 
> new loads. 
> 
> Regards, 
> Ravindra. 
> 
> On 5 December 2017 at 10:12, 岑玉海 <cenyuha...@163.com> wrote: 
> 
> > Hi,  Ravindra: 
> >    I read your design documents, why not use the standard hive/spark 
> > folder structure, is there any problem if use the hive/spark folder 
> > structure？ 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Best regards! 
> > Yuhai Cen 
> > 
> > 
> > 在2017年12月4日 14:09，Ravindra Pesala<ravi.pes...@gmail.com> 写道： 
> > Hi, 
> > 
> > 
> > Please find the design document for standard partition support in carbon. 
> > https://docs.google.com/document/d/1NJo_Qq4eovl7YRuT9O7yWTL0P378HnC8WT 
> > 0-6pkQ7GQ/edit?usp=sharing 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Regards, 
> > Ravindra. 
> > 
> > 
> > On 27 November 2017 at 17:36, cenyuhai11 <cenyuha...@163.com> wrote: 
> > The datasource api still have a problem that it do not support hybird 
> > fileformat table. 
> > Detail description about hybird fileformat table is in this issue: 
> > https://issues.apache.org/jira/browse/CARBONDATA-1377. 
> > 
> > All partitions' fileformat of datasource table must be the same. 
> > So we can't change fileformat to carbodata by command "alter table 
> > table_xxx 
> > set fileformat carbondata;" 
> > 
> > So I think implement TableReader is the right way. 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. 
> > n5.nabble.com/ 
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > Thanks & Regards, 
> > Ravi 
> > 
> 
> 
> 
> --  
> Thanks & Regards, 
> Ravi

回复： [Discussion] Support Spark/Hive based partition in carbon

Reply via email to