Re: [DISCUSS] Support transactional table in SDK

2018-12-19 Thread Nicholas
Hi Jacky,
 I have already reviewed the design doc in CARBONDATA-3152.What is the
current progress of supporting transactional table in SDK?In my option,
creating online segement firstly is necessary for now.



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSS] Support transactional table in SDK

2018-12-10 Thread Jacky Li



> 在 2018年12月8日,下午3:53,Liang Chen  写道:
> 
> Hi
> 
> Good idea, thank you started this discussion.
> 
> Agree with Ravi comments, we need to double-check some limitations after
> introducing the feature.
> 
> Flink and Kafka integration can be discussed later. 
> For using SDK to write new data to the existing carbondata table , some
> questions:
> 1.How to ensure to create the same index, dictionary... policy as per the
> existing table?
Likun: SDK uses the same writer provided in carbondata-core module, so it 
follows the same “policy” as you mentioned

> 2.Can you please help me to understand this proposal further : what valued
> scenarios require this feature?
Likun: currently, SDK writes carbondata files in a flat folder and lose all 
features built on top on segment concept, such as show segment, delete segment, 
compaction, datamap, MV, data update, delete, streaming, global dictionary, 
etc. 
By introducing this feature (support transactional table in SDK), application 
can use it in a non-spark environment to write new carbondata files and still 
enjoy transactional table with segment support and all previous features 
supported.

Basically, these new APIs in SDK adds a new way to write data into an existing 
carbondata table. It is for non-spark environment such as Flink, Kafka-Stream, 
Cassandra, or any other Java application.  

> 
> 
> After having online segment, one can use this feature to implement
> ApacheFlink-CarbonData integration, or Apache
> KafkaStream-CarbonDataintegration, or just using SDK to write new data to
> existing CarbonData table,the integration level can be the same as current
> Spark-CarbonDataintegration.
> 
> Regards
> Liang
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 





Re: [DISCUSS] Support transactional table in SDK

2018-12-10 Thread Jacky Li
Hi Nicholas,

Yes, this is a feature required for flink-carbon to write to transactional 
table. You are welcomed to participate in this. I think you can contribute by 
reviewing the design doc in CARBONDATA-3152 firstly, after we settle down the 
API we can open sub-tasks for this ticket. 


Regards,
Jacky

> 在 2018年12月10日,下午1:55,Nicholas  写道:
> 
> Hi Jacky,
> Carbon should support transactional table in SDK before
> ApacheFlink-Carbondata Integration.After having online segment, I can use
> this feature to implement ApacheFlink-CarbonData integration.Therefore, can
> I participate in the development of this feature,facilitating the
> integration of ApacheFlink-CarbonData integration feature?
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 





Re: [DISCUSS] Support transactional table in SDK

2018-12-10 Thread Jacky Li



> 在 2018年12月7日,下午11:05,ravipesala  写道:
> 
> Hi Jacky,
> 
> Its a good idea to support writing transactional table from SDK. But we need
> to add following limitations as well
> 1. It can work on file systems which can take append lock like HDFS. 
Likun: yes, since we need to overwrite table status file, we need file locking.

> 2. Compaction, delete segment cannot be done on online segments till it is
> converted to the transactional segment.
Likun: Compaction and other data management work will still be done by 
CarbonSession application in standard spark cluster.

> 3. SDK writer should be responsible to add complete carbondata file to
> online segment once the writing is done, it should not add any half cooked
> data.
Likun: yes, in the design doc, I have mentioned this

> 
> And also as we are trying to updating the tablestatus from other modules
> like SDK , we better consider the segment interface first. Please go through
> the jira
> https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-2827
> 
> 
> Regards,
> Ravindra
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 



Re: [DISCUSS] Support transactional table in SDK

2018-12-07 Thread Liang Chen
Hi

Good idea, thank you started this discussion.

Agree with Ravi comments, we need to double-check some limitations after
introducing the feature.

Flink and Kafka integration can be discussed later. 
For using SDK to write new data to the existing carbondata table , some
questions:
1.How to ensure to create the same index, dictionary... policy as per the
existing table?
2.Can you please help me to understand this proposal further : what valued
scenarios require this feature?


After having online segment, one can use this feature to implement
ApacheFlink-CarbonData integration, or Apache
KafkaStream-CarbonDataintegration, or just using SDK to write new data to
existing CarbonData table,the integration level can be the same as current
Spark-CarbonDataintegration.

Regards
Liang



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [DISCUSS] Support transactional table in SDK

2018-12-07 Thread ravipesala
Hi Jacky,

Its a good idea to support writing transactional table from SDK. But we need
to add following limitations as well
 1. It can work on file systems which can take append lock like HDFS. 
 2. Compaction, delete segment cannot be done on online segments till it is
converted to the transactional segment.
 3. SDK writer should be responsible to add complete carbondata file to
online segment once the writing is done, it should not add any half cooked
data.
 
And also as we are trying to updating the tablestatus from other modules
like SDK , we better consider the segment interface first. Please go through
the jira
https://issues.apache.org/jira/projects/CARBONDATA/issues/CARBONDATA-2827


Regards,
Ravindra





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


[DISCUSS] Support transactional table in SDK

2018-12-06 Thread Jacky Li


Hi All,

In order to support application integration without central coordinator like 
Flink and Kafka Stream, transaction table need to be supported in SDK, and a 
new type of segment called Online Segment is proposed. 

Since it is hard to describe the motivation and design in a good format in the 
mail, I have attached a document in CARBONDATA-3152. Please review the doc and 
provide your feedback.  

https://issues.apache.org/jira/browse/CARBONDATA-3152

Regards,
Jacky