[DISCUSS]The cluster version management

2020-10-11 Thread Houliang Qi
Hi all,


I’d like to start a discussion about the cluster version management:

As someone mentioned  that there should be a develop branch and a release 
branch[1]. The develop branch is used to submit the latest development, and the 
release branch is used for the functions that need to be released in the latest 
release.

At the same time, I hope that the development of cluster can also have two 
branches: cluster_develop branch and cluster_release branch.

The cluster_develop branch is used to merge the stand-alone version of the 
develop branch code and the latest development of the cluster.

The cluster_release branch is used to release some of the latest features. Only 
the functions that need to be released are allowed to be merged into the 
cluster_release branch, or to fix some bugs. Other newly developed
functions are not allowed to be merged into the cluster_release branch. After 
cluster_release has been fully tested, it can be released.

Regarding the latest release, I would like to check out a cluster_release 
branch after the cluster_premerge branch merges into the master (develop), and 
then the master branch merges into the cluster_new (cluster_develop)
branch.

And I think the new functions do not have beed tested or need more than one 
month to tested should be switch off when release the cluster version.

Does anyone have some ideas about this?


[1] 
https://lists.apache.org/thread.html/rf7dce8d4cfcf4001feeba139cc897d6b40a1741e06ef87aabd56d8c9%40%3Cdev.iotdb.apache.org%3E




Thanks,
---
Houliang Qi



[RESULT][VOTE]Migrate the default branch from "master" to "main"/"develop"

2020-10-11 Thread Houliang Qi
Hi all,
As the vote[1] have passed more than 72 hours, the vote results are as following

[D] Develop +7
[M] Main +1
[K] Keep master 0



Thanks to everybody who votes!


[1] 
https://lists.apache.org/thread.html/rf7dce8d4cfcf4001feeba139cc897d6b40a1741e06ef87aabd56d8c9%40%3Cdev.iotdb.apache.org%3E


Thanks,
---
Houliang Qi



Re: [VOTE] Migrate the default branch from "master" to "main"/"develop"

2020-10-11 Thread Houliang Qi
Hi Xiangdong,


Thank you for your advice and I will start a new discussion with your opinion.


Thanks,
---
Houliang Qi
On 10/12/2020 12:04,Xiangdong Huang wrote:
Hi Houliang,

Thanks for raising this up.

1. You'd better to start a new thread entitled  "[VOTE][RESULT] ." for
the vote.
2. start another thread to discuss about the cluster version management.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Houliang Qi  于2020年10月12日周一 上午11:57写道:

Hi all,
As the vote have passed more than 72 hours, the vote results are as
following


[D] Develop +7
[M] Main +1
[K] Keep master 0


And  I’d like to start a new discussion about this:


As someone mentioned  that there should be a develop branch and a release
branch. The develop branch is used to submit the latest development, and
the release branch is used for the functions that need to be released in
the latest release.


At the same time, I hope that the development of cluster can also have two
branches: cluster_develop branch and cluster_release branch.


The cluster_develop branch is used to merge the stand-alone version of the
develop branch code and the latest development of the cluster.


The cluster_release branch is used to release some of the latest features.
Only the functions that need to be released are allowed to be merged into
the cluster_release branch, or to fix some bugs. Other newly developed
functions are not allowed to be merged into the cluster_release branch.
After cluster_release has been fully tested, it can be released.


Regarding the latest release, I would like to check out a cluster_release
branch after the cluster_premerge branch merges into the master (develop),
and then the master branch merges into the cluster_new (cluster_develop)
branch.


And I think the new functions do not have beed tested or need more than
one month to tested should be switch off when release the cluster version.


Does anyone have some ideas about this?


Thanks,
---
Houliang Qi
On 09/24/2020 02:38,Kevin A. McGrail wrote:
I am +1 to rename it but don't have any good input on what would be a
good name to use going forward.  I'll defer to others on that.

On 9/22/2020 4:23 AM, Xiangdong Huang wrote:
Hi,

There is a movement to move the default branch from "master" to "main" or
"develop", because of two reasons:

- Many people around the world thought the word "master" has some 
other meaning.
- the word "master" can not clearly describe the branch's purpose. As we
use the branch as our main working/developing branch, "main" or "develop"
may be better.

We had a discussion on private@ and I think it is time to start a vote in
public.

So, I'd like to call a formal vote for changing the default branch:

- [M] main
- [D] develop
- [K] Keep "master"

The vote will last at least 72 hours.
The name who gets the most votes (and >= 3 votes) wins.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院

--
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171



Re: Share some experiment results about Gorilla encoding algorithm

2020-10-11 Thread Xiangdong Huang
Hi,

> I think we can change the name of the old Gorilla encoding to
TSEncoding.OLD_GORILLA in the code under the premise of ensuring the
compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for
the re-implemented version. This may minimize the impact on users.

I opt for this way. Old_Gorillia still can be serialized as "6". And then
we assign a new short value to the new gorilla.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Steve Su  于2020年10月11日周日 下午11:53写道:

> Hi,
>
> From my point of view, since the reimplementation of this algorithm does
> not change the structure of TsFile, there is no need to upgrade the version
> number of TsFile to 03.
>
> I think we can change the name of the old Gorilla encoding to
> TSEncoding.OLD_GORILLA in the code under the premise of ensuring the
> compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for
> the re-implemented version. This may minimize the impact on users.
>
> What do you think? :)
>
> Steve Su
>
> -- 原始邮件 --
> 发件人: "dev" ;
> 发送时间: 2020年10月10日(星期六) 晚上11:35
> 收件人: "dev";
> 主题: Re: Share some experiment results about Gorilla encoding algorithm
>
> Hi,
>
> Nice!
>
> One question. So, if we reimplement the Gorilla algorithm, how to consider
> the version compatibility?
>
> 1. Upgrade the TsFile version to 03, or
> 2. Add a new encoding name to the corrected gorilla.
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Steve Su  于2020年10月10日周六 下午10:20写道:
>
> > Hi,
> >
> > Recently, we realized that the Gorilla encoding algorithm that has been
> > used inside IoTDB may have some issues, because it will cause time series
> > data (the value part) to become more space-consuming after encoding. This
> > is not in line with expectations. Usually after using Gorilla encoding,
> the
> > data will take up less space.
> >
> > I found a very good open source Gorilla algorithm implementation by
> > Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> > compared the difference in encoding / decoding time cost and compression
> > rate between the version implemented by Michael and the version used
> > internally by IoTDB, and found that the version used inside IoTDB does
> have
> > a lot of room for improvement.
> >
> > See
> >
> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> > for more experiment details.
> >
> > I think we can refer to Michael's implementation to re-implement the
> > algorithm inside IoTDB to reduce the compression rate (fix potential
> > errors) and improve performance. I have created a JIRA (see
> > https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible,
> I
> > would be happy to re-implement the algorithm.
> >
> > Thanks,
> > Steve Su


Re: [VOTE] Migrate the default branch from "master" to "main"/"develop"

2020-10-11 Thread Xiangdong Huang
Hi Houliang,

Thanks for raising this up.

1. You'd better to start a new thread entitled  "[VOTE][RESULT] ." for
the vote.
2. start another thread to discuss about the cluster version management.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Houliang Qi  于2020年10月12日周一 上午11:57写道:

> Hi all,
> As the vote have passed more than 72 hours, the vote results are as
> following
>
>
> [D] Develop +7
> [M] Main +1
> [K] Keep master 0
>
>
> And  I’d like to start a new discussion about this:
>
>
> As someone mentioned  that there should be a develop branch and a release
> branch. The develop branch is used to submit the latest development, and
> the release branch is used for the functions that need to be released in
> the latest release.
>
>
> At the same time, I hope that the development of cluster can also have two
> branches: cluster_develop branch and cluster_release branch.
>
>
> The cluster_develop branch is used to merge the stand-alone version of the
> develop branch code and the latest development of the cluster.
>
>
> The cluster_release branch is used to release some of the latest features.
> Only the functions that need to be released are allowed to be merged into
> the cluster_release branch, or to fix some bugs. Other newly developed
> functions are not allowed to be merged into the cluster_release branch.
> After cluster_release has been fully tested, it can be released.
>
>
> Regarding the latest release, I would like to check out a cluster_release
> branch after the cluster_premerge branch merges into the master (develop),
> and then the master branch merges into the cluster_new (cluster_develop)
> branch.
>
>
> And I think the new functions do not have beed tested or need more than
> one month to tested should be switch off when release the cluster version.
>
>
> Does anyone have some ideas about this?
>
>
> Thanks,
> ---
> Houliang Qi
> On 09/24/2020 02:38,Kevin A. McGrail wrote:
> I am +1 to rename it but don't have any good input on what would be a
> good name to use going forward.  I'll defer to others on that.
>
> On 9/22/2020 4:23 AM, Xiangdong Huang wrote:
> Hi,
>
> There is a movement to move the default branch from "master" to "main" or
> "develop", because of two reasons:
>
> - Many people around the world thought the word "master" has some 
> other meaning.
> - the word "master" can not clearly describe the branch's purpose. As we
> use the branch as our main working/developing branch, "main" or "develop"
> may be better.
>
> We had a discussion on private@ and I think it is time to start a vote in
> public.
>
> So, I'd like to call a formal vote for changing the default branch:
>
> - [M] main
> - [D] develop
> - [K] Keep "master"
>
> The vote will last at least 72 hours.
> The name who gets the most votes (and >= 3 votes) wins.
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
> 黄向东
> 清华大学 软件学院
>
> --
> Kevin A. McGrail
> kmcgr...@apache.org
>
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>


Re: [VOTE] Migrate the default branch from "master" to "main"/"develop"

2020-10-11 Thread Houliang Qi
Hi all,
As the vote have passed more than 72 hours, the vote results are as following


[D] Develop +7
[M] Main +1
[K] Keep master 0


And  I’d like to start a new discussion about this:


As someone mentioned  that there should be a develop branch and a release 
branch. The develop branch is used to submit the latest development, and the 
release branch is used for the functions that need to be released in the latest 
release.


At the same time, I hope that the development of cluster can also have two 
branches: cluster_develop branch and cluster_release branch. 


The cluster_develop branch is used to merge the stand-alone version of the 
develop branch code and the latest development of the cluster.


The cluster_release branch is used to release some of the latest features. Only 
the functions that need to be released are allowed to be merged into the 
cluster_release branch, or to fix some bugs. Other newly developed functions 
are not allowed to be merged into the cluster_release branch. After 
cluster_release has been fully tested, it can be released.


Regarding the latest release, I would like to check out a cluster_release 
branch after the cluster_premerge branch merges into the master (develop), and 
then the master branch merges into the cluster_new (cluster_develop) branch. 


And I think the new functions do not have beed tested or need more than one 
month to tested should be switch off when release the cluster version.


Does anyone have some ideas about this?


Thanks,
---
Houliang Qi
On 09/24/2020 02:38,Kevin A. McGrail wrote:
I am +1 to rename it but don't have any good input on what would be a
good name to use going forward.  I'll defer to others on that.

On 9/22/2020 4:23 AM, Xiangdong Huang wrote:
Hi,

There is a movement to move the default branch from "master" to "main" or
"develop", because of two reasons:

- Many people around the world thought the word "master" has some 
other meaning.
- the word "master" can not clearly describe the branch's purpose. As we
use the branch as our main working/developing branch, "main" or "develop"
may be better.

We had a discussion on private@ and I think it is time to start a vote in
public.

So, I'd like to call a formal vote for changing the default branch:

- [M] main
- [D] develop
- [K] Keep "master"

The vote will last at least 72 hours.
The name who gets the most votes (and >= 3 votes) wins.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院

--
Kevin A. McGrail
kmcgr...@apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


Re: Draft for board report

2020-10-11 Thread Xiangdong Huang
Thanks Houliang,
you are right. I copied that project's template...
Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Houliang Qi  于2020年10月12日周一 上午9:58写道:

> Hi  Xiangdong,
> Thanks for proposing the main work of IoTDB for the last month, only minor
> issues:
> > The Apache CarbonData is an IoT native database with high performance ...
> Maybe it's:
> The Apache IoTDB is an IoT native database with high performance...
>
>
>
>
> Thanks,
> ---
> Houliang Qi
>
>
> On 10/11/2020 17:15,Xiangdong Huang wrote:
> Hi all,
>
>
> As IoTDB graduated last month, and the board requires writing reports every
> month for the first 3 months. Here is the draft. If no issues, I will
> submit it on Sep 14th.
>
>
> ## Description:
>
>
> - The Apache CarbonData is an IoT native database with high performance
>
> for data management and analysis on both the edge and the cloud.
>
>
>
> ## Issues:
>
>
> - There are no new issues requiring board attention at this time.
>
>
> ## Activity:
>
>
> As IoTDB has reported on Sep, we only summary the activities from Sep to
> Oct.
>
>
> - Apache IoTDB graduated last month (Sep 16th). Common tasks have been done
> for migrating IoTDB from the incubator to tlp, including the repository,
> the website description, the Travis-CI settings, etc..
>
>
> - We had two public talks about IoTDB, one is "Use cases and optimizations
> of IoTDB" on ApacheCon 2020, and another is "IoTDB and Hadoop: Connecting
> the edge and the cloud open source ecosystem for IIoT" on Hadoop meetup in
> Shanghai, China.
>
>
> - Two proposals are submitted and accepted for Outreachy Intern
>
>
> - We are organizing the design documents on IoTDB's cwiki space
>
>
>
> - We are working for v0.11
>
> - https://cwiki.apache.org/confluence/display/IOTDB/v0.11
>
>
> ## Health Report:
>
>
> As IoTDB has reported on Sep, we only summary the health report from Sep to
> Oct.
>
>
> - Commit activity:
>
> - 434 commits in the last month
>
>
> - GitHub PR activity:
>
> - 136 PRs opened on GitHub, last month
>
> - 108 PRs closed on GitHub, last month
>
>
>
> ## Releases:
>
>
> - 0.10.1 was released on 2020-08-23
>
>
>
> ## Project Composition:
>
>
> - There are currently 35 committers and 23 PMC members in this project.
>
> - The Committer-to-PMC ratio is roughly 3:2.
>
>
>
> ## Community changes:
>
>
> - Chao Wang was added as a committer on 2020-09-03
>
>
> ## JIRA activity:
>
>
> - 81 issues opened in JIRA, last month
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>


Re:Draft for board report

2020-10-11 Thread Houliang Qi
Hi  Xiangdong,
Thanks for proposing the main work of IoTDB for the last month, only minor 
issues:
> The Apache CarbonData is an IoT native database with high performance ...
Maybe it's:
The Apache IoTDB is an IoT native database with high performance...




Thanks,
---
Houliang Qi


On 10/11/2020 17:15,Xiangdong Huang wrote:
Hi all,


As IoTDB graduated last month, and the board requires writing reports every
month for the first 3 months. Here is the draft. If no issues, I will
submit it on Sep 14th.


## Description:


- The Apache CarbonData is an IoT native database with high performance

for data management and analysis on both the edge and the cloud.



## Issues:


- There are no new issues requiring board attention at this time.


## Activity:


As IoTDB has reported on Sep, we only summary the activities from Sep to
Oct.


- Apache IoTDB graduated last month (Sep 16th). Common tasks have been done
for migrating IoTDB from the incubator to tlp, including the repository,
the website description, the Travis-CI settings, etc..


- We had two public talks about IoTDB, one is "Use cases and optimizations
of IoTDB" on ApacheCon 2020, and another is "IoTDB and Hadoop: Connecting
the edge and the cloud open source ecosystem for IIoT" on Hadoop meetup in
Shanghai, China.


- Two proposals are submitted and accepted for Outreachy Intern


- We are organizing the design documents on IoTDB's cwiki space



- We are working for v0.11

- https://cwiki.apache.org/confluence/display/IOTDB/v0.11


## Health Report:


As IoTDB has reported on Sep, we only summary the health report from Sep to
Oct.


- Commit activity:

- 434 commits in the last month


- GitHub PR activity:

- 136 PRs opened on GitHub, last month

- 108 PRs closed on GitHub, last month



## Releases:


- 0.10.1 was released on 2020-08-23



## Project Composition:


- There are currently 35 committers and 23 PMC members in this project.

- The Committer-to-PMC ratio is roughly 3:2.



## Community changes:


- Chao Wang was added as a committer on 2020-09-03


## JIRA activity:


- 81 issues opened in JIRA, last month

Best,
---
Xiangdong Huang
School of Software, Tsinghua University


Re: Share some experiment results about Gorilla encoding algorithm

2020-10-11 Thread Steve Su
Hi,

From my point of view, since the reimplementation of this algorithm does not 
change the structure of TsFile, there is no need to upgrade the version number 
of TsFile to 03.

I think we can change the name of the old Gorilla encoding to 
TSEncoding.OLD_GORILLA in the code under the premise of ensuring the 
compatibility of the old TsFiles, and then reserve TSEncoding.GORILLA for the 
re-implemented version. This may minimize the impact on users.

What do you think? :)

Steve Su

--  --
??: "dev" ;
: 2020??10??10??(??) 11:35
??: "dev";
: Re: Share some experiment results about Gorilla encoding algorithm

Hi,

Nice!

One question. So, if we reimplement the Gorilla algorithm, how to consider
the version compatibility?

1. Upgrade the TsFile version to 03, or
2. Add a new encoding name to the corrected gorilla.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 ??
 


Steve Su  ??2020??10??10?? 10:20??

> Hi,
>
> Recently, we realized that the Gorilla encoding algorithm that has been
> used inside IoTDB may have some issues, because it will cause time series
> data (the value part) to become more space-consuming after encoding. This
> is not in line with expectations. Usually after using Gorilla encoding, the
> data will take up less space.
>
> I found a very good open source Gorilla algorithm implementation by
> Michael on Github (see https://github.com/burmanm/gorilla-tsc). I
> compared the difference in encoding / decoding time cost and compression
> rate between the version implemented by Michael and the version used
> internally by IoTDB, and found that the version used inside IoTDB does have
> a lot of room for improvement.
>
> See
> https://cwiki.apache.org/confluence/display/IOTDB/Gorilla+encoding+algorithm
> for more experiment details.
>
> I think we can refer to Michael's implementation to re-implement the
> algorithm inside IoTDB to reduce the compression rate (fix potential
> errors) and improve performance. I have created a JIRA (see
> https://issues.apache.org/jira/browse/IOTDB-938) for this. If possible, I
> would be happy to re-implement the algorithm.
>
> Thanks,
> Steve Su

Draft for board report

2020-10-11 Thread Xiangdong Huang
Hi all,


As IoTDB graduated last month, and the board requires writing reports every
month for the first 3 months. Here is the draft. If no issues, I will
submit it on Sep 14th.


## Description:


- The Apache CarbonData is an IoT native database with high performance

for data management and analysis on both the edge and the cloud.



## Issues:


 - There are no new issues requiring board attention at this time.


## Activity:


As IoTDB has reported on Sep, we only summary the activities from Sep to
Oct.


- Apache IoTDB graduated last month (Sep 16th). Common tasks have been done
for migrating IoTDB from the incubator to tlp, including the repository,
the website description, the Travis-CI settings, etc..


- We had two public talks about IoTDB, one is "Use cases and optimizations
of IoTDB" on ApacheCon 2020, and another is "IoTDB and Hadoop: Connecting
the edge and the cloud open source ecosystem for IIoT" on Hadoop meetup in
Shanghai, China.


- Two proposals are submitted and accepted for Outreachy Intern


- We are organizing the design documents on IoTDB's cwiki space



- We are working for v0.11

- https://cwiki.apache.org/confluence/display/IOTDB/v0.11


## Health Report:


As IoTDB has reported on Sep, we only summary the health report from Sep to
Oct.


- Commit activity:

  - 434 commits in the last month


- GitHub PR activity:

  - 136 PRs opened on GitHub, last month

  - 108 PRs closed on GitHub, last month



## Releases:


  - 0.10.1 was released on 2020-08-23



## Project Composition:


 - There are currently 35 committers and 23 PMC members in this project.

 - The Committer-to-PMC ratio is roughly 3:2.



## Community changes:


 - Chao Wang was added as a committer on 2020-09-03


## JIRA activity:


 - 81 issues opened in JIRA, last month

Best,
---
Xiangdong Huang
School of Software, Tsinghua University


Re: Does someone want to maintain the TsFile Golang version

2020-10-11 Thread Xiangdong Huang
Hi Giorgio,

If just IDL (Interface description language), then we still implement these
interfaces using different languages.

I do not think protobuf works. It can describe the format of TsFile, but
using protobuf to serialize data maybe not a good idea,
as we then are hard to know the meaning of each byte.

Besides, I am not sure protobuf can provide the best compression.

e.g., the first 12 bytes are magic string + version. If using protobuf, how
to constrain that?

Maybe many fields will have to be defined as bytes (define them as string
will waste space as we know the length of the string).

But maybe we can do some experiments.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Giorgio Zoppi  于2020年10月11日周日 下午4:19写道:

> Hello,
> We definitely need an IDL for the TsFile, from the IDL we can define use a
> serialization mechanism that is not language dependent.
> Can TsFile be described with protobuf for example?
> BR,
> Giorgio
>


Re: Does someone want to maintain the TsFile Golang version

2020-10-11 Thread Giorgio Zoppi
Hello,
We definitely need an IDL for the TsFile, from the IDL we can define use a
serialization mechanism that is not language dependent.
Can TsFile be described with protobuf for example?
BR,
Giorgio