Re: [VOTE] Release 0.7.0, release candidate #1

2021-01-21 Thread Balaji Varadarajan
 +1 (binding)
1. Ran release validation script successfully.2. Build successful3. Quickstart 
succeeded. 
Checking Checksum of Source Release Checksum Check of Source Release - [OK]
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current 
                                Dload  Upload   Total   Spent    Left  Speed100 
34972  100 34972    0     0  68978      0 --:--:-- --:--:-- --:--:-- 
68842Checking Signature Signature Check - [OK]
Checking for binary files in source release No Binary Files in Source Release? 
- [OK]
Checking for DISCLAIMER DISCLAIMER file exists ? [OK]
Checking for LICENSE and NOTICE License file exists ? [OK] Notice file exists ? 
[OK]
Performing custom Licensing Check Licensing Check Passed [OK]
Running RAT Check RAT Check Passed [OK]

On Thursday, January 21, 2021, 12:44:15 AM PST, Vinoth Chandar 
 wrote:  
 
 Hi everyone,

Please review and vote on the release candidate #1 for the version 0.7.0,
as follows:

[ ] +1, Approve the release

[ ] -1, Do not approve the release (please provide specific comments)



The complete staging area is available for your review, which includes:

* JIRA release notes [1],

* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 7F2A3BEB922181B06ACB1AA45F7D09E581D2BCB6 [3],

* all artifacts to be deployed to the Maven Central Repository [4],

* source code tag "release-0.7.0-rc1" [5],



The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.



Thanks,

Release Manager



[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348721


[2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.7.0-rc1/

[3] https://dist.apache.org/repos/dist/release/hudi/KEYS

[4] https://repository.apache.org/content/repositories/orgapachehudi-1027/

[5] https://github.com/apache/hudi/tree/release-0.7.0-rc1
  

Re: [Announce] Clustering feature available in beta

2021-01-21 Thread Vinoth Chandar
This is really really promising! I think the gains will be much higher if
clustered over a larger window of commits!
We can keep improving this over time.

I ll be sure to link the results to the doc updates

On Wed, Jan 20, 2021 at 10:40 PM Satish Kotha 
wrote:

> Hello everyone,
>
> We see ~60% improvement in query runtime for some datasets. See an example
> documented here
> <
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-PerformanceEvaluation
> >.
> Please try out this feature and share any feedback.
> I have included commands to run async clustering in the example section
> <
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-PerformanceEvaluation
> >.
> You could also setup inline clustering using commands in this section
> <
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-Commandstoscheduleandrunclustering
> >
> .
>
> Thanks
> Satish
>
> On Tue, Dec 22, 2020 at 10:32 PM Vinoth Chandar  wrote:
>
> > Please help us test this more, before RC is cut! :)
> >
> > On Tue, Dec 22, 2020 at 10:23 PM Satish Kotha
>  > >
> > wrote:
> >
> > > Hello all,
> > >
> > > Clustering feature landed 
> on
> > > master branch and is available in beta. This feature can be used to do
> > > following
> > > 1) Stitch small files into larger files
> > > 2) Change data layout on disk by sorting data using different columns
> > (for
> > > query/storage optimization)
> > >
> > > If you are interested in the above use cases, appreciate it if you can
> > try
> > > out this feature. I have included commands to run clustering in this
> > > section
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance#RFC19Clusteringdataforspeedandqueryperformance-Commandstoscheduleandrunclustering
> > > >
> > > (along
> > > with caveats as this feature is still in beta).
> > >
> > > Any feedback is welcome. I'm also on #general room in slack. Please
> feel
> > > free to ping me if you have any questions/comments.
> > >
> > > Thanks
> > > Satish
> > >
> >
>


Re: [VOTE] Release 0.7.0, release candidate #1

2021-01-21 Thread nishith agarwal
+1 binding

- Build Successful
- Release validation script Successful
- Quick start runs Successfully

./release/validate_staged_release.sh --release=0.7.0 --rc_num=1
/tmp/validation_scratch_dir_001 ~/hoodie-0.7/hudi/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.7.0-rc1 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 34972  100 349720 0  96076  0 --:--:-- --:--:-- --:--:--
96341
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]

Thanks,
Nishith

On Thu, Jan 21, 2021 at 8:46 PM Sivabalan  wrote:

> +1 binding
>
> - checksums and signatures [OK]
> - successfully built [OK]
> - ran quick start guide [OK]
> - Ran release validation guide [OK]
> - Verified artifacts in staging repo [OK]
> - Ran test suite job w/ inserts, upserts, deletes and validation(spark sql
> and hive). Also same job w/ metadata enabled as well [OK]
>
>
> ./release/validate_staged_release.sh --release=0.7.0 --rc_num=1
> /tmp/validation_scratch_dir_001
> ~/Documents/personal/projects/siva_hudi/hudi_070_rc1/hudi-0.7.0-rc1/scripts
> Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
> Validating hudi-0.7.0-rc1 with release type "dev"
> Checking Checksum of Source Release
> Checksum Check of Source Release - [OK]
>
>   % Total% Received % Xferd  Average Speed   TimeTime Time
>  Current
>  Dload  Upload   Total   SpentLeft
>  Speed
> 100 34972  100 349720 0   105k  0 --:--:-- --:--:-- --:--:--
>  104k
> Checking Signature
> Signature Check - [OK]
>
> Checking for binary files in source release
> No Binary Files in Source Release? - [OK]
>
> Checking for DISCLAIMER
> DISCLAIMER file exists ? [OK]
>
> Checking for LICENSE and NOTICE
> License file exists ? [OK]
> Notice file exists ? [OK]
>
> Performing custom Licensing Check
> Licensing Check Passed [OK]
>
> Running RAT Check
> RAT Check Passed [OK]
>
>
> On Thu, Jan 21, 2021 at 8:21 PM Satish Kotha  >
> wrote:
>
> > +1,
> >
> > 1) Able to build
> > 2) Integration tests pass
> > 3) Unit tests pass locally
> > 4) Successfully ran clustering on a small dataset (metadata table not
> > enabled)
> > 5) Verified insert, upsert, insert_overwrite works using QuickStart
> > commands on COW table (metadata table not enabled)
> >
> >
> >
> > On Thu, Jan 21, 2021 at 12:44 AM Vinoth Chandar 
> wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> 0.7.0,
> > > as follows:
> > >
> > > [ ] +1, Approve the release
> > >
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > >
> > > * JIRA release notes [1],
> > >
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint 7F2A3BEB922181B06ACB1AA45F7D09E581D2BCB6 [3],
> > >
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > >
> > > * source code tag "release-0.7.0-rc1" [5],
> > >
> > >
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Release Manager
> > >
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348721
> > >
> > >
> > > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.7.0-rc1/
> > >
> > > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> > >
> > > [4]
> > https://repository.apache.org/content/repositories/orgapachehudi-1027/
> > >
> > > [5] https://github.com/apache/hudi/tree/release-0.7.0-rc1
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>


Re: [VOTE] Release 0.7.0, release candidate #1

2021-01-21 Thread Sivabalan
+1 binding

- checksums and signatures [OK]
- successfully built [OK]
- ran quick start guide [OK]
- Ran release validation guide [OK]
- Verified artifacts in staging repo [OK]
- Ran test suite job w/ inserts, upserts, deletes and validation(spark sql
and hive). Also same job w/ metadata enabled as well [OK]


./release/validate_staged_release.sh --release=0.7.0 --rc_num=1
/tmp/validation_scratch_dir_001
~/Documents/personal/projects/siva_hudi/hudi_070_rc1/hudi-0.7.0-rc1/scripts
Downloading from svn co https://dist.apache.org/repos/dist//dev/hudi
Validating hudi-0.7.0-rc1 with release type "dev"
Checking Checksum of Source Release
Checksum Check of Source Release - [OK]

  % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
 Dload  Upload   Total   SpentLeft
 Speed
100 34972  100 349720 0   105k  0 --:--:-- --:--:-- --:--:--
 104k
Checking Signature
Signature Check - [OK]

Checking for binary files in source release
No Binary Files in Source Release? - [OK]

Checking for DISCLAIMER
DISCLAIMER file exists ? [OK]

Checking for LICENSE and NOTICE
License file exists ? [OK]
Notice file exists ? [OK]

Performing custom Licensing Check
Licensing Check Passed [OK]

Running RAT Check
RAT Check Passed [OK]


On Thu, Jan 21, 2021 at 8:21 PM Satish Kotha 
wrote:

> +1,
>
> 1) Able to build
> 2) Integration tests pass
> 3) Unit tests pass locally
> 4) Successfully ran clustering on a small dataset (metadata table not
> enabled)
> 5) Verified insert, upsert, insert_overwrite works using QuickStart
> commands on COW table (metadata table not enabled)
>
>
>
> On Thu, Jan 21, 2021 at 12:44 AM Vinoth Chandar  wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version 0.7.0,
> > as follows:
> >
> > [ ] +1, Approve the release
> >
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> >
> > The complete staging area is available for your review, which includes:
> >
> > * JIRA release notes [1],
> >
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 7F2A3BEB922181B06ACB1AA45F7D09E581D2BCB6 [3],
> >
> > * all artifacts to be deployed to the Maven Central Repository [4],
> >
> > * source code tag "release-0.7.0-rc1" [5],
> >
> >
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> >
> >
> > Thanks,
> >
> > Release Manager
> >
> >
> >
> > [1]
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348721
> >
> >
> > [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.7.0-rc1/
> >
> > [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
> >
> > [4]
> https://repository.apache.org/content/repositories/orgapachehudi-1027/
> >
> > [5] https://github.com/apache/hudi/tree/release-0.7.0-rc1
> >
>


-- 
Regards,
-Sivabalan


Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread pzwpzw

Thank vino yang! I have move the doc to RFC-25. We can continue the discussion 
there.

2021年1月22日 上午9:27,vino yang  写道:


Hi zhiwei,

Done! Now, you should have cwiki permission.

Best,
Vino

pzwpzw  于2021年1月22日周五 上午12:06写道:


That is great! Can you give me the permission to the cwiki? My cwiki id
is: zhiwei .
I will move it to there and continue the disscussion.


2021年1月21日 下午11:19,Gary Li  写道:


Hi pengzhiwei,


Thanks for the proposal. That’s a great feature. Can we move the design
doc to cwiki page as a new RFC? We can continue the discussion from there.


Thanks,


Best Regards,
Gary Li




From: pzwpzw 
Reply-To: "dev@hudi.apache.org" 
Date: Wednesday, January 20, 2021 at 11:52 PM
To: "dev@hudi.apache.org" 
Cc: "dev@hudi.apache.org" 
Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite


Hi, we have implemented the spark sql extension for hudi in our Internal
version. Here is the main implementation, including the extension sql
syntax and implementation scheme on spark. I am waiting for your feedback.
Any comments are welcome~




https://docs.google.com/document/d/1KC6Rae67CUaCUpKoIAkM6OTAGuOWFPD9qtNfqchAl1o/edit#heading=h.oeoy1y14sifu




2020年12月23日 上午12:30,Vinoth Chandar  写道:
Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think.
love to have you involved.


On Tue, Dec 22, 2020 at 3:20 AM pzwpzw 
wrote:




Yes, it looks good .
We are building the spark sql extensions to support for hudi in
our internal version.
I am interested in participating in the extension of SparkSQL on hudi.
2020年12月22日 下午4:30,Vinoth Chandar  写道:


Hi,


I think what we are landing on finally is.


- Keep pushing for SparkSQL support using Spark extensions route
- Calcite effort will be a separate/orthogonal approach, down the line


Please feel free to correct me, if I got this wrong.


On Mon, Dec 21, 2020 at 3:30 AM pzwpzw 
wrote:


Hi 受春柏 ,here is my point. We can use Calcite to build a common sql layer


to process engine independent SQL, for example most of the DDL、Hoodie CLI


command and also provide parser for the common SQL extensions(e.g. Merge


Into). The Engine-related syntax can be taught to the respective engines to


process. If the common sql layer can handle the input sql, it handle


it.Otherwise it is routed to the engine for processing. In long term, the


common layer will more and more rich and perfect.


2020年12月21日 下午4:38,受春柏  写道:




Hi,all






That's very good,Hudi SQL syntax can support Flink、hive and other analysis


components at the same time,


But there are some questions about SparkSQL. SparkSQL syntax is in


conflict with Calctite syntax.Is our strategy


user migration or syntax compatibility?


In addition ,will it also support write SQL?










































在 2020-12-19 02:10:16,"Nishith"  写道:




That’s awesome. Looks like we have a consensus on Calcite. Look forward to


the RFC as well!






-Nishith






On Dec 18, 2020, at 9:03 AM, Vinoth Chandar  wrote:






Sounds good. Look forward to a RFC/DISCUSS thread.






Thanks




Vinoth






On Thu, Dec 17, 2020 at 6:04 PM Danny Chan  wrote:






Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would




add support for SQL connectors of Hoodie Flink soon ~




Currently, i'm preparing a refactoring to the current Flink writer code.






Vinoth Chandar  于2020年12月18日周五 上午6:39写道:






Thanks Kabeer for the note on gmail. Did not realize that. :)






My desired use case is user use the Hoodie CLI to execute these SQLs.




They can choose what engine to use by a CLI config option.






Yes, that is also another attractive aspect of this route. We can build




out




a common SQL layer and have this translate to the underlying engine




(sounds




like Hive huh)




Longer term, if we really think we can more easily implement a full DML +




DDL + DQL, we can proceed with this.






As others pointed out, for Spark SQL, it might be good to try the Spark




extensions route, before we take this on more fully.






The other part where Calcite is great is, all the support for




windowing/streaming in its syntax.




Danny, I guess if we should be able to leverage that through a deeper




Flink/Hudi integration?








On Thu, Dec 17, 2020 at 1:07 PM Vinoth Chandar 




wrote:






I think Dongwook is investigating on the same lines. and it does seem




better to pursue this first, before trying other approaches.










On Tue, Dec 15, 2020 at 1:38 AM pzwpzw 




wrote:






Yeah I agree with Nishith that an option way is to look at the




ways




to




plug in custom logical and physical plans in Spark. It can simplify




the




implementation and reuse the Spark SQL syntax. And also users




familiar




with




Spark SQL will be able to use HUDi's SQL features more quickly.




In fact, spark have provided the SparkSessionExtensions interface for




implement custom syntax extensions and SQL rewrite rule.















Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread vino yang
Hi zhiwei,

Done! Now, you should have cwiki permission.

Best,
Vino

pzwpzw  于2021年1月22日周五 上午12:06写道:

> That is great!  Can you give me the permission to the cwiki? My cwiki id
> is: zhiwei .
> I will move it to there and continue the disscussion.
>
> 2021年1月21日 下午11:19,Gary Li  写道:
>
> Hi pengzhiwei,
>
> Thanks for the proposal. That’s a great feature. Can we move the design
> doc to cwiki page as a new RFC? We can continue the discussion from there.
>
> Thanks,
>
> Best Regards,
> Gary Li
>
>
> From: pzwpzw 
> Reply-To: "dev@hudi.apache.org" 
> Date: Wednesday, January 20, 2021 at 11:52 PM
> To: "dev@hudi.apache.org" 
> Cc: "dev@hudi.apache.org" 
> Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite
>
> Hi, we have implemented the spark sql extension for hudi in our Internal
> version. Here is the main implementation, including the extension sql
> syntax and implementation scheme on spark. I am waiting for your feedback.
> Any comments are welcome~
>
>
> https://docs.google.com/document/d/1KC6Rae67CUaCUpKoIAkM6OTAGuOWFPD9qtNfqchAl1o/edit#heading=h.oeoy1y14sifu
>
>
> 2020年12月23日 上午12:30,Vinoth Chandar  写道:
> Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think.
> love to have you involved.
>
> On Tue, Dec 22, 2020 at 3:20 AM pzwpzw 
> wrote:
>
>
> Yes, it looks good .
> We are building the spark sql extensions to support for hudi in
> our internal version.
> I am interested in participating in the extension of SparkSQL on hudi.
> 2020年12月22日 下午4:30,Vinoth Chandar  写道:
>
> Hi,
>
> I think what we are landing on finally is.
>
> - Keep pushing for SparkSQL support using Spark extensions route
> - Calcite effort will be a separate/orthogonal approach, down the line
>
> Please feel free to correct me, if I got this wrong.
>
> On Mon, Dec 21, 2020 at 3:30 AM pzwpzw 
> wrote:
>
> Hi 受春柏 ,here is my point. We can use Calcite to build a common sql layer
>
> to process engine independent SQL, for example most of the DDL、Hoodie CLI
>
> command and also provide parser for the common SQL extensions(e.g. Merge
>
> Into). The Engine-related syntax can be taught to the respective engines to
>
> process. If the common sql layer can handle the input sql, it handle
>
> it.Otherwise it is routed to the engine for processing. In long term, the
>
> common layer will more and more rich and perfect.
>
> 2020年12月21日 下午4:38,受春柏  写道:
>
>
> Hi,all
>
>
>
> That's very good,Hudi SQL syntax can support Flink、hive and other analysis
>
> components at the same time,
>
> But there are some questions about SparkSQL. SparkSQL syntax is in
>
> conflict with Calctite syntax.Is our strategy
>
> user migration or syntax compatibility?
>
> In addition ,will it also support write SQL?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 在 2020-12-19 02:10:16,"Nishith"  写道:
>
>
> That’s awesome. Looks like we have a consensus on Calcite. Look forward to
>
> the RFC as well!
>
>
>
> -Nishith
>
>
>
> On Dec 18, 2020, at 9:03 AM, Vinoth Chandar  wrote:
>
>
>
> Sounds good. Look forward to a RFC/DISCUSS thread.
>
>
>
> Thanks
>
>
> Vinoth
>
>
>
> On Thu, Dec 17, 2020 at 6:04 PM Danny Chan  wrote:
>
>
>
> Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would
>
>
> add support for SQL connectors of Hoodie Flink soon ~
>
>
> Currently, i'm preparing a refactoring to the current Flink writer code.
>
>
>
> Vinoth Chandar  于2020年12月18日周五 上午6:39写道:
>
>
>
> Thanks Kabeer for the note on gmail. Did not realize that. :)
>
>
>
> My desired use case is user use the Hoodie CLI to execute these SQLs.
>
>
> They can choose what engine to use by a CLI config option.
>
>
>
> Yes, that is also another attractive aspect of this route. We can build
>
>
> out
>
>
> a common SQL layer and have this translate to the underlying engine
>
>
> (sounds
>
>
> like Hive huh)
>
>
> Longer term, if we really think we can more easily implement a full DML +
>
>
> DDL + DQL, we can proceed with this.
>
>
>
> As others pointed out, for Spark SQL, it might be good to try the Spark
>
>
> extensions route, before we take this on more fully.
>
>
>
> The other part where Calcite is great is, all the support for
>
>
> windowing/streaming in its syntax.
>
>
> Danny, I guess if we should be able to leverage that through a deeper
>
>
> Flink/Hudi integration?
>
>
>
>
> On Thu, Dec 17, 2020 at 1:07 PM Vinoth Chandar 
>
>
> wrote:
>
>
>
> I think Dongwook is investigating on the same lines. and it does seem
>
>
> better to pursue this first, before trying other approaches.
>
>
>
>
>
> On Tue, Dec 15, 2020 at 1:38 AM pzwpzw 
>
> .invalid>
>
>
> wrote:
>
>
>
> Yeah I agree with Nishith that an option way is to look at the
>
>
> ways
>
>
> to
>
>
> plug in custom logical and physical plans in Spark. It can simplify
>
>
> the
>
>
> implementation and reuse the Spark SQL syntax. And also users
>
>
> familiar
>
>
> with
>
>
> Spark SQL will be able to use HUDi's SQL features more quickly.
>
>
> In fact, spark have provided the SparkSessionExtensions int

Re: [VOTE] Release 0.7.0, release candidate #1

2021-01-21 Thread Satish Kotha
+1,

1) Able to build
2) Integration tests pass
3) Unit tests pass locally
4) Successfully ran clustering on a small dataset (metadata table not
enabled)
5) Verified insert, upsert, insert_overwrite works using QuickStart
commands on COW table (metadata table not enabled)



On Thu, Jan 21, 2021 at 12:44 AM Vinoth Chandar  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 0.7.0,
> as follows:
>
> [ ] +1, Approve the release
>
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
>
> The complete staging area is available for your review, which includes:
>
> * JIRA release notes [1],
>
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 7F2A3BEB922181B06ACB1AA45F7D09E581D2BCB6 [3],
>
> * all artifacts to be deployed to the Maven Central Repository [4],
>
> * source code tag "release-0.7.0-rc1" [5],
>
>
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
>
>
> Thanks,
>
> Release Manager
>
>
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348721
>
>
> [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.7.0-rc1/
>
> [3] https://dist.apache.org/repos/dist/release/hudi/KEYS
>
> [4] https://repository.apache.org/content/repositories/orgapachehudi-1027/
>
> [5] https://github.com/apache/hudi/tree/release-0.7.0-rc1
>


Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread pzwpzw

That is great!  Can you give me the permission to the cwiki? My cwiki id is: 
zhiwei .  
I will move it to there and continue the disscussion.

2021年1月21日 下午11:19,Gary Li  写道:


Hi pengzhiwei,

Thanks for the proposal. That’s a great feature. Can we move the design doc to 
cwiki page as a new RFC? We can continue the discussion from there.

Thanks,

Best Regards,
Gary Li


From: pzwpzw 
Reply-To: "dev@hudi.apache.org" 
Date: Wednesday, January 20, 2021 at 11:52 PM
To: "dev@hudi.apache.org" 
Cc: "dev@hudi.apache.org" 
Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

Hi, we have implemented the spark sql extension for hudi in our Internal 
version. Here is the main implementation, including the extension sql syntax 
and implementation scheme on spark. I am waiting for your feedback. Any 
comments are welcome~

https://docs.google.com/document/d/1KC6Rae67CUaCUpKoIAkM6OTAGuOWFPD9qtNfqchAl1o/edit#heading=h.oeoy1y14sifu


2020年12月23日 上午12:30,Vinoth Chandar  写道:
Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think.
love to have you involved.

On Tue, Dec 22, 2020 at 3:20 AM pzwpzw 
wrote:


Yes, it looks good .
We are building the spark sql extensions to support for hudi in
our internal version.
I am interested in participating in the extension of SparkSQL on hudi.
2020年12月22日 下午4:30,Vinoth Chandar  写道:

Hi,

I think what we are landing on finally is.

- Keep pushing for SparkSQL support using Spark extensions route
- Calcite effort will be a separate/orthogonal approach, down the line

Please feel free to correct me, if I got this wrong.

On Mon, Dec 21, 2020 at 3:30 AM pzwpzw 
wrote:

Hi 受春柏 ,here is my point. We can use Calcite to build a common sql layer

to process engine independent SQL, for example most of the DDL、Hoodie CLI

command and also provide parser for the common SQL extensions(e.g. Merge

Into). The Engine-related syntax can be taught to the respective engines to

process. If the common sql layer can handle the input sql, it handle

it.Otherwise it is routed to the engine for processing. In long term, the

common layer will more and more rich and perfect.

2020年12月21日 下午4:38,受春柏  写道:


Hi,all



That's very good,Hudi SQL syntax can support Flink、hive and other analysis

components at the same time,

But there are some questions about SparkSQL. SparkSQL syntax is in

conflict with Calctite syntax.Is our strategy

user migration or syntax compatibility?

In addition ,will it also support write SQL?





















在 2020-12-19 02:10:16,"Nishith"  写道:


That’s awesome. Looks like we have a consensus on Calcite. Look forward to

the RFC as well!



-Nishith



On Dec 18, 2020, at 9:03 AM, Vinoth Chandar  wrote:



Sounds good. Look forward to a RFC/DISCUSS thread.



Thanks


Vinoth



On Thu, Dec 17, 2020 at 6:04 PM Danny Chan  wrote:



Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would


add support for SQL connectors of Hoodie Flink soon ~


Currently, i'm preparing a refactoring to the current Flink writer code.



Vinoth Chandar  于2020年12月18日周五 上午6:39写道:



Thanks Kabeer for the note on gmail. Did not realize that. :)



My desired use case is user use the Hoodie CLI to execute these SQLs.


They can choose what engine to use by a CLI config option.



Yes, that is also another attractive aspect of this route. We can build


out


a common SQL layer and have this translate to the underlying engine


(sounds


like Hive huh)


Longer term, if we really think we can more easily implement a full DML +


DDL + DQL, we can proceed with this.



As others pointed out, for Spark SQL, it might be good to try the Spark


extensions route, before we take this on more fully.



The other part where Calcite is great is, all the support for


windowing/streaming in its syntax.


Danny, I guess if we should be able to leverage that through a deeper


Flink/Hudi integration?




On Thu, Dec 17, 2020 at 1:07 PM Vinoth Chandar 


wrote:



I think Dongwook is investigating on the same lines. and it does seem


better to pursue this first, before trying other approaches.





On Tue, Dec 15, 2020 at 1:38 AM pzwpzw 


wrote:



Yeah I agree with Nishith that an option way is to look at the


ways


to


plug in custom logical and physical plans in Spark. It can simplify


the


implementation and reuse the Spark SQL syntax. And also users


familiar


with


Spark SQL will be able to use HUDi's SQL features more quickly.


In fact, spark have provided the SparkSessionExtensions interface for


implement custom syntax extensions and SQL rewrite rule.







https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/sql/SparkSessionExtensions.html

Re:  Reply:Re: [DISCUSS] SQL Support using Apache Calcite

2021-01-21 Thread Gary Li
Hi pengzhiwei,

Thanks for the proposal. That’s a great feature. Can we move the design doc to 
cwiki page as a new RFC? We can continue the discussion from there.

Thanks,

Best Regards,
Gary Li


From: pzwpzw 
Reply-To: "dev@hudi.apache.org" 
Date: Wednesday, January 20, 2021 at 11:52 PM
To: "dev@hudi.apache.org" 
Cc: "dev@hudi.apache.org" 
Subject: Re: Reply:Re: [DISCUSS] SQL Support using Apache Calcite

Hi, we have implemented the spark sql extension for hudi in our Internal 
version. Here is the main implementation, including the extension sql syntax 
and implementation scheme  on spark. I am waiting for your feedback. Any 
comments are welcome~

https://docs.google.com/document/d/1KC6Rae67CUaCUpKoIAkM6OTAGuOWFPD9qtNfqchAl1o/edit#heading=h.oeoy1y14sifu


2020年12月23日 上午12:30,Vinoth Chandar  写道:
Sounds great. There will be a RFC/DISCUSS thread once 0.7.0 is out I think.
love to have you involved.

On Tue, Dec 22, 2020 at 3:20 AM pzwpzw 
wrote:


Yes, it looks good .
We are building the spark sql extensions to support for hudi in
our internal version.
I am interested in participating in the extension of SparkSQL on hudi.
2020年12月22日 下午4:30,Vinoth Chandar  写道:

Hi,

I think what we are landing on finally is.

- Keep pushing for SparkSQL support using Spark extensions route
- Calcite effort will be a separate/orthogonal approach, down the line

Please feel free to correct me, if I got this wrong.

On Mon, Dec 21, 2020 at 3:30 AM pzwpzw 
wrote:

Hi 受春柏 ,here is my point. We can use Calcite to build a common sql layer

to process engine independent SQL, for example most of the DDL、Hoodie CLI

command and also provide parser for the common SQL extensions(e.g. Merge

Into). The Engine-related syntax can be taught to the respective engines to

process. If the common sql layer can handle the input sql, it handle

it.Otherwise it is routed to the engine for processing. In long term, the

common layer will more and more rich and perfect.

2020年12月21日 下午4:38,受春柏  写道:


Hi,all



That's very good,Hudi SQL syntax can support Flink、hive and other analysis

components at the same time,

But there are some questions about SparkSQL. SparkSQL syntax is in

conflict with Calctite syntax.Is our strategy

user migration or syntax compatibility?

In addition ,will it also support write SQL?





















在 2020-12-19 02:10:16,"Nishith"  写道:


That’s awesome. Looks like we have a consensus on Calcite. Look forward to

the RFC as well!



-Nishith



On Dec 18, 2020, at 9:03 AM, Vinoth Chandar  wrote:



Sounds good. Look forward to a RFC/DISCUSS thread.



Thanks


Vinoth



On Thu, Dec 17, 2020 at 6:04 PM Danny Chan  wrote:



Yes, Apache Flink basically reuse the DQL syntax of Apache Calcite, i would


add support for SQL connectors of Hoodie Flink soon ~


Currently, i'm preparing a refactoring to the current Flink writer code.



Vinoth Chandar  于2020年12月18日周五 上午6:39写道:



Thanks Kabeer for the note on gmail. Did not realize that. :)



My desired use case is user use the Hoodie CLI to execute these SQLs.


They can choose what engine to use by a CLI config option.



Yes, that is also another attractive aspect of this route. We can build


out


a common SQL layer and have this translate to the underlying engine


(sounds


like Hive huh)


Longer term, if we really think we can more easily implement a full DML +


DDL + DQL, we can proceed with this.



As others pointed out, for Spark SQL, it might be good to try the Spark


extensions route, before we take this on more fully.



The other part where Calcite is great is, all the support for


windowing/streaming in its syntax.


Danny, I guess if we should be able to leverage that through a deeper


Flink/Hudi integration?




On Thu, Dec 17, 2020 at 1:07 PM Vinoth Chandar 


wrote:



I think Dongwook is investigating on the same lines. and it does seem


better to pursue this first, before trying other approaches.





On Tue, Dec 15, 2020 at 1:38 AM pzwpzw 


wrote:



Yeah I agree with Nishith that an option way is to look at the


ways


to


plug in custom logical and physical plans in Spark. It can simplify


the


implementation and reuse the Spark SQL syntax. And also users


familiar


with


Spark SQL will be able to use HUDi's SQL features more quickly.


In fact, spark have provided the SparkSessionExtensions interface for


implement custom syntax extensions and SQL rewrite rule.







https://spark.apache.org/docs/2.4.5/api/java/org/apache/spark/sql/SparkSessionExtensions.html


.


We can use the SparkS

[VOTE] Release 0.7.0, release candidate #1

2021-01-21 Thread Vinoth Chandar
Hi everyone,

Please review and vote on the release candidate #1 for the version 0.7.0,
as follows:

[ ] +1, Approve the release

[ ] -1, Do not approve the release (please provide specific comments)



The complete staging area is available for your review, which includes:

* JIRA release notes [1],

* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 7F2A3BEB922181B06ACB1AA45F7D09E581D2BCB6 [3],

* all artifacts to be deployed to the Maven Central Repository [4],

* source code tag "release-0.7.0-rc1" [5],



The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.



Thanks,

Release Manager



[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348721


[2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.7.0-rc1/

[3] https://dist.apache.org/repos/dist/release/hudi/KEYS

[4] https://repository.apache.org/content/repositories/orgapachehudi-1027/

[5] https://github.com/apache/hudi/tree/release-0.7.0-rc1