[DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-22 Thread Sivabalan
As Aapche Hudi is getting widely adopted, performance has become the need
of the hour. This RFC focusses on improving performance of the Hudi index
by introducing record level index. The proposal is to implement a new index
format that is a mapping of (recordKey <-> partition, fileId) or
((recordKey, partitionPath) → fileId). This mapping will be stored and
maintained by Hudi as another implementation of HoodieIndex. This record
level indexing will definitely give a boost to both read and write
performance.

Here

is the link to RFC.

Appreciate your review and thoughts.

-- 
Regards,
-Sivabalan


Re:Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-22 Thread lamberken


Right, will do.


Thanks,
Lamber-Ken

At 2020-02-22 22:35:13, "vbal...@apache.org"  wrote:
> 
>+1 on ensuring all scripts in Hudi codebase follow same convention for 
>licensing.
>Balaji.VOn Saturday, February 22, 2020, 06:16:29 AM PST, Suneel Marthi 
> wrote:  
> 
> Please go ahead and make the change @lamberken
>
>I was just looking at scripts from Hive and Kafka projects, see below.
>
>https://github.com/apache/hive/blob/master/bin/init-hive-dfs.sh
>https://github.com/apache/hive/blob/master/bin/hive-config.sh
>
>https://github.com/apache/kafka/blob/trunk/bin/connect-distributed.sh
>https://github.com/apache/kafka/blob/trunk/bin/kafka-leader-election.sh
>
>I suggest to fix all the script files to be consistent with apache license
>guide.
>
>
>
>On Sat, Feb 22, 2020 at 8:53 AM lamberken  wrote:
>
>>
>>
>> Hi all,
>>
>>
>> During the voting process on rc1 0.5.1-incubating release, Justin pointed
>> out
>> docker/hoodie/hadoop/base/entrypoint.sh has an incorrect license header,
>> But, many script files used the same license header like "entrypoint.sh"
>> has.
>>
>>
>> From apache license guide[2], it says "The text should be enclosed in the
>> appropriate comment syntax for the file format."
>> So, need to remove the repeated "#", like following changes?
>>
>>
>>
>> 
>> #  Licensed to the Apache Software Foundation (ASF) under one
>> #  or more contributor license agreements.  See the NOTICE file
>> #  distributed with this work for additional information
>> #  regarding copyright ownership.  The ASF licenses this file
>> #  to you under the Apache License, Version 2.0 (the
>> #  "License"); you may not use this file except in compliance
>> #  with the License.  You may obtain a copy of the License at
>> #
>> #  http://www.apache.org/licenses/LICENSE-2.0
>> #
>> #  Unless required by applicable law or agreed to in writing, software
>> #  distributed under the License is distributed on an "AS IS" BASIS,
>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> #  See the License for the specific language governing permissions and
>> # limitations under the License.
>>
>> 
>>
>>
>> #
>> #  Licensed to the Apache Software Foundation (ASF) under one
>> #  or more contributor license agreements.  See the NOTICE file
>> #  distributed with this work for additional information
>> #  regarding copyright ownership.  The ASF licenses this file
>> #  to you under the Apache License, Version 2.0 (the
>> #  "License"); you may not use this file except in compliance
>> #  with the License.  You may obtain a copy of the License at
>> #
>> #  http://www.apache.org/licenses/LICENSE-2.0
>> #
>> #  Unless required by applicable law or agreed to in writing, software
>> #  distributed under the License is distributed on an "AS IS" BASIS,
>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> #  See the License for the specific language governing permissions and
>> # limitations under the License.
>> #
>>
>>
>> Any thought are welcome, thanks.
>>
>>
>> Thanks,
>> Lamber-Ken
>>
>>
>> [1]
>> https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E
>> [2] https://www.apache.org/licenses/LICENSE-2.0
>>
>>
>  


Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-22 Thread vbal...@apache.org
 
+1 on ensuring all scripts in Hudi codebase follow same convention for 
licensing.
Balaji.VOn Saturday, February 22, 2020, 06:16:29 AM PST, Suneel Marthi 
 wrote:  
 
 Please go ahead and make the change @lamberken

I was just looking at scripts from Hive and Kafka projects, see below.

https://github.com/apache/hive/blob/master/bin/init-hive-dfs.sh
https://github.com/apache/hive/blob/master/bin/hive-config.sh

https://github.com/apache/kafka/blob/trunk/bin/connect-distributed.sh
https://github.com/apache/kafka/blob/trunk/bin/kafka-leader-election.sh

I suggest to fix all the script files to be consistent with apache license
guide.



On Sat, Feb 22, 2020 at 8:53 AM lamberken  wrote:

>
>
> Hi all,
>
>
> During the voting process on rc1 0.5.1-incubating release, Justin pointed
> out
> docker/hoodie/hadoop/base/entrypoint.sh has an incorrect license header,
> But, many script files used the same license header like "entrypoint.sh"
> has.
>
>
> From apache license guide[2], it says "The text should be enclosed in the
> appropriate comment syntax for the file format."
> So, need to remove the repeated "#", like following changes?
>
>
>
> 
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #      http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
>
> 
>
>
> #
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #      http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
>
> Any thought are welcome, thanks.
>
>
> Thanks,
> Lamber-Ken
>
>
> [1]
> https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E
> [2] https://www.apache.org/licenses/LICENSE-2.0
>
>
  

Re: [DISCUSS] How to correct the license header of entrypoint.sh script

2020-02-22 Thread Suneel Marthi
Please go ahead and make the change @lamberken

I was just looking at scripts from Hive and Kafka projects, see below.

https://github.com/apache/hive/blob/master/bin/init-hive-dfs.sh
https://github.com/apache/hive/blob/master/bin/hive-config.sh

https://github.com/apache/kafka/blob/trunk/bin/connect-distributed.sh
https://github.com/apache/kafka/blob/trunk/bin/kafka-leader-election.sh

I suggest to fix all the script files to be consistent with apache license
guide.



On Sat, Feb 22, 2020 at 8:53 AM lamberken  wrote:

>
>
> Hi all,
>
>
> During the voting process on rc1 0.5.1-incubating release, Justin pointed
> out
> docker/hoodie/hadoop/base/entrypoint.sh has an incorrect license header,
> But, many script files used the same license header like "entrypoint.sh"
> has.
>
>
> From apache license guide[2], it says "The text should be enclosed in the
> appropriate comment syntax for the file format."
> So, need to remove the repeated "#", like following changes?
>
>
>
> 
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #  http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
>
> 
>
>
> #
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #  http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
>
> Any thought are welcome, thanks.
>
>
> Thanks,
> Lamber-Ken
>
>
> [1]
> https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E
> [2] https://www.apache.org/licenses/LICENSE-2.0
>
>