[jira] [Created] (HIVE-25571) Fix Metastore script for Oracle Database

2021-09-28 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-25571:
---

 Summary: Fix Metastore script for Oracle Database
 Key: HIVE-25571
 URL: https://issues.apache.org/jira/browse/HIVE-25571
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Error:1
{noformat}
354/359      CREATE UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS 
(AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);
Error: ORA-00955: name is already used by an existing object 
(state=42000,code=955)
Aborting command set because "force" is false and command failed: "CREATE 
UNIQUE INDEX DBPRIVILEGEINDEX ON DC_PRIVS 
(AUTHORIZER,NAME,PRINCIPAL_NAME,PRINCIPAL_TYPE,DC_PRIV,GRANTOR,GRANTOR_TYPE);"
[ERROR] 2021-09-29 09:18:59.075 [main] MetastoreSchemaTool - Schema 
initialization FAILED! Metastore state would be inconsistent!
Schema initialization FAILED! Metastore state would be inconsistent!{noformat}
Error:2
{noformat}
Error: ORA-00900: invalid SQL statement (state=42000,code=900)
Aborting command set because "force" is false and command failed: "===
-- HIVE-24396
-- Create DataCo{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25570) Hive should send full URL path for authorization for the command insert overwrite location

2021-09-28 Thread Sai Hemanth Gantasala (Jira)
Sai Hemanth Gantasala created HIVE-25570:


 Summary: Hive should send full URL path for authorization for the 
command insert overwrite location
 Key: HIVE-25570
 URL: https://issues.apache.org/jira/browse/HIVE-25570
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HiveServer2
Affects Versions: 4.0.0
Reporter: Sai Hemanth Gantasala
Assignee: Sai Hemanth Gantasala


For authorization, Hive is currently sending the path given as input from the 
user for the command, for eg
{code:java}
insert overwrite directory 
'/user/warehouse/tablespace/external/something/new/test_new_tb1' select * from 
test_tb1;
{code}
Hive is sending the path as 
'/user/warehouse/tablespace/external/something/new/test_new_tb1' 
Instead, Hive should send a fully qualified path for authorization,  for e.g: 
'hdfs://hostname:port_name/user/warehouse/tablespace/external/something/new/test_new_tb1'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Commit message guidelines

2021-09-28 Thread Peter Vary
Hi Stamatis,

I am generally positive for every standardization process. My experience 
thought me that a standardized code is usually better in the long run than any 
super optimized solution where we specialize every codepath for the locally 
best optimum.

First question:
- Is it possible to check/enforce these guidelines somehow?

If we have some tool to enforce this, that would be super helpful.

I agree with most of your points.

There are a few question marks, but I can live with all of the proposed 
requirements if that is the consensus:
- I think having contributor / reviewer / PR number in the commit message is 
helpful (at least for me). I use the `git log --pretty=online|grep` quite 
often, and one usage is when I would like to collect the PR's which were 
authored or reviewed by a contributor
- I am not sure about the 72 character line length - the PRs are most often 
viewed by tools which do the wrapping based on the available screen size. 
Wouldn't we want to relay on that?

Thanks,
Peter

> On Sep 28, 2021, at 11:25, Alessandro Solimando 
>  wrote:
> 
> Hi Stamatis,
> thanks for the suggestions, I think they are reasonable, the project would
> benefit from their adoption.
> 
> Regarding the removal of contributor/reviewers names from the commit
> message favoring the use of git metadata, there has been a similar discussion
> in Calcite ML
> 
> which
> led to consensus.
> 
> Best regards,
> Alessandro
> 
> On Fri, 24 Sept 2021 at 16:40, Stamatis Zampetakis 
> wrote:
> 
>> Hi all,
>> 
>> I think we all more or less follow some standard pattern when committing in
>> Hive but with some small effort we could make things more uniform and
>> hopefully better.
>> I would like to start a discussion about creating some guidelines, which we
>> could put to the wiki or in contributing.md, to improve the quality of our
>> history (git log).
>> I outline some suggestions below to kick off the discussion. Many things in
>> the list are minor (and maybe even personal preferences) but one thing
>> which is really missing from the project is B3 especially the *why* part.
>> Why is the commit necessary? Why has the change been made?.
>> In some cases the why part is also missing from the JIRA making the code
>> harder to maintain.
>> 
>> Subject line:
>> S1. Start with the Jira id capitalized and followed immediately (no space)
>> by double colon (:)
>> S2. Leave one space after the Jira id, and start the summary with a capital
>> letter
>> S3. Keep it concise (ideally less than 72 characters) and provide a useful
>> description of the change
>> S4. Do not include or end the line with period
>> S5. Do not include the pull request id in the summary
>> S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added
>> a handler …”) or present tense (“Adds a handler …”)
>> S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to
>> describe the bug (“NullPointerException if user is unknown”) and people
>> will correctly surmise that the purpose of your change is to fix the bug.
>> S8. Do not add a contributor's name; the author tag is made exactly for
>> this and can be explored/parsed much more efficiently by tools/people for
>> stats or other purposes
>> S9. Do not add reviewers name; information is present in multiple places
>> (e.g., committer tag, PR, JIRA)
>> 
>> Message body: (Trivial changes may not require a body)
>> B1. Separate subject from body with a blank line
>> B2. Wrap the body at 72 characters
>> B3. Use the body to explain what and why vs. how
>> Example
>> "Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and
>> Converter to avoid executing the fallback method which in many cases
>> returns null and can cause NPE when this value propagates up the call
>> stack."
>> vs.
>> "Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
>> and Converter"
>> B4. If multiple authors include them using the standard GitHub marker,
>> "Co-authored-by:", followed by the name and email of the author (e.g.,
>> Co-authored-by: Marton Bod )
>> B5. If the reviewer is different from committer (or merge via GitHub UI)
>> use "Reviewed-by:" followed by the name and email of the reviewer (e.g.,
>> Reviewed-by: Stamatis Zampetakis )
>> B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many
>> times as authors/reviewers.
>> B7. Include the PR id at the end of the message (e.g., Closes #2514);
>> someone can easily navigate back to the PR to check comments, reviewers,
>> etc.
>> 
>> A sample commit message following these guidelines is shown below:
>> 
>> commit de7781f29f82083fe01274b4d436b52920a89173
>> Author: Soumyakanti Das 
>> Commit: Stamatis Zampetakis 
>> 
>>HIVE-25354: NPE when estimating row count in external JDBC tables
>> 
>>Add handler methods in HiveRelMdDistictRowCount for JdbcHiv

[jira] [Created] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25569:
---

 Summary: Enable table definition over a single file
 Key: HIVE-25569
 URL: https://issues.apache.org/jira/browse/HIVE-25569
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


Suppose there is a directory where multiple files are present - and by a 3rd 
party database system this is perfectly normal - because its treating a single 
file as the contents of the table.

Tables defined in the metastore follow a different principle - tables are 
considered to be under a directory - and all files under that directory are the 
contents of that directory.

To enable seamless migration/evaluation of Hive and other databases using HMS 
as a metadatabackend the ability to define a table over a single file would be 
usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25568) Estimate TopNKey operator statistics.

2021-09-28 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-25568:
-

 Summary: Estimate TopNKey operator statistics.
 Key: HIVE-25568
 URL: https://issues.apache.org/jira/browse/HIVE-25568
 Project: Hive
  Issue Type: Improvement
Reporter: Krisztian Kasa


Currently TopNKey operator has the same statistics as it's parent operator:
{code}
 TableScan
  alias: src
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Top N Key Operator
sort order: +
keys: key (type: string)
null sort order: z
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
top n: 5
{code}
This operator filters out rows and this should be indicated in statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Commit message guidelines

2021-09-28 Thread Alessandro Solimando
Hi Stamatis,
thanks for the suggestions, I think they are reasonable, the project would
benefit from their adoption.

Regarding the removal of contributor/reviewers names from the commit
message favoring the use of git metadata, there has been a similar discussion
in Calcite ML

which
led to consensus.

Best regards,
Alessandro

On Fri, 24 Sept 2021 at 16:40, Stamatis Zampetakis 
wrote:

> Hi all,
>
> I think we all more or less follow some standard pattern when committing in
> Hive but with some small effort we could make things more uniform and
> hopefully better.
> I would like to start a discussion about creating some guidelines, which we
> could put to the wiki or in contributing.md, to improve the quality of our
> history (git log).
> I outline some suggestions below to kick off the discussion. Many things in
> the list are minor (and maybe even personal preferences) but one thing
> which is really missing from the project is B3 especially the *why* part.
> Why is the commit necessary? Why has the change been made?.
> In some cases the why part is also missing from the JIRA making the code
> harder to maintain.
>
> Subject line:
> S1. Start with the Jira id capitalized and followed immediately (no space)
> by double colon (:)
> S2. Leave one space after the Jira id, and start the summary with a capital
> letter
> S3. Keep it concise (ideally less than 72 characters) and provide a useful
> description of the change
> S4. Do not include or end the line with period
> S5. Do not include the pull request id in the summary
> S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added
> a handler …”) or present tense (“Adds a handler …”)
> S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to
> describe the bug (“NullPointerException if user is unknown”) and people
> will correctly surmise that the purpose of your change is to fix the bug.
> S8. Do not add a contributor's name; the author tag is made exactly for
> this and can be explored/parsed much more efficiently by tools/people for
> stats or other purposes
> S9. Do not add reviewers name; information is present in multiple places
> (e.g., committer tag, PR, JIRA)
>
> Message body: (Trivial changes may not require a body)
> B1. Separate subject from body with a blank line
> B2. Wrap the body at 72 characters
> B3. Use the body to explain what and why vs. how
> Example
> "Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and
> Converter to avoid executing the fallback method which in many cases
> returns null and can cause NPE when this value propagates up the call
> stack."
> vs.
> "Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
> and Converter"
> B4. If multiple authors include them using the standard GitHub marker,
> "Co-authored-by:", followed by the name and email of the author (e.g.,
> Co-authored-by: Marton Bod )
> B5. If the reviewer is different from committer (or merge via GitHub UI)
> use "Reviewed-by:" followed by the name and email of the reviewer (e.g.,
> Reviewed-by: Stamatis Zampetakis )
> B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many
> times as authors/reviewers.
> B7. Include the PR id at the end of the message (e.g., Closes #2514);
> someone can easily navigate back to the PR to check comments, reviewers,
> etc.
>
> A sample commit message following these guidelines is shown below:
>
> commit de7781f29f82083fe01274b4d436b52920a89173
> Author: Soumyakanti Das 
> Commit: Stamatis Zampetakis 
>
> HIVE-25354: NPE when estimating row count in external JDBC tables
>
> Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
> and Converter to avoid executing the fallback method which in many
> cases returns null and can cause NPE when this value propagates up the
> call stack.
>
> Co-authored-by: Krisztian Kasa 
>
> Reviewed-by: Peter Vary 
> Reviewed-by: Zoltan Haindrich 
>
> Closes #2514
>
> Let me know your thoughts.
>
> Best,
> Stamatis
>