Re: Reviewers and assignees of PRs

2021-02-04 Thread Zoltan Haindrich

Hey All!

After jumping over some further requirements/etc I was able to make this work 
and merge it!
It already found some PRs which are changing the parser/thrift api - which I 
would have missed otherwise.
I hope that this could help us increasing our PR review rate. I would like to 
suggest to all committers to add some rules to the .github/assign-by-files.yml 
file.
It requires the assignee to be the member of the "hive-committers" github group.

cheers,
Zoltan


On 12/11/20 12:00 PM, Zoltan Haindrich wrote:

Hey All!

I've prepared the things need for this a long time ago - I've only opened the 
PR now...

If you would like to extend the assign-by-files - please either leave a comment on the PR 
; or use the "Edit file" option on github to add your changes!

https://github.com/apache/hive/pull/1767/files

cheers,
Zoltan

On 6/18/20 11:41 AM, Zoltan Haindrich wrote:

Hey all!

I'm happy to see that (I guess) everyone is using the PR based stuff without issues - there are still some flaky stuff from time-to-time; but I feel that patches go in 
faster - and I have a feeling we have more reviewes going on as well - which is awesome!


I've read a bit about github "reviewers" / "assignee" stuff - because it seemed 
somewhat confusing...
Basically both of them could be a group of users - the meaning of these fields 
should be filled by the community.
I would like to propose to use the "reviewers" to use it as people from whom 
reviews might be expected.
And use the assignee field to list those who should approve the change to go in 
(anyone may add asignees/reviewers)

We sometimes forget PRs and they may become "stale" most of them is just falling thru the cracks...to prevent this the best would be if everyone would self-assign PRs 
which are in his/her area of interest.


There are some times when a give feature needs to change not closely related parts of the codebase - this is usually fine; but there are places which might need "more 
eyes" on reviews.

In the past I was sometimes surprised by some interesting changes in say the 
thrift api / package.jdo / antlr stuff.

Because the jira title may not suggest what files will be changed - I wanted to 
find a way to auto add some kind of notifications to PRs.

Today I've found a neat solution to this [1] - which goes a little bit beyond what I anticipated - there is a small plugin which could enable to auto-add reviewers based 
on the changed files (adding a reviewer will also emit an email) - I had to fix a few small issues with it to ensure that it works/etc [2].


I really like this approach beacuase it could enable to change the direction of things - and could enable that contributors doesn't neccessarily need to look for 
reviewers. (but this seems more like just sci-fi right now - lets start small and go from there...)


I propose to collect some globs and reviewers in a google doc before we first 
commit this file into the repo - so that everyone could add things he/she is 
interested in.

cheers,
Zoltan

[1] https://github.com/marketplace/actions/auto-assign-reviewer-by-files
[2] https://github.com/kgyrtkirk/auto-assign-reviewer-by-files
[3] 
https://docs.google.com/document/d/11n9acHby31rwVHfRW4zxxYukymHS-tTSYlJEghZwJaY/edit?usp=sharing


[jira] [Created] (HIVE-24734) Sanity check in HiveSplitGenerator available slot calculation

2021-02-04 Thread Zoltan Matyus (Jira)
Zoltan Matyus created HIVE-24734:


 Summary: Sanity check in HiveSplitGenerator available slot 
calculation
 Key: HIVE-24734
 URL: https://issues.apache.org/jira/browse/HIVE-24734
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 4.0.0
Reporter: Zoltan Matyus


HiveSplitGenerator calculates the number of available slots from available 
memory like this:

{code:java}
if (getContext() != null) {
  totalResource = getContext().getTotalAvailableResource().getMemory();
  taskResource = getContext().getVertexTaskResource().getMemory();
  availableSlots = totalResource / taskResource;
}
{code}

I had a scenario where the total memory was calculated correctly, but the task 
memory returned -1. This led to error like these:

{noformat}
tez.HiveSplitGenerator: Number of input splits: 1. -3641 available slots, 1.7 
waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat

Estimated number of tasks: -6189 for bucket 1

java.lang.IllegalArgumentException: Illegal Capacity: -6189
{noformat}

Admittedly, this happened during development, and hopefully will not occur on a 
properly configured cluster. (Although I'm not sure what the issue was on my 
setup, possibly XMX set higher than physical memory.)

In any case, it feels like setting availableSlots < 1 will never lead to 
desired behavior, so in such cases we could emit a warning and correct the 
value to 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Contributions from dataproc-metastore

2021-02-04 Thread Zoltan Haindrich

Hey All!

It seems to me that someone have opened a "dataproc-metastore" account on 
github and is contributing to Hive thru that user.
I personally don't like that the account is not a real person - it looks more 
like a team or group inside Google.

This account already has a commit which is very confusing:
* the github account is https://github.com/dataproc-metastore
* the jira is assigned to Cameron Moberg 
https://issues.apache.org/jira/browse/HIVE-24470
* the actual commits in the PR were made by Zhou Fang https://github.com/coufon
* the commit is attributed to "Zhou Fang" - 
https://github.com/apache/hive/commit/b0309b7f023d9785c3a842d70d0fc471252101bf
* the jira is still open...but that's not really relevant - that can be fixed 
in no time :D

I think we should stop merging PRs from sources like this (or is it too much to 
ask that the user should have a matching github account)?

This "dataproc-metastore" user had one more PR open - I was a bit angry because 
of the above; so I've closed it.

Let me know what you think!

cheers,
Zoltan


[jira] [Created] (HIVE-24735) Implement TIMESTAMP WITH LOCAL TIME ZONE integration with ORC

2021-02-04 Thread Panagiotis Garefalakis (Jira)
Panagiotis Garefalakis created HIVE-24735:
-

 Summary: Implement TIMESTAMP WITH LOCAL TIME ZONE integration with 
ORC
 Key: HIVE-24735
 URL: https://issues.apache.org/jira/browse/HIVE-24735
 Project: Hive
  Issue Type: New Feature
Reporter: Panagiotis Garefalakis
Assignee: Panagiotis Garefalakis


TIMESTAMP_INSTANT in ORC is equivalent to TIMESTAMP_WITH_LOCAL_TIME_ZONE type 
in Hive. Support to read/write timestamp with local time zone in ORC was added 
as part of ORC-189.

We should implement their 
[integration|https://github.com/apache/hive/pull/1823#discussion_r564077084].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24736) Make buffer tracking in LLAP cache with BP wrapper more accurate

2021-02-04 Thread Jira
Ádám Szita created HIVE-24736:
-

 Summary: Make buffer tracking in LLAP cache with BP wrapper more 
accurate
 Key: HIVE-24736
 URL: https://issues.apache.org/jira/browse/HIVE-24736
 Project: Hive
  Issue Type: Improvement
  Components: llap
Reporter: Ádám Szita
Assignee: Ádám Szita


HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
instances are stored before entering LRFU's heap - so that lock contention is 
eased up.

This is a nice performance improvement, but comes at the cost of losing the 
exact accounting of llap buffer instances - e.g. if user gives a purge command, 
not all the cache space is free'd up as one'd expect because purge only 
considers buffers that the policy knows about. In this case we'd see in LLAP's 
iomem servlet that the LRFU policy is empty, but a table may still have the 
full content loaded.

Also, if we use text based tables, during cache load, a set of -OrcEncode 
threads are used that are ephemeral in nature. Attaching buffers to these 
threads' thread local structures are ultimately lost. In an edge case we could 
load lots of data into the cache by reading in many distinct smaller text 
tables, whose buffers never reach LRFU policy, and hence cache hit ratio will 
be suffering as a consequence (memory manager will give up asking LRFU to 
evict, and will free up random buffers).

I propose we try and track the amount of data stored in the BP wrapper 
threadlocals, and flush them into the heap as a first step of a purge request. 
This will enhance supportability.
We should also replace the ephemeral OrcEncode threads with a thread pool, that 
could actually serve as small performance improvement on its own by saving time 
and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24737) Remove Configuration TEZ_SIMPLE_CUSTOM_EDGE_TINY_BUFFER_SIZE_MB

2021-02-04 Thread David Mollitor (Jira)
David Mollitor created HIVE-24737:
-

 Summary: Remove Configuration 
TEZ_SIMPLE_CUSTOM_EDGE_TINY_BUFFER_SIZE_MB
 Key: HIVE-24737
 URL: https://issues.apache.org/jira/browse/HIVE-24737
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor


Please remove {{TEZ_SIMPLE_CUSTOM_EDGE_TINY_BUFFER_SIZE_MB}}

 

It is never in practice actually used.  Can it just be assigned a sensible 
hard-coded value?

 

This seem like an over optimization at the cost of yet another configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24738) Reuse committed filelist from directInsert manifest during loadPartition

2021-02-04 Thread Peter Varga (Jira)
Peter Varga created HIVE-24738:
--

 Summary: Reuse committed filelist from directInsert manifest 
during loadPartition
 Key: HIVE-24738
 URL: https://issues.apache.org/jira/browse/HIVE-24738
 Project: Hive
  Issue Type: Sub-task
Reporter: Peter Varga
Assignee: Peter Varga


This way the costly FileSystem listing can be avoided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24739) Clarify Usage of Thrift TServerEventHandler and Count Number of Messages Processed

2021-02-04 Thread David Mollitor (Jira)
David Mollitor created HIVE-24739:
-

 Summary: Clarify Usage of Thrift TServerEventHandler and Count 
Number of Messages Processed
 Key: HIVE-24739
 URL: https://issues.apache.org/jira/browse/HIVE-24739
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


Make the messages emitted from {{TServerEventHandler}} more meaningful.  Also, 
track the number of messages that each client sends to aid in troubleshooting.

I run into this issue all the time with and this would greatly help clarify the 
logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24740) Can't order by an unselected column

2021-02-04 Thread Oleksiy Sayankin (Jira)
Oleksiy Sayankin created HIVE-24740:
---

 Summary: Can't order by an unselected column
 Key: HIVE-24740
 URL: https://issues.apache.org/jira/browse/HIVE-24740
 Project: Hive
  Issue Type: Bug
Reporter: Oleksiy Sayankin


{code}
CREATE TABLE t1 (column1 STRING);
{code}

{code}
select substr(column1,1,4), avg(column1) from t1 group by substr(column1,1,4) 
order by column1;
{code}

{code}
org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:87 Invalid table 
alias or column reference 'column1': (possible column names are: _c0, _c1, 
.(tok_function substr (tok_table_or_col column1) 1 4), .(tok_function avg 
(tok_table_or_col column1)))
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5645)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5576)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.getOrderByExpression(CalcitePlanner.java:4326)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.beginGenOBLogicalPlan(CalcitePlanner.java:4230)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genOBLogicalPlan(CalcitePlanner.java:4136)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5326)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1864)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1810)
at 
org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:130)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:915)
at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:179)
at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1571)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:562)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12538)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:456)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:315)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:223)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:492)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:445)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:403)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:229)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:744)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:714)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:170)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver(TestCliDriver.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:135)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.

[jira] [Created] (HIVE-24741) get_partitions_ps_with_auth performance can be improved when it is requesting all the partitions

2021-02-04 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-24741:
--

 Summary: get_partitions_ps_with_auth performance can be improved 
when it is requesting all the partitions
 Key: HIVE-24741
 URL: https://issues.apache.org/jira/browse/HIVE-24741
 Project: Hive
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


{{get_partitions_ps_with_auth}} API does not support DirectSQL. I have seen 
some large production use-cases where this API (specifically from Spark 
applications) is used heavily to request for all the partitions of a table. 
This performance of this API when requesting all the partitions of the table 
can be signficantly improved (~4 times from a realworld large workload usecase) 
if we forward this API call to a directSQL enabled API. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Contributions from dataproc-metastore

2021-02-04 Thread Vihang Karajgaonkar
Thanks Zoltan for your email.

Just to give some context, dataproc-metastore is Google's metastore
compatible cloud service. The good news is that they are happy and willing
to contribute any improvements/fixes to Apache Hive (metastore
specifically) instead of forking out the repository.
They also contributed their proposed changes here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158869886

I think it makes sense to have individual users contribute the PR so that
we can attribute the patch accordingly. When I merged their PR I asked them
offline who is the end user for this PR and they mentioned they are still
figuring out who is going to be the point of contact for the open-source
contributions. While merging the PR, github suggested the author name and I
used that.

I was a bit angry because of the above; so I've closed it.
>
I feel this is a bit against the spirit of open-source hive and it would be
great to have a wiki page for commit guidelines and ask them to refer to
it. The only wiki that I find about commit guidelines is
https://cwiki.apache.org/confluence/display/Hive/HowToCommit which
definitely needs an update.

On Thu, Feb 4, 2021 at 1:02 AM Zoltan Haindrich  wrote:

> Hey All!
>
> It seems to me that someone have opened a "dataproc-metastore" account on
> github and is contributing to Hive thru that user.
> I personally don't like that the account is not a real person - it looks
> more like a team or group inside Google.
>
> This account already has a commit which is very confusing:
> * the github account is https://github.com/dataproc-metastore
> * the jira is assigned to Cameron Moberg
> https://issues.apache.org/jira/browse/HIVE-24470
> * the actual commits in the PR were made by Zhou Fang
> https://github.com/coufon
> * the commit is attributed to "Zhou Fang" -
> https://github.com/apache/hive/commit/b0309b7f023d9785c3a842d70d0fc471252101bf
> * the jira is still open...but that's not really relevant - that can be
> fixed in no time :D
>
> I think we should stop merging PRs from sources like this (or is it too
> much to ask that the user should have a matching github account)?
>
> This "dataproc-metastore" user had one more PR open - I was a bit angry
> because of the above; so I've closed it.
>
> Let me know what you think!
>
> cheers,
> Zoltan
>


Re: Contributions from dataproc-metastore

2021-02-04 Thread Cameron Moberg
Hello!

Thanks for bringing this up! As Vihang said, our team created the account
`dataproc-metastore` due to the newness of our product and internal
contributing challenges we wanted to work out before moving forward. Moving
forward we see the benefit of using our personal accounts for attribution
and clarity while doing open source work and will migrate to using those.

We'll go ahead and re-open that PR under either Zhou or my account and are
excited to be part of this open source community in the future!

Cheers,
Cameron

On 2021/02/04 20:11:26, Vihang Karajgaonkar  wrote:
> Thanks Zoltan for your email.>
>
> Just to give some context, dataproc-metastore is Google's metastore>
> compatible cloud service. The good news is that they are happy and
willing>
> to contribute any improvements/fixes to Apache Hive (metastore>
> specifically) instead of forking out the repository.>
> They also contributed their proposed changes here:>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158869886>

>
> I think it makes sense to have individual users contribute the PR so
that>
> we can attribute the patch accordingly. When I merged their PR I asked
them>
> offline who is the end user for this PR and they mentioned they are
still>
> figuring out who is going to be the point of contact for the open-source>
> contributions. While merging the PR, github suggested the author name and
I>
> used that.>
>
> I was a bit angry because of the above; so I've closed it.>
> >>
> I feel this is a bit against the spirit of open-source hive and it would
be>
> great to have a wiki page for commit guidelines and ask them to refer to>
> it. The only wiki that I find about commit guidelines is>
> https://cwiki.apache.org/confluence/display/Hive/HowToCommit which>
> definitely needs an update.>
>
> On Thu, Feb 4, 2021 at 1:02 AM Zoltan Haindrich  wrote:>
>
> > Hey All!>
> >>
> > It seems to me that someone have opened a "dataproc-metastore" account
on>
> > github and is contributing to Hive thru that user.>
> > I personally don't like that the account is not a real person - it
looks>
> > more like a team or group inside Google.>
> >>
> > This account already has a commit which is very confusing:>
> > * the github account is https://github.com/dataproc-metastore>
> > * the jira is assigned to Cameron Moberg>
> > https://issues.apache.org/jira/browse/HIVE-24470>
> > * the actual commits in the PR were made by Zhou Fang>
> > https://github.com/coufon>
> > * the commit is attributed to "Zhou Fang" ->
> >
https://github.com/apache/hive/commit/b0309b7f023d9785c3a842d70d0fc471252101bf>

> > * the jira is still open...but that's not really relevant - that can
be>
> > fixed in no time :D>
> >>
> > I think we should stop merging PRs from sources like this (or is it
too>
> > much to ask that the user should have a matching github account)?>
> >>
> > This "dataproc-metastore" user had one more PR open - I was a bit
angry>
> > because of the above; so I've closed it.>
> >>
> > Let me know what you think!>
> >>
> > cheers,>
> > Zoltan>
> >>
>


[jira] [Created] (HIVE-24742) Support router path or view fs path in Hive table location

2021-02-04 Thread Aihua Xu (Jira)
Aihua Xu created HIVE-24742:
---

 Summary: Support router path or view fs path in Hive table location
 Key: HIVE-24742
 URL: https://issues.apache.org/jira/browse/HIVE-24742
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 3.1.2
Reporter: Aihua Xu
Assignee: Aihua Xu


In 
[FileUtils.java|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L747],
 equalsFileSystem function checks the base URL to determine if source and 
destination are on the same cluster and decides copy or move the data. That 
will not work for viewfs or router base file system since viewfs://ns-default/a 
and viewfs://ns-default/b may be on different physical clusters.

FileSystem in HDFS supports resolvePath() function to resolve to the physical 
path. We can support viewfs and router through such function.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Contributions from dataproc-metastore

2021-02-04 Thread Stamatis Zampetakis
Apache requires signing an ICLA [1] for committers and clear intention of
contributing from contributors [2].
>From the above, I would say that it is important to know who (individual)
is the one contributing the code and Zoltan did well to raise awareness
around this topic.
Of course, not everyone is familiar with these processes so as Vihang
pointed out it would be good to improve the documentation and point people
to that when necessary.

Best,
Stamatis

[1] https://www.apache.org/licenses/icla.pdf
[2] https://apetro.ghost.io/apache-contributors-no-cla/

On Thu, Feb 4, 2021 at 9:12 PM Vihang Karajgaonkar 
wrote:

> Thanks Zoltan for your email.
>
> Just to give some context, dataproc-metastore is Google's metastore
> compatible cloud service. The good news is that they are happy and willing
> to contribute any improvements/fixes to Apache Hive (metastore
> specifically) instead of forking out the repository.
> They also contributed their proposed changes here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158869886
>
> I think it makes sense to have individual users contribute the PR so that
> we can attribute the patch accordingly. When I merged their PR I asked them
> offline who is the end user for this PR and they mentioned they are still
> figuring out who is going to be the point of contact for the open-source
> contributions. While merging the PR, github suggested the author name and I
> used that.
>
> I was a bit angry because of the above; so I've closed it.
> >
> I feel this is a bit against the spirit of open-source hive and it would be
> great to have a wiki page for commit guidelines and ask them to refer to
> it. The only wiki that I find about commit guidelines is
> https://cwiki.apache.org/confluence/display/Hive/HowToCommit which
> definitely needs an update.
>
> On Thu, Feb 4, 2021 at 1:02 AM Zoltan Haindrich  wrote:
>
> > Hey All!
> >
> > It seems to me that someone have opened a "dataproc-metastore" account on
> > github and is contributing to Hive thru that user.
> > I personally don't like that the account is not a real person - it looks
> > more like a team or group inside Google.
> >
> > This account already has a commit which is very confusing:
> > * the github account is https://github.com/dataproc-metastore
> > * the jira is assigned to Cameron Moberg
> > https://issues.apache.org/jira/browse/HIVE-24470
> > * the actual commits in the PR were made by Zhou Fang
> > https://github.com/coufon
> > * the commit is attributed to "Zhou Fang" -
> >
> https://github.com/apache/hive/commit/b0309b7f023d9785c3a842d70d0fc471252101bf
> > * the jira is still open...but that's not really relevant - that can be
> > fixed in no time :D
> >
> > I think we should stop merging PRs from sources like this (or is it too
> > much to ask that the user should have a matching github account)?
> >
> > This "dataproc-metastore" user had one more PR open - I was a bit angry
> > because of the above; so I've closed it.
> >
> > Let me know what you think!
> >
> > cheers,
> > Zoltan
> >
>


[jira] [Created] (HIVE-24743) [HS2] Send tableId to get_partitions_by_names_req HMS API from HS2

2021-02-04 Thread Kishen Das (Jira)
Kishen Das created HIVE-24743:
-

 Summary: [HS2] Send tableId to get_partitions_by_names_req HMS API 
from HS2
 Key: HIVE-24743
 URL: https://issues.apache.org/jira/browse/HIVE-24743
 Project: Hive
  Issue Type: Sub-task
Reporter: Kishen Das


As part of ( HIVE-23821: Send tableId in request for all the new HMS 
get_partition APIs ) we added logic to send tableId in the request for several 
get_partition APIs, but looks like it was missed out for getPartitionsByNames. 
TableId and validWriteIdList are used to maintain consistency, when HMS API 
response is being served from a remote cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24744) Deletion of previous dump dir fails with NPE for ptests

2021-02-04 Thread Arko Sharma (Jira)
Arko Sharma created HIVE-24744:
--

 Summary: Deletion of previous dump dir fails with NPE for ptests
 Key: HIVE-24744
 URL: https://issues.apache.org/jira/browse/HIVE-24744
 Project: Hive
  Issue Type: Bug
Reporter: Arko Sharma
Assignee: Arko Sharma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)