Re: [DISCUSS] Hive 3.2

2020-11-13 Thread Matt McCline
A few comments. I am going to move forward with a VOTE on Hive 3.2 next.

-Original Message-
From: Matt McCline 
Sent: Monday, October 26, 2020 2:12 PM
To: dev@hive.apache.org
Subject: RE: [EXTERNAL] Re: [DISCUSS] Hive 3.2


Hi László,

Thank you for your response.

Since 3.1.3-rc0 was tagged on Jan 13 there are 3156 commits in master more than 
in this tag. I mostly wanted to address the huge number of changes in master.
We could do a 3.1.3 release with a modest number of changes, and a 3.2 with 
perhaps many or all of the 3,000+ changes in master.
What do you think?

Matt

-Original Message-
From: László Bodor 
Sent: Monday, October 26, 2020 4:19 AM
To: dev@hive.apache.org
Subject: [EXTERNAL] Re: [DISCUSS] Hive 3.2

Sorry, posted incorrect link for 3.1.3-rc0, the correct is:
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Freleases%2Ftag%2Frelease-3.1.3-rc0data=04%7C01%7Cmatt.mccline%40microsoft.com%7Ceaddbcb684604ac8867808d879a0ff93%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637393079672104819%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=CXYiz9oONt6%2FpsGhAXso30qgjr3JQljyTKnxdE3XSvI%3Dreserved=0

On Mon, 26 Oct 2020 at 12:17, László Bodor 
wrote:

> Hey!
>
> I'm also interested in PMCs' opinion. I think it should be released 
> from branch-3, otherwise, it's a 4.0, right? (which is a heavier 
> discussion, and I don't know what Hive4 will be about.) On 3.x we have 
> an official 3.1.2 and an abandoned 3.1.3-rc0, which is not yet 
> released as far as I can see. I guess the next release is supposed to 
> be 3.1.3 as we haven't changed tez/hadoop/orc dependencies since that, 
> and I don't think branch-3 was actively maintained.
>
> Regards,
> Laszlo Bodor
>
> On Thu, 22 Oct 2020 at 21:24, Matt McCline 
>  wrote:
>
>> Hey,
>> Hive master is about 2 years ahead of 3.1 - it seems like time to 
>> release those changes.
>> So, let us have community discussion about creating a Hive 3.2 release.
>> I volunteer to be the release manager. I have not done that before, 
>> so I will need help.
>> I will start a VOTE thread soon, but I would like to hear some 
>> opinions first.
>>
>> Thank you,
>> Matt
>>
>> (It is unclear if there are enough major features or dependencies on 
>> projects that necessitate a major version bump)
>>
>>


[jira] [Created] (HIVE-24387) Metastore access through JDBC handler does not use correct database accessor

2020-11-13 Thread Jesus Camacho Rodriguez (Jira)
Jesus Camacho Rodriguez created HIVE-24387:
--

 Summary: Metastore access through JDBC handler does not use 
correct database accessor
 Key: HIVE-24387
 URL: https://issues.apache.org/jira/browse/HIVE-24387
 Project: Hive
  Issue Type: Bug
  Components: JDBC storage handler
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


There is some differences in the SQL syntax for each RDBMS generated by the 
database accessor. For metastore, we always end up with the default accessor, 
which lead to errors, e.g., when a limit query is executed for a 
Postgres-backed metastore.

{code}
Error: java.io.IOException: java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
while trying to get column names: ERROR: syntax error at or near "{"
Position: 200 (state=,code=0)

SELECT "TBL_COLUMN_GRANT_ID", "COLUMN_NAME", "CREATE_TIME", "GRANT_OPTION", 
"GRANTOR", "GRANTOR_TYPE", "PRINCIPAL_NAME", "PRINCIPAL_TYPE", "TBL_COL_PRIV", 
"TBL_ID", "AUTHORIZER" FROM "TBL_COL_PRIVS"
{LIMIT 1}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Result of the TPC-DS benchmark on Hive master branch

2020-11-13 Thread Sungwoo Park
Hi Zoltan,

I have run another fresh TPC-DS test using the latest commit. Here is the
summary:

Commits used:

1) Hive, master, e9f72e654750de208227d46a22e983413b080c6c (HIVE-24366, Thu
Nov 12)
2) Tez, 0.10.0, 22fec6c0ecc7ebe6f6f28800935cc6f69794dad5 (CHANGES.txt
updated with TEZ-4238, Thu Oct 8)

Scenario:

1) create a database consisting of external tables from a 100GB TPC-DS text
dataset
2) create a database consisting of ORC tables
3) compute column statistics, set tez.runtime.compress=false
4) run TPC-DS queries and check the results

Configuration:

1) set hive.execution.engine=tez, hive.execution.mode=container
2) set hive.cbo.enable=true

Experiment #1: hive.optimize.shared.work.dppunion=true

Query 2 fails:

java.lang.IllegalArgumentException: Edge [Reducer 9 :
org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 6 :
org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST :
org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >>
org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager
}) already defined!

Query 14 fails:

org.apache.hadoop.hive.ql.parse.SemanticException: EXCEPT and INTERSECT
operations are only supported with Cost Based Optimizations enabled. Please
set 'hive.cbo.enable' to true!

Query 59 fails:

java.lang.IllegalArgumentException: Edge [Reducer 6 :
org.apache.hadoop.hive.ql.exec.tez.ReduceTezProcessor] -> [Map 4 :
org.apache.hadoop.hive.ql.exec.tez.MapTezProcessor] ({ BROADCAST :
org.apache.tez.runtime.library.input.UnorderedKVInput >> PERSISTED >>
org.apache.tez.runtime.library.output.UnorderedKVOutput >> NullEdgeManager
}) already defined!

Experiment #2: hive.optimize.shared.work.dppunion=false

Query 14 fails:

org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: SemanticException EXCEPT and INTERSECT operations are
only supported with Cost Based Optimizations enabled. Please set
'hive.cbo.enable' to true!

Summary:

1. With hive.optimize.shared.work.dppunion=true, query 2 and 59 fail.
Please see the attachment for stack traces.

2. Query 14 fails in both cases, and it seems like another bug. Note that
when hive.cbo.enable is set to true when running query 14.

3. For some queries, the number of rows is different between the two
experiments. In most cases, it seems to be rounding errors, but the
difference is rather large for some queries (e.g., query 29 and 58). Please
see the attachment for the result.

I could open a new Jira for this issue, or create a sub-task of HIVE-24384.
Or perhaps HIVE-24384 is already enough. So please let me know which would
be good for you.

(I have automated the entire experiment, so if you would like to see the
result of testing a new commit, I would be happy to rerun the experiment
and get back to you.)

Cheers,

--- Sungwoo

On Thu, Nov 12, 2020 at 10:49 PM Zoltan Haindrich  wrote:

> Hey Sungwoo!
>
> On 11/12/20 10:23 AM, Sungwoo Park wrote:
> > Hi Zoltan,
> >
> > I used the same hive-site.xml for the previous test (which was okay) and
> > the new test (which failed), so my guess is that it is perhaps due to a
> > commit since the previous test. Let me try later to identify the commit
> > that fails query 14, with the hope that identifying such a commit might
> be
> > useful in debugging.
>
> That would definetly help - if you could share the 2 commit hashes; it
> might be possible that we could guess it from the commit message or
> something.
>
>
> > Another question: is HIVE-24360 part of a solution to the problem of
> > hive.optimize.shared.work.dppunion?
> > I have tried the latest commit (which includes HIVE-24360) using the
> TPC-DS
> > benchmark, and it seems like the problem still exists.
>
> Yes, HIVE-24360 should have fixed that - do you still see an exception
> coming from tez-api reporting edge errors?
> I will also pick these changes for a smaller benchmark run soon...but I'm
> not running any right now. Could also note for which query you've seen the
> exception - so that I
> could also check it.
> Could you please open a jira about this - and add the actual exception
> trace/etc if available?
>
> cheers,
> Zoltan
>
> >
> > Cheers,
> >
> > --- Sungwoo
> >
> > On Mon, Nov 9, 2020 at 6:18 PM Zoltan Haindrich  wrote:
> >
> >> Hey Sungwoo!
> >>
> >> Regarding Q14 / "java.lang.RuntimeException: equivalence mapping
> violation"
> >>
> >>   From the stack trace you shared it seems like the mapper have already
> >> seen both the filter and the ast node earlier - and they are in separate
> >> mapping groups. (Which is
> >> unfortunate) I think it won't be simple to track that down - it will
> >> definetly need some debugging.
> >> The best would be to have a repro query for it...
> >>
> >> note: we already run q14 in TestTezPerf*Driver - could it might be
> >> possible that we've disabled some features in the hive-site.xml for
> these
> >> tests; and that's why we
> >> haven't seen it before?
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >
>
1) Hive, 

Re: Credits page - Edits

2020-11-13 Thread Narayanan Venkateswaran
Hi,

svn co https://svn.apache.org/repos/asf/hive/hcatalog-historical/site/

worked for me.

Narayanan

On Fri, Nov 13, 2020 at 1:54 PM Anishek Agarwal  wrote:

> Hello,
>
> For hive under
>
> https://cwiki.apache.org/confluence/display/Hive/HowToCommit#HowToCommit-Newcommitters
>
> the step
> svn co https://svn.apache.org/repos/asf/hive/site hive-site
> says the URL is incorrect, anyone knows what the correct location is for
> the above repo ?
>
> thanks
> anishek
>


[jira] [Created] (HIVE-24386) Add builder methods for GetTablesRequest and GetPartitionsRequest to HiveMetaStoreClient

2020-11-13 Thread Narayanan Venkateswaran (Jira)
Narayanan Venkateswaran created HIVE-24386:
--

 Summary: Add builder methods for GetTablesRequest and 
GetPartitionsRequest to HiveMetaStoreClient
 Key: HIVE-24386
 URL: https://issues.apache.org/jira/browse/HIVE-24386
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Narayanan Venkateswaran
Assignee: Narayanan Venkateswaran


Builder methods for GetTablesRequest and GetPartitionsRequest should be added 
to the HiveMetaStoreClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24385) hive 数据类型date 值是不是有范围

2020-11-13 Thread Jira
罗鹏程 created HIVE-24385:
--

 Summary: hive 数据类型date 值是不是有范围
 Key: HIVE-24385
 URL: https://issues.apache.org/jira/browse/HIVE-24385
 Project: Hive
  Issue Type: Task
  Components: CLI
Affects Versions: 2.1.1
 Environment: hive2.1
Reporter: 罗鹏程


字段数据类型是date,

插入数据"-99-99",结果是null,插入数据"-99-99",结果是“2020-10-19”

将数据类型修改为string时,再次插入数据"-99-99",结果是"-99-99",

date是不是有范围?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24384) SharedWorkOptimizer improvements

2020-11-13 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24384:
---

 Summary: SharedWorkOptimizer improvements
 Key: HIVE-24384
 URL: https://issues.apache.org/jira/browse/HIVE-24384
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


this started as a small feature addition but due to the sheer volume of the 
q.out changes - its better to do smaller changes at a time; which means more 
tickets...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24383) Add Table type to HPL/SQL

2020-11-13 Thread Attila Magyar (Jira)
Attila Magyar created HIVE-24383:


 Summary: Add Table type to HPL/SQL
 Key: HIVE-24383
 URL: https://issues.apache.org/jira/browse/HIVE-24383
 Project: Hive
  Issue Type: Improvement
  Components: hpl/sql
Reporter: Attila Magyar
Assignee: Attila Magyar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Credits page - Edits

2020-11-13 Thread Anishek Agarwal
Hello,

For hive under
https://cwiki.apache.org/confluence/display/Hive/HowToCommit#HowToCommit-Newcommitters

the step
svn co https://svn.apache.org/repos/asf/hive/site hive-site
says the URL is incorrect, anyone knows what the correct location is for
the above repo ?

thanks
anishek