[jira] [Created] (HIVE-24734) Sanity check in HiveSplitGenerator available slot calculation
Zoltan Matyus created HIVE-24734: Summary: Sanity check in HiveSplitGenerator available slot calculation Key: HIVE-24734 URL: https://issues.apache.org/jira/browse/HIVE-24734 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 4.0.0 Reporter: Zoltan Matyus HiveSplitGenerator calculates the number of available slots from available memory like this: {code:java} if (getContext() != null) { totalResource = getContext().getTotalAvailableResource().getMemory(); taskResource = getContext().getVertexTaskResource().getMemory(); availableSlots = totalResource / taskResource; } {code} I had a scenario where the total memory was calculated correctly, but the task memory returned -1. This led to error like these: {noformat} tez.HiveSplitGenerator: Number of input splits: 1. -3641 available slots, 1.7 waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat Estimated number of tasks: -6189 for bucket 1 java.lang.IllegalArgumentException: Illegal Capacity: -6189 {noformat} Admittedly, this happened during development, and hopefully will not occur on a properly configured cluster. (Although I'm not sure what the issue was on my setup, possibly XMX set higher than physical memory.) In any case, it feels like setting availableSlots < 1 will never lead to desired behavior, so in such cases we could emit a warning and correct the value to 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24691) Ban commons-logging (again)
Zoltan Matyus created HIVE-24691: Summary: Ban commons-logging (again) Key: HIVE-24691 URL: https://issues.apache.org/jira/browse/HIVE-24691 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 4.0.0 Reporter: Zoltan Matyus Assignee: Zoltan Matyus The usage of commons-logging has been completely removed once from Hive in HIVE-20019. However, new usage has been added since, despite attempts to ban this (bannedDependencies). I'm removing all usage again, and add another way to ban using it (restrictImports). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-24213) Incorrect exception in the Merge MapJoinTask into its child MapRedTask optimizer
Zoltan Matyus created HIVE-24213: Summary: Incorrect exception in the Merge MapJoinTask into its child MapRedTask optimizer Key: HIVE-24213 URL: https://issues.apache.org/jira/browse/HIVE-24213 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 4.0.0 Reporter: Zoltan Matyus Assignee: Zoltan Matyus The {{CommonJoinTaskDispatcher#mergeMapJoinTaskIntoItsChildMapRedTask}} method throws a {{SemanticException}} if the number of {{FileSinkOperator}}s it finds is not exactly 1. The exception is valid if zero operators are found, but there can be valid use cases where multiple FileSinkOperators exist. Example: the MapJoin and it child are used in a common table expression, which is used for multiple inserts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22767) beeline doesn't parse semicolons in comments properly
Zoltan Matyus created HIVE-22767: Summary: beeline doesn't parse semicolons in comments properly Key: HIVE-22767 URL: https://issues.apache.org/jira/browse/HIVE-22767 Project: Hive Issue Type: Bug Components: Beeline Reporter: Zoltan Matyus HIVE-12646 fixed the handling of semicolons in quoted strings, but leaves the problem of semicolons in comments. E.g. with beeline connected to any database... this works: {code:sql}select 1; select /* */ 2; select /* */ 3;{code} this doesn't work: {code:sql}select 1; select /* ; */ 2; select /* ; */ 3;{code} This has been fixed and reintroduced before (possibly multiple times). Ideally, there should be a single utility method somewhere to separate comments, strings and commands -- with the proper testing in place (q files). However, I'm trying to make this fix back-portable, so a light touch is needed. I'm focusing on beeline for now, and only writing (very thorough) unit tests, as I cannot exclude any new q files from TestCliDriver (which would break, since it's using a different parsing method). P.S. excerpt of the error message: {noformat} 0: jdbc:hive2://...> select 1; select /* ; */ 2; select /* ; */ 3; INFO : Compiling command(queryId=...): select 1 INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:int, comment:null)], properties:null) INFO : Completed compiling command(queryId=...); Time taken: 0.38 seconds INFO : Executing command(queryId=...): select 1 INFO : Completed executing command(queryId=...); Time taken: 0.004 seconds INFO : OK +--+ | _c0 | +--+ | 1| +--+ 1 row selected (2.007 seconds) INFO : Compiling command(queryId=...): select /* ERROR : FAILED: ParseException line 1:9 cannot recognize input near '' '' '' in select clause org.apache.hadoop.hive.ql.parse.ParseException: line 1:9 cannot recognize input near '' '' '' in select clause at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:233) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:79) at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:72) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:598) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1505) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1452) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1447) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at ... {noformat} Similarly, the following query also fails: {code:sql}select /* ' */ 1; select /* ' */ 2;{code} I suspect line comments are also not handled properly but I cannot reproduce this in interactive beeline... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22236) Fail to create View selecting View containing NOT IN subquery
Zoltan Matyus created HIVE-22236: Summary: Fail to create View selecting View containing NOT IN subquery Key: HIVE-22236 URL: https://issues.apache.org/jira/browse/HIVE-22236 Project: Hive Issue Type: Bug Reporter: Zoltan Matyus Assignee: Zoltan Matyus * Given a complicated view with a select statement that has subquery containing "{{NOT IN}}" * Hive fails to create a simple view as {{SELECT * FROM complicated_view}} * (with CBO disabled). The unparse replacements of the complicated view will be applied to the text of the simple view, resulting in {{IllegalArgumentException: replace: range invalid}} exceptions from {{org.antlr.runtime.TokenRewriteStream.replace}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-21584) Java 11 preparation: system class loader is not URLClassLoader
Zoltan Matyus created HIVE-21584: Summary: Java 11 preparation: system class loader is not URLClassLoader Key: HIVE-21584 URL: https://issues.apache.org/jira/browse/HIVE-21584 Project: Hive Issue Type: Task Components: Hive Affects Versions: 4.0.0 Reporter: Zoltan Matyus Assignee: Zoltan Matyus Currently, Hive assumes that the system class loader is instance of {{URLClassLoader}}. In Java 11 this is not the case. There are a few (unresolved) JIRAs about specific occurrences of {{URLClassLoader}} (e.g. [HIVE-21237|https://issues.apache.org/jira/browse/HIVE-21237], [HIVE-17909|https://issues.apache.org/jira/browse/HIVE-17909]), but no _"remove all occurrences"_. Also I couldn't find umbrella "Java 11 upgrade" JIRA. This ticket is to remove all unconditional casts of any random class loader to {{URLClassLoader}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)