[jira] [Updated] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-15444: Attachment: HIVE-15444.2.patch > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch, HIVE-15444.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > {code} > {code} > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > {code} > {code} > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-15444: Description: {code} hive> set tez.queue.name; tez.queue.name is undefined hive> set tez.queue.name=HQ_OLPS; hive> set tez.queue.name; tez.queue.name=HQ_OLPS {code} {code} hive> insert into abc values(2,2); Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1481877998678_0011) -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 100 0 0 -- VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s -- Loading data to table default.abc OK Time taken: 19.983 seconds {code} {code} hive> set tez.queue.name; tez.queue.name is undefined hive> set hive.execution.engine; hive.execution.engine=tez {code} was: hive> set tez.queue.name; tez.queue.name is undefined hive> set tez.queue.name=HQ_OLPS; hive> set tez.queue.name; tez.queue.name=HQ_OLPS hive> insert into abc values(2,2); Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1481877998678_0011) -- VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 100 0 0 -- VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s -- Loading data to table default.abc OK Time taken: 19.983 seconds hive> set tez.queue.name; tez.queue.name is undefined hive> set hive.execution.engine; hive.execution.engine=tez > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > {code} > {code} > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > {code} > {code} > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254762#comment-17254762 ] Oleksiy Sayankin commented on HIVE-15444: - Created https://github.com/apache/hive/pull/1815 > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-15444: -- Labels: pull-request-available (was: ) > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?focusedWorklogId=528325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528325 ] ASF GitHub Bot logged work on HIVE-15444: - Author: ASF GitHub Bot Created on: 25/Dec/20 07:45 Start Date: 25/Dec/20 07:45 Worklog Time Spent: 10m Work Description: oleksiy-sayankin opened a new pull request #1815: URL: https://github.com/apache/hive/pull/1815 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528325) Remaining Estimate: 0h Time Spent: 10m > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin reassigned HIVE-15444: --- Assignee: Oleksiy Sayankin (was: Hui Fei) > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.1, 2.2.0 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch > > > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-15444) tez.queue.name is invalid after tez job running on CLI
[ https://issues.apache.org/jira/browse/HIVE-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksiy Sayankin updated HIVE-15444: Status: In Progress (was: Patch Available) > tez.queue.name is invalid after tez job running on CLI > -- > > Key: HIVE-15444 > URL: https://issues.apache.org/jira/browse/HIVE-15444 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0, 2.1.1 >Reporter: Hui Fei >Assignee: Oleksiy Sayankin >Priority: Major > Fix For: 3.2.0 > > Attachments: HIVE-15444.1.patch > > > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set tez.queue.name=HQ_OLPS; > hive> set tez.queue.name; > tez.queue.name=HQ_OLPS > hive> insert into abc values(2,2); > Query ID = hadoop_20161216181208_6c382e49-ac4a-4f52-ba1e-3ed962733fc1 > Total jobs = 1 > Launching Job 1 out of 1 > Status: Running (Executing on YARN cluster with App id > application_1481877998678_0011) > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 6.57 s > -- > Loading data to table default.abc > OK > Time taken: 19.983 seconds > hive> set tez.queue.name; > tez.queue.name is undefined > hive> set hive.execution.engine; > hive.execution.engine=tez -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24305) avro decimal schema is not properly populating scale/precision if value is enclosed in quote
[ https://issues.apache.org/jira/browse/HIVE-24305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R resolved HIVE-24305. --- Fix Version/s: 4.0.0 Resolution: Fixed > avro decimal schema is not properly populating scale/precision if value is > enclosed in quote > > > Key: HIVE-24305 > URL: https://issues.apache.org/jira/browse/HIVE-24305 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {code:java} > CREATE TABLE test_quoted_scale_precision STORED AS AVRO TBLPROPERTIES > ('avro.schema.literal'='{"type":"record","name":"DecimalTest","namespace":"com.example.test","fields":[{"name":"Decimal24_6","type":["null",{"type":"bytes","logicalType":"decimal","precision":24,"scale":"6"}]}]}'); > > desc test_quoted_scale_precision; > // current output > decimal24_6 decimal(24,0) > // expected output > decimal24_6 decimal(24,6){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24305) avro decimal schema is not properly populating scale/precision if value is enclosed in quote
[ https://issues.apache.org/jira/browse/HIVE-24305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254737#comment-17254737 ] Naresh P R commented on HIVE-24305: --- Thanks for the review & commit [~lpinter]. > avro decimal schema is not properly populating scale/precision if value is > enclosed in quote > > > Key: HIVE-24305 > URL: https://issues.apache.org/jira/browse/HIVE-24305 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Assignee: Naresh P R >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {code:java} > CREATE TABLE test_quoted_scale_precision STORED AS AVRO TBLPROPERTIES > ('avro.schema.literal'='{"type":"record","name":"DecimalTest","namespace":"com.example.test","fields":[{"name":"Decimal24_6","type":["null",{"type":"bytes","logicalType":"decimal","precision":24,"scale":"6"}]}]}'); > > desc test_quoted_scale_precision; > // current output > decimal24_6 decimal(24,0) > // expected output > decimal24_6 decimal(24,6){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-15820) comment at the head of beeline -e
[ https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-15820: -- Labels: patch pull-request-available (was: patch) > comment at the head of beeline -e > - > > Key: HIVE-15820 > URL: https://issues.apache.org/jira/browse/HIVE-15820 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.2.1, 2.1.1 >Reporter: muxin >Assignee: muxin >Priority: Major > Labels: patch, pull-request-available > Attachments: HIVE-15820.patch > > Time Spent: 10m > Remaining Estimate: 0h > > $ beeline -u jdbc:hive2://localhost:1 -n test -e " > > --asdfasdfasdfasdf > > select * from test_table; > > " > expected result of the above command should be all rows of test_table(same as > run in beeline interactive mode),but it does not output anything. > the cause is that -e option will read commands as one string, and in method > dispatch(String line) it calls function isComment(String line) in the first, > which using > 'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' > to regard commands as a comment. > two ways can be considered to fix this problem: > 1. in method initArgs(String[] args), split command by '\n' into command list > before dispatch when cl.getOptionValues('e') != null > 2. in method dispatch(String line), remove comments using this: > static String removeComments(String line) { > if (line == null || line.isEmpty()) { > return line; > } > StringBuilder builder = new StringBuilder(); > int escape = -1; > for (int index = 0; index < line.length(); index++) { > if (index < line.length() - 1 && line.charAt(index) == > line.charAt(index + 1)) { > if (escape == -1 && line.charAt(index) == '-') { > //find \n as the end of comment > index = line.indexOf('\n',index+1); > //there is no sql after this comment,so just break out > if (-1==index){ > break; > } > } > } > char letter = line.charAt(index); > if (letter == escape) { > escape = -1; // Turn escape off. > } else if (escape == -1 && (letter == '\'' || letter == '"')) { > escape = letter; // Turn escape on. > } > builder.append(letter); > } > return builder.toString(); > } > the second way can be a general solution to remove all comments start with > '--' in a sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-15820) comment at the head of beeline -e
[ https://issues.apache.org/jira/browse/HIVE-15820?focusedWorklogId=528297&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528297 ] ASF GitHub Bot logged work on HIVE-15820: - Author: ASF GitHub Bot Created on: 25/Dec/20 03:04 Start Date: 25/Dec/20 03:04 Worklog Time Spent: 10m Work Description: ujc714 opened a new pull request #1814: URL: https://github.com/apache/hive/pull/1814 ### What changes were proposed in this pull request? 1) Don't check if a line is a comment in Beeline.dispatch(). Instead, remove the comments from the line. 2) Replace removeComments(String, int[]) with removeComments(String) in Commands.handleMultiLineCmd(). ### Why are the changes needed? 1) The queries in '-e' parameter is passed to Beeline.dispatch() as a single line although there could be multiple lines. If the first line is a comment, the rest lines are ignored. We should pass the query strings to Commands.execute(). 2) HiveStringUtils.removeComments(String, int[]) is used for a single line. In this method. If we use it to check a multiple line string and there is one comment line, the lines after this comment line will be discarded. In fact, HiveStringUtils.removeComments(String) splits a multiple line string to several single line strings and calls HiveStringUtils.removeComments(String, int[]) to process each single line. So HiveStringUtils.removeComments(String) is what we should use in Commands.handleMultiLineCmd(). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? 1) org.apache.hive.beeline.testLinesEndingWithComments is used to test HiveStringUtils.removeComments(String). 2) org.apache.hive.beeline.cli.testSqlFromCmdWithComments* are used to test queries passed via '-e' option. And testSqlFromCmdWithComments2 is used to test the query after a comment line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528297) Remaining Estimate: 0h Time Spent: 10m > comment at the head of beeline -e > - > > Key: HIVE-15820 > URL: https://issues.apache.org/jira/browse/HIVE-15820 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.2.1, 2.1.1 >Reporter: muxin >Assignee: muxin >Priority: Major > Labels: patch > Attachments: HIVE-15820.patch > > Time Spent: 10m > Remaining Estimate: 0h > > $ beeline -u jdbc:hive2://localhost:1 -n test -e " > > --asdfasdfasdfasdf > > select * from test_table; > > " > expected result of the above command should be all rows of test_table(same as > run in beeline interactive mode),but it does not output anything. > the cause is that -e option will read commands as one string, and in method > dispatch(String line) it calls function isComment(String line) in the first, > which using > 'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' > to regard commands as a comment. > two ways can be considered to fix this problem: > 1. in method initArgs(String[] args), split command by '\n' into command list > before dispatch when cl.getOptionValues('e') != null > 2. in method dispatch(String line), remove comments using this: > static String removeComments(String line) { > if (line == null || line.isEmpty()) { > return line; > } > StringBuilder builder = new StringBuilder(); > int escape = -1; > for (int index = 0; index < line.length(); index++) { > if (index < line.length() - 1 && line.charAt(index) == > line.charAt(index + 1)) { > if (escape == -1 && line.charAt(index) == '-') { > //find \n as the end of comment > index = line.indexOf('\n',index+1); > //there is no sql after this comment,so just break out > if (-1==index){ > break; > } > } > } > char letter = line.charAt(index); > if (letter == escape) { > escape = -1; // Turn escape off. > } else if (escape == -1 && (letter == '\'' || letter == '"')) { > escape = letter; // Turn escape on. > } > builder.append(letter); > } > return builder.toString(); > } > the second way can be a general solution to remove all comments start with > '--' in a sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=528293&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528293 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 25/Dec/20 02:21 Start Date: 25/Dec/20 02:21 Worklog Time Spent: 10m Work Description: sunchao commented on pull request #1806: URL: https://github.com/apache/hive/pull/1806#issuecomment-751149489 @iemejia what do you think? do you still plan to get this in 2.3.8? I'm hoping that we can catch the Spark 3.1.0 release train. @wangyum curious if you have verified whether [this case](https://github.com/apache/spark/pull/26804#issuecomment-714729078) is resolved with Hive 2.3.8. I plan to check this by myself too before starting a new vote. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528293) Time Spent: 7h 20m (was: 7h 10m) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-15820) comment at the head of beeline -e
[ https://issues.apache.org/jira/browse/HIVE-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254363#comment-17254363 ] Robbie Zhang edited comment on HIVE-15820 at 12/25/20, 1:38 AM: HIVE-16935 introduces HiveStringUtils.removeComments() into Commands.java but this issue still exists. There are two problems: # Beeline.isComment() checks a single line but the option '-e' will pass multiple lines as a single line to Beeline.dispatch(). If the first line starts with '–' or ‘#’, the rest lines are considered as comments so they won't be passed to Commands.execute() at all. # HiveStringUtils.removeComments(String, int[]) is used for a single line. It checks if this line starts with '--' or '#'. If it is, an empty string is returned immediately. In fact, for multiple lines, we should use HiveStringUtils.removeComments(String). I'll provide a patch later. was (Author: robbie): HIVE-16935 introduces HiveStringUtils.removeComments() into Commands.java but this issue still exists. There are two problems: # Beeline.isComment() doesn't work properly on multiple lines. If the first line starts with '–' or ‘#’, the rest lines are considered as comments so they won't be passed to Commands.execute() at all. # HiveStringUtils.removeComments(String, int[]) is used for a single line. It checks if this line starts with '--' or '#'. If it is, an empty string is returned immediately. In fact, for multiple lines, we should use HiveStringUtils.removeComments(String). I'll provide a patch later. > comment at the head of beeline -e > - > > Key: HIVE-15820 > URL: https://issues.apache.org/jira/browse/HIVE-15820 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.2.1, 2.1.1 >Reporter: muxin >Assignee: muxin >Priority: Major > Labels: patch > Attachments: HIVE-15820.patch > > > $ beeline -u jdbc:hive2://localhost:1 -n test -e " > > --asdfasdfasdfasdf > > select * from test_table; > > " > expected result of the above command should be all rows of test_table(same as > run in beeline interactive mode),but it does not output anything. > the cause is that -e option will read commands as one string, and in method > dispatch(String line) it calls function isComment(String line) in the first, > which using > 'lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--")' > to regard commands as a comment. > two ways can be considered to fix this problem: > 1. in method initArgs(String[] args), split command by '\n' into command list > before dispatch when cl.getOptionValues('e') != null > 2. in method dispatch(String line), remove comments using this: > static String removeComments(String line) { > if (line == null || line.isEmpty()) { > return line; > } > StringBuilder builder = new StringBuilder(); > int escape = -1; > for (int index = 0; index < line.length(); index++) { > if (index < line.length() - 1 && line.charAt(index) == > line.charAt(index + 1)) { > if (escape == -1 && line.charAt(index) == '-') { > //find \n as the end of comment > index = line.indexOf('\n',index+1); > //there is no sql after this comment,so just break out > if (-1==index){ > break; > } > } > } > char letter = line.charAt(index); > if (letter == escape) { > escape = -1; // Turn escape off. > } else if (escape == -1 && (letter == '\'' || letter == '"')) { > escape = letter; // Turn escape on. > } > builder.append(letter); > } > return builder.toString(); > } > the second way can be a general solution to remove all comments start with > '--' in a sql -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.0
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=528272&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528272 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 25/Dec/20 00:46 Start Date: 25/Dec/20 00:46 Worklog Time Spent: 10m Work Description: wangyum commented on pull request #1742: URL: https://github.com/apache/hive/pull/1742#issuecomment-751136951 Any update? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528272) Time Spent: 40m (was: 0.5h) > Upgrade Hadoop to 3.3.0 > --- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=528271&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528271 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 25/Dec/20 00:45 Start Date: 25/Dec/20 00:45 Worklog Time Spent: 10m Work Description: wangyum commented on pull request #1806: URL: https://github.com/apache/hive/pull/1806#issuecomment-751136805 @sunchao @iemejia Could we release Hive 2.3.8 first? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528271) Time Spent: 7h 10m (was: 7h) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24566) Add Parquet Stats Optimization
[ https://issues.apache.org/jira/browse/HIVE-24566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254633#comment-17254633 ] Jesus Camacho Rodriguez commented on HIVE-24566: [~belugabehr], yes, I think this approach could potentially improve performance for such queries. I guess you referred to 'single multi-threaded processor' to avoid launching any jobs to compute these queries. For tables with a large number of files, computing from metadata even if jobs are launched, would still be a useful optimization. > Add Parquet Stats Optimization > --- > > Key: HIVE-24566 > URL: https://issues.apache.org/jira/browse/HIVE-24566 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Priority: Major > > Parquet files store min/max/count data in foot metadata. > When a query is submitted to a Parquet table, and stats are not available, > Hive should launch a single multi-threaded processor that simply reads the > meta data of each Parquet file instead of walking through every single record > in the table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254632#comment-17254632 ] Stamatis Zampetakis commented on HIVE-24569: I believe relying on {{LlapRoutingAppenderPurgePolicy}} is very prone to errors and hard to reproduce bugs due to race conditions. I think the safest way to move forward and avoid a lot of this small fixes in the future is to rely on a less precise but more robust {{PurgePolicy}} such as the {{IdlePurgePolicy}}. I tested a config with {{IdlePurgePolicy}} on a test cluster and the problem no longer appears. Possibly as part of this fix I will try to get rid of {{LlapRoutingAppenderPurgePolicy}} and related classes also for HS2 and not only for LLAP. > LLAP daemon leaks file descriptors/log4j appenders > -- > > Key: HIVE-24569 > URL: https://issues.apache.org/jira/browse/HIVE-24569 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: llap-appender-gc-roots.png > > > With HIVE-9756 query logs in LLAP are directed to different files (file per > query) using a Log4j2 routing appender. Without a purge policy in place, > appenders are created dynamically by the routing appender, one for each > query, and remain in memory forever. The dynamic appenders write to files so > each appender holds to a file descriptor. > Further work HIVE-14224 has mitigated the issue by introducing a custom > purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic > appenders (and closes the respective files) when the query is completed > (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion). > > However, in the presence of multiple threads appending to the logs there are > race conditions. In an internal Hive cluster the number of file descriptors > started going up approx one descriptor leaking per query. After some > debugging it turns out that one thread (running the > QueryTracker#handleLogOnQueryCompletion) signals that the query has finished > and thus the purge policy should get rid of the respective appender (and > close the file) while another (Task-Executor-0) attempts to append another > log message for the same query. The initial appender is closed after the > request from the query tracker but a new one is created to accomodate the > message from the task executor and the latter is never removed thus creating > a leak. > Similar leaks have been identified and fixed for HS2 with the most similar > one being that described > [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041]. > > The problem relies on the timing of threads so it may not manifestate in all > versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via > lsof (or other similar command) with the following output: > {noformat} > # 1494391 is the PID of the LLAP daemon process > ls -ltr /proc/1494391/fd > ... > lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567
[jira] [Commented] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254631#comment-17254631 ] Stamatis Zampetakis commented on HIVE-24569: HIVE-22127 also concerns HS2 and most likely it was solved by HIVE-17128. > LLAP daemon leaks file descriptors/log4j appenders > -- > > Key: HIVE-24569 > URL: https://issues.apache.org/jira/browse/HIVE-24569 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: llap-appender-gc-roots.png > > > With HIVE-9756 query logs in LLAP are directed to different files (file per > query) using a Log4j2 routing appender. Without a purge policy in place, > appenders are created dynamically by the routing appender, one for each > query, and remain in memory forever. The dynamic appenders write to files so > each appender holds to a file descriptor. > Further work HIVE-14224 has mitigated the issue by introducing a custom > purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic > appenders (and closes the respective files) when the query is completed > (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion). > > However, in the presence of multiple threads appending to the logs there are > race conditions. In an internal Hive cluster the number of file descriptors > started going up approx one descriptor leaking per query. After some > debugging it turns out that one thread (running the > QueryTracker#handleLogOnQueryCompletion) signals that the query has finished > and thus the purge policy should get rid of the respective appender (and > close the file) while another (Task-Executor-0) attempts to append another > log message for the same query. The initial appender is closed after the > request from the query tracker but a new one is created to accomodate the > message from the task executor and the latter is never removed thus creating > a leak. > Similar leaks have been identified and fixed for HS2 with the most similar > one being that described > [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041]. > > The problem relies on the timing of threads so it may not manifestate in all > versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via > lsof (or other similar command) with the following output: > {noformat} > # 1494391 is the PID of the LLAP daemon process > ls -ltr /proc/1494391/fd > ... > lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 979 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121837_e80c0024-f6bc-4b3c-85ed-5c0c85c55787-dag_1608659125567_0008_199.log > {noforma
[jira] [Commented] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254630#comment-17254630 ] Stamatis Zampetakis commented on HIVE-24569: The fix for HIVE-17128 deals with more or less the same problem but in HS2 process. > LLAP daemon leaks file descriptors/log4j appenders > -- > > Key: HIVE-24569 > URL: https://issues.apache.org/jira/browse/HIVE-24569 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: llap-appender-gc-roots.png > > > With HIVE-9756 query logs in LLAP are directed to different files (file per > query) using a Log4j2 routing appender. Without a purge policy in place, > appenders are created dynamically by the routing appender, one for each > query, and remain in memory forever. The dynamic appenders write to files so > each appender holds to a file descriptor. > Further work HIVE-14224 has mitigated the issue by introducing a custom > purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic > appenders (and closes the respective files) when the query is completed > (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion). > > However, in the presence of multiple threads appending to the logs there are > race conditions. In an internal Hive cluster the number of file descriptors > started going up approx one descriptor leaking per query. After some > debugging it turns out that one thread (running the > QueryTracker#handleLogOnQueryCompletion) signals that the query has finished > and thus the purge policy should get rid of the respective appender (and > close the file) while another (Task-Executor-0) attempts to append another > log message for the same query. The initial appender is closed after the > request from the query tracker but a new one is created to accomodate the > message from the task executor and the latter is never removed thus creating > a leak. > Similar leaks have been identified and fixed for HS2 with the most similar > one being that described > [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041]. > > The problem relies on the timing of threads so it may not manifestate in all > versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via > lsof (or other similar command) with the following output: > {noformat} > # 1494391 is the PID of the LLAP daemon process > ls -ltr /proc/1494391/fd > ... > lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 979 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121837_e80c0024-f6bc-4b3c-85ed-5c0c85c55787-dag_1608659125567_0008_199.log
[jira] [Updated] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-24569: --- Description: With HIVE-9756 query logs in LLAP are directed to different files (file per query) using a Log4j2 routing appender. Without a purge policy in place, appenders are created dynamically by the routing appender, one for each query, and remain in memory forever. The dynamic appenders write to files so each appender holds to a file descriptor. Further work HIVE-14224 has mitigated the issue by introducing a custom purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic appenders (and closes the respective files) when the query is completed (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion). However, in the presence of multiple threads appending to the logs there are race conditions. In an internal Hive cluster the number of file descriptors started going up approx one descriptor leaking per query. After some debugging it turns out that one thread (running the QueryTracker#handleLogOnQueryCompletion) signals that the query has finished and thus the purge policy should get rid of the respective appender (and close the file) while another (Task-Executor-0) attempts to append another log message for the same query. The initial appender is closed after the request from the query tracker but a new one is created to accomodate the message from the task executor and the latter is never removed thus creating a leak. Similar leaks have been identified and fixed for HS2 with the most similar one being that described [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041]. The problem relies on the timing of threads so it may not manifestate in all versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via lsof (or other similar command) with the following output: {noformat} # 1494391 is the PID of the LLAP daemon process ls -ltr /proc/1494391/fd ... lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log lrwx-- 1 hive hadoop 64 Dec 24 12:08 979 -> /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121837_e80c0024-f6bc-4b3c-85ed-5c0c85c55787-dag_1608659125567_0008_199.log {noformat} or in the heap dump with many appenders (in my case {{LlapWrappedAppender}}) holding indirectly open file descriptors: !llap-appender-gc-roots.png! was: With HIVE-9756 query logs in LLAP are directed to different files (file per query) using a Log4j2 routing appender. Without a purge policy in place, appenders are created dynamically by the routing appender, one for each query, and remain in memory forever. The dynamic appenders write to files so each appender holds to a file descriptor. Further work HIVE-14224 has mitigated the issue by introducing a custom purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic appenders (and closes the respective files) when the query is completed (org.apa
[jira] [Assigned] (HIVE-24569) LLAP daemon leaks file descriptors/log4j appenders
[ https://issues.apache.org/jira/browse/HIVE-24569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-24569: -- > LLAP daemon leaks file descriptors/log4j appenders > -- > > Key: HIVE-24569 > URL: https://issues.apache.org/jira/browse/HIVE-24569 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.2.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > Attachments: llap-appender-gc-roots.png > > > With HIVE-9756 query logs in LLAP are directed to different files (file per > query) using a Log4j2 routing appender. Without a purge policy in place, > appenders are created dynamically by the routing appender, one for each > query, and remain in memory forever. The dynamic appenders write to files so > each appender holds to a file descriptor. > Further work HIVE-14224 has mitigated the issue by introducing a custom > purging policy (LlapRoutingAppenderPurgePolicy) which deletes the dynamic > appenders (and closes the respective files) when the query is completed > (org.apache.hadoop.hive.llap.daemon.impl.QueryTracker#handleLogOnQueryCompletion). > > However, in the presence of multiple threads appending to the logs there are > race conditions. In an internal Hive cluster the number of file descriptors > started going up approx one descriptor leaking per query. After some > debugging it turns out that one thread (running the > QueryTracker#handleLogOnQueryCompletion) signals that the query has finished > and thus the purge policy should get rid of the respective appender (and > close the file) while another (Task-Executor-0) attempts to append another > log message for the same query. The initial appender is closed after the > request from the query tracker but a new one is created to accomodate the > message from the task executor and the latter is never removed thus creating > a leak. > Similar leaks have been identified and fixed for HS2 with the most similar > one being that described > [here|https://issues.apache.org/jira/browse/HIVE-22753?focusedCommentId=17021041&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17021041]. > > The problem relies on the timing of threads so it may not manifestate in all > versions between 2.2.0 and 4.0.0. Usually the leak can be seen either via > lsof (or other similar command) with the following output: > {noformat} > # 1494391 is the PID of the LLAP daemon process > ls -ltr /proc/1494391/fd > ... > lrwx-- 1 hive hadoop 64 Dec 24 12:08 978 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121724_66ce273d-54a9-4dcd-a9fb-20cb5691cef7-dag_1608659125567_0008_194.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 977 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121804_ce53eeb5-c73f-4999-b7a4-b4dd04d4e4de-dag_1608659125567_0008_197.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 974 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224122002_1693bd7d-2f0e-4673-a8d1-b7cb14a02204-dag_1608659125567_0008_204.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 989 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121909_6a56218f-06c7-4906-9907-4b6dd824b100-dag_1608659125567_0008_201.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 984 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121754_78ef49a0-bc23-478f-9a16-87fa25e7a287-dag_1608659125567_0008_196.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 983 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121855_e65b9ebf-b2ec-4159-9570-1904442b7048-dag_1608659125567_0008_200.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 981 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121818_e9051ae3-1316-46af-aabb-22c53ed2fda7-dag_1608659125567_0008_198.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 980 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121744_fcf37921-4351-4368-95ee-b5be2592d89a-dag_1608659125567_0008_195.log > lrwx-- 1 hive hadoop 64 Dec 24 12:08 979 -> > /hadoop/yarn/log/application_1608659125567_0006/container_e04_1608659125567_0006_01_02/hive_20201224121837_e80c0024-f6bc-4b3c-85ed-5c0c85c55787-dag_1608659125567_0008_199.log > {noformat} > or in the heap dump with many appenders (in my case {{LlapWrappedAppender}}) > holding indirectly open file descriptors
[jira] [Updated] (HIVE-21961) Update jetty version to 9.4.x
[ https://issues.apache.org/jira/browse/HIVE-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-21961: -- Labels: pull-request-available (was: ) > Update jetty version to 9.4.x > - > > Key: HIVE-21961 > URL: https://issues.apache.org/jira/browse/HIVE-21961 > Project: Hive > Issue Type: Task >Reporter: Oleksiy Sayankin >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Attachments: HIVE-21961.02.patch, HIVE-21961.03.patch, > HIVE-21961.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Update jetty version to 9.4.x -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21961) Update jetty version to 9.4.x
[ https://issues.apache.org/jira/browse/HIVE-21961?focusedWorklogId=528228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528228 ] ASF GitHub Bot logged work on HIVE-21961: - Author: ASF GitHub Bot Created on: 24/Dec/20 17:58 Start Date: 24/Dec/20 17:58 Worklog Time Spent: 10m Work Description: wangyum opened a new pull request #1813: URL: https://github.com/apache/hive/pull/1813 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528228) Remaining Estimate: 0h Time Spent: 10m > Update jetty version to 9.4.x > - > > Key: HIVE-21961 > URL: https://issues.apache.org/jira/browse/HIVE-21961 > Project: Hive > Issue Type: Task >Reporter: Oleksiy Sayankin >Assignee: László Bodor >Priority: Major > Attachments: HIVE-21961.02.patch, HIVE-21961.03.patch, > HIVE-21961.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Update jetty version to 9.4.x -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24568) Fix guice compatibility issues
[ https://issues.apache.org/jira/browse/HIVE-24568?focusedWorklogId=528222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528222 ] ASF GitHub Bot logged work on HIVE-24568: - Author: ASF GitHub Bot Created on: 24/Dec/20 17:29 Start Date: 24/Dec/20 17:29 Worklog Time Spent: 10m Work Description: wangyum commented on pull request #1812: URL: https://github.com/apache/hive/pull/1812#issuecomment-750932073 May be we need to upgrade jetty: https://issues.apache.org/jira/browse/HIVE-21961 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528222) Time Spent: 50m (was: 40m) > Fix guice compatibility issues > -- > > Key: HIVE-24568 > URL: https://issues.apache.org/jira/browse/HIVE-24568 > Project: Hive > Issue Type: Improvement >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {noformat} > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; > » at > com.google.inject.multibindings.Multibinder.collectionOfProvidersOf(Multibinder.java:202) > » at > com.google.inject.multibindings.Multibinder$RealMultibinder.(Multibinder.java:283) > » at > com.google.inject.multibindings.Multibinder$RealMultibinder.(Multibinder.java:258) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21961) Update jetty version to 9.4.x
[ https://issues.apache.org/jira/browse/HIVE-21961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254621#comment-17254621 ] Yuming Wang commented on HIVE-21961: Any update? > Update jetty version to 9.4.x > - > > Key: HIVE-21961 > URL: https://issues.apache.org/jira/browse/HIVE-21961 > Project: Hive > Issue Type: Task >Reporter: Oleksiy Sayankin >Assignee: László Bodor >Priority: Major > Attachments: HIVE-21961.02.patch, HIVE-21961.03.patch, > HIVE-21961.patch > > > Update jetty version to 9.4.x -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed
[ https://issues.apache.org/jira/browse/HIVE-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254555#comment-17254555 ] Stamatis Zampetakis commented on HIVE-17128: Hi [~euigeun_chung], kind of late reply but where did you observe the leak, in HS2 or LLAP daemons? Did you log an issue finally? I just discovered today that there is a leak in LLAP so will open a jira soon. > Operation Logging leaks file descriptors as the log4j Appender is never closed > -- > > Key: HIVE-17128 > URL: https://issues.apache.org/jira/browse/HIVE-17128 > Project: Hive > Issue Type: Bug > Components: Logging >Reporter: Andrew Sherman >Assignee: Andrew Sherman >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-17128.1.patch, HIVE-17128.2.patch, > HIVE-17128.3.patch > > > [HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 > RoutingAppender to automatically output the log for each query into each > individual operation log file. As log4j does not know when a query is > finished it keeps the OutputStream in the Appender open even when the query > completes. The stream holds a file descriptor and so we leak file > descriptors. Note that we are already careful to close any streams reading > from the operation log file. > h2. Fix > To fix this we use a technique described in the comments of [LOG4J2-510] > which uses reflection to close the appender. The test in > TestOperationLoggingLayout will be extended to check that the Appender is > closed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-9490) [Parquet] Support Alter Table/Partition Concatenate
[ https://issues.apache.org/jira/browse/HIVE-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254484#comment-17254484 ] Vladimir Vilinski commented on HIVE-9490: - Nobody cares since one year? Seems, that Hadoop time is coming to an end. > [Parquet] Support Alter Table/Partition Concatenate > --- > > Key: HIVE-9490 > URL: https://issues.apache.org/jira/browse/HIVE-9490 > Project: Hive > Issue Type: Sub-task >Reporter: Dong Chen >Assignee: Dong Chen >Priority: Major > Attachments: HIVE-9490.patch-testcase > > > Parquet should support > {{ALTER TABLE table_name \[PARTITION (partition_key = 'partition_value')\] > CONCATENATE;}} > If the table or partition contains many small Parquet files, then the above > command will merge them into larger files. The merge should happen at row > group level thereby avoiding the overhead of decompressing and decoding the > data. > It is only supported by RCFiles or ORCFiles now. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-24568) Fix guice compatibility issues
[ https://issues.apache.org/jira/browse/HIVE-24568?focusedWorklogId=528040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528040 ] ASF GitHub Bot logged work on HIVE-24568: - Author: ASF GitHub Bot Created on: 24/Dec/20 08:26 Start Date: 24/Dec/20 08:26 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #1812: URL: https://github.com/apache/hive/pull/1812#issuecomment-750803705 avro-mapred depends on a different servlet version so maybe it is accidentally excluded. I updated #1635 with this change and it is now passing the itests locally, let's see if anything new appears. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528040) Time Spent: 40m (was: 0.5h) > Fix guice compatibility issues > -- > > Key: HIVE-24568 > URL: https://issues.apache.org/jira/browse/HIVE-24568 > Project: Hive > Issue Type: Improvement >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType; > » at > com.google.inject.multibindings.Multibinder.collectionOfProvidersOf(Multibinder.java:202) > » at > com.google.inject.multibindings.Multibinder$RealMultibinder.(Multibinder.java:283) > » at > com.google.inject.multibindings.Multibinder$RealMultibinder.(Multibinder.java:258) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HIVE-21737) Upgrade Avro to version 1.10.1
[ https://issues.apache.org/jira/browse/HIVE-21737?focusedWorklogId=528038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-528038 ] ASF GitHub Bot logged work on HIVE-21737: - Author: ASF GitHub Bot Created on: 24/Dec/20 08:24 Start Date: 24/Dec/20 08:24 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #1635: URL: https://github.com/apache/hive/pull/1635#issuecomment-750803292 I included @wangyum fix into this patch let's see if it works now :crossed_fingers: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 528038) Time Spent: 7h (was: 6h 50m) > Upgrade Avro to version 1.10.1 > -- > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HIVE-21737-Make-Avro-use-in-Hive-compatible-with-Avr.patch > > Time Spent: 7h > Remaining Estimate: 0h > > Avro >= 1.9.x bring a lot of fixes including a leaner version of Avro without > Jackson in the public API and Guava as a dependency. Worth the update. -- This message was sent by Atlassian Jira (v8.3.4#803005)