[jira] [Commented] (HIVE-15850) Proper handling of timezone in Druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861178#comment-15861178 ] ASF GitHub Bot commented on HIVE-15850: --- Github user asfgit closed the pull request at: https://github.com/apache/hive/pull/145 > Proper handling of timezone in Druid storage handler > > > Key: HIVE-15850 > URL: https://issues.apache.org/jira/browse/HIVE-15850 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-15850.patch > > > We need to make sure that filters on timestamp are passed to Druid with > correct timezone. > After CALCITE-1617, Calcite will generate a Druid query with intervals > without timezone specification. In Druid, these intervals will be assumed to > be in UTC (if Druid is running in UTC, which is currently the > recommendation). However, in Hive, those intervals should be assumed to be in > the user timezone. Thus, we should respect Hive semantics and include the > user timezone in the intervals passed to Druid. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15850) Proper handling of timezone in Druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15850: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Pushed to master, thanks [~bslim] and [~ashutoshc]! > Proper handling of timezone in Druid storage handler > > > Key: HIVE-15850 > URL: https://issues.apache.org/jira/browse/HIVE-15850 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-15850.patch > > > We need to make sure that filters on timestamp are passed to Druid with > correct timezone. > After CALCITE-1617, Calcite will generate a Druid query with intervals > without timezone specification. In Druid, these intervals will be assumed to > be in UTC (if Druid is running in UTC, which is currently the > recommendation). However, in Hive, those intervals should be assumed to be in > the user timezone. Thus, we should respect Hive semantics and include the > user timezone in the intervals passed to Druid. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861176#comment-15861176 ] Hive QA commented on HIVE-15863: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12852031/HIVE-15863.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10244 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3487/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3487/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3487/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12852031 - PreCommit-HIVE-Build > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive
[ https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simanchal Das updated HIVE-15229: - Status: Patch Available (was: Open) > 'like any' and 'like all' operators in hive > --- > > Key: HIVE-15229 > URL: https://issues.apache.org/jira/browse/HIVE-15229 > Project: Hive > Issue Type: New Feature > Components: Operators >Reporter: Simanchal Das >Assignee: Simanchal Das >Priority: Minor > Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch > > > In Teradata 'like any' and 'like all' operators are mostly used when we are > matching a text field with numbers of patterns. > 'like any' and 'like all' operator are equivalents of multiple like operator > like example below. > {noformat} > --like any > select col1 from table1 where col2 like any ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like condition > select col1 from table1 where col2 like '%accountant%' or col2 like > '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like > '%insurance%' ; > --like all > select col1 from table1 where col2 like all ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like operator > select col1 from table1 where col2 like '%accountant%' and col2 like > '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like > '%insurance%' ; > {noformat} > Problem statement: > Now a days so many data warehouse projects are being migrated from Teradata > to Hive. > Always Data engineer and Business analyst are searching for these two > operator. > If we introduce these two operator in hive then so many scripts will be > migrated smoothly instead of converting these operators to multiple like > operators. > Result: > 1. 'LIKE ANY' operator return true if a text(column value) matches to any > pattern. > 2. 'LIKE ALL' operator return true if a text(column value) matches to all > patterns. > 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the > left hand side is NULL, but also if one of the pattern in the list is NULL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive
[ https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simanchal Das updated HIVE-15229: - Status: Open (was: Patch Available) > 'like any' and 'like all' operators in hive > --- > > Key: HIVE-15229 > URL: https://issues.apache.org/jira/browse/HIVE-15229 > Project: Hive > Issue Type: New Feature > Components: Operators >Reporter: Simanchal Das >Assignee: Simanchal Das >Priority: Minor > Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch > > > In Teradata 'like any' and 'like all' operators are mostly used when we are > matching a text field with numbers of patterns. > 'like any' and 'like all' operator are equivalents of multiple like operator > like example below. > {noformat} > --like any > select col1 from table1 where col2 like any ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like condition > select col1 from table1 where col2 like '%accountant%' or col2 like > '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like > '%insurance%' ; > --like all > select col1 from table1 where col2 like all ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like operator > select col1 from table1 where col2 like '%accountant%' and col2 like > '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like > '%insurance%' ; > {noformat} > Problem statement: > Now a days so many data warehouse projects are being migrated from Teradata > to Hive. > Always Data engineer and Business analyst are searching for these two > operator. > If we introduce these two operator in hive then so many scripts will be > migrated smoothly instead of converting these operators to multiple like > operators. > Result: > 1. 'LIKE ANY' operator return true if a text(column value) matches to any > pattern. > 2. 'LIKE ALL' operator return true if a text(column value) matches to all > patterns. > 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the > left hand side is NULL, but also if one of the pattern in the list is NULL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive
[ https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simanchal Das updated HIVE-15229: - Attachment: (was: HIVE-15229.2.patch) > 'like any' and 'like all' operators in hive > --- > > Key: HIVE-15229 > URL: https://issues.apache.org/jira/browse/HIVE-15229 > Project: Hive > Issue Type: New Feature > Components: Operators >Reporter: Simanchal Das >Assignee: Simanchal Das >Priority: Minor > Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch > > > In Teradata 'like any' and 'like all' operators are mostly used when we are > matching a text field with numbers of patterns. > 'like any' and 'like all' operator are equivalents of multiple like operator > like example below. > {noformat} > --like any > select col1 from table1 where col2 like any ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like condition > select col1 from table1 where col2 like '%accountant%' or col2 like > '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like > '%insurance%' ; > --like all > select col1 from table1 where col2 like all ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like operator > select col1 from table1 where col2 like '%accountant%' and col2 like > '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like > '%insurance%' ; > {noformat} > Problem statement: > Now a days so many data warehouse projects are being migrated from Teradata > to Hive. > Always Data engineer and Business analyst are searching for these two > operator. > If we introduce these two operator in hive then so many scripts will be > migrated smoothly instead of converting these operators to multiple like > operators. > Result: > 1. 'LIKE ANY' operator return true if a text(column value) matches to any > pattern. > 2. 'LIKE ALL' operator return true if a text(column value) matches to all > patterns. > 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the > left hand side is NULL, but also if one of the pattern in the list is NULL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15229) 'like any' and 'like all' operators in hive
[ https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simanchal Das updated HIVE-15229: - Attachment: HIVE-15229.2.patch > 'like any' and 'like all' operators in hive > --- > > Key: HIVE-15229 > URL: https://issues.apache.org/jira/browse/HIVE-15229 > Project: Hive > Issue Type: New Feature > Components: Operators >Reporter: Simanchal Das >Assignee: Simanchal Das >Priority: Minor > Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch > > > In Teradata 'like any' and 'like all' operators are mostly used when we are > matching a text field with numbers of patterns. > 'like any' and 'like all' operator are equivalents of multiple like operator > like example below. > {noformat} > --like any > select col1 from table1 where col2 like any ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like condition > select col1 from table1 where col2 like '%accountant%' or col2 like > '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like > '%insurance%' ; > --like all > select col1 from table1 where col2 like all ('%accountant%', '%accounting%', > '%retail%', '%bank%', '%insurance%'); > --Can be written using multiple like operator > select col1 from table1 where col2 like '%accountant%' and col2 like > '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like > '%insurance%' ; > {noformat} > Problem statement: > Now a days so many data warehouse projects are being migrated from Teradata > to Hive. > Always Data engineer and Business analyst are searching for these two > operator. > If we introduce these two operator in hive then so many scripts will be > migrated smoothly instead of converting these operators to multiple like > operators. > Result: > 1. 'LIKE ANY' operator return true if a text(column value) matches to any > pattern. > 2. 'LIKE ALL' operator return true if a text(column value) matches to all > patterns. > 3. 'LIKE ANY' and 'LIKE ALL' returns NULL not only if the expression on the > left hand side is NULL, but also if one of the pattern in the list is NULL. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Attachment: (was: HIVE-15863.patch) > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Status: Open (was: Patch Available) > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15860: -- Attachment: HIVE-15860.2.patch Patch v2 adds the check only when job is in STARTED state and we can't get the job info. I think it's better because it avoids checking the remote context every time the monitor runs. [~xuefuz] what do you think? > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch, HIVE-15860.2.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout
[ https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861114#comment-15861114 ] Xuefu Zhang edited comment on HIVE-15671 at 2/10/17 11:10 AM: -- Hi [~vanzin], to backtrack a little bit, I have a followup question about your comment. {quote} That's kinda hard to solve, because the server doesn't know which client connected until two things happen: first the driver has started, second the driver completed the SASL handshake to identify itself. A lot of things can go wrong in that time. There's already some code, IIRC, that fails the session if the spark-submit job dies with an error, but aside from that, it's kinda hard to do more. {quote} I was talking about server detecting a driver problem after it has connected back to the server. I'm wondering which timeout applies in case of a problem on the driver side, such as long GC, stall connection between the server and the driver, etc. It's kind of long if this timeout is also server.connect.timeout, which is increased to 10m in our case to accommodate for the busy cluster. To me it doesn't seem that such a timeout exist, in absence of a heartbeat mechanism. was (Author: xuefuz): Hi [~vanzin], to backtrack a little bit, I have a followup question about your comment. {quote} That's kinda hard to solve, because the server doesn't know which client connected until two things happen: first the driver has started, second the driver completed the SASL handshake to identify itself. A lot of things can go wrong in that time. There's already some code, IIRC, that fails the session if the spark-submit job dies with an error, but aside from that, it's kinda hard to do more. {code} I was talking about server detecting a driver problem after it has connected back to the server. I'm wondering which timeout applies in case of a problem on the driver side, such as long GC, stall connection between the server and the driver, etc. It's kind of long if this timeout is also server.connect.timeout, which is increased to 10m in our case to accommodate for the busy cluster. To me it doesn't seem that such a timeout exist, in absence of a heartbeat mechanism. > RPCServer.registerClient() erroneously uses server/client handshake timeout > for connection timeout > -- > > Key: HIVE-15671 > URL: https://issues.apache.org/jira/browse/HIVE-15671 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15671.1.patch, HIVE-15671.patch > > > {code} > /** >* Tells the RPC server to expect a connection from a new client. >* ... >*/ > public Future registerClient(final String clientId, String secret, > RpcDispatcher serverDispatcher) { > return registerClient(clientId, secret, serverDispatcher, > config.getServerConnectTimeoutMs()); > } > {code} > {{config.getServerConnectTimeoutMs()}} returns value for > *hive.spark.client.server.connect.timeout*, which is meant for timeout for > handshake between Hive client and remote Spark driver. Instead, the timeout > should be *hive.spark.client.connect.timeout*, which is for timeout for > remote Spark driver in connecting back to Hive client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout
[ https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861114#comment-15861114 ] Xuefu Zhang commented on HIVE-15671: Hi [~vanzin], to backtrack a little bit, I have a followup question about your comment. {quote} That's kinda hard to solve, because the server doesn't know which client connected until two things happen: first the driver has started, second the driver completed the SASL handshake to identify itself. A lot of things can go wrong in that time. There's already some code, IIRC, that fails the session if the spark-submit job dies with an error, but aside from that, it's kinda hard to do more. {code} I was talking about server detecting a driver problem after it has connected back to the server. I'm wondering which timeout applies in case of a problem on the driver side, such as long GC, stall connection between the server and the driver, etc. It's kind of long if this timeout is also server.connect.timeout, which is increased to 10m in our case to accommodate for the busy cluster. To me it doesn't seem that such a timeout exist, in absence of a heartbeat mechanism. > RPCServer.registerClient() erroneously uses server/client handshake timeout > for connection timeout > -- > > Key: HIVE-15671 > URL: https://issues.apache.org/jira/browse/HIVE-15671 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15671.1.patch, HIVE-15671.patch > > > {code} > /** >* Tells the RPC server to expect a connection from a new client. >* ... >*/ > public Future registerClient(final String clientId, String secret, > RpcDispatcher serverDispatcher) { > return registerClient(clientId, secret, serverDispatcher, > config.getServerConnectTimeoutMs()); > } > {code} > {{config.getServerConnectTimeoutMs()}} returns value for > *hive.spark.client.server.connect.timeout*, which is meant for timeout for > handshake between Hive client and remote Spark driver. Instead, the timeout > should be *hive.spark.client.connect.timeout*, which is for timeout for > remote Spark driver in connecting back to Hive client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout
[ https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861113#comment-15861113 ] Xuefu Zhang commented on HIVE-15671: Thanks, [~lirui], I will try to reproduce, though it might be hard to do so. > RPCServer.registerClient() erroneously uses server/client handshake timeout > for connection timeout > -- > > Key: HIVE-15671 > URL: https://issues.apache.org/jira/browse/HIVE-15671 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.1.0 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > Attachments: HIVE-15671.1.patch, HIVE-15671.patch > > > {code} > /** >* Tells the RPC server to expect a connection from a new client. >* ... >*/ > public Future registerClient(final String clientId, String secret, > RpcDispatcher serverDispatcher) { > return registerClient(clientId, secret, serverDispatcher, > config.getServerConnectTimeoutMs()); > } > {code} > {{config.getServerConnectTimeoutMs()}} returns value for > *hive.spark.client.server.connect.timeout*, which is meant for timeout for > handshake between Hive client and remote Spark driver. Instead, the timeout > should be *hive.spark.client.connect.timeout*, which is for timeout for > remote Spark driver in connecting back to Hive client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15850) Proper handling of timezone in Druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1586#comment-1586 ] Hive QA commented on HIVE-15850: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851860/HIVE-15850.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10230 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=97) [parallel_join1.q,union27.q,union12.q,groupby7_map_multi_single_reducer.q,varchar_join1.q,join7.q,join_reorder4.q,skewjoinopt2.q,bucketsortoptimize_insert_2.q,smb_mapjoin_17.q,script_env_var1.q,groupby7_map.q,groupby3.q,bucketsortoptimize_insert_8.q,union20.q] org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3486/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3486/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3486/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851860 - PreCommit-HIVE-Build > Proper handling of timezone in Druid storage handler > > > Key: HIVE-15850 > URL: https://issues.apache.org/jira/browse/HIVE-15850 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Attachments: HIVE-15850.patch > > > We need to make sure that filters on timestamp are passed to Druid with > correct timezone. > After CALCITE-1617, Calcite will generate a Druid query with intervals > without timezone specification. In Druid, these intervals will be assumed to > be in UTC (if Druid is running in UTC, which is currently the > recommendation). However, in Hive, those intervals should be assumed to be in > the user timezone. Thus, we should respect Hive semantics and include the > user timezone in the intervals passed to Druid. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15730) JDBC should use SQLFeatureNotSupportedException where appropriate instead of SQLException
[ https://issues.apache.org/jira/browse/HIVE-15730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861084#comment-15861084 ] Sankar Hariappan commented on HIVE-15730: - Thanks [~vgumashta] for the commit! > JDBC should use SQLFeatureNotSupportedException where appropriate instead of > SQLException > -- > > Key: HIVE-15730 > URL: https://issues.apache.org/jira/browse/HIVE-15730 > Project: Hive > Issue Type: Bug > Components: JDBC >Reporter: Thejas M Nair >Assignee: Sankar Hariappan > Fix For: 2.2.0 > > Attachments: HIVE-15730.01.patch > > > An example is HiveBaseResultSet.rowDeleted. It throws SQLException("Method > not supported") instead of SQLFeatureNotSupportedException. > For that optional method, the use of SQLFeatureNotSupportedException is more > appropriate. > See > http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#rowDeleted() > http://docs.oracle.com/javase/7/docs/api/java/sql/SQLFeatureNotSupportedException.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15858) Beeline ^C doesn't close the session
[ https://issues.apache.org/jira/browse/HIVE-15858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861073#comment-15861073 ] Sankar Hariappan commented on HIVE-15858: - Thanks for the review [~xuefuz]! I shall see the test results right below your comment. Seems few failures not related to the patch. > Beeline ^C doesn't close the session > > > Key: HIVE-15858 > URL: https://issues.apache.org/jira/browse/HIVE-15858 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Attachments: HIVE-15858.01.patch > > > When open multiple connections through Beeline to Hiveserver2 and if tries to > close the client using !quit or ^C command, it looks like all the > connections/sessions are not getting closed. > !quit seems to close the current active connection but fails to close other > open sessions. > ^C doesn't close any session. > This behaviour is noticed only with the HTTP mode of transport > (hive.server2.transport.mode=http). In case of BINARY mode, server triggers > the close session when a tcp connection is closed by peer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15858) Beeline ^C doesn't close the session
[ https://issues.apache.org/jira/browse/HIVE-15858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861064#comment-15861064 ] Sankar Hariappan commented on HIVE-15858: - Thanks [~vihangk1] for the comment! I do agree with [~thejas] point that these changes are meant for different reasons. > Beeline ^C doesn't close the session > > > Key: HIVE-15858 > URL: https://issues.apache.org/jira/browse/HIVE-15858 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Attachments: HIVE-15858.01.patch > > > When open multiple connections through Beeline to Hiveserver2 and if tries to > close the client using !quit or ^C command, it looks like all the > connections/sessions are not getting closed. > !quit seems to close the current active connection but fails to close other > open sessions. > ^C doesn't close any session. > This behaviour is noticed only with the HTTP mode of transport > (hive.server2.transport.mode=http). In case of BINARY mode, server triggers > the close session when a tcp connection is closed by peer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861036#comment-15861036 ] Rui Li commented on HIVE-15860: --- A more specific way to fix it is just add the check when job has started and {{sparkJobStatus.getState()}} returns null. The SENT and QUEUED branches are covered by the monitor timeout. The SUCCEEDED and FAILED branch will break the loop themselves. So we only need to worry about the STARTED branch. > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861037#comment-15861037 ] Hive QA commented on HIVE-15860: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12852020/HIVE-15860.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10244 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_partitioned] (batchId=10) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3485/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3485/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3485/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12852020 - PreCommit-HIVE-Build > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15850) Proper handling of timezone in Druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15850: --- Status: Patch Available (was: In Progress) > Proper handling of timezone in Druid storage handler > > > Key: HIVE-15850 > URL: https://issues.apache.org/jira/browse/HIVE-15850 > Project: Hive > Issue Type: Bug > Components: Druid integration >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Critical > Attachments: HIVE-15850.patch > > > We need to make sure that filters on timestamp are passed to Druid with > correct timezone. > After CALCITE-1617, Calcite will generate a Druid query with intervals > without timezone specification. In Druid, these intervals will be assumed to > be in UTC (if Druid is running in UTC, which is currently the > recommendation). However, in Hive, those intervals should be assumed to be in > the user timezone. Thus, we should respect Hive semantics and include the > user timezone in the intervals passed to Druid. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Comment: was deleted (was: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851933/HIVE-15863.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10242 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=73) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde1] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] (batchId=13) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[date_udf] (batchId=109) org.apache.hive.jdbc.TestJdbcDriver2.testPrepareStatement (batchId=215) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3474/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3474/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3474/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851933 - PreCommit-HIVE-Build) > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15863.patch > > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Comment: was deleted (was: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851933/HIVE-15863.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10239 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=73) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde1] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] (batchId=13) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[date_udf] (batchId=109) org.apache.hadoop.hive.metastore.TestHiveMetaStoreGetMetaConf.org.apache.hadoop.hive.metastore.TestHiveMetaStoreGetMetaConf (batchId=188) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3475/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3475/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3475/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851933 - PreCommit-HIVE-Build) > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15863.patch > > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Attachment: (was: HIVE-15863.patch) > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15863.patch > > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Comment: was deleted (was: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851929/HIVE-15863.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10242 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=73) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=3) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde1] (batchId=32) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date] (batchId=13) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[date_udf] (batchId=109) org.apache.hive.jdbc.TestJdbcDriver2.testPrepareStatement (batchId=215) org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=217) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3470/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3470/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3470/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851929 - PreCommit-HIVE-Build) > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15863.patch > > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-15863: --- Attachment: HIVE-15863.patch > Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC > timezone > -- > > Key: HIVE-15863 > URL: https://issues.apache.org/jira/browse/HIVE-15863 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-15863.patch > > > Related to CALCITE-1623. > At query preparation time, Calcite uses a Calendar to hold the value of DATE, > TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, > and bad things might happen if it does not. Currently, we pass the Calendar > object with user timezone from Hive. We need to pass it with UTC timezone and > make the inverse conversion when we go back from Calcite to Hive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861020#comment-15861020 ] Rui Li commented on HIVE-15860: --- [~xuefuz] - yeah the monitor loops forever in that case. For the monitor, the job has started because we have received JobStarted event. So it goes to this switch branch every time it wakes up: {code} case STARTED: JobExecutionStatus sparkJobState = sparkJobStatus.getState(); if (sparkJobState == JobExecutionStatus.RUNNING) { Map progressMap = sparkJobStatus.getSparkStageProgress(); if (!running) { perfLogger.PerfLogEnd(CLASS_NAME, PerfLogger.SPARK_SUBMIT_TO_RUNNING); printAppInfo(); // print job stages. console.printInfo("\nQuery Hive on Spark job[" + sparkJobStatus.getJobId() + "] stages: " + Arrays.toString(sparkJobStatus.getStageIds())); console.printInfo("\nStatus: Running (Hive on Spark job[" + sparkJobStatus.getJobId() + "])"); running = true; String format = "Job Progress Format\nCurrentTime StageId_StageAttemptId: " + "SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount"; if (!inPlaceUpdate) { console.printInfo(format); } else { console.logInfo(format); } } printStatus(progressMap, lastProgressMap); lastProgressMap = progressMap; } break; {code} However, {{sparkJobStatus.getState()}} always returns null because we haven't received the JobSubmitted event which carries the JobId. At this point, we need a way to tell whether the connect has broken, or there's just a big gap between JobStarted and JobSubmitted, see HIVE-9370. So I added the check to see if the client is still alive. > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860967#comment-15860967 ] Xuefu Zhang commented on HIVE-15860: Hi [~lirui], thanks for working on this. Just to clarify, does the monitor loop forever in the case? It seems that it does even though the broken connection is already detected at RPC layer. As a result, the user session will hang forever w/o making any progress. > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15871) Cannot insert into target table because column number/types are different with hive.merge.cardinality.check=false
[ https://issues.apache.org/jira/browse/HIVE-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860956#comment-15860956 ] Hive QA commented on HIVE-15871: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851986/HIVE-15871.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10230 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) TestSparkCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=123) [groupby3_map.q,union26.q,mapreduce1.q,mapjoin_addjar.q,bucket_map_join_spark1.q,udf_example_add.q,multi_insert_with_join.q,sample7.q,auto_join_nulls.q,ppd_outer_join4.q,load_dyn_part8.q,alter_merge_orc.q,sample6.q,bucket_map_join_1.q,auto_sortmerge_join_9.q] org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3484/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3484/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3484/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851986 - PreCommit-HIVE-Build > Cannot insert into target table because column number/types are different > with hive.merge.cardinality.check=false > -- > > Key: HIVE-15871 > URL: https://issues.apache.org/jira/browse/HIVE-15871 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-15871.02.patch, HIVE-15871.03.patch > > > Merge statement with WHEN MATCHED and hive.merge.cardinality.check=false > causes errors like > {noformat} > FAILED: SemanticException [Error 10044]: Line 11:12 Cannot insert into target > table because column number/types are different 'part_0': Table insclause-0 > has 3 columns, but query has 4 columns. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15792) Hive should raise SemanticException when LPAD/RPAD pad character's length is 0
[ https://issues.apache.org/jira/browse/HIVE-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860943#comment-15860943 ] Lefty Leverenz commented on HIVE-15792: --- Thanks for the documentation, [~nandakumar131]. Looks good. > Hive should raise SemanticException when LPAD/RPAD pad character's length is 0 > -- > > Key: HIVE-15792 > URL: https://issues.apache.org/jira/browse/HIVE-15792 > Project: Hive > Issue Type: Improvement >Reporter: Zoltan Chovan >Assignee: Nandakumar >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15792.000.patch, HIVE-15792.001.patch, > HIVE-15792.002.patch > > > For example SELECT LPAD('A', 2, ''); will cause an infinite loop and the > running query will hang without any error. > It would be great if this could be prevented by checking the pad character's > length and if it's 0 then throw a SemanticException. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15860: -- Status: Patch Available (was: Open) > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
[ https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-15860: -- Attachment: HIVE-15860.1.patch Actually Rpc's listener can detect the remote end becomes inactive and mark SparkClient as dead accordingly. However, since we haven't received the JobId on Hive side, RemoteSparkJobStatus won't contact the remote driver for job info and simply return null for the monitor. And thus it never finds out the Rpc channel has already closed. The patch adds a check every time the monitor runs, to make sure the remote context is still alive. The fix is for remote mode only, because I don't think the issue exists for local mode. > RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally > - > > Key: HIVE-15860 > URL: https://issues.apache.org/jira/browse/HIVE-15860 > Project: Hive > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-15860.1.patch > > > It happens when RemoteDriver crashes between {{JobStarted}} and > {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will > consider the job has started, however it can't get the job info because it > hasn't received the JobId. Then the monitor will loop forever. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15672) LLAP text cache: improve first query perf II
[ https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860927#comment-15860927 ] Lefty Leverenz commented on HIVE-15672: --- Doc note: This adds *hive.llap.io.encode.vector.serde.async.enabled* to HiveConf.java, so it needs to be documented in the wiki for release 2.2.0. * [Configuration Properties -- LLAP I/O | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-LLAPI/O] Added a TODOC2.2 label. > LLAP text cache: improve first query perf II > > > Key: HIVE-15672 > URL: https://issues.apache.org/jira/browse/HIVE-15672 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, > HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch, > HIVE-15672.06.patch, HIVE-15672.07.patch, HIVE-15672.08.patch > > > 4) Send VRB to the pipeline and write ORC in parallel (in background). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15672) LLAP text cache: improve first query perf II
[ https://issues.apache.org/jira/browse/HIVE-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-15672: -- Labels: TODOC2.2 (was: ) > LLAP text cache: improve first query perf II > > > Key: HIVE-15672 > URL: https://issues.apache.org/jira/browse/HIVE-15672 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-15672.01.patch, HIVE-15672.02.patch, > HIVE-15672.03.patch, HIVE-15672.04.patch, HIVE-15672.05.patch, > HIVE-15672.06.patch, HIVE-15672.07.patch, HIVE-15672.08.patch > > > 4) Send VRB to the pipeline and write ORC in parallel (in background). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS
[ https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860918#comment-15860918 ] Xuefu Zhang commented on HIVE-15489: Thanks for working on this, [~csun]! I had one pass over this patch and have the following thoughts: 1. The new configuration might have a better name. "hive.spark.use.ts.stats" seems a little too general. Please consider a more specific name, something like "hive_on_spark.use.file.size.for.mapjoin". Very minor though. 2. For new property, we probably want to default it to the old behavior when checking in. Maybe we can have some test cases run with this new configuration on. 3. If join op isn't coming directly from table scan, I saw we are still using operator stats to decide mapjoin. This can still cause the issue of inaccurate estimation, right? Should we just don't convert it to map join in such a case? 4. There seems to be some test failures in the above run. Are they related? > Alternatively use table scan stats for HoS > -- > > Key: HIVE-15489 > URL: https://issues.apache.org/jira/browse/HIVE-15489 > Project: Hive > Issue Type: Improvement > Components: Spark, Statistics >Affects Versions: 2.2.0 >Reporter: Chao Sun >Assignee: Chao Sun > Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, > HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.wip.patch > > > For MapJoin in HoS, we should provide an option to only use stats in the TS > rather than the populated stats in each of the join branch. This could be > pretty conservative but more reliable. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15872) The PERCENTILE UDAF does not work with empty set
[ https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860908#comment-15860908 ] Chaozhong Yang edited comment on HIVE-15872 at 2/10/17 8:30 AM: According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reducer. was (Author: debugger87): According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reduce. ``` @Override public Object terminate(AggregationBuffer agg) throws HiveException { PercentileAggBuf myagg = (PercentileAggBuf) agg; if (myagg.histogram.getUsedBins() < 1) { // SQL standard - return null for zero elements return null; } else { assert(myagg.quantiles != null); return new DoubleWritable(myagg.histogram.quantile(myagg.quantiles[0])); } } ``` > The PERCENTILE UDAF does not work with empty set > > > Key: HIVE-15872 > URL: https://issues.apache.org/jira/browse/HIVE-15872 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Chaozhong Yang >Assignee: Chaozhong Yang > Fix For: 2.1.2 > > Attachments: HIVE-15872.patch > > > 1. Original SQL: > select > percentile_approx( > column0, > array(0.50, 0.70, 0.90, 0.95, 0.99) > ) > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > 2. Exception StackTrace: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) at > java.util.ArrayList.get(ArrayList.java:429) at > org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761) > ... 8 more > 3. review data: > select > column0 > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > After run this sql, we found the result is NULL. > 4. what's the meaning of [0.0, 1.0] in stacktrace? > In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an > ArrayList which name is partialHistogram. Normally, the basic structure of > partialHistogram is [npercentiles, percentile0, percentile1..., nbins, > bin0.x, bin0.y, bin1.x, bin1.y,...]. However, if we are process NULL(empty > set) column values, the partialHistoram will only contains [npercentiles(0), > nbins(1)]. That's the reason why the stacktrace shows a strange row data: > {"key":{},"value":{"_col0":[0.0,1.0]}} > Before we call histogram#merge (on-line hisgoram algorithm from p
[jira] [Commented] (HIVE-15864) Fix typo introduced in HIVE-14754
[ https://issues.apache.org/jira/browse/HIVE-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860909#comment-15860909 ] Barna Zsombor Klara commented on HIVE-15864: Failures are well known flaky/failing tests: HIVE-15744, HIVE-15696 Should not be related. > Fix typo introduced in HIVE-14754 > - > > Key: HIVE-15864 > URL: https://issues.apache.org/jira/browse/HIVE-15864 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2, Metastore >Affects Versions: 2.2.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-15864.patch > > > hs2_suceeded_queries needs another "c": hs2_succeeded_queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15872) The PERCENTILE UDAF does not work with empty set
[ https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860908#comment-15860908 ] Chaozhong Yang edited comment on HIVE-15872 at 2/10/17 8:25 AM: According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reduce. ``` @Override public Object terminate(AggregationBuffer agg) throws HiveException { PercentileAggBuf myagg = (PercentileAggBuf) agg; if (myagg.histogram.getUsedBins() < 1) { // SQL standard - return null for zero elements return null; } else { assert(myagg.quantiles != null); return new DoubleWritable(myagg.histogram.quantile(myagg.quantiles[0])); } } ``` was (Author: debugger87): According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reduce. @Override public Object terminate(AggregationBuffer agg) throws HiveException { PercentileAggBuf myagg = (PercentileAggBuf) agg; if (myagg.histogram.getUsedBins() < 1) { // SQL standard - return null for zero elements return null; } else { assert(myagg.quantiles != null); return new DoubleWritable(myagg.histogram.quantile(myagg.quantiles[0])); } } > The PERCENTILE UDAF does not work with empty set > > > Key: HIVE-15872 > URL: https://issues.apache.org/jira/browse/HIVE-15872 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Chaozhong Yang >Assignee: Chaozhong Yang > Fix For: 2.1.2 > > Attachments: HIVE-15872.patch > > > 1. Original SQL: > select > percentile_approx( > column0, > array(0.50, 0.70, 0.90, 0.95, 0.99) > ) > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > 2. Exception StackTrace: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) at > java.util.ArrayList.get(ArrayList.java:429) at > org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761) > ... 8 more > 3. review data: > select > column0 > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > After run this sql, we found the result is NULL. > 4. what's the meaning of [0.0, 1.0] in stacktrace? > In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an > ArrayList which name is partialHistogram. Normally, the basic structure of > partialH
[jira] [Commented] (HIVE-15872) The PERCENTILE UDAF does not work with empty set
[ https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860908#comment-15860908 ] Chaozhong Yang commented on HIVE-15872: --- According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reduce. @Override public Object terminate(AggregationBuffer agg) throws HiveException { PercentileAggBuf myagg = (PercentileAggBuf) agg; if (myagg.histogram.getUsedBins() < 1) { // SQL standard - return null for zero elements return null; } else { assert(myagg.quantiles != null); return new DoubleWritable(myagg.histogram.quantile(myagg.quantiles[0])); } } > The PERCENTILE UDAF does not work with empty set > > > Key: HIVE-15872 > URL: https://issues.apache.org/jira/browse/HIVE-15872 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Chaozhong Yang >Assignee: Chaozhong Yang > Fix For: 2.1.2 > > Attachments: HIVE-15872.patch > > > 1. Original SQL: > select > percentile_approx( > column0, > array(0.50, 0.70, 0.90, 0.95, 0.99) > ) > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > 2. Exception StackTrace: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) at > java.util.ArrayList.get(ArrayList.java:429) at > org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761) > ... 8 more > 3. review data: > select > column0 > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > After run this sql, we found the result is NULL. > 4. what's the meaning of [0.0, 1.0] in stacktrace? > In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an > ArrayList which name is partialHistogram. Normally, the basic structure of > partialHistogram is [npercentiles, percentile0, percentile1..., nbins, > bin0.x, bin0.y, bin1.x, bin1.y,...]. However, if we are process NULL(empty > set) column values, the partialHistoram will only contains [npercentiles(0), > nbins(1)]. That's the reason why the stacktrace shows a strange row data: > {"key":{},"value":{"_col0":[0.0,1.0]}} > Before we call histogram#merge (on-line hisgoram algorithm from paper: > http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the > partialHistogram should remove elements which store percentiles like > `partialHistogram.subList(0, nquantiles+1).clear();`. In the case of e
[jira] [Comment Edited] (HIVE-15872) The PERCENTILE UDAF does not work with empty set
[ https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860908#comment-15860908 ] Chaozhong Yang edited comment on HIVE-15872 at 2/10/17 8:25 AM: According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reduce. @Override public Object terminate(AggregationBuffer agg) throws HiveException { PercentileAggBuf myagg = (PercentileAggBuf) agg; if (myagg.histogram.getUsedBins() < 1) { // SQL standard - return null for zero elements return null; } else { assert(myagg.quantiles != null); return new DoubleWritable(myagg.histogram.quantile(myagg.quantiles[0])); } } was (Author: debugger87): According to SQL-standard, percentile_approx should return null for zero elements rather than throws IndexOutOfBoundsException in reduce. @Override public Object terminate(AggregationBuffer agg) throws HiveException { PercentileAggBuf myagg = (PercentileAggBuf) agg; if (myagg.histogram.getUsedBins() < 1) { // SQL standard - return null for zero elements return null; } else { assert(myagg.quantiles != null); return new DoubleWritable(myagg.histogram.quantile(myagg.quantiles[0])); } } > The PERCENTILE UDAF does not work with empty set > > > Key: HIVE-15872 > URL: https://issues.apache.org/jira/browse/HIVE-15872 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Chaozhong Yang >Assignee: Chaozhong Yang > Fix For: 2.1.2 > > Attachments: HIVE-15872.patch > > > 1. Original SQL: > select > percentile_approx( > column0, > array(0.50, 0.70, 0.90, 0.95, 0.99) > ) > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > 2. Exception StackTrace: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) > ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766) > at > org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) > ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) at > java.util.ArrayList.get(ArrayList.java:429) at > org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761) > ... 8 more > 3. review data: > select > column0 > from > my_table > where > date = '20170207' > and column1 = 'value1' > and column2 = 'value2' > and column3 = 'value3' > and column4 = 'value4' > and column5 = 'value5' > After run this sql, we found the result is NULL. > 4. what's the meaning of [0.0, 1.0] in stacktrace? > In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an > ArrayList which name is partialHistogram. Normally, the basic structure of > partialHistogram
[jira] [Commented] (HIVE-15430) Change SchemaTool table validator to test based on the dbType
[ https://issues.apache.org/jira/browse/HIVE-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860898#comment-15860898 ] Hive QA commented on HIVE-15430: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851985/HIVE-15430.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10244 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=140) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hive.beeline.TestSchemaTool.testValidateSchemaTables (batchId=211) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3483/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3483/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3483/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851985 - PreCommit-HIVE-Build > Change SchemaTool table validator to test based on the dbType > - > > Key: HIVE-15430 > URL: https://issues.apache.org/jira/browse/HIVE-15430 > Project: Hive > Issue Type: Sub-task > Components: Hive >Affects Versions: 2.2.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-15430.1.patch > > > Currently the validator parses the "oracle" schema file to determine what > tables are expected in the database. (mostly because of ease of parsing the > schema file compared to other syntax). We have learnt from HIVE-15118, that > not all schema files have the same amount of tables. For example, derby has > an old table that is never used that other DBs do not contain). -- This message was sent by Atlassian JIRA (v6.3.15#6346)