[jira] [Commented] (HIVE-7228) StreamPrinter should be joined to calling thread
[ https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031801#comment-14031801 ] Hive QA commented on HIVE-7228: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650319/HIVE-7228.patch {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5536 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/469/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/469/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-469/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650319 StreamPrinter should be joined to calling thread - Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Attachments: HIVE-7228.patch ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez
[ https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031804#comment-14031804 ] Gunther Hagleitner commented on HIVE-7212: -- No new failures. Use resource re-localization instead of restarting sessions in Tez -- Key: HIVE-7212 URL: https://issues.apache.org/jira/browse/HIVE-7212 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch, HIVE-7212.3.patch scriptfile1.q is failing on Tez because of a recent breakage in localization. On top of that we're currently restarting sessions if the resources have changed. (add file/add jar/etc). Instead of doing this we should just have tez relocalize these new resources. This way no session/AM restart is required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6385) UDF degrees() doesn't take decimal as input
[ https://issues.apache.org/jira/browse/HIVE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031811#comment-14031811 ] Lefty Leverenz commented on HIVE-6385: -- [~lars_francke] documented this in the wiki in February 2014 (and I added version information in March): * [UDFs -- Mathematical Functions | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions] * [doc diffs for HIVE-6385 (degrees) and other Hive 0.13.0 jiras | https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27362046selectedPageVersions=79selectedPageVersions=77] UDF degrees() doesn't take decimal as input --- Key: HIVE-6385 URL: https://issues.apache.org/jira/browse/HIVE-6385 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-6385.patch HIVE-6246 and HIVE-6327 added decimal support in most of the mathematical UDFs, including radians(). However, such support is still missing for UDF degrees(). This fills the gap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6385) UDF degrees() doesn't take decimal as input
[ https://issues.apache.org/jira/browse/HIVE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031814#comment-14031814 ] Lefty Leverenz commented on HIVE-6385: -- Also documented in Data Types: * [Hive Data Types -- Mathematical UDFs | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-MathematicalUDFs] UDF degrees() doesn't take decimal as input --- Key: HIVE-6385 URL: https://issues.apache.org/jira/browse/HIVE-6385 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-6385.patch HIVE-6246 and HIVE-6327 added decimal support in most of the mathematical UDFs, including radians(). However, such support is still missing for UDF degrees(). This fills the gap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-3976: - Labels: (was: TODOC13) Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: New Feature Components: Query Processor, Types Affects Versions: 0.11.0 Reporter: Mark Grover Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031816#comment-14031816 ] Lefty Leverenz commented on HIVE-3976: -- [~lars_francke] documented this in the wiki: * [Hive Data Types -- Decimals | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Decimals] * [doc diffs for HIVE-3976 | https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27838462selectedPageVersions=26selectedPageVersions=25] Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: New Feature Components: Query Processor, Types Affects Versions: 0.11.0 Reporter: Mark Grover Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031824#comment-14031824 ] Lefty Leverenz commented on HIVE-1466: -- [~prasadm] documented this in the DDL and DML wikidocs: * [DDL: Create Table (row_format) | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable] * [DDL: Row Format, Storage Format, and SerDe | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe] ** [DDL doc diffs for HIVE-1466 | https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27362034selectedPageVersions=72selectedPageVersions=71] * [DML: Writing data into the filesystem from queries | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries] ** [DML doc diffs for HIVE-1466 | https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27362036selectedPageVersions=21selectedPageVersions=20] Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: New Feature Components: SQL Reporter: Adam Kramer Assignee: Prasad Mujumdar Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-1466: - Labels: (was: TODOC13) Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: New Feature Components: SQL Reporter: Adam Kramer Assignee: Prasad Mujumdar Fix For: 0.13.0 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Documentation Policy
Should we create JIRA for these so that the work to be done on these does not get lost? ... or should we schedule a doc blitz to take care of as many as possible right away? (Inclusive OR.) -- Lefty On Sat, Jun 14, 2014 at 10:35 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: A few more from older releases: *0.10*: https://issues.apache.org/jira/browse/HIVE-2397?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC10%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC *0.11:* https://issues.apache.org/jira/browse/HIVE-3073?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC11%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC *0.12:* https://issues.apache.org/jira/browse/HIVE-5161?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC12%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC Should we create JIRA for these so that the work to be done on these does not get lost? On Fri, Jun 13, 2014 at 5:59 PM, Lefty Leverenz leftylever...@gmail.com wrote: Agreed, deleting TODOC## simplifies the labels field, so we should just use comments to keep track of docs done. Besides, doc tasks can get complicated -- my gmail inbox has a few messages with simultaneous done and to-do labels -- so comments are best for tracking progress. Also, as Szehon noticed, links in the comments make it easy to find the docs. +1 on (a): delete TODOCs when done; don't add any new labels. -- Lefty On Fri, Jun 13, 2014 at 1:31 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: +1 on deleting the TODOC tag as I think it's assumed by default that once an enhancement is done, it will be doc'ed. We may consider adding an additional docdone tag but I think we can instead just wait for a +1 from the contributor that the documentation is satisfactory (and assume a implicit +1 for no reply) before deleting the TODOC tag. On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho sze...@cloudera.com wrote: Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses the state of a JIRA, so its probably best to remove it. The idea of docdone is to query what docs got produced and needs review? It might be nice to have a tag for that, to easily signal to contributor or interested parties to take a look. On a side note, I already find very helpful your JIRA comments with links to doc-wikis, both to inform the contributor and just as reference for others. Thanks again for the great work. On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz leftylever...@gmail.com wrote: One more question: what should we do after the documentation is done for a JIRA ticket? (a) Just remove the TODOC## label. (b) Replace TODOC## with docdone (no caps, no version number). (c) Add a docdone label but keep TODOC##. (d) Something else. -- Lefty On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com wrote: Thank you guys! This is great work. On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Going through the issues, I think overall Lefty did an awesome job catching and documenting most of them in time. Following are some of the 0.13 and 0.14 ones which I found which either do not have documentation or have outdated one and probably need one to be consumeable. Contributors, feel free to remove the label if you disagree. *TODOC13:* https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed) *TODOC14:* https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed) I'll continue digging through the queue going backwards to 0.12 and 0.11 and see if I find similar stuff there as well. On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com kulkarni.swar...@gmail.com wrote: Feel free to label such jiras with this keyword and ask the contributors for more information if you need any. Cool. I'll start chugging through the queue today adding labels as apt. On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote: Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13? Sounds good to me. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the
[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031827#comment-14031827 ] Hive QA commented on HIVE-5771: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650333/HIVE-5771.12.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 5615 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/470/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/470/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-470/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650333 Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5607) Hive fails to parse the % (mod) sign after brackets.
[ https://issues.apache.org/jira/browse/HIVE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-5607: - Labels: TODOC14 (was: ) Hive fails to parse the % (mod) sign after brackets. -- Key: HIVE-5607 URL: https://issues.apache.org/jira/browse/HIVE-5607 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.10.0 Reporter: dima machlin Assignee: Xuefu Zhang Priority: Minor Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5607.1.patch, HIVE-5607.patch the scenario : create table t(a int); select * from t order by (a)%7; will fail with the following exception : FAILED: ParseException line 1:28 mismatched input '%' expecting EOF near ')' I must mention that this *does* work in 0.7.1 and doesn't work in 0.10 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6810) Provide example and update docs to show use of back tick when doing SHOW GRANT
[ https://issues.apache.org/jira/browse/HIVE-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6810: - Labels: TODOC12 (was: ) Provide example and update docs to show use of back tick when doing SHOW GRANT -- Key: HIVE-6810 URL: https://issues.apache.org/jira/browse/HIVE-6810 Project: Hive Issue Type: Improvement Components: Documentation Affects Versions: 0.12.0 Reporter: Udai Kiran Potluri Labels: TODOC12 The Docs at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-ViewingGrantedPrivileges Do not show an example or mention need to use back tick (`) character especially when there are special characters. Per HIVE-2074, all GRANT/REVOKE need a back tick character when using -. Similarly, with the SHOW GRANT USER if the user id has a .. For eg: SHOW GRANT USER `abc.xyz` ON TABLE mock_opt; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6684) Beeline does not accept comments that are preceded by spaces
[ https://issues.apache.org/jira/browse/HIVE-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-6684: - Labels: TODOC14 (was: ) Beeline does not accept comments that are preceded by spaces Key: HIVE-6684 URL: https://issues.apache.org/jira/browse/HIVE-6684 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.10.0 Reporter: Jeremy Beard Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-6684.1.patch, HIVE-6684.2.patch Beeline throws an error if single-line comments are indented with spaces. This works in the embedded Hive CLI. For example: SELECT -- this is the field we want field FROM table; Error: Error while processing statement: FAILED: ParseException line 1:71 cannot recognize input near 'EOF' 'EOF' 'EOF' in select clause (state=42000,code=4) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Attachment: HIVE-7159.5.patch .5 fixes some of the failures. For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Status: Patch Available (was: Open) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-7159: - Status: Open (was: Patch Available) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031858#comment-14031858 ] Gunther Hagleitner commented on HIVE-7159: -- +1 once the tests pass For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition
[ https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031859#comment-14031859 ] Gunther Hagleitner commented on HIVE-7159: -- rb: https://reviews.apache.org/r/22553/ For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition Key: HIVE-7159 URL: https://issues.apache.org/jira/browse/HIVE-7159 Project: Hive Issue Type: Bug Reporter: Harish Butani Assignee: Harish Butani Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, HIVE-7159.4.patch, HIVE-7159.5.patch A join B on A.x = B.y can be transformed to (A where x is not null) join (B where y is not null) on A.x = B.y Apart from avoiding shuffling null keyed rows it also avoids issues with reduce-side skew when there are a lot of null values in the data. Thanks to [~gopalv] for the analysis and coming up with the solution. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031894#comment-14031894 ] Hive QA commented on HIVE-6584: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650362/HIVE-6584.4.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-472/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650362 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch, HIVE-6584.4.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7233) File hive-hwi-0.13.1 not found on lib folder
Dinh Hoang Luong created HIVE-7233: -- Summary: File hive-hwi-0.13.1 not found on lib folder Key: HIVE-7233 URL: https://issues.apache.org/jira/browse/HIVE-7233 Project: Hive Issue Type: New Feature Components: Web UI Affects Versions: 0.13.1 Reporter: Dinh Hoang Luong I found that: line 27 of file .../apache-hive-0.13.1-sr/hwi/pom.xml with packagejar/package instead of packagewar/package sorry my english is bad. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031977#comment-14031977 ] David Chen commented on HIVE-7094: -- See HIVE-7230 for the patch for adding the Eclipse formatter file. Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads
[ https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031990#comment-14031990 ] Hive QA commented on HIVE-7210: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650358/HIVE-7210.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5536 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.conf.TestHiveConf.testConfProperties {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/474/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/474/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-474/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650358 NPE with No plan file found when running Driver instances on multiple threads --- Key: HIVE-7210 URL: https://issues.apache.org/jira/browse/HIVE-7210 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Gunther Hagleitner Attachments: HIVE-7210.1.patch Informatica has a multithreaded application running multiple instances of CLIDriver. When running concurrent queries they sometimes hit the following error: {noformat} 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271) at
[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread
[ https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7228: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Pankit! StreamPrinter should be joined to calling thread - Key: HIVE-7228 URL: https://issues.apache.org/jira/browse/HIVE-7228 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Pankit Thapar Assignee: Pankit Thapar Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7228.patch ISSUE: StreamPrinter class is used for connecting an input stream (connected to output) of a process with the output stream of a Session (CliSessionState/SessionState class) It acts as a pipe between the two and transfers data from input stream to the output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. From some of the current usages of this class, I noticed that the calling threads do not wait for the transfer operation to be completed. That is, the calling thread does not join the SteamPrinter threads. The calling thread would move forward thinking that the respective output stream already has the data needed. But, it is not always the right assumption since, it might happen that the StreamPrinter thread did not finish execution by the time it was expected by the calling thread. FIX: To ensure that calling thread waits for the StreamPrinter threads to complete, StreamPrinter threads are joined to calling thread. Please note , without the fix, TestCliDriverMethods#testRun failed sometimes (like 1 in 30 times). This test would not fail with this fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez
[ https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031998#comment-14031998 ] Vikram Dixit K commented on HIVE-7212: -- +1 LGTM Use resource re-localization instead of restarting sessions in Tez -- Key: HIVE-7212 URL: https://issues.apache.org/jira/browse/HIVE-7212 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch, HIVE-7212.3.patch scriptfile1.q is failing on Tez because of a recent breakage in localization. On top of that we're currently restarting sessions if the resources have changed. (add file/add jar/etc). Instead of doing this we should just have tez relocalize these new resources. This way no session/AM restart is required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032019#comment-14032019 ] Hive QA commented on HIVE-7230: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650394/HIVE-7230.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5611 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.conf.TestHiveConf.testConfProperties {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/475/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/475/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-475/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650394 Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032029#comment-14032029 ] Swarnim Kulkarni commented on HIVE-7230: Duplicate of HIVE-6317 Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions
[ https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032030#comment-14032030 ] Swarnim Kulkarni commented on HIVE-7230: As noted on HIVE-6317, you might be able to fix this by simply adding the following to the pom file: {noformat} plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-eclipse-plugin/artifactId version${maven.eclipse.plugin.version}/version configuration downloadJavadocstrue/downloadJavadocs downloadSourcestrue/downloadSources workspaceActiveCodeStyleProfileNameGoogleStyle/workspaceActiveCodeStyleProfileName workspaceCodeStylesURLhttps://google-styleguide.googlecode.com/svn/trunk/eclipse-java-google-style.xml/workspaceCodeStylesURL /configuration /plugin {noformat} Add Eclipse formatter file for Hive coding conventions -- Key: HIVE-7230 URL: https://issues.apache.org/jira/browse/HIVE-7230 Project: Hive Issue Type: Improvement Reporter: David Chen Assignee: David Chen Attachments: HIVE-7230.1.patch Eclipse's formatter is a convenient way to clean up formatting for Java code. Currently, there is no Eclipse formatter file checked into Hive's codebase. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6317) Add eclipse code formatter to hive projects
[ https://issues.apache.org/jira/browse/HIVE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni resolved HIVE-6317. Resolution: Duplicate Add eclipse code formatter to hive projects --- Key: HIVE-6317 URL: https://issues.apache.org/jira/browse/HIVE-6317 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Currently on hive trunk, it seems like the eclipse formatter doesn't get automatically imported(it used to happen sometime ago). We should probably fix that so all changes going forward are formatted consistently according to this formatter. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032033#comment-14032033 ] Gopal V commented on HIVE-7232: --- [~ashutoshc]: Incorrect results as well. Ran the same query with Tez MR, got different results. MR doesn't hit the same scenario becuase of the empty Map task, which doesn't have any input columns named reducesinkkey0. Tez seems to hit a corner case where there are 2 shuffle joins one after the other - there is an input col named KEY.reducesinkkey0 and an output col named reducesinkkey0, which have no relation to each other. {code} $ diff -y -W 72 results/q5.tez.txt results/q5.mr.txt CHINA 985314.0848|VIETNAM 1.897236998313891E10 INDIA 819113.441801 |CHINA 1.894405687452681E10 VIETNAM 637407.2255|INDONESIA 1.89306456994551 JAPAN 523754.9791|JAPAN 1.892184676125508E10 INDONESIA 517900.1924|INDIA 1.886882412417209E10 {code} ReduceSink is emitting NULL keys due to failed keyEval -- Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V After HIVE-4867 has been merged in, some queries have exhibited a very weird skew towards NULL keys emitted from the ReduceSinkOperator. Added extra logging to print expr.column() in ExprNodeColumnEvaluator in reduce sink. {code} 2014-06-14 00:37:19,186 INFO [TezChild] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)} key_row={reducesinkkey0:442} {code} {code} HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null); int distKeyLength = firstKey.getDistKeyLength(); if(distKeyLength = 1) { StringBuffer x1 = new StringBuffer(); x1.append(numDistributionKeys = + numDistributionKeys + \n); for (int i = 0; i numDistributionKeys; i++) { x1.append(cachedKeys[0][i] + -- + keyEval[i] + \n); } x1.append(key_row=+ SerDeUtils.getJSONString(row, keyObjectInspector)); LOG.info(GOPAL: + x1.toString()); } {code} The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032064#comment-14032064 ] Hive QA commented on HIVE-7094: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650393/HIVE-7094.3.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/478/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/478/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-478/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-478/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'conf/hive-default.xml.template' Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/core/target hcatalog/streaming/target hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestUnrolledBitPack.java + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1602783. At revision 1602783. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12650393 Separate out static/dynamic
[jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC
[ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032062#comment-14032062 ] Hive QA commented on HIVE-7219: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650413/HIVE-7219.3.patch {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 5578 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testStoreFuncAllSimpleTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/476/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/476/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-476/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650413 Improve performance of serialization utils in ORC - Key: HIVE-7219 URL: https://issues.apache.org/jira/browse/HIVE-7219 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, orc-read-perf-jmh-benchmark.png ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7234) Select on decimal column throws NPE
Ashish Kumar Singh created HIVE-7234: Summary: Select on decimal column throws NPE Key: HIVE-7234 URL: https://issues.apache.org/jira/browse/HIVE-7234 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Select on decimal column throws NPE for values greater than maximum permissible value (99) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7198) HiveServer2 CancelOperation does not work for long running queries
[ https://issues.apache.org/jira/browse/HIVE-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis resolved HIVE-7198. - Resolution: Duplicate It's fixed by HIVE-5901. Feel free to reopen this issue if it's reproduced in hive-0.13.0. HiveServer2 CancelOperation does not work for long running queries -- Key: HIVE-7198 URL: https://issues.apache.org/jira/browse/HIVE-7198 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Romain Rigaux Sending the CancelOperation() call does not always stop the query and its related MapReduce jobs. e.g. from https://issues.cloudera.org/browse/HUE-2144 {code} I guess you're right. But the strange thing is that the canceled query shows in job browser as 'Running' and the percents go up - 0%, 50%, then the job is failed. How does the cancelling actually work? Is it like the hadoop kill command? It seems to me like it works until certain phase of map reduce is done. And another thing - after cancelling the job in Hue I can kill it with hadoop job -kill job_id. If it was killed already, it would show no such job. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 22612: HIVE-7234: Handle nulls from decimal columns elegantly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22612/ --- Review request for hive, Szehon Ho and Xuefu Zhang. Bugs: HIVE-7234 https://issues.apache.org/jira/browse/HIVE-7234 Repository: hive-git Description --- HIVE-7234: Handle nulls from decimal columns elegantly Diffs - common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java ad0901548217fbb828a01f8f5edda64581ac2c1e data/files/decimal_10_0.txt PRE-CREATION data/files/decimal_9_0.txt PRE-CREATION itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestDecimal.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java 78cc3819c61f5a1bcb0cdd3425a0105416c26861 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 5a4623729ec955bbe8fcf662503b42ff8735eaad Diff: https://reviews.apache.org/r/22612/diff/ Testing --- Added unit tests to test the scenario. Thanks, Ashish Singh
[jira] [Updated] (HIVE-7234) Select on decimal column throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated HIVE-7234: - Status: Patch Available (was: Open) Select on decimal column throws NPE --- Key: HIVE-7234 URL: https://issues.apache.org/jira/browse/HIVE-7234 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7234.patch Select on decimal column throws NPE for values greater than maximum permissible value (99) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7234) Select on decimal column throws NPE
[ https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated HIVE-7234: - Attachment: HIVE-7234.patch Select on decimal column throws NPE --- Key: HIVE-7234 URL: https://issues.apache.org/jira/browse/HIVE-7234 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7234.patch Select on decimal column throws NPE for values greater than maximum permissible value (99) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-7232: --- Assignee: Navis ReduceSink is emitting NULL keys due to failed keyEval -- Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis After HIVE-4867 has been merged in, some queries have exhibited a very weird skew towards NULL keys emitted from the ReduceSinkOperator. Added extra logging to print expr.column() in ExprNodeColumnEvaluator in reduce sink. {code} 2014-06-14 00:37:19,186 INFO [TezChild] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)} key_row={reducesinkkey0:442} {code} {code} HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null); int distKeyLength = firstKey.getDistKeyLength(); if(distKeyLength = 1) { StringBuffer x1 = new StringBuffer(); x1.append(numDistributionKeys = + numDistributionKeys + \n); for (int i = 0; i numDistributionKeys; i++) { x1.append(cachedKeys[0][i] + -- + keyEval[i] + \n); } x1.append(key_row=+ SerDeUtils.getJSONString(row, keyObjectInspector)); LOG.info(GOPAL: + x1.toString()); } {code} The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032088#comment-14032088 ] Ashutosh Chauhan commented on HIVE-7232: Seems like this also can get triggered for MR path. I think latest patch on HIVE-5771 is failing for test like subquery_in.q because they are hitting into this issue. ReduceSink is emitting NULL keys due to failed keyEval -- Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis After HIVE-4867 has been merged in, some queries have exhibited a very weird skew towards NULL keys emitted from the ReduceSinkOperator. Added extra logging to print expr.column() in ExprNodeColumnEvaluator in reduce sink. {code} 2014-06-14 00:37:19,186 INFO [TezChild] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)} key_row={reducesinkkey0:442} {code} {code} HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null); int distKeyLength = firstKey.getDistKeyLength(); if(distKeyLength = 1) { StringBuffer x1 = new StringBuffer(); x1.append(numDistributionKeys = + numDistributionKeys + \n); for (int i = 0; i numDistributionKeys; i++) { x1.append(cachedKeys[0][i] + -- + keyEval[i] + \n); } x1.append(key_row=+ SerDeUtils.getJSONString(row, keyObjectInspector)); LOG.info(GOPAL: + x1.toString()); } {code} The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC
[ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032090#comment-14032090 ] Gunther Hagleitner commented on HIVE-7219: -- These failures look related to the patch (at least some of them). Looked at orc_analyze: Need to update golden files with new sizes. orc_split_elimination: Seems the order of records has changed in some queries, not sure how this patch causes it, but should take a look. Improve performance of serialization utils in ORC - Key: HIVE-7219 URL: https://issues.apache.org/jira/browse/HIVE-7219 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, orc-read-perf-jmh-benchmark.png ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7235) TABLESAMPLE on join table is regarded as alias
Navis created HIVE-7235: --- Summary: TABLESAMPLE on join table is regarded as alias Key: HIVE-7235 URL: https://issues.apache.org/jira/browse/HIVE-7235 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial {noformat} SELECT c_custkey, o_custkey FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on c_custkey = o_custkey; {noformat} Fails with NPE -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()
[ https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] steve, Oh updated HIVE-7182: Status: Patch Available (was: Open) ResultSet is not closed in JDBCStatsPublisher#init() Key: HIVE-7182 URL: https://issues.apache.org/jira/browse/HIVE-7182 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: steve, Oh Priority: Minor Attachments: HIVE-7182.1.patch, HIVE-7182.2.patch, HIVE-7182.patch {code} ResultSet rs = dbm.getTables(null, null, JDBCStatsUtils.getStatTableName(), null); boolean tblExists = rs.next(); {code} rs is not closed upon return from init() If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the close() call should be placed in finally block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()
[ https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] steve, Oh updated HIVE-7182: Attachment: HIVE-7182.2.patch I reattached the patch after fix compile error and rebase. HIVE-7182.2.patch rebased against current trunk. ResultSet is not closed in JDBCStatsPublisher#init() Key: HIVE-7182 URL: https://issues.apache.org/jira/browse/HIVE-7182 Project: Hive Issue Type: Bug Reporter: Ted Yu Assignee: steve, Oh Priority: Minor Attachments: HIVE-7182.1.patch, HIVE-7182.2.patch, HIVE-7182.patch {code} ResultSet rs = dbm.getTables(null, null, JDBCStatsUtils.getStatTableName(), null); boolean tblExists = rs.next(); {code} rs is not closed upon return from init() If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the close() call should be placed in finally block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HIVE-7236) Tez progress monitor should indicate running/failed tasks
[ https://issues.apache.org/jira/browse/HIVE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-7236 started by Gopal V. Tez progress monitor should indicate running/failed tasks - Key: HIVE-7236 URL: https://issues.apache.org/jira/browse/HIVE-7236 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Currently, the only logging in TezJobMonitor is for completed tasks. This makes it hard to locate task stalls and task failures. Failure scenarios are harder to debug, in particular when analyzing query runs on a cluster with bad nodes. Change the job monitor to log running failed tasks as follows. {code} Map 1: 0(+157,-1)/1755 Reducer 2: 0/1 Map 1: 0(+168,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 {code} That is 189 tasks running, 1 failure and 0 complete. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7236) Tez progress monitor should indicate running/failed tasks
Gopal V created HIVE-7236: - Summary: Tez progress monitor should indicate running/failed tasks Key: HIVE-7236 URL: https://issues.apache.org/jira/browse/HIVE-7236 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Currently, the only logging in TezJobMonitor is for completed tasks. This makes it hard to locate task stalls and task failures. Failure scenarios are harder to debug, in particular when analyzing query runs on a cluster with bad nodes. Change the job monitor to log running failed tasks as follows. {code} Map 1: 0(+157,-1)/1755 Reducer 2: 0/1 Map 1: 0(+168,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 {code} That is 189 tasks running, 1 failure and 0 complete. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7236) Tez progress monitor should indicate running/failed tasks
[ https://issues.apache.org/jira/browse/HIVE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-7236: -- Attachment: HIVE-7236.1.patch Tez progress monitor should indicate running/failed tasks - Key: HIVE-7236 URL: https://issues.apache.org/jira/browse/HIVE-7236 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-7236.1.patch Currently, the only logging in TezJobMonitor is for completed tasks. This makes it hard to locate task stalls and task failures. Failure scenarios are harder to debug, in particular when analyzing query runs on a cluster with bad nodes. Change the job monitor to log running failed tasks as follows. {code} Map 1: 0(+157,-1)/1755 Reducer 2: 0/1 Map 1: 0(+168,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 {code} That is 189 tasks running, 1 failure and 0 complete. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032106#comment-14032106 ] Navis commented on HIVE-7232: - Fail of subquery_in.q in HIVE-5771 seemed not caused by HIVE-4867 but strongly related with it because HIVE-4867 have (intentionally) broken internal assumption on keys/values of RS. With constant propagation optimizer, subquery_in.q is making different keys for each aliases of join, which seemed not valid. {code} -- sq_1 Reduce Output Operator key expressions: _col1 (type: int) sort order: ++ Map-reduce partition columns: _col1 (type: int) {code} and {code} -- others Reduce Output Operator key expressions: _col0 (type: int), _col1 (type: int) sort order: ++ Map-reduce partition columns: _col0 (type: int), _col1 (type: int) {code} ReduceSink is emitting NULL keys due to failed keyEval -- Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis After HIVE-4867 has been merged in, some queries have exhibited a very weird skew towards NULL keys emitted from the ReduceSinkOperator. Added extra logging to print expr.column() in ExprNodeColumnEvaluator in reduce sink. {code} 2014-06-14 00:37:19,186 INFO [TezChild] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)} key_row={reducesinkkey0:442} {code} {code} HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null); int distKeyLength = firstKey.getDistKeyLength(); if(distKeyLength = 1) { StringBuffer x1 = new StringBuffer(); x1.append(numDistributionKeys = + numDistributionKeys + \n); for (int i = 0; i numDistributionKeys; i++) { x1.append(cachedKeys[0][i] + -- + keyEval[i] + \n); } x1.append(key_row=+ SerDeUtils.getJSONString(row, keyObjectInspector)); LOG.info(GOPAL: + x1.toString()); } {code} The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032108#comment-14032108 ] Navis commented on HIVE-7232: - For this problem, I cannot understand that the RS which is a child of JOIN can get ROW of format, {noformat} {reducesinkkey0:442} {noformat} In my reading, join would emit ROW and rowOI which is labeled with output columns, like below {noformat} _col0{KEY.reducesinkkey0} _col3{VALUE._col2} _col10 {VALUE._col0} _col11 {KEY.reducesinkkey0} _col14 {VALUE._col3} {noformat} I don't have environment for hadoop-2, so it's hard to verify, so it might take some time. ReduceSink is emitting NULL keys due to failed keyEval -- Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis After HIVE-4867 has been merged in, some queries have exhibited a very weird skew towards NULL keys emitted from the ReduceSinkOperator. Added extra logging to print expr.column() in ExprNodeColumnEvaluator in reduce sink. {code} 2014-06-14 00:37:19,186 INFO [TezChild] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)} key_row={reducesinkkey0:442} {code} {code} HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null); int distKeyLength = firstKey.getDistKeyLength(); if(distKeyLength = 1) { StringBuffer x1 = new StringBuffer(); x1.append(numDistributionKeys = + numDistributionKeys + \n); for (int i = 0; i numDistributionKeys; i++) { x1.append(cachedKeys[0][i] + -- + keyEval[i] + \n); } x1.append(key_row=+ SerDeUtils.getJSONString(row, keyObjectInspector)); LOG.info(GOPAL: + x1.toString()); } {code} The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7231) Improve ORC padding
[ https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032109#comment-14032109 ] Hive QA commented on HIVE-7231: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12650431/HIVE-7231.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/479/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/479/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-479/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12650431 Improve ORC padding --- Key: HIVE-7231 URL: https://issues.apache.org/jira/browse/HIVE-7231 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-7231.1.patch Current ORC padding is not optimal because of fixed stripe sizes within block. The padding overhead will be significant in some cases. Also padding percentage relative to stripe size is not configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5771: --- Status: Open (was: Patch Available) [~tedxu] Navis's observation [here|https://issues.apache.org/jira/browse/HIVE-7232?focusedCommentId=14032108page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032108] seems correct about failing tests for subquery_in.q test case. Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7198) HiveServer2 CancelOperation does not work for long running queries
[ https://issues.apache.org/jira/browse/HIVE-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032115#comment-14032115 ] Romain Rigaux commented on HIVE-7198: - Nice! Will have a try! HiveServer2 CancelOperation does not work for long running queries -- Key: HIVE-7198 URL: https://issues.apache.org/jira/browse/HIVE-7198 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Romain Rigaux Sending the CancelOperation() call does not always stop the query and its related MapReduce jobs. e.g. from https://issues.cloudera.org/browse/HUE-2144 {code} I guess you're right. But the strange thing is that the canceled query shows in job browser as 'Running' and the percents go up - 0%, 50%, then the job is failed. How does the cancelling actually work? Is it like the hadoop kill command? It seems to me like it works until certain phase of map reduce is done. And another thing - after cancelling the job in Hue I can kill it with hadoop job -kill job_id. If it was killed already, it would show no such job. {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval
[ https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032123#comment-14032123 ] Gopal V commented on HIVE-7232: --- [~navis]: I can run tests for you, if you have a patch file with log lines. I can reproduce this issue consistently for all recent runs of this query. ReduceSink is emitting NULL keys due to failed keyEval -- Key: HIVE-7232 URL: https://issues.apache.org/jira/browse/HIVE-7232 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Navis After HIVE-4867 has been merged in, some queries have exhibited a very weird skew towards NULL keys emitted from the ReduceSinkOperator. Added extra logging to print expr.column() in ExprNodeColumnEvaluator in reduce sink. {code} 2014-06-14 00:37:19,186 INFO [TezChild] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)} key_row={reducesinkkey0:442} {code} {code} HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null); int distKeyLength = firstKey.getDistKeyLength(); if(distKeyLength = 1) { StringBuffer x1 = new StringBuffer(); x1.append(numDistributionKeys = + numDistributionKeys + \n); for (int i = 0; i numDistributionKeys; i++) { x1.append(cachedKeys[0][i] + -- + keyEval[i] + \n); } x1.append(key_row=+ SerDeUtils.getJSONString(row, keyObjectInspector)); LOG.info(GOPAL: + x1.toString()); } {code} The query is tpc-h query5, with extra NULL checks just to be sure. {code} ELECT n_name, sum(l_extendedprice * (1 - l_discount)) AS revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate = '1994-01-01' AND o_orderdate '1995-01-01' and l_orderkey is not null and c_custkey is not null and l_suppkey is not null and c_nationkey is not null and s_nationkey is not null and n_regionkey is not null GROUP BY n_name ORDER BY revenue DESC; {code} The reducer which has the issue has the following plan {code} Reducer 3 Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {KEY.reducesinkkey0} {VALUE._col2} 1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3} outputColumnNames: _col0, _col3, _col10, _col11, _col14 Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col10 (type: int) sort order: + Map-reduce partition columns: _col10 (type: int) Statistics: Num rows: 18344 Data size: 95229140992 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: int), _col3 (type: int), _col11 (type: int), _col14 (type: string) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)