[jira] [Work started] (HIVE-13873) Column pruning for nested fields
[ https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-13873 started by Ferdinand Xu. --- > Column pruning for nested fields > > > Key: HIVE-13873 > URL: https://issues.apache.org/jira/browse/HIVE-13873 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-13873) Column pruning for nested fields
[ https://issues.apache.org/jira/browse/HIVE-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-13873: --- Assignee: Ferdinand Xu > Column pruning for nested fields > > > Key: HIVE-13873 > URL: https://issues.apache.org/jira/browse/HIVE-13873 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Reporter: Xuefu Zhang >Assignee: Ferdinand Xu > > Some columnar file formats such as Parquet store fields in struct type also > column by column using encoding described in Google Dramel pager. It's very > common in big data where data are stored in structs while queries only needs > a subset of the the fields in the structs. However, presently Hive still > needs to read the whole struct regardless whether all fields are selected. > Therefore, pruning unwanted sub-fields in struct or nested fields at file > reading time would be a big performance boost for such scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13840) Orc split generation is reading file footers twice
[ https://issues.apache.org/jira/browse/HIVE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13840: - Attachment: HIVE-13840-branch-1.patch Committed to branch-1 as well > Orc split generation is reading file footers twice > -- > > Key: HIVE-13840 > URL: https://issues.apache.org/jira/browse/HIVE-13840 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13840-branch-1.patch, HIVE-13840.1.patch, > HIVE-13840.2.patch, HIVE-13840.3.patch > > > Recent refactorings to move orc out introduced a regression in split > generation. This leads to reading the orc file footers twice during split > generation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13840) Orc split generation is reading file footers twice
[ https://issues.apache.org/jira/browse/HIVE-13840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13840: - Fix Version/s: 1.3.0 > Orc split generation is reading file footers twice > -- > > Key: HIVE-13840 > URL: https://issues.apache.org/jira/browse/HIVE-13840 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13840-branch-1.patch, HIVE-13840.1.patch, > HIVE-13840.2.patch, HIVE-13840.3.patch > > > Recent refactorings to move orc out introduced a regression in split > generation. This leads to reading the orc file footers twice during split > generation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13841) Orc split generation returns different strategies with cache enabled vs disabled
[ https://issues.apache.org/jira/browse/HIVE-13841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13841: - Fix Version/s: 1.3.0 > Orc split generation returns different strategies with cache enabled vs > disabled > > > Key: HIVE-13841 > URL: https://issues.apache.org/jira/browse/HIVE-13841 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13841-branch-1.patch, HIVE-13841.1.patch > > > Split strategy chosen by OrcInputFormat should not change when enabling or > disabling footer cache. Currently if footer cache is disabled minSplits in > OrcInputFormat.Context will be set to -1 which is used during determination > of split strategies. minSplits should be set to requested value or some > default instead of cache size -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader
[ https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330996#comment-15330996 ] Sergey Shelukhin commented on HIVE-13913: - isClosed is not thread safe, I might just restore it to not working for now. There's also some other bug I got distracted from... will update the patch eventually. > LLAP: introduce backpressure to recordreader > > > Key: HIVE-13913 > URL: https://issues.apache.org/jira/browse/HIVE-13913 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, > HIVE-13913.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader
[ https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330993#comment-15330993 ] Sergey Shelukhin commented on HIVE-13913: - isClosed is not thread safe, I might just restore it to not working for now. There's also some other bug I got distracted from... will update the patch eventually. > LLAP: introduce backpressure to recordreader > > > Key: HIVE-13913 > URL: https://issues.apache.org/jira/browse/HIVE-13913 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, > HIVE-13913.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader
[ https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330992#comment-15330992 ] Sergey Shelukhin commented on HIVE-13913: - isClosed is not thread safe, I might just restore it to not working for now. There's also some other bug I got distracted from... will update the patch eventually. > LLAP: introduce backpressure to recordreader > > > Key: HIVE-13913 > URL: https://issues.apache.org/jira/browse/HIVE-13913 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, > HIVE-13913.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13841) Orc split generation returns different strategies with cache enabled vs disabled
[ https://issues.apache.org/jira/browse/HIVE-13841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13841: - Attachment: HIVE-13841-branch-1.patch Also committed patch to branch-1. > Orc split generation returns different strategies with cache enabled vs > disabled > > > Key: HIVE-13841 > URL: https://issues.apache.org/jira/browse/HIVE-13841 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.1.0 > > Attachments: HIVE-13841-branch-1.patch, HIVE-13841.1.patch > > > Split strategy chosen by OrcInputFormat should not change when enabling or > disabling footer cache. Currently if footer cache is disabled minSplits in > OrcInputFormat.Context will be set to -1 which is used during determination > of split strategies. minSplits should be set to requested value or some > default instead of cache size -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14016) Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator
[ https://issues.apache.org/jira/browse/HIVE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-14016: -- Assignee: Gopal V > Vectorization: VectorGroupByRollupOperator and VectorGroupByCubeOperator > > > Key: HIVE-14016 > URL: https://issues.apache.org/jira/browse/HIVE-14016 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Gopal V > > Rollup and Cube queries are not vectorized today due to the miss of > grouping-sets inside vector group by. > The cube and rollup operators can be shimmed onto the end of the pipeline by > converting a single row writer into a multiple row writer. > The corresponding non-vec loop is as follows > {code} > if (groupingSetsPresent) { > Object[] newKeysArray = newKeys.getKeyArray(); > Object[] cloneNewKeysArray = new Object[newKeysArray.length]; > for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) { > cloneNewKeysArray[keyPos] = newKeysArray[keyPos]; > } > for (int groupingSetPos = 0; groupingSetPos < groupingSets.size(); > groupingSetPos++) { > for (int keyPos = 0; keyPos < groupingSetsPosition; keyPos++) { > newKeysArray[keyPos] = null; > } > FastBitSet bitset = groupingSetsBitSet[groupingSetPos]; > // Some keys need to be left to null corresponding to that grouping > set. > for (int keyPos = bitset.nextSetBit(0); keyPos >= 0; > keyPos = bitset.nextSetBit(keyPos+1)) { > newKeysArray[keyPos] = cloneNewKeysArray[keyPos]; > } > newKeysArray[groupingSetsPosition] = > newKeysGroupingSets[groupingSetPos]; > processKey(row, rowInspector); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13984: --- Status: Patch Available (was: Open) > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch, > HIVE-13984.03.patch, HIVE-13984.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Status: Patch Available (was: Open) > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13984: --- Attachment: HIVE-13984.04.patch > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch, > HIVE-13984.03.patch, HIVE-13984.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13984) Use multi-threaded approach to listing files for msck
[ https://issues.apache.org/jira/browse/HIVE-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-13984: --- Status: Open (was: Patch Available) > Use multi-threaded approach to listing files for msck > - > > Key: HIVE-13984 > URL: https://issues.apache.org/jira/browse/HIVE-13984 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13984.01.patch, HIVE-13984.02.patch, > HIVE-13984.03.patch, HIVE-13984.04.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Status: Open (was: Patch Available) > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13961) ACID: Major compaction fails to include the original bucket files if there's no delta directory
[ https://issues.apache.org/jira/browse/HIVE-13961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330894#comment-15330894 ] Eugene Koifman commented on HIVE-13961: --- +1 > ACID: Major compaction fails to include the original bucket files if there's > no delta directory > --- > > Key: HIVE-13961 > URL: https://issues.apache.org/jira/browse/HIVE-13961 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Blocker > Attachments: HIVE-13961.1.patch, HIVE-13961.2.patch, > HIVE-13961.3.patch, HIVE-13961.4.patch, HIVE-13961.5.patch, HIVE-13961.6.patch > > > The issue can be reproduced by steps below: > 1. Insert a row to Non-ACID table > 2. Convert Non-ACID to ACID table (i.e. set transactional=true table property) > 3. Perform Major compaction -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13958) hive.strict.checks.type.safety should apply to decimals, as well as IN... and BETWEEN... ops
[ https://issues.apache.org/jira/browse/HIVE-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330839#comment-15330839 ] Sergey Shelukhin commented on HIVE-13958: - One nit on RB; also, this doesn't actually cover decimal <-> string case > hive.strict.checks.type.safety should apply to decimals, as well as IN... and > BETWEEN... ops > > > Key: HIVE-13958 > URL: https://issues.apache.org/jira/browse/HIVE-13958 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Takuma Wakamori > Labels: patch > Attachments: HIVE-13958.01.patch, HIVE-13958.02.patch, > HIVE-13958.03.patch > > > String to decimal auto-casts should be prohibited for compares -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13958) hive.strict.checks.type.safety should apply to decimals, as well as IN... and BETWEEN... ops
[ https://issues.apache.org/jira/browse/HIVE-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13958: Attachment: HIVE-13958.03.patch The same patch as 02, looks like HiveQA died and skipped this patch. > hive.strict.checks.type.safety should apply to decimals, as well as IN... and > BETWEEN... ops > > > Key: HIVE-13958 > URL: https://issues.apache.org/jira/browse/HIVE-13958 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Takuma Wakamori > Labels: patch > Attachments: HIVE-13958.01.patch, HIVE-13958.02.patch, > HIVE-13958.03.patch > > > String to decimal auto-casts should be prohibited for compares -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13617) LLAP: support non-vectorized execution in IO
[ https://issues.apache.org/jira/browse/HIVE-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330823#comment-15330823 ] Sergey Shelukhin commented on HIVE-13617: - [~spena] when files are explicitly specified in qfile, they are run regardless of the properties file. For now I just added the out file. [~prasanth_j] can you please review > LLAP: support non-vectorized execution in IO > > > Key: HIVE-13617 > URL: https://issues.apache.org/jira/browse/HIVE-13617 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13617-wo-11417.patch, HIVE-13617-wo-11417.patch, > HIVE-13617.01.patch, HIVE-13617.03.patch, HIVE-13617.04.patch, > HIVE-13617.05.patch, HIVE-13617.06.patch, HIVE-13617.patch, HIVE-13617.patch, > HIVE-15396-with-oi.patch > > > Two approaches - a separate decoding path, into rows instead of VRBs; or > decoding VRBs into rows on a higher level (the original LlapInputFormat). I > think the latter might be better - it's not a hugely important path, and perf > in non-vectorized case is not the best anyway, so it's better to make do with > much less new code and architectural disruption. > Some ORC patches in progress introduce an easy to reuse (or so I hope, > anyway) VRB-to-row conversion, so we should just use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14002) Extend limit propagation to subsequent RS operators
[ https://issues.apache.org/jira/browse/HIVE-14002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330819#comment-15330819 ] Ashutosh Chauhan commented on HIVE-14002: - I am not sure if we can allow *any* operators between two RSs other than GBy. e.g., filter can be problematic if first Limit only generates N rows which filters eat all of it. We will get incorrect result. Even for Select operator we can allow this only for column references and constants. > Extend limit propagation to subsequent RS operators > --- > > Key: HIVE-14002 > URL: https://issues.apache.org/jira/browse/HIVE-14002 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-14002.patch > > > In some occasions, for instance when RS dedup does not kick in, it is useful > to propagate the limit to subsequent RS operators, as this will reduce > intermediary results and impact performance. This issue covers that extension. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available
[ https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-14010. -- Resolution: Fixed Fix Version/s: 2.2.0 2.1.0 1.3.0 Committed to branch-1, branch-2.1 and master. > parquet-logging.properties from HIVE_CONF_DIR should be used when available > --- > > Key: HIVE-14010 > URL: https://issues.apache.org/jira/browse/HIVE-14010 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.1.0, 2.2.0 > > Attachments: HIVE-14010.1.patch > > > Following up on HIVE-13954, when parquet-logging.properties is available in > HIVE_CONF_DIR it should be used first. When not available fallback to > relative path from bin directory. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available
[ https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330804#comment-15330804 ] Ashutosh Chauhan commented on HIVE-14010: - +1 > parquet-logging.properties from HIVE_CONF_DIR should be used when available > --- > > Key: HIVE-14010 > URL: https://issues.apache.org/jira/browse/HIVE-14010 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14010.1.patch > > > Following up on HIVE-13954, when parquet-logging.properties is available in > HIVE_CONF_DIR it should be used first. When not available fallback to > relative path from bin directory. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330792#comment-15330792 ] Ashutosh Chauhan commented on HIVE-14014: - +1 > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330790#comment-15330790 ] Pengcheng Xiong commented on HIVE-14014: Done. > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13833) Add an initial delay when starting the heartbeat
[ https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13833: - Resolution: Fixed Fix Version/s: 2.2.0 1.3.0 Status: Resolved (was: Patch Available) > Add an initial delay when starting the heartbeat > > > Key: HIVE-13833 > URL: https://issues.apache.org/jira/browse/HIVE-13833 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Minor > Fix For: 1.3.0, 2.2.0 > > Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch, > HIVE-13833.3.patch, HIVE-13833.4.patch > > > Since the scheduling of heartbeat happens immediately after lock acquisition, > it's unnecessary to send heartbeat at the time when locks is acquired. Add an > initial delay to skip this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13833) Add an initial delay when starting the heartbeat
[ https://issues.apache.org/jira/browse/HIVE-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330787#comment-15330787 ] Wei Zheng commented on HIVE-13833: -- Committed to master and branch-1. Thanks Eugene for the review. > Add an initial delay when starting the heartbeat > > > Key: HIVE-13833 > URL: https://issues.apache.org/jira/browse/HIVE-13833 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Minor > Fix For: 1.3.0, 2.2.0 > > Attachments: HIVE-13833.1.patch, HIVE-13833.2.patch, > HIVE-13833.3.patch, HIVE-13833.4.patch > > > Since the scheduling of heartbeat happens immediately after lock acquisition, > it's unnecessary to send heartbeat at the time when locks is acquired. Add an > initial delay to skip this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11166) HiveHBaseTableOutputFormat can't call getFileExtension(JobConf jc, boolean isCompressed, HiveOutputFormat hiveOutputFormat)
[ https://issues.apache.org/jira/browse/HIVE-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330789#comment-15330789 ] Aihua Xu commented on HIVE-11166: - [~Yun Zhao] The change seems reasonable to me. Can we add one unit test to cover this hbase test case? > HiveHBaseTableOutputFormat can't call getFileExtension(JobConf jc, boolean > isCompressed, HiveOutputFormat hiveOutputFormat) > - > > Key: HIVE-11166 > URL: https://issues.apache.org/jira/browse/HIVE-11166 > Project: Hive > Issue Type: Bug > Components: HBase Handler, Spark >Reporter: meiyoula >Assignee: Yun Zhao > Attachments: HIVE-11166.2.patch, HIVE-11166.patch > > > I create a hbase table with HBaseStorageHandler in JDBCServer of spark, then > execute the *insert into* sql statement, ClassCastException occurs. > {quote} > Error: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 3.0 (TID 12, vm-17): java.lang.ClassCastException: > org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to > org.apache.hadoop.hive.ql.io.HiveOutputFormat > at > org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72) > at > org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71) > at > org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91) > at > org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115) > at > org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > {quote} > It's because the code in spark below. To hbase table, the outputFormat is > HiveHBaseTableOutputFormat, it isn't instanceOf[HiveOutputForm > at]. > {quote} > @transient private lazy val > outputFormat=conf.value.getOutputFormat.asInstanceOf[HiveOutputForm > at[AnyRef, Writable]] > val extension = Utilities.getFileExtension(conf.value, > fileSinkConf.getCompressed, outputFormat) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Status: Patch Available (was: Open) > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Attachment: HIVE-14014.02.patch > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Status: Open (was: Patch Available) > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch, HIVE-14014.02.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14011) MessageFactory is not pluggable
[ https://issues.apache.org/jira/browse/HIVE-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sravya Tirukkovalur updated HIVE-14011: --- Attachment: HIVE-14011.patch Attaching a fix. > MessageFactory is not pluggable > --- > > Key: HIVE-14011 > URL: https://issues.apache.org/jira/browse/HIVE-14011 > Project: Hive > Issue Type: Bug >Reporter: Sravya Tirukkovalur > Attachments: HIVE-14011.patch > > > Property "hcatalog.message.factory.impl.json" is available to use a custom > message factory implementation. Although it is not pluggable as > MessageFatcory is hardcoded to use JSONMessageFactory. > https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/messaging/MessageFactory.java#L39 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13970) refactor LLAPIF splits - get rid of SubmitWorkInfo
[ https://issues.apache.org/jira/browse/HIVE-13970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13970: Attachment: HIVE-13970.patch The patch for HiveQA again... > refactor LLAPIF splits - get rid of SubmitWorkInfo > -- > > Key: HIVE-13970 > URL: https://issues.apache.org/jira/browse/HIVE-13970 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13970.only.patch, HIVE-13970.patch, HIVE-13970.patch > > > First we build the signable vertex spec, convert it into bytes (as we > should), and put it inside SubmitWorkInfo. Then we serialize that into byte[] > and put it into LlapInputSplit. Then we serialize that to return... We should > get rid of one of the steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14009) Acid DB creation error in HiveQA
[ https://issues.apache.org/jira/browse/HIVE-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330767#comment-15330767 ] Eugene Koifman commented on HIVE-14009: --- [~spena] could you comment? > Acid DB creation error in HiveQA > > > Key: HIVE-14009 > URL: https://issues.apache.org/jira/browse/HIVE-14009 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Seen when running TestEncryptedHDFSCliDriver, at least with Hadoop 2.7.2 > (HIVE-13930). > Looks like such issues are usually caused by concurrent db creation from > multiple threads. > {noformat} > java.lang.RuntimeException: Unable to set up transaction database for > testing: Exception during creation of file > /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat > for container > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkQFileTestHack(TxnHandler.java:2172) > ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:228) > ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:96) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTxnHandler(HiveMetaStore.java:557) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5902) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_25] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_25] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_25] > at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at com.sun.proxy.$Proxy111.heartbeat(Unknown Source) [?:?] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:2140) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_25] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_25] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_25] > at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25] > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at com.sun.proxy.$Proxy112.heartbeat(Unknown Source) [?:?] > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.heartbeat(DbTxnManager.java:663) > [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:423) > [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:633) > [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_25] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [?:1.8.0_25] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_25] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [?:1.8.0_25] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_25] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_25] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25] > Caused by: java.sql.SQLException: Exception during creation of file > /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat > for container > at > org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown > Source) ~[derby-10.10.2.0.jar:?] > at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) >
[jira] [Updated] (HIVE-13930) upgrade Hive to latest Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13930: Attachment: HIVE-13930.03.patch HiveQA failed silently, trying again. > upgrade Hive to latest Hadoop version > - > > Key: HIVE-13930 > URL: https://issues.apache.org/jira/browse/HIVE-13930 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, > HIVE-13930.03.patch, HIVE-13930.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14009) Acid DB creation error in HiveQA
[ https://issues.apache.org/jira/browse/HIVE-14009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330760#comment-15330760 ] Sergey Shelukhin commented on HIVE-14009: - No idea. > Acid DB creation error in HiveQA > > > Key: HIVE-14009 > URL: https://issues.apache.org/jira/browse/HIVE-14009 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Seen when running TestEncryptedHDFSCliDriver, at least with Hadoop 2.7.2 > (HIVE-13930). > Looks like such issues are usually caused by concurrent db creation from > multiple threads. > {noformat} > java.lang.RuntimeException: Unable to set up transaction database for > testing: Exception during creation of file > /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat > for container > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkQFileTestHack(TxnHandler.java:2172) > ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.setConf(TxnHandler.java:228) > ~[hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.txn.TxnUtils.getTxnStore(TxnUtils.java:96) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getTxnHandler(HiveMetaStore.java:557) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.heartbeat(HiveMetaStore.java:5902) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_25] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_25] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_25] > at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at com.sun.proxy.$Proxy111.heartbeat(Unknown Source) [?:?] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.heartbeat(HiveMetaStoreClient.java:2140) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_25] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_25] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_25] > at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25] > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > [hive-metastore-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at com.sun.proxy.$Proxy112.heartbeat(Unknown Source) [?:?] > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.heartbeat(DbTxnManager.java:663) > [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.heartbeat(DbTxnManager.java:423) > [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$Heartbeater.run(DbTxnManager.java:633) > [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_25] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [?:1.8.0_25] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_25] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [?:1.8.0_25] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_25] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_25] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25] > Caused by: java.sql.SQLException: Exception during creation of file > /home/hiveptest/54.219.24.101-hiveptest-0/apache-github-source-source/itests/qtest/target/tmp/junit_metastore_db/seg0/cc60.dat > for container > at > org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown > Source) ~[derby-10.10.2.0.jar:?] > at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source) >
[jira] [Updated] (HIVE-13771) LLAPIF: generate app ID
[ https://issues.apache.org/jira/browse/HIVE-13771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13771: Attachment: HIVE-13771.03.patch Looks like HiveQA failed silently > LLAPIF: generate app ID > --- > > Key: HIVE-13771 > URL: https://issues.apache.org/jira/browse/HIVE-13771 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13771.01.patch, HIVE-13771.02.patch, > HIVE-13771.03.patch, HIVE-13771.patch > > > See comments in the HIVE-13675 patch. The uniqueness needs to be ensured; the > user may be allowed to supply a prefix (e.g. his YARN app Id, if any) for > ease of tracking -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330755#comment-15330755 ] Ashutosh Chauhan commented on HIVE-14014: - partition dir may not get created in commit() so its better to pass in {{filescreated}} boolean. > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin
[ https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13986: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Committed to master. > LLAP: kill Tez AM on token errors from plugin > - > > Key: HIVE-13986 > URL: https://issues.apache.org/jira/browse/HIVE-13986 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.2.0 > > Attachments: HIVE-13986.01.patch, HIVE-13986.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330703#comment-15330703 ] Ashutosh Chauhan commented on HIVE-13985: - Can you create a RB entry for this? > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-2.1.patch, > HIVE-13985.1.patch, HIVE-13985.2.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330681#comment-15330681 ] Sergey Shelukhin commented on HIVE-13974: - Some comments on RB > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13974.01.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Attachment: HIVE-14014.01.patch > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330679#comment-15330679 ] Pengcheng Xiong commented on HIVE-14014: [~ashutoshc], could u take a look? Thanks. > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14014) zero length file is being created for empty bucket in tez mode (II)
[ https://issues.apache.org/jira/browse/HIVE-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14014: --- Status: Patch Available (was: Open) > zero length file is being created for empty bucket in tez mode (II) > --- > > Key: HIVE-14014 > URL: https://issues.apache.org/jira/browse/HIVE-14014 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14014.01.patch > > > The same problem happens when source table is not empty, e.g,, when "limit 0" > is not there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13696) Monitor fair-scheduler.xml and automatically update/validate jobs submitted to fair-scheduler
[ https://issues.apache.org/jira/browse/HIVE-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330656#comment-15330656 ] Yongzhi Chen commented on HIVE-13696: - LGTM +1 But I do not know this part of code well, [~prasadm], could you review the patch? Thanks > Monitor fair-scheduler.xml and automatically update/validate jobs submitted > to fair-scheduler > - > > Key: HIVE-13696 > URL: https://issues.apache.org/jira/browse/HIVE-13696 > Project: Hive > Issue Type: Improvement >Reporter: Reuben Kuhnert >Assignee: Reuben Kuhnert > Attachments: HIVE-13696.01.patch, HIVE-13696.02.patch, > HIVE-13696.06.patch, HIVE-13696.08.patch, HIVE-13696.11.patch, > HIVE-13696.13.patch > > > Ensure that jobs are placed into the correct queue according to > {{fair-scheduler.xml}}. Jobs should be placed into the correct queue, and > users should not be able to submit jobs to queues they do not have access to. > This patch builds on the existing functionality in {{FairSchedulerShim}} to > route jobs to user-specific queue based on {{fair-scheduler.xml}} > configuration (leveraging the Yarn {{QueuePlacementPolicy}} class). In > addition to configuring job routing at session connect (current behavior), > the routing is validated per submission to yarn (when impersonation is off). > A {{FileSystemWatcher}} class is included to monitor changes in the > {{fair-scheduler.xml}} file (so updates are automatically reloaded when the > file pointed to by {{yarn.scheduler.fair.allocation.file}} is changed). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13959) MoveTask should only release its query associated locks
[ https://issues.apache.org/jira/browse/HIVE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-13959: --- Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) HIVE-13959.patch has committed to 2.2.0 and 2.1.1. Thanks [~ychena] for review. > MoveTask should only release its query associated locks > --- > > Key: HIVE-13959 > URL: https://issues.apache.org/jira/browse/HIVE-13959 > Project: Hive > Issue Type: Bug > Components: Locking >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-13959.1.patch, HIVE-13959.patch, HIVE-13959.patch > > > releaseLocks in MoveTask releases all locks under a HiveLockObject pathNames. > But some of locks under this pathNames might be for other queries and should > not be released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14013) Describe table doesn't show unicode properly
[ https://issues.apache.org/jira/browse/HIVE-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14013: Status: Patch Available (was: Open) Patch-1: for various places, use utf-8 encoding when writing to output stream. We need to come up with hive specific escape() version since the common one also escapes unicode characters which causes the issue. > Describe table doesn't show unicode properly > > > Key: HIVE-14013 > URL: https://issues.apache.org/jira/browse/HIVE-14013 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14013.1.patch > > > Describe table output will show comments incorrectly rather than the unicode > itself. > {noformat} > hive> desc formatted t1; > # Detailed Table Information > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > comment \u8868\u4E2D\u6587\u6D4B\u8BD5 > numFiles0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14013) Describe table doesn't show unicode properly
[ https://issues.apache.org/jira/browse/HIVE-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14013: Attachment: HIVE-14013.1.patch > Describe table doesn't show unicode properly > > > Key: HIVE-14013 > URL: https://issues.apache.org/jira/browse/HIVE-14013 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.2.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14013.1.patch > > > Describe table output will show comments incorrectly rather than the unicode > itself. > {noformat} > hive> desc formatted t1; > # Detailed Table Information > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} > comment \u8868\u4E2D\u6587\u6D4B\u8BD5 > numFiles0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14000) (ORC) Changing a numeric type column of a partitioned table to lower type set values to something other than 'NULL'
[ https://issues.apache.org/jira/browse/HIVE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330266#comment-15330266 ] Matt McCline commented on HIVE-14000: - About 1/2 the change is eliminating unused members and parameters after row reading parts of all the tree readers were eliminated. > (ORC) Changing a numeric type column of a partitioned table to lower type set > values to something other than 'NULL' > --- > > Key: HIVE-14000 > URL: https://issues.apache.org/jira/browse/HIVE-14000 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14000.01.patch > > > When an integer column is changed to a type that is smaller (e.g. bigint to > int) and set hive.metastore.disallow.incompatible.col.type.changes=false, the > data is clipped instead of being NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14000) (ORC) Changing a numeric type column of a partitioned table to lower type set values to something other than 'NULL'
[ https://issues.apache.org/jira/browse/HIVE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330251#comment-15330251 ] Matt McCline commented on HIVE-14000: - [~sershe] I added a link to RB. Yes, the only related failues are I didn't update the MiniTez Q file outputs and mistakenly updated schema_evol_stats. > (ORC) Changing a numeric type column of a partitioned table to lower type set > values to something other than 'NULL' > --- > > Key: HIVE-14000 > URL: https://issues.apache.org/jira/browse/HIVE-14000 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14000.01.patch > > > When an integer column is changed to a type that is smaller (e.g. bigint to > int) and set hive.metastore.disallow.incompatible.col.type.changes=false, the > data is clipped instead of being NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14008) Duplicate line in LLAP SecretManager
[ https://issues.apache.org/jira/browse/HIVE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14008: Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Committed to branches. > Duplicate line in LLAP SecretManager > > > Key: HIVE-14008 > URL: https://issues.apache.org/jira/browse/HIVE-14008 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Trivial > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14008.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
[ https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330193#comment-15330193 ] Sergey Shelukhin commented on HIVE-13974: - Test failures look related. I'd look at the patch later today. > ORC Schema Evolution doesn't support add columns to non-last STRUCT columns > --- > > Key: HIVE-13974 > URL: https://issues.apache.org/jira/browse/HIVE-13974 > Project: Hive > Issue Type: Bug > Components: Hive, ORC, Transactions >Affects Versions: 1.3.0, 2.1.0, 2.2.0 >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13974.01.patch > > > Currently, the included columns are based on the fileSchema and not the > readerSchema which doesn't work for adding columns to non-last STRUCT data > type columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14000) (ORC) Changing a numeric type column of a partitioned table to lower type set values to something other than 'NULL'
[ https://issues.apache.org/jira/browse/HIVE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330187#comment-15330187 ] Sergey Shelukhin commented on HIVE-14000: - Is it possible to have an RB? Also, test failures look related. > (ORC) Changing a numeric type column of a partitioned table to lower type set > values to something other than 'NULL' > --- > > Key: HIVE-14000 > URL: https://issues.apache.org/jira/browse/HIVE-14000 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14000.01.patch > > > When an integer column is changed to a type that is smaller (e.g. bigint to > int) and set hive.metastore.disallow.incompatible.col.type.changes=false, the > data is clipped instead of being NULL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330182#comment-15330182 ] Sergey Shelukhin commented on HIVE-13957: - Committed there too. Thanks! > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize
[ https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14012: Reporter: Takahiko Saito (was: Sergey Shelukhin) > some ColumnVector-s are missing ensureSize > -- > > Key: HIVE-14012 > URL: https://issues.apache.org/jira/browse/HIVE-14012 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-14012.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13957: Fix Version/s: 2.1.1 > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-13957: Target Version/s: (was: 2.1.1) > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.0.2 > > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13648) ORC Schema Evolution doesn't support same type conversion for VARCHAR, CHAR, or DECIMAL when maxLength or precision/scale is different
[ https://issues.apache.org/jira/browse/HIVE-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330181#comment-15330181 ] Prasanth Jayachandran commented on HIVE-13648: -- [~mmccline] I can see enforcing precision and scale in decimal conversion, but I don't see enforcing change in maxLength for char/varchar conversion reader. Also fileType argument passed StringGroupFromStringGroupTreeReader seems to be unused. > ORC Schema Evolution doesn't support same type conversion for VARCHAR, CHAR, > or DECIMAL when maxLength or precision/scale is different > -- > > Key: HIVE-13648 > URL: https://issues.apache.org/jira/browse/HIVE-13648 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13648.01.patch, HIVE-13648.02.patch > > > E.g. when a data file is copied in has a VARCHAR maxLength that doesn't match > the DDL's maxLength. This error is produced: > {code} > java.io.IOException: ORC does not support type conversion from file type > varchar(145) (36) to reader type varchar(114) (36) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14012) some ColumnVector-s are missing ensureSize
[ https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330176#comment-15330176 ] Sergey Shelukhin commented on HIVE-14012: - [~owen.omalley] can you comment on the above? I understand you added the complex type vectors. > some ColumnVector-s are missing ensureSize > -- > > Key: HIVE-14012 > URL: https://issues.apache.org/jira/browse/HIVE-14012 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14012.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14006) Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-14006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam reassigned HIVE-14006: Assignee: Naveen Gangam > Hive query with UNION ALL fails with ArrayIndexOutOfBoundsException > --- > > Key: HIVE-14006 > URL: https://issues.apache.org/jira/browse/HIVE-14006 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > > set hive.cbo.enable=false; > DROP VIEW IF EXISTS a_view; > DROP TABLE IF EXISTS table_a1; > DROP TABLE IF EXISTS table_a2; > DROP TABLE IF EXISTS table_b1; > DROP TABLE IF EXISTS table_b2; > CREATE TABLE table_a1 > (composite_key STRING); > CREATE TABLE table_a2 > (composite_key STRING); > CREATE TABLE table_b1 > (composite_key STRING, col1 STRING); > CREATE TABLE table_b2 > (composite_key STRING); > CREATE VIEW a_view AS > SELECT > substring(a1.composite_key, 1, locate('|',a1.composite_key) - 1) AS autoname, > NULL AS col1 > FROM table_a1 a1 > FULL OUTER JOIN table_a2 a2 > ON a1.composite_key = a2.composite_key > UNION ALL > SELECT > substring(b1.composite_key, 1, locate('|',b1.composite_key) - 1) AS autoname, > b1.col1 AS col1 > FROM table_b1 b1 > FULL OUTER JOIN table_b2 b2 > ON b1.composite_key = b2.composite_key; > INSERT INTO TABLE table_b1 > SELECT * FROM ( > SELECT 'something|awful', 'col1' > )s ; > SELECT autoname > FROM a_view > WHERE autoname='something'; > fails with > Diagnostic Messages for this Task: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"_col0":"something"} > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row {"_col0":"something"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) > at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) > ... 8 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.ql.exec.UnionOperator.processOp(UnionOperator.java:134) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) > The same query succeeds when {{hive.ppd.remove.duplicatefilters=false}} with > or without CBO on. It also succeeds with just CBO on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize
[ https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14012: Status: Patch Available (was: Open) > some ColumnVector-s are missing ensureSize > -- > > Key: HIVE-14012 > URL: https://issues.apache.org/jira/browse/HIVE-14012 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14012.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize
[ https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14012: Attachment: HIVE-14012.patch [~prasanth_j] [~mmccline] can you take a look? Also, do List and Map vectors need ensureSize? It doesn't look like it, but I wonder if something needs to be done with child vectors. > some ColumnVector-s are missing ensureSize > -- > > Key: HIVE-14012 > URL: https://issues.apache.org/jira/browse/HIVE-14012 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14012.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Attachment: HIVE-13985-branch-2.1.patch HIVE-13985-branch-1.patch Attaching branch-1 and branch-2.1 patches > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-2.1.patch, > HIVE-13985.1.patch, HIVE-13985.2.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems
[ https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330108#comment-15330108 ] Sergey Shelukhin commented on HIVE-13901: - Um, no +1 for now, some feedback on RB > Hivemetastore add partitions can be slow depending on filesystems > - > > Key: HIVE-13901 > URL: https://issues.apache.org/jira/browse/HIVE-13901 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch > > > Depending on FS, creating external tables & adding partitions can be > expensive (e.g msck which adds all partitions). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Attachment: HIVE-13985.2.patch > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985.1.patch, HIVE-13985.2.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329890#comment-15329890 ] Hive QA commented on HIVE-14007: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12810403/HIVE-14007.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 9438 tests executed *Failed tests:* {noformat} TestBitFieldReader - did not produce a TEST-*.xml file TestBitPack - did not produce a TEST-*.xml file TestColumnStatistics - did not produce a TEST-*.xml file TestColumnStatisticsImpl - did not produce a TEST-*.xml file TestDataReaderProperties - did not produce a TEST-*.xml file TestDynamicArray - did not produce a TEST-*.xml file TestFileDump - did not produce a TEST-*.xml file TestInStream - did not produce a TEST-*.xml file TestIntegerCompressionReader - did not produce a TEST-*.xml file TestJsonFileDump - did not produce a TEST-*.xml file TestMemoryManager - did not produce a TEST-*.xml file TestNewIntegerEncoding - did not produce a TEST-*.xml file TestOrcNullOptimization - did not produce a TEST-*.xml file TestOrcTimezone1 - did not produce a TEST-*.xml file TestOrcTimezone2 - did not produce a TEST-*.xml file TestOrcWideTable - did not produce a TEST-*.xml file TestOutStream - did not produce a TEST-*.xml file TestRLEv2 - did not produce a TEST-*.xml file TestReaderImpl - did not produce a TEST-*.xml file TestRecordReaderImpl - did not produce a TEST-*.xml file TestRunLengthByteReader - did not produce a TEST-*.xml file TestRunLengthIntegerReader - did not produce a TEST-*.xml file TestSerializationUtils - did not produce a TEST-*.xml file TestStreamName - did not produce a TEST-*.xml file TestStringDictionary - did not produce a TEST-*.xml file TestStringRedBlackTree - did not produce a TEST-*.xml file TestTypeDescription - did not produce a TEST-*.xml file TestUnrolledBitPack - did not produce a TEST-*.xml file TestVectorOrcFile - did not produce a TEST-*.xml file TestZlib - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.ql.TestTxnCommands.testSimpleAcidInsert {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/123/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/123/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-123/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 35 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12810403 - PreCommit-HIVE-MASTER-Build > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329862#comment-15329862 ] Prasanth Jayachandran commented on HIVE-13985: -- Targeting this patch for branch-2.1. Since orc is moving out in HIVE-14007 will wait for master commit. [~ashutoshc] fyi. > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985.1.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Attachment: HIVE-13985.1.patch > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985.1.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Status: Patch Available (was: Open) > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985.1.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13951) GenericUDFArray should constant fold at compile time
[ https://issues.apache.org/jira/browse/HIVE-13951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zadoroshnyak updated HIVE-13951: --- Priority: Critical (was: Major) > GenericUDFArray should constant fold at compile time > > > Key: HIVE-13951 > URL: https://issues.apache.org/jira/browse/HIVE-13951 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.3.0, 2.1.0 >Reporter: Sergey Zadoroshnyak >Priority: Critical > > 1. Hive constant propagation optimizer is enabled. > hive.optimize.constant.propagation=true; > 2. Hive query: > select array('Total','Total') from some_table; > ERROR: org.apache.hadoop.hive.ql.optimizer.ConstantPropagateProcFactory > (ConstantPropagateProcFactory.java:evaluateFunction(939)) - Unable to > evaluate org.apache.hadoop.hive.ql.udf.generic.GenericUDFArray@3d26c423. > Return value unrecoginizable. > Details: > During compilation of query, hive checks if any subexpression of a specified > expression can be evaluated to be constant and replaces such subexpression > with the constant. > If the expression is a deterministic UDF and all the subexpressions are > constants, the value will be calculated immediately during compilation time > (not runtime) > So array is a deterministic UDF, 'Total' is string constant. So Hive tries > to replace result of evaluation UDF with the constant. > But looks like, that Hive only supports primitives and struct objects. > So, array is not supported yet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329638#comment-15329638 ] Sergio Peña commented on HIVE-13964: Thanks [~ayousufi]. This is working very good. Now the only problem is the test {{TestBeeLineWithArgs}} that fail on HiveQA. Whenever you see a {{did not produce a TEST-*.xml file}} message, it means that the test was taking too long and PTest had to kill the process. Currently, we have 40m of expiration time to run a test. Could you take a look at it? Maybe there are some tests that are waiting for user/pass to be passed, and they are hanging the test execution. > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, > HIVE-13964.03.patch, HIVE-13964.04.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-14007: - Attachment: HIVE-14007.patch This patch makes the change and deletes the files. > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-14007.patch > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14007) Replace ORC module with ORC release
[ https://issues.apache.org/jira/browse/HIVE-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-14007: - Status: Patch Available (was: Open) > Replace ORC module with ORC release > --- > > Key: HIVE-14007 > URL: https://issues.apache.org/jira/browse/HIVE-14007 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.2.0 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > > This completes moving the core ORC reader & writer to the ORC project. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13928) Hive2: float value need to be single quoted inside where clause to return rows when it doesn't have to be
[ https://issues.apache.org/jira/browse/HIVE-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329567#comment-15329567 ] Takahiko Saito commented on HIVE-13928: --- I don't think anyone is working on it. Cc: [~mmccline] [~jdere] > Hive2: float value need to be single quoted inside where clause to return > rows when it doesn't have to be > - > > Key: HIVE-13928 > URL: https://issues.apache.org/jira/browse/HIVE-13928 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Priority: Critical > > The below select where with float value does not return any row: > {noformat} > 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> drop table test; > No rows affected (0.212 seconds) > 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> create table test (f float); > No rows affected (1.131 seconds) > 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> insert into table test values > (-35664.76),(29497.34); > No rows affected (2.482 seconds) > 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> select * from test; > ++--+ > | test.f | > ++--+ > | -35664.76 | > | 29497.34 | > ++--+ > 2 rows selected (0.142 seconds) > 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> select * from test where f = > -35664.76; > +-+--+ > | test.f | > +-+--+ > +-+--+ > {noformat} > The workaround is to single quote float value: > {noformat} > 0: jdbc:hive2://os-r7-mvjkcu-hiveserver2-11-4> select * from test where f = > '-35664.76'; > ++--+ > | test.f | > ++--+ > | -35664.76 | > ++--+ > 1 row selected (0.163 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13966) DbNotificationListener: can loose DDL operation notifications
[ https://issues.apache.org/jira/browse/HIVE-13966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329382#comment-15329382 ] Reuben Kuhnert commented on HIVE-13966: --- Looking at this pattern in a number of metastore functions: {code} if (!success) { ms.rollbackTransaction(); if (madeDir) { wh.deleteDir(tblPath, true); } } for (MetaStoreEventListener listener : listeners) { CreateTableEvent createTableEvent = new CreateTableEvent(tbl, success, this); createTableEvent.setEnvironmentContext(envContext); listener.onCreateTable(createTableEvent); } {code} I'm noticing that {{DBNotificationListener}} is a subclass of {{MetastoreEventListener}}. When you say we should not require bringing all post event listeners into the transaction (but we do want to bring in {{DbNotificationListener}}), would that mean having a separate hierarchy for those listeners that *should* be part of the transaction? Is that what is meant by 'synchronous' (part of the transaction) or do we mean 'synchronous' as in not queued for processing later, per: {code} * Design overview: This listener takes any event, builds a NotificationEventResponse, * and puts it on a queue. There is a dedicated thread that reads entries from the queue and * places them in the database. The reason for doing it in a separate thread is that we want to * avoid slowing down other metadata operations with the work of putting the notification into * the database. Also, occasionally the thread needs to clean the database of old records. We * definitely don't want to do that as part of another metadata operation. */ public class DbNotificationListener extends MetaStoreEventListener { {code} > DbNotificationListener: can loose DDL operation notifications > - > > Key: HIVE-13966 > URL: https://issues.apache.org/jira/browse/HIVE-13966 > Project: Hive > Issue Type: Bug > Components: HCatalog >Reporter: Nachiket Vaidya >Priority: Critical > > The code for each API in HiveMetaStore.java is like this: > 1. openTransaction() > 2. -- operation-- > 3. commit() or rollback() based on result of the operation. > 4. add entry to notification log (unconditionally) > If the operation is failed (in step 2), we still add entry to notification > log. Found this issue in testing. > It is still ok as this is the case of false positive. > If the operation is successful and adding to notification log failed, the > user will get an MetaException. It will not rollback the operation, as it is > already committed. We need to handle this case so that we will not have false > negatives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13735) Query involving only partition columns need not launch mr/tez job
[ https://issues.apache.org/jira/browse/HIVE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329349#comment-15329349 ] Rajesh Balamohan commented on HIVE-13735: - Thanks [~Takuma] - I have updated the assignee. > Query involving only partition columns need not launch mr/tez job > - > > Key: HIVE-13735 > URL: https://issues.apache.org/jira/browse/HIVE-13735 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Takuma Wakamori > > codebase: hive master > dataset: tpc-ds 10 TB scale > e.g queries: > {noformat} > hive> show partitions web_sales; > ... > ... > Time taken: 0.13 seconds, Fetched: 1824 row(s) > hive> select distinct ws_sold_date_sk from web_sales; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 2.70 s > -- > Status: DAG finished successfully in 2.70 seconds > .. > Time taken: 3.964 seconds, Fetched: 1824 row(s) > hive> select distinct ws_sold_date_sk from web_sales order by ws_sold_date_sk; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED80180100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > Reducer 3 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 03/03 [==>>] 100% ELAPSED TIME: 23.05 s > -- > Status: DAG finished successfully in 23.05 seconds > ... > Time taken: 27.095 seconds, Fetched: 1824 row(s) > {noformat} > since the info is already available in metastore, it might not need to launch > these jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13735) Query involving only partition columns need not launch mr/tez job
[ https://issues.apache.org/jira/browse/HIVE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-13735: Assignee: Takuma Wakamori > Query involving only partition columns need not launch mr/tez job > - > > Key: HIVE-13735 > URL: https://issues.apache.org/jira/browse/HIVE-13735 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Takuma Wakamori > > codebase: hive master > dataset: tpc-ds 10 TB scale > e.g queries: > {noformat} > hive> show partitions web_sales; > ... > ... > Time taken: 0.13 seconds, Fetched: 1824 row(s) > hive> select distinct ws_sold_date_sk from web_sales; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 2.70 s > -- > Status: DAG finished successfully in 2.70 seconds > .. > Time taken: 3.964 seconds, Fetched: 1824 row(s) > hive> select distinct ws_sold_date_sk from web_sales order by ws_sold_date_sk; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED80180100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > Reducer 3 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 03/03 [==>>] 100% ELAPSED TIME: 23.05 s > -- > Status: DAG finished successfully in 23.05 seconds > ... > Time taken: 27.095 seconds, Fetched: 1824 row(s) > {noformat} > since the info is already available in metastore, it might not need to launch > these jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13903: --- Fix Version/s: 2.2.0 > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, > HIVE-13903.02.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329237#comment-15329237 ] Jesus Camacho Rodriguez commented on HIVE-13982: [~ashutoshc], fails are unrelated. Could you review the patch? Thanks > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329225#comment-15329225 ] Jesus Camacho Rodriguez commented on HIVE-13957: [~sershe], fix can be pushed to branch-2.1 and fix version set to 2.1.1. About 2.1.0, it is waiting for you vote! :p Thanks > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.0.2 > > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13735) Query involving only partition columns need not launch mr/tez job
[ https://issues.apache.org/jira/browse/HIVE-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329101#comment-15329101 ] Takuma Wakamori commented on HIVE-13735: Hi, [~rajesh.balamohan]. If there is no one working on this issue, I will try. However, I imagine that my solution may be ad-hoc; the corresponding code will be executed only under the condition like: (the query is select) && (have distinct clause) && (select_expr is a partitioning key) && (...). If it is OK, could anyone assign me to this issue? Thanks. > Query involving only partition columns need not launch mr/tez job > - > > Key: HIVE-13735 > URL: https://issues.apache.org/jira/browse/HIVE-13735 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan > > codebase: hive master > dataset: tpc-ds 10 TB scale > e.g queries: > {noformat} > hive> show partitions web_sales; > ... > ... > Time taken: 0.13 seconds, Fetched: 1824 row(s) > hive> select distinct ws_sold_date_sk from web_sales; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED 1 100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 2.70 s > -- > Status: DAG finished successfully in 2.70 seconds > .. > Time taken: 3.964 seconds, Fetched: 1824 row(s) > hive> select distinct ws_sold_date_sk from web_sales order by ws_sold_date_sk; > -- > VERTICES MODESTATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > -- > Map 1 .. container SUCCEEDED80180100 > 0 0 > Reducer 2 .. container SUCCEEDED 1 100 > 0 0 > Reducer 3 .. container SUCCEEDED 1 100 > 0 0 > -- > VERTICES: 03/03 [==>>] 100% ELAPSED TIME: 23.05 s > -- > Status: DAG finished successfully in 23.05 seconds > ... > Time taken: 27.095 seconds, Fetched: 1824 row(s) > {noformat} > since the info is already available in metastore, it might not need to launch > these jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14003) queries running against llap hang at times - preemption issues
[ https://issues.apache.org/jira/browse/HIVE-14003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329052#comment-15329052 ] Siddharth Seth commented on HIVE-14003: --- [~hagleitn] - mind taking a look at the patch, and providing some more information on dummyOps / mergeOps. An interrupt would ideally stop an opeartion - however it's really a suggestion, and we cannot rely on libraries to handle them correctly. I suspect most of Hadoop has issues here. An HDFS jira was created and has already been fixed. The abort flag serves to protect against operations which reset the interrupt status - which is where the avoid blocking op comment comes in. In most cases we'll be OK, with an abort flag check. > queries running against llap hang at times - preemption issues > -- > > Key: HIVE-14003 > URL: https://issues.apache.org/jira/browse/HIVE-14003 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Takahiko Saito >Assignee: Siddharth Seth > Attachments: HIVE-14003.01.patch > > > The preemption logic in the Hive processor needs some more work. There are > definitely windows where the abort flag is completely dropped within the Hive > processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13986) LLAP: kill Tez AM on token errors from plugin
[ https://issues.apache.org/jira/browse/HIVE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329021#comment-15329021 ] Siddharth Seth commented on HIVE-13986: --- +1 > LLAP: kill Tez AM on token errors from plugin > - > > Key: HIVE-13986 > URL: https://issues.apache.org/jira/browse/HIVE-13986 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13986.01.patch, HIVE-13986.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14008) Duplicate line in LLAP SecretManager
[ https://issues.apache.org/jira/browse/HIVE-14008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329016#comment-15329016 ] Siddharth Seth commented on HIVE-14008: --- +1 > Duplicate line in LLAP SecretManager > > > Key: HIVE-14008 > URL: https://issues.apache.org/jira/browse/HIVE-14008 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Trivial > Attachments: HIVE-14008.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available
[ https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14010: - Attachment: HIVE-14010.1.patch [~hagleitn] Can you plz take a look? Its related to the previous logging issue. > parquet-logging.properties from HIVE_CONF_DIR should be used when available > --- > > Key: HIVE-14010 > URL: https://issues.apache.org/jira/browse/HIVE-14010 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14010.1.patch > > > Following up on HIVE-13954, when parquet-logging.properties is available in > HIVE_CONF_DIR it should be used first. When not available fallback to > relative path from bin directory. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14010) parquet-logging.properties from HIVE_CONF_DIR should be used when available
[ https://issues.apache.org/jira/browse/HIVE-14010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14010: - Description: Following up on HIVE-13954, when parquet-logging.properties is available in HIVE_CONF_DIR it should be used first. When not available fallback to relative path from bin directory. NO PRECOMMIT TESTS was:Following up on HIVE-13954, when parquet-logging.properties is available in HIVE_CONF_DIR it should be used first. When not available fallback to relative path from bin directory. > parquet-logging.properties from HIVE_CONF_DIR should be used when available > --- > > Key: HIVE-14010 > URL: https://issues.apache.org/jira/browse/HIVE-14010 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > Following up on HIVE-13954, when parquet-logging.properties is available in > HIVE_CONF_DIR it should be used first. When not available fallback to > relative path from bin directory. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13956) LLAP: external client output is writing to channel before it is writable again
[ https://issues.apache.org/jira/browse/HIVE-13956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328992#comment-15328992 ] Prasanth Jayachandran commented on HIVE-13956: -- [~jdere] Can you please rebase the patch? The test failures looks unrelated. > LLAP: external client output is writing to channel before it is writable again > -- > > Key: HIVE-13956 > URL: https://issues.apache.org/jira/browse/HIVE-13956 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-13956.1.patch > > > Rows are being written/flushed on the output channel without checking if the > channel is writable. Introduce a writability check/wait. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13964) Add a parameter to beeline to allow a properties file to be passed in
[ https://issues.apache.org/jira/browse/HIVE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328989#comment-15328989 ] Hive QA commented on HIVE-13964: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12810164/HIVE-13964.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10191 tests executed *Failed tests:* {noformat} TestBeeLineWithArgs - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/120/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/120/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-120/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12810164 - PreCommit-HIVE-MASTER-Build > Add a parameter to beeline to allow a properties file to be passed in > - > > Key: HIVE-13964 > URL: https://issues.apache.org/jira/browse/HIVE-13964 > Project: Hive > Issue Type: New Feature > Components: Beeline >Affects Versions: 2.0.1 >Reporter: Abdullah Yousufi >Assignee: Abdullah Yousufi >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-13964.01.patch, HIVE-13964.02.patch, > HIVE-13964.03.patch, HIVE-13964.04.patch > > > HIVE-6652 removed the ability to pass in a properties file as a beeline > parameter. It may be a useful feature to be able to pass the file in is a > parameter, such as --property-file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13903) getFunctionInfo is downloading jar on every call
[ https://issues.apache.org/jira/browse/HIVE-13903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated HIVE-13903: --- Resolution: Fixed Fix Version/s: 2.1.1 Status: Resolved (was: Patch Available) Committed. Thanks [~prongs]. Thanks [~jcamachorodriguez] for review. > getFunctionInfo is downloading jar on every call > > > Key: HIVE-13903 > URL: https://issues.apache.org/jira/browse/HIVE-13903 > Project: Hive > Issue Type: Bug >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Fix For: 2.1.1 > > Attachments: HIVE-13903.01.patch, HIVE-13903.01.patch, > HIVE-13903.02.patch > > > on queries using permanent udfs, the jar file of the udf is downloaded > multiple times. Each call originating from Registry.getFunctionInfo. This > increases time for the query, especially if that query is just an explain > query. The jar should be downloaded once, and not downloaded again if the udf > class is accessible in the current thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)