[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3
[ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416662#comment-15416662 ] Lefty Leverenz commented on HIVE-14270: --- Thanks for the fix, Sergio. +1 for the configuration parameters. > Write temporary data to HDFS when doing inserts on tables located on S3 > --- > > Key: HIVE-14270 > URL: https://issues.apache.org/jira/browse/HIVE-14270 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, > HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch, HIVE-14270.6.patch > > > Currently, when doing INSERT statements on tables located at S3, Hive writes > and reads temporary (or intermediate) files to S3 as well. > If HDFS is still the default filesystem on Hive, then we can keep such > temporary files on HDFS to keep things run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3
[ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416644#comment-15416644 ] Hive QA commented on HIVE-14270: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823131/HIVE-14270.6.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10410 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/848/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/848/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-848/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12823131 - PreCommit-HIVE-MASTER-Build > Write temporary data to HDFS when doing inserts on tables located on S3 > --- > > Key: HIVE-14270 > URL: https://issues.apache.org/jira/browse/HIVE-14270 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, > HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch, HIVE-14270.6.patch > > > Currently, when doing INSERT statements on tables located at S3, Hive writes > and reads temporary (or intermediate) files to S3 as well. > If HDFS is still the default filesystem on Hive, then we can keep such > temporary files on HDFS to keep things run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases
[ https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416521#comment-15416521 ] Hive QA commented on HIVE-14511: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823128/HIVE-14511.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10407 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/847/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/847/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-847/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12823128 - PreCommit-HIVE-MASTER-Build > Improve MSCK for partitioned table to deal with special cases > - > > Key: HIVE-14511 > URL: https://issues.apache.org/jira/browse/HIVE-14511 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14511.01.patch > > > Some users will have a folder rather than a file under the last partition > folder. However, msck is going to search for the leaf folder rather than the > last partition folder. We need to improve that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-12546) Hive beeline doesn't support arrow keys and tab
[ https://issues.apache.org/jira/browse/HIVE-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416378#comment-15416378 ] Junjie Chen edited comment on HIVE-12546 at 8/11/16 2:22 AM: - Hi[~sershe] I also tried with OS X(10.11.6) terminal, it works fine with ssh. can we close this? was (Author: junjie): Hi[~sershe] I also tried with OS X(10.11.6) terminal, it works fine with ssh. Could you please specifiy which version of beeline and what ENV you were using? it would be better it you can dump the full ENV, and elaborate reproduce steps. > Hive beeline doesn't support arrow keys and tab > --- > > Key: HIVE-12546 > URL: https://issues.apache.org/jira/browse/HIVE-12546 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Sergey Shelukhin > > On CLI, up/down arrows navigate history, tab auto-completes, and left/right > arrows move around the command text. > Trying to use beeline, I see that these just print key codes or the tab into > the command text. > This should be fixed before removing CLI in favor of beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416380#comment-15416380 ] Hive QA commented on HIVE-14483: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823125/HIVE-14483.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/846/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/846/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-846/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12823125 - PreCommit-HIVE-MASTER-Build > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Zadoroshnyak >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector r
[jira] [Commented] (HIVE-12546) Hive beeline doesn't support arrow keys and tab
[ https://issues.apache.org/jira/browse/HIVE-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416378#comment-15416378 ] Junjie Chen commented on HIVE-12546: Hi[~sershe] I also tried with OS X(10.11.6) terminal, it works fine with ssh. Could you please specifiy which version of beeline and what ENV you were using? it would be better it you can dump the full ENV, and elaborate reproduce steps. > Hive beeline doesn't support arrow keys and tab > --- > > Key: HIVE-12546 > URL: https://issues.apache.org/jira/browse/HIVE-12546 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Sergey Shelukhin > > On CLI, up/down arrows navigate history, tab auto-completes, and left/right > arrows move around the command text. > Trying to use beeline, I see that these just print key codes or the tab into > the command text. > This should be fixed before removing CLI in favor of beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416318#comment-15416318 ] Darren Lo commented on HIVE-14063: -- Hi Vihang, Thanks for considering my points. In my opinion, the fact that we're using a general-purpose JDBC client under the hood should be an implementation detail. If you're trying to make the new HS2-based experience as close as possible to the old Hive CLI experience (which this JIRA is required for, but isn't focused on), part of that is making the configs reasonable to hive users. If hive clients have to know that beeline is a special thing that takes configs in a way they're not used to, that's a burden on those clients. People can probably get past that burden, sure, but if it can be avoided, so much the better. Perhaps when we invoke "hive" (and we want that to internally use beeline / HS2) it can read files from nice, hive-specific locations, and then pass those to beeline in whatever way works for beeline. This hides the beeline details from the hive client. It's a bit trickier when invoking beeline directly if you want to maintain the purity of it being a general-purpose JDBC client. If we use standard Hadoop conf format, clients could decide to leverage the hadoop credential provider to protect / obfuscate passwords. If users are going to connect to an HS2 that is different from what's listed in /etc/hive/conf, then with Sergio's suggestion we can load from somewhere like ~/.hive/ to get those overrides, or they can set HIVE_CONF_DIR=/path/to/custom/hive/conf. They can also just type !connect in the prompt. Also, just a thought, but if the purity of beeline is paramount, you could maybe implement a feature where you give it some beeline-specific configuration file with something like: {noformat} extra.configs.loader=org.apache.HiveToBeelineConfigLoader {noformat} Which would allow beeline to have no native knowledge of Hive. Instead, it would have a generic config loading plugin mechanism that hive could leverage to make it load from hive-site.xml. Quite a bit of extra work to get the configs though, I know. > beeline to auto connect to the HiveServer2 > -- > > Key: HIVE-14063 > URL: https://issues.apache.org/jira/browse/HIVE-14063 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: beeline.conf.template > > > Currently one has to give an jdbc:hive2 url in order for Beeline to connect a > hiveserver2 instance. It would be great if Beeline can get the info somehow > (from a properties file at a well-known location?) and connect automatically > if user doesn't specify such a url. If the properties file is not present, > then beeline would expect user to provide the url and credentials using > !connect or ./beeline -u .. commands > While Beeline is flexible (being a mere JDBC client), most environments would > have just a single HS2. Having users to manually connect into this via either > "beeline ~/.propsfile" or -u or !connect statements is lowering the > experience part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time
[ https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14376: - Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) No related test failures. Thanks for the review! Committed patch to master. > Schema evolution tests takes a long time > > > Key: HIVE-14376 > URL: https://issues.apache.org/jira/browse/HIVE-14376 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch > > > schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. > Same set of tests are added to llap as well in HIVE-14355 which will some > more time. Most tests uses INSERT into table which can be slow and is not > needed. Even some individual tests takes about 5 mins to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow
[ https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416243#comment-15416243 ] Prasanth Jayachandran commented on HIVE-14504: -- This will also bring the runtime for MiniLlap from 7 min to 1m 30s. > tez_join_hash.q test is slow > > > Key: HIVE-14504 > URL: https://issues.apache.org/jira/browse/HIVE-14504 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14504.1.patch > > > tez_join_hash.q also explicitly sets execution engine to mr which slows down > the entire test. Test takes around 7 mins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14376) Schema evolution tests takes a long time
[ https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416229#comment-15416229 ] Hive QA commented on HIVE-14376: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823116/HIVE-14376.2.patch {color:green}SUCCESS:{color} +1 due to 26 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10406 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/845/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/845/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-845/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12823116 - PreCommit-HIVE-MASTER-Build > Schema evolution tests takes a long time > > > Key: HIVE-14376 > URL: https://issues.apache.org/jira/browse/HIVE-14376 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch > > > schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. > Same set of tests are added to llap as well in HIVE-14355 which will some > more time. Most tests uses INSERT into table which can be slow and is not > needed. Even some individual tests takes about 5 mins to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14513) Enhance custom query feature in LDAP atn to support resultset of ldap groups
[ https://issues.apache.org/jira/browse/HIVE-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14513: - Attachment: HIVE-14513.patch The proposed fix uses an existing LDAP configuration property {{hive.server2.authentication.ldap.groupMembershipKey}} to find the users if the user-configured LDAP query returns a group instead of a user. This property is already used for group filters. > Enhance custom query feature in LDAP atn to support resultset of ldap groups > > > Key: HIVE-14513 > URL: https://issues.apache.org/jira/browse/HIVE-14513 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.0.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-14513.patch > > > LDAP Authenticator can be configured to use a result set from a LDAP query to > authenticate. However, is it expected that this LDAP query would only result > a set of users (aka full DNs for the users in LDAP). > However, its not always straightforward to be able to author queries that > return users. For example, say you would like to allow "all users from group1 > and group2" to be authenticated. The LDAP query has to return a union of all > members of the group1 and group2. > For example, one common configuration is that groups contain a list of its > users > "dn: uid=group1,ou=Groups,dc=example,dc=com", > "distinguishedName: uid=group1,ou=Groups,dc=example,dc=com", > "objectClass: top", > "objectClass: groupOfNames", > "objectClass: ExtensibleObject", > "cn: group1", > "ou: Groups", > "sn: group1", > "member: uid=user1,ou=People,dc=example,dc=com", > The query > {{(&(objectClass=groupOfNames)(|(cn=group1)(cn=group2)))}} > will return the entries > uid=group1,ou=Groups,dc=example,dc=com > uid=group2,ou=Groups,dc=example,dc=com > but there is no means to form a query that would return just the values of > "member" attributes. (ldap client tools are able to do by filtering out the > attributes on these entries. > So it will be useful to have such support to be able to specify queries that > return groups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416168#comment-15416168 ] Abdullah Yousufi commented on HIVE-14373: - Hey [~poeppt], that's a pretty good idea and also the ideal behavior for blobstore testing. We'd need to investigate how that would work and what changes would be necessary on hiveqa. Perhaps it would make more sense as a follow up ticket once this change is in? Let me know if that sounds reasonable. Also, thanks for your reviews on the patch as well! > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure
[ https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-14396: --- Attachment: HIVE-14396.1.patch > CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver > count.q failure > --- > > Key: HIVE-14396 > URL: https://issues.apache.org/jira/browse/HIVE-14396 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-14396.1.patch > > > Currently there are three different failures > Set hive.cbo.returnpath.hiveop=true for all cases. > 1) First case is wrong result for following query > {code:title=failure 1 Wrong result} > explain select count(1), count(*), count(a), count(b), count(c), count(d), > count(distinct a), count(distinct b), count(distinct c), count(distinct d), > count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct > a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), > count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), > count(distinct a,b,c,d) from abcd; > {code} > This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for > corresponding expression for a aggregate function's argument wrong index is > being used. > 2) Out of bound exception for following > {code} > set hive.map.aggr=false > explain select count(1), count(*), count(a), count(b), count(c), count(d), > count(distinct a), count(distinct b), count(distinct c), count(distinct d), > count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct > a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), > count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), > count(distinct a,b,c,d) from abcd; > {code} > The above happens while converting Calcite Aggregation to Hive's group by > operator. > 3) Once the above case with exception is fixed same query with > hive.map.aggr=false give wrong results. Problem in this case is that while > creating expression for aggregate function's argument we end up with wrong > column info from underlying reduce sink operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure
[ https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-14396: --- Status: Patch Available (was: Open) > CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver > count.q failure > --- > > Key: HIVE-14396 > URL: https://issues.apache.org/jira/browse/HIVE-14396 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-14396.1.patch > > > Currently there are three different failures > Set hive.cbo.returnpath.hiveop=true for all cases. > 1) First case is wrong result for following query > {code:title=failure 1 Wrong result} > explain select count(1), count(*), count(a), count(b), count(c), count(d), > count(distinct a), count(distinct b), count(distinct c), count(distinct d), > count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct > a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), > count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), > count(distinct a,b,c,d) from abcd; > {code} > This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for > corresponding expression for a aggregate function's argument wrong index is > being used. > 2) Out of bound exception for following > {code} > set hive.map.aggr=false > explain select count(1), count(*), count(a), count(b), count(c), count(d), > count(distinct a), count(distinct b), count(distinct c), count(distinct d), > count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct > a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), > count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), > count(distinct a,b,c,d) from abcd; > {code} > The above happens while converting Calcite Aggregation to Hive's group by > operator. > 3) Once the above case with exception is fixed same query with > hive.map.aggr=false give wrong results. Problem in this case is that while > creating expression for aggregate function's argument we end up with wrong > column info from underlying reduce sink operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14507) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters failure
[ https://issues.apache.org/jira/browse/HIVE-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran resolved HIVE-14507. -- Resolution: Duplicate Duplicate of HIVE-14420. The issue is with tez counters not being published correctly from RS. > org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters > failure > > > Key: HIVE-14507 > URL: https://issues.apache.org/jira/browse/HIVE-14507 > Project: Hive > Issue Type: Sub-task >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > > Fails locally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14504) tez_join_hash.q test is slow
[ https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14504: - Status: Patch Available (was: Open) > tez_join_hash.q test is slow > > > Key: HIVE-14504 > URL: https://issues.apache.org/jira/browse/HIVE-14504 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14504.1.patch > > > tez_join_hash.q also explicitly sets execution engine to mr which slows down > the entire test. Test takes around 7 mins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14504) tez_join_hash.q test is slow
[ https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14504: - Attachment: HIVE-14504.1.patch This patch removes explicitly setting execution engine to mr. Initially explicit usage of execution engine is used to check correctness with MR. I reran the test using TestCliDriver to check correctness with MR. After separation the test now runs <4 mins in Tez and 1m 30s in MR. If both runs parallel this should be much better than what we had before. [~sseth] can you please review? > tez_join_hash.q test is slow > > > Key: HIVE-14504 > URL: https://issues.apache.org/jira/browse/HIVE-14504 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14504.1.patch > > > tez_join_hash.q also explicitly sets execution engine to mr which slows down > the entire test. Test takes around 7 mins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416108#comment-15416108 ] Sergey Shelukhin commented on HIVE-14448: - +1 pending tests > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14448.01.patch, HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.N
[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zadoroshnyak updated HIVE-14483: --- Attachment: (was: 0001-HIVE-14483.patch) > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Zadoroshnyak >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14426) Extensive logging on info level in WebHCat
[ https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-14426: -- Attachment: HIVE-14426.3.patch Instead of changing the log level, an explicit parameter is added to remove the possibility of unintended data leak. > Extensive logging on info level in WebHCat > -- > > Key: HIVE-14426 > URL: https://issues.apache.org/jira/browse/HIVE-14426 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, HIVE-14426.patch > > > There is an extensive logging in WebHCat at info level, and even some > sensitive information could be logged -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416083#comment-15416083 ] Sergey Zadoroshnyak commented on HIVE-14483: After upgrading into Hive 2.1.0, we only found exception for StringDirectTreeReader. But, I think, that you should ask [~owen.omalley]- he is responsible for user story https://issues.apache.org/jira/browse/HIVE-12159 and he keeps silent.. > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Zadoroshnyak >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14515) Schema evolution uses slow INSERT INTO .. VALUES
[ https://issues.apache.org/jira/browse/HIVE-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14515: Summary: Schema evolution uses slow INSERT INTO .. VALUES (was: Schema evolution use slow INSERT INTO .. VALUES) > Schema evolution uses slow INSERT INTO .. VALUES > > > Key: HIVE-14515 > URL: https://issues.apache.org/jira/browse/HIVE-14515 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > > Use LOAD DATA LOCAL INPATH and INSERT INTO TABLE ... SELECT * FROM instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416077#comment-15416077 ] Matt McCline commented on HIVE-14448: - Submitted new approach. > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14448.01.patch, HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.Na
[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14448: Status: Patch Available (was: In Progress) > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14448.01.patch, HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Nat
[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14448: Attachment: HIVE-14448.01.patch > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14448.01.patch, HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14448: Status: In Progress (was: Patch Available) > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14448.01.patch, HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Nat
[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log
[ https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416074#comment-15416074 ] Hive QA commented on HIVE-14405: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12821686/HIVE-14405.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10460 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/844/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/844/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-844/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12821686 - PreCommit-HIVE-MASTER-Build > Have tests log to the console along with hive.log > - > > Key: HIVE-14405 > URL: https://issues.apache.org/jira/browse/HIVE-14405 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14405.01.patch > > > When running tests from the IDE (not itests), logs end up going to hive.log - > making it difficult to debug tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3
[ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14270: --- Attachment: HIVE-14270.6.patch > Write temporary data to HDFS when doing inserts on tables located on S3 > --- > > Key: HIVE-14270 > URL: https://issues.apache.org/jira/browse/HIVE-14270 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, > HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch, HIVE-14270.6.patch > > > Currently, when doing INSERT statements on tables located at S3, Hive writes > and reads temporary (or intermediate) files to S3 as well. > If HDFS is still the default filesystem on Hive, then we can keep such > temporary files on HDFS to keep things run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415995#comment-15415995 ] Eugene Koifman commented on HIVE-14199: --- I think what [~gopalv] meant by legacy is "transactional_properties=legacy" though the issues with streaming/bucketing were fixed long ago in HIVE-11983 > Enable Bucket Pruning for ACID tables > - > > Key: HIVE-14199 > URL: https://issues.apache.org/jira/browse/HIVE-14199 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch > > > Currently, ACID tables do not benefit from the bucket pruning feature > introduced in HIVE-11525. The reason for this has been the fact that bucket > pruning happens at split generation level and for ACID, traditionally the > delta files were never split. The parallelism for ACID was then restricted to > the number of buckets. There would be as many splits as the number of buckets > and each worker processing one split would inevitably read all the delta > files for that bucket, even when the query may have originally required only > one of the buckets to be read. > However, HIVE-14035 now enables even the delta files to be also split. What > this means is that now we have enough information at the split generation > level to determine appropriate buckets to process for the delta files. This > can efficiently allow us to prune unnecessary buckets for delta files and > will lead to good performance gain for a large number of selective queries on > ACID tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415977#comment-15415977 ] Vihang Karajgaonkar commented on HIVE-14063: It is a good suggestion to include these values in hive-site.xml with the advantages of leveraging existing configuration frameworks to deliver the configs to Beeline. But IMO there are the following problems with using hive-site.xml 1. Beeline is in theory a general purpose JDBC client. As far as I understand it was not designed to be hive-only client. Adding these values to hive-site.xml is essentially tying this feature to hive specific environments. While one may argue that Beeline is almost always used for Hive, that is something that the larger community in general has to decide to whether to treat it as hive-specific JDBC client or not. 2. Different users could potentially have different connection endpoints like urls, credentials (even authentication mechanisms for different HS2 instances?) which means we need to store the user-specific information in some other place (environment variables/configuration file) I don't see any particular advantages of using two sources of information for getting a few configuration values since we will have to read both the files in any case. Similarly, user specific configurations like hive variables and hiveConf need to be separated from the the all-purpose general hive-site.xml. All this file will contain is the user-friendly way to describe connection url (which includes hosts, authentication, default db, hivevars and user-specific hiveConf) > beeline to auto connect to the HiveServer2 > -- > > Key: HIVE-14063 > URL: https://issues.apache.org/jira/browse/HIVE-14063 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: beeline.conf.template > > > Currently one has to give an jdbc:hive2 url in order for Beeline to connect a > hiveserver2 instance. It would be great if Beeline can get the info somehow > (from a properties file at a well-known location?) and connect automatically > if user doesn't specify such a url. If the properties file is not present, > then beeline would expect user to provide the url and credentials using > !connect or ./beeline -u .. commands > While Beeline is flexible (being a mere JDBC client), most environments would > have just a single HS2. Having users to manually connect into this via either > "beeline ~/.propsfile" or -u or !connect statements is lowering the > experience part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11525) Bucket pruning
[ https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415976#comment-15415976 ] Eugene Koifman commented on HIVE-11525: --- For completeness: The streaming API was fixed in HIVE-11983 > Bucket pruning > -- > > Key: HIVE-11525 > URL: https://issues.apache.org/jira/browse/HIVE-11525 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0 >Reporter: Maciek Kocon >Assignee: Gopal V > Labels: TODOC2.0 > Fix For: 2.0.0 > > Attachments: HIVE-11525.1.patch, HIVE-11525.2.patch, > HIVE-11525.3.patch, HIVE-11525.WIP.patch > > > Logically and functionally bucketing and partitioning are quite similar - > both provide mechanism to segregate and separate the table's data based on > its content. Thanks to that significant further optimisations like > [partition] PRUNING or [bucket] MAP JOIN are possible. > The difference seems to be imposed by design where the PARTITIONing is > open/explicit while BUCKETing is discrete/implicit. > Partitioning seems to be very common if not a standard feature in all current > RDBMS while BUCKETING seems to be HIVE specific only. > In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT > PARTITIONING". > Regardless of the fact that these two are recognised as two separate features > available in Hive there should be nothing to prevent leveraging same existing > query/join optimisations across the two. > BUCKET pruning > Enable partition PRUNING equivalent optimisation for queries on BUCKETED > tables > Simplest example is for queries like: > "SELECT … FROM x WHERE colA=123123" > to read only the relevant bucket file rather than all file-buckets that > belong to a table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases
[ https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14511: --- Status: Patch Available (was: Open) [~ashutoshc], could u take a look? Thanks. Actually, i found that if you have a p1=1/p2=1/files and also p1=1/p2=2/p3=3/files, the current master will check if #partition specifications match and will throw exception. > Improve MSCK for partitioned table to deal with special cases > - > > Key: HIVE-14511 > URL: https://issues.apache.org/jira/browse/HIVE-14511 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14511.01.patch > > > Some users will have a folder rather than a file under the last partition > folder. However, msck is going to search for the leaf folder rather than the > last partition folder. We need to improve that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases
[ https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14511: --- Attachment: HIVE-14511.01.patch > Improve MSCK for partitioned table to deal with special cases > - > > Key: HIVE-14511 > URL: https://issues.apache.org/jira/browse/HIVE-14511 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14511.01.patch > > > Some users will have a folder rather than a file under the last partition > folder. However, msck is going to search for the leaf folder rather than the > last partition folder. We need to improve that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415940#comment-15415940 ] Sergey Shelukhin commented on HIVE-14483: - Patch looks good to me... attaching the same patch for HiveQA (with the expected name pattern). Is similar fix needed in other places for other complex readers? > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Zadoroshnyak >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14483: Assignee: Sergey Zadoroshnyak (was: Sergey Shelukhin) > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Zadoroshnyak >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14483: Attachment: HIVE-14483.01.patch > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Shelukhin >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-14483: --- Assignee: Sergey Shelukhin (was: Owen O'Malley) > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Sergey Shelukhin >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases
[ https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415905#comment-15415905 ] Pengcheng Xiong commented on HIVE-14511: [~sershe], thanks for your attention. i have exactly the same concern as you. But I can not convince [~ashutoshc]. :) > Improve MSCK for partitioned table to deal with special cases > - > > Key: HIVE-14511 > URL: https://issues.apache.org/jira/browse/HIVE-14511 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > Some users will have a folder rather than a file under the last partition > folder. However, msck is going to search for the leaf folder rather than the > last partition folder. We need to improve that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14376) Schema evolution tests takes a long time
[ https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415898#comment-15415898 ] Matt McCline commented on HIVE-14376: - +1 LGTM > Schema evolution tests takes a long time > > > Key: HIVE-14376 > URL: https://issues.apache.org/jira/browse/HIVE-14376 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch > > > schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. > Same set of tests are added to llap as well in HIVE-14355 which will some > more time. Most tests uses INSERT into table which can be slow and is not > needed. Even some individual tests takes about 5 mins to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow
[ https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14501: -- Fix Version/s: 2.2.0 > MiniTez test for union_type_chk.q is slow > - > > Key: HIVE-14501 > URL: https://issues.apache.org/jira/browse/HIVE-14501 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.2.0 > > Attachments: HIVE-14501.1.patch > > > union_type_chk.q runs on minimr and minitez but the test itself explicitly > sets execution engine as mr. It takes around 10 mins to run this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14462) Reduce number of partition check calls in add_partitions
[ https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415848#comment-15415848 ] Hive QA commented on HIVE-14462: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12823006/HIVE-14462.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10459 tests executed *Failed tests:* {noformat} TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partitions_json org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_add_partition_with_whitelist org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/843/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/843/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-843/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12823006 - PreCommit-HIVE-MASTER-Build > Reduce number of partition check calls in add_partitions > > > Key: HIVE-14462 > URL: https://issues.apache.org/jira/browse/HIVE-14462 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, > HIVE-14462.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415819#comment-15415819 ] Siddharth Seth commented on HIVE-14502: --- I don't think we should move MiniLlap to use MiniHbase - it is not the default at the moment. Maybe retain a few tests on MiniTez which can run with MiniHBase. Setup times for MiniHBase metatstore based tests is 3 minutes. 1 minute for regular tests. The 1 minute will be cut down after HIVE-13496. A similar effort could be taken up for MiniHbase. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14448: -- Priority: Critical (was: Major) > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > su
[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log
[ https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415812#comment-15415812 ] Siddharth Seth commented on HIVE-14405: --- Looks like all of that is already in place. Re-triggering a jenkins run to see what this looks like. May need to change the console logging to INFO level (and let default debug logs go to hive.log) > Have tests log to the console along with hive.log > - > > Key: HIVE-14405 > URL: https://issues.apache.org/jira/browse/HIVE-14405 > Project: Hive > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14405.01.patch > > > When running tests from the IDE (not itests), logs end up going to hive.log - > making it difficult to debug tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415807#comment-15415807 ] Sergey Zadoroshnyak commented on HIVE-14483: [~sershe] Do you know who is responsible for Hive ORC module? > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases
[ https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415794#comment-15415794 ] Sergey Shelukhin commented on HIVE-14511: - Where does that folder come from? I think if it's for the use case where people use msck to "import" manually created partitions, we should not encourage it, it's not even a supported scenario. If it continues to be misused, at least it can be misused correctly :) Otherwise, did Hive create those directories with nested directories? Is it for ACID case or something like that? > Improve MSCK for partitioned table to deal with special cases > - > > Key: HIVE-14511 > URL: https://issues.apache.org/jira/browse/HIVE-14511 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > > Some users will have a folder rather than a file under the last partition > folder. However, msck is going to search for the leaf folder rather than the > last partition folder. We need to improve that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time
[ https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14376: - Attachment: HIVE-14376.2.patch Rebased patch > Schema evolution tests takes a long time > > > Key: HIVE-14376 > URL: https://issues.apache.org/jira/browse/HIVE-14376 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0, 2.1.1 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Minor > Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch > > > schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. > Same set of tests are added to llap as well in HIVE-14355 which will some > more time. Most tests uses INSERT into table which can be slow and is not > needed. Even some individual tests takes about 5 mins to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14453) refactor physical writing of ORC data and metadata to FS from the logical writers
[ https://issues.apache.org/jira/browse/HIVE-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415784#comment-15415784 ] Sergey Shelukhin commented on HIVE-14453: - Jdbc failures are unrelated. > refactor physical writing of ORC data and metadata to FS from the logical > writers > - > > Key: HIVE-14453 > URL: https://issues.apache.org/jira/browse/HIVE-14453 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14453.01.patch, HIVE-14453.patch > > > ORC data doesn't have to go directly into an HDFS stream via buffers, it can > go somewhere else (e.g. a write-thru cache, or an addressable system that > doesn't require the stream blocks to be held in memory before writing them > all together). > To that effect, it would be nice to abstract the data block/metadata > structure creating from the physical file concerns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow
[ https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14501: - Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the review! Committed to master. > MiniTez test for union_type_chk.q is slow > - > > Key: HIVE-14501 > URL: https://issues.apache.org/jira/browse/HIVE-14501 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14501.1.patch > > > union_type_chk.q runs on minimr and minitez but the test itself explicitly > sets execution engine as mr. It takes around 10 mins to run this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests
[ https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415760#comment-15415760 ] Prasanth Jayachandran commented on HIVE-14502: -- Yeah. That will be taken care off. IIRC [~sseth] was mentioning 3 mins setup time for hbase cluster. Which we have to cut down. > Convert MiniTez tests to MiniLlap tests > --- > > Key: HIVE-14502 > URL: https://issues.apache.org/jira/browse/HIVE-14502 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster > than MiniTezCliDriver because of threaded executors and caching. > MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To > cut down this test time significantly it makes sense to move over mive tez > tests to mini llap tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14501) MiniTez test for union_type_chk.q is slow
[ https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415756#comment-15415756 ] Siddharth Seth commented on HIVE-14501: --- +1 > MiniTez test for union_type_chk.q is slow > - > > Key: HIVE-14501 > URL: https://issues.apache.org/jira/browse/HIVE-14501 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14501.1.patch > > > union_type_chk.q runs on minimr and minitez but the test itself explicitly > sets execution engine as mr. It takes around 10 mins to run this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415736#comment-15415736 ] Thomas Poepping commented on HIVE-14373: Here is an example of how S3A proposes to solve a similar problem: https://issues.apache.org/jira/browse/HADOOP-13446 > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415696#comment-15415696 ] Thomas Poepping commented on HIVE-14373: The build server is part of AmazonEC2, is it not? Why would we not allow hiveqa to run these tests using those credentials, while still keeping them out of the source? It would be easier for committers and contributors if it was obvious that their change might break something, without relying on committers to have to run the tests manually. > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14479) Add some join tests for acid table
[ https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415645#comment-15415645 ] Eugene Koifman commented on HIVE-14479: --- For the test in TestTxnCommands2, I would add a check that compaction actually ran. See testCompactWithDelete() for example. Otherwise, you'll never know if compaction failed and did nothing. Also, in .q files, I see various options being set to trigger specific type of join. Is that not needed in TestTxnCommands2? > Add some join tests for acid table > -- > > Key: HIVE-14479 > URL: https://issues.apache.org/jira/browse/HIVE-14479 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-14479.1.patch, HIVE-14479.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415622#comment-15415622 ] Abdullah Yousufi commented on HIVE-14373: - Hey [~yalovyyi], currently the best way to switch between different s3 clients would be to use the different key names in auth-keys.xml. I created auth-keys.xml.template as an s3a example, but that could be easily changed for s3n. However, I agree that the bucket variable name in that file should not be specific to s3a. Also thanks a ton for the review on the patch, I'll get to that shortly. > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13754) Fix resource leak in HiveClientCache
[ https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-13754: Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0, 1.2.1 (was: 1.2.1, 2.0.0) Status: Resolved (was: Patch Available) > Fix resource leak in HiveClientCache > > > Key: HIVE-13754 > URL: https://issues.apache.org/jira/browse/HIVE-13754 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Fix For: 2.0.0 > > Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, > HIVE-13754.1.patch, HIVE-13754.patch > > > Found that the {{users}} reference count can go into negative values, which > prevents {{tearDownIfUnused}} from closing the client connection when called. > This leads to a build up of clients which have been evicted from the cache, > are no longer in use, but have not been shutdown. > GC will eventually call {{finalize}}, which forcibly closes the connection > and cleans up the client, but I have seen as many as several hundred open > client connections as a result. > The main resource for this is caused by RetryingMetaStoreClient, which will > call {{reconnect}} on acquire, which calls {{close}}. This will decrement > {{users}} to -1 on the reconnect, then acquire will increase this to 0 while > using it, and back to -1 when it releases it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13754) Fix resource leak in HiveClientCache
[ https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415599#comment-15415599 ] Mithun Radhakrishnan commented on HIVE-13754: - Committed to master. Thanks, [~cdrome]! > Fix resource leak in HiveClientCache > > > Key: HIVE-13754 > URL: https://issues.apache.org/jira/browse/HIVE-13754 > Project: Hive > Issue Type: Bug > Components: Clients >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Fix For: 2.0.0 > > Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, > HIVE-13754.1.patch, HIVE-13754.patch > > > Found that the {{users}} reference count can go into negative values, which > prevents {{tearDownIfUnused}} from closing the client connection when called. > This leads to a build up of clients which have been evicted from the cache, > are no longer in use, but have not been shutdown. > GC will eventually call {{finalize}}, which forcibly closes the connection > and cleans up the client, but I have seen as many as several hundred open > client connections as a result. > The main resource for this is caused by RetryingMetaStoreClient, which will > call {{reconnect}} on acquire, which calls {{close}}. This will decrement > {{users}} to -1 on the reconnect, then acquire will increase this to 0 while > using it, and back to -1 when it releases it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query
[ https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-13756: Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0, 1.2.1 (was: 1.2.1, 2.0.0) Status: Resolved (was: Patch Available) > Map failure attempts to delete reducer _temporary directory on multi-query > pig query > > > Key: HIVE-13756 > URL: https://issues.apache.org/jira/browse/HIVE-13756 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Fix For: 2.0.0 > > Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, > HIVE-13756.1.patch, HIVE-13756.patch > > > A pig script, executed with multi-query enabled, that reads the source data > and writes it as-is into TABLE_A as well as performing a group-by operation > on the data which is written into TABLE_B can produce erroneous results if > any map fails. This results in a single MR job that writes the map output to > a scratch directory relative to TABLE_A and the reducer output to a scratch > directory relative to TABLE_B. > If one or more maps fail it will delete the attempt data relative to TABLE_A, > but it also deletes the _temporary directory relative to TABLE_B. This has > the unintended side-effect of preventing subsequent maps from committing > their data. This means that any maps which successfully completed before the > first map failure will have its data committed as expected, other maps not, > resulting in an incomplete result set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query
[ https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415574#comment-15415574 ] Mithun Radhakrishnan commented on HIVE-13756: - Committed to master. Thanks, [~cdrome]! > Map failure attempts to delete reducer _temporary directory on multi-query > pig query > > > Key: HIVE-13756 > URL: https://issues.apache.org/jira/browse/HIVE-13756 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, > HIVE-13756.1.patch, HIVE-13756.patch > > > A pig script, executed with multi-query enabled, that reads the source data > and writes it as-is into TABLE_A as well as performing a group-by operation > on the data which is written into TABLE_B can produce erroneous results if > any map fails. This results in a single MR job that writes the map output to > a scratch directory relative to TABLE_A and the reducer output to a scratch > directory relative to TABLE_B. > If one or more maps fail it will delete the attempt data relative to TABLE_A, > but it also deletes the _temporary directory relative to TABLE_B. This has > the unintended side-effect of preventing subsequent maps from committing > their data. This means that any maps which successfully completed before the > first map failure will have its data committed as expected, other maps not, > resulting in an incomplete result set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14462) Reduce number of partition check calls in add_partitions
[ https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-14462: Status: Patch Available (was: Open) > Reduce number of partition check calls in add_partitions > > > Key: HIVE-14462 > URL: https://issues.apache.org/jira/browse/HIVE-14462 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, > HIVE-14462.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
[ https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-12181: Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Pushed to master. > Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver > --- > > Key: HIVE-12181 > URL: https://issues.apache.org/jira/browse/HIVE-12181 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Fix For: 2.2.0 > > Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, > HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, > HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, > HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, > HIVE-12181.patch > > > There was a performance concern earlier, but HIVE-7587 has fixed that. We can > change the default to true now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14418) Hive config validation prevents unsetting the settings
[ https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415454#comment-15415454 ] Hive QA commented on HIVE-14418: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12822926/HIVE-14418.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 65 failed/errored test(s), 10460 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_00_nonpart_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_01_nonpart org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_00_part_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_03_nonpart_over_compat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_all_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_05_some_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_06_one_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_07_all_part_over_nonoverlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_08_nonpart_rename org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_09_part_spec_nonoverlap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_10_external_managed org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_11_managed_external org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_12_external_location org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_13_managed_location org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_14_managed_location_over_existing org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_15_external_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_16_part_external org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_17_part_managed org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_18_part_external org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_19_00_part_external_location org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_19_part_external_location org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_20_part_managed_location org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_21_export_authsuccess org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_22_import_exist_authsuccess org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_23_import_part_authsuccess org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_24_import_nonexist_authsuccess org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_25_export_parentpath_has_inaccessible_children org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_hidden_files org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repl_1_drop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repl_2_exim_basic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repl_3_exim_metadata org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_import org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_export org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_00_unsupported_schema org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_01_nonpart_over_loaded org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_02_all_part_over_overlap org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_03_nonpart_noncompat_colschema org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_04_nonpart_noncompat_colnumber org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_05_nonpart_noncompat_coltype org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_06_nonpart_noncompat_storage org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_07_nonpart_noncompat_ifof org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_08_nonpart_noncompat_serde org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_09_nonpart_noncompat_serdeparam org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_10_nonpart_noncompat_bucketing org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_11_nonpart_noncompat_sorting org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_12_nonnative_export org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliD
[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3
[ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415405#comment-15415405 ] Sergio Peña commented on HIVE-14270: Thanks [~leftylev] to catch it. You have a good eye on these document issues :) > Write temporary data to HDFS when doing inserts on tables located on S3 > --- > > Key: HIVE-14270 > URL: https://issues.apache.org/jira/browse/HIVE-14270 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, > HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch > > > Currently, when doing INSERT statements on tables located at S3, Hive writes > and reads temporary (or intermediate) files to S3 as well. > If HDFS is still the default filesystem on Hive, then we can keep such > temporary files on HDFS to keep things run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415393#comment-15415393 ] Sergio Peña commented on HIVE-14063: Sounds good idea to keep the configs on hive-site.xml just to follow the standard. However, we also want that users can have their own credentials in a place where Hive can look for them, like ~/.hive/hive-site.xml or any other place. > beeline to auto connect to the HiveServer2 > -- > > Key: HIVE-14063 > URL: https://issues.apache.org/jira/browse/HIVE-14063 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Attachments: beeline.conf.template > > > Currently one has to give an jdbc:hive2 url in order for Beeline to connect a > hiveserver2 instance. It would be great if Beeline can get the info somehow > (from a properties file at a well-known location?) and connect automatically > if user doesn't specify such a url. If the properties file is not present, > then beeline would expect user to provide the url and credentials using > !connect or ./beeline -u .. commands > While Beeline is flexible (being a mere JDBC client), most environments would > have just a single HS2. Having users to manually connect into this via either > "beeline ~/.propsfile" or -u or !connect statements is lowering the > experience part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14457) Partitions in encryption zone are still trashed though an exception is returned
[ https://issues.apache.org/jira/browse/HIVE-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-14457: --- Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Committed to 2.2.0 and 2.1.1, thanks [~ychena] and [~spena] for review. > Partitions in encryption zone are still trashed though an exception is > returned > --- > > Key: HIVE-14457 > URL: https://issues.apache.org/jira/browse/HIVE-14457 > Project: Hive > Issue Type: Bug > Components: Encryption, Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14457.patch > > > drop_partition_common in HiveMetaStore still drops partitions in encryption > zone without PURGE even through it returns an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14404) Allow delimiterfordsv to use multiple-character delimiters
[ https://issues.apache.org/jira/browse/HIVE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-14404: - Status: Patch Available (was: Open) > Allow delimiterfordsv to use multiple-character delimiters > -- > > Key: HIVE-14404 > URL: https://issues.apache.org/jira/browse/HIVE-14404 > Project: Hive > Issue Type: Improvement >Reporter: Stephen Measmer >Assignee: Marta Kuczora > Attachments: HIVE-14404.patch > > > HIVE-5871 allows for reading multiple character delimiters. Would like the > ability to use outputformat=dsv and define multiple character delimiters. > Today delimiterfordsv only uses on character even if multiple are passes. > For example: > when I use: > beeline>!set outputformat dsv > beeline>!set delimiterfordsv "^-^" > I get: > 111201081253106275^31-Oct-2011 > 00:00:00^Text^201605232823^2016051968232151^201605232823_2016051968232151_0_0_1 > > Would like it to be: > 111201081253106275^-^31-Oct-2011 > 00:00:00^-^Text^-^201605232823^-^2016051968232151^-^201605232823_2016051968232151_0_0_1 > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14404) Allow delimiterfordsv to use multiple-character delimiters
[ https://issues.apache.org/jira/browse/HIVE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-14404: - Status: Open (was: Patch Available) > Allow delimiterfordsv to use multiple-character delimiters > -- > > Key: HIVE-14404 > URL: https://issues.apache.org/jira/browse/HIVE-14404 > Project: Hive > Issue Type: Improvement >Reporter: Stephen Measmer >Assignee: Marta Kuczora > Attachments: HIVE-14404.patch > > > HIVE-5871 allows for reading multiple character delimiters. Would like the > ability to use outputformat=dsv and define multiple character delimiters. > Today delimiterfordsv only uses on character even if multiple are passes. > For example: > when I use: > beeline>!set outputformat dsv > beeline>!set delimiterfordsv "^-^" > I get: > 111201081253106275^31-Oct-2011 > 00:00:00^Text^201605232823^2016051968232151^201605232823_2016051968232151_0_0_1 > > Would like it to be: > 111201081253106275^-^31-Oct-2011 > 00:00:00^-^Text^-^201605232823^-^2016051968232151^-^201605232823_2016051968232151_0_0_1 > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415295#comment-15415295 ] Illya Yalovyy commented on HIVE-14373: -- [~abdullah], Will it be possible to implement the test framework so it is easy to test other blob stores? Currently the configuration and the project relies on s3a configuration. I think it will be useful to be able to quickly switch to s3n or other implementations. I would be glad to assist if required. Thank you for this work. It looks really great and useful! > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
[ https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415253#comment-15415253 ] Hive QA commented on HIVE-12181: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12822904/HIVE-12181.15.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10445 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-dynamic_partition_pruning.q-vector_char_mapjoin1.q-unionDistinct_2.q-and-12-more - did not produce a TEST-*.xml file TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/839/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/839/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-839/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12822904 - PreCommit-HIVE-MASTER-Build > Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver > --- > > Key: HIVE-12181 > URL: https://issues.apache.org/jira/browse/HIVE-12181 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, > HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, > HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, > HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, > HIVE-12181.patch > > > There was a performance concern earlier, but HIVE-7587 has fixed that. We can > change the default to true now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline reassigned HIVE-14448: --- Assignee: Matt McCline (was: Sergey Shelukhin) > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Matt McCline > Attachments: HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.Nati
[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415220#comment-15415220 ] Matt McCline commented on HIVE-14448: - This is the area I was struggling with for HIVE-14214: ORC Schema Evolution and Predicate Push Down do not work together (no rows returned) I'll take this over. > Queries with predicate fail when ETL split strategy is chosen for ACID tables > - > > Key: HIVE-14448 > URL: https://issues.apache.org/jira/browse/HIVE-14448 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Saket Saurabh >Assignee: Sergey Shelukhin > Attachments: HIVE-14448.patch > > > When ETL split strategy is applied to ACID tables with predicate pushdown > (SARG enabled), split generation fails for ACID. This bug will be usually > exposed when working with data at scale, because in most otherwise cases only > BI split strategy is chosen. My guess is that this is happening because the > correct readerSchema is not being picked up when we try to extract SARG > column names. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testETLSplitStrategyForACID() throws Exception { > hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL"); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true); > runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)"); > runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'"); > runWorker(hiveConf); > List rs = runStatementOnDriver("select * from " + Table.ACIDTBL > + " where a = 1"); > int[][] resultData = new int[][] {{1,2}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} > Back-trace for this failed test is as follows: > {code} > exec.Task: Job Submission failed with exception > 'java.lang.RuntimeException(ORC split generation failed with exception: > java.lang.NegativeArraySizeException)' > java.lang.RuntimeException: ORC split generation failed with exception: > java.lang.NegativeArraySizeException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119) > at > org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292) > at > org.apache.hadoop.hive.ql.TestTxnC
[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3
[ https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415113#comment-15415113 ] Hive QA commented on HIVE-14373: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12822895/HIVE-14373.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/838/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/838/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-838/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12822895 - PreCommit-HIVE-MASTER-Build > Add integration tests for hive on S3 > > > Key: HIVE-14373 > URL: https://issues.apache.org/jira/browse/HIVE-14373 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Abdullah Yousufi > Attachments: HIVE-14373.patch > > > With Hive doing improvements to run on S3, it would be ideal to have better > integration testing on S3. > These S3 tests won't be able to be executed by HiveQA because it will need > Amazon credentials. We need to write suite based on ideas from the Hadoop > project where: > - an xml file is provided with S3 credentials > - a committer must run these tests manually to verify it works > - the xml file should not be part of the commit, and hiveqa should not run > these tests. > https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14462) Reduce number of partition check calls in add_partitions
[ https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-14462: Attachment: HIVE-14462.3.patch > Reduce number of partition check calls in add_partitions > > > Key: HIVE-14462 > URL: https://issues.apache.org/jira/browse/HIVE-14462 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, > HIVE-14462.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14462) Reduce number of partition check calls in add_partitions
[ https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-14462: Status: Open (was: Patch Available) In some corner cases, it is possible that partitions can have nested & multiple directories. (e.g table/ii=1/jj=15/q=10/r=20/s=30/00_0, table/ii=1/jj=15/q=11/r=22/s=33/00_0 where in ii and jj are the only partition columns). {{HiveMetastoreChecker.getPartitionName}} ends up resolving partition names as "ii=1/jj=15/q=11/r=22/s=33" and "ii=1/jj=15/q=10/r=20/s=30". When msck is run, it would end up throwing duplicate partitions exception for ii=1, jj=15 in MS. msck falls back to {{msckAddPartitionsOneByOne}}, which tries to repair one partition at a time and ignores any exceptions. So job completes essentially, but ends up making lots of calls to MS and can be too slow. I will attach the latest patch in RB Without Patch: = msck runtime for 1 partitions in small cluster: *370 seconds* With Patch: === msck runtime for 1 partitions in small cluster: *62 seconds* > Reduce number of partition check calls in add_partitions > > > Key: HIVE-14462 > URL: https://issues.apache.org/jira/browse/HIVE-14462 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, > HIVE-14462.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415069#comment-15415069 ] Sergey Zadoroshnyak edited comment on HIVE-14483 at 8/10/16 10:21 AM: -- please ignore this comment was (Author: spring): please ingore this comment > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415080#comment-15415080 ] Sergey Zadoroshnyak commented on HIVE-14483: [~owen.omalley] [~prasanth_j] Could you please review pull request? > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415067#comment-15415067 ] Sergey Zadoroshnyak edited comment on HIVE-14483 at 8/10/16 10:21 AM: -- please ignore this comment was (Author: spring): please ingore this comment > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zadoroshnyak updated HIVE-14483: --- Attachment: 0001-HIVE-14483.patch Fix java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zadoroshnyak updated HIVE-14483: --- Comment: was deleted (was: Fix java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays) > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > Attachments: 0001-HIVE-14483.patch > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zadoroshnyak updated HIVE-14483: --- Status: Patch Available (was: Open) > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415068#comment-15415068 ] Sergey Zadoroshnyak commented on HIVE-14483: please ingore this comment > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Zadoroshnyak updated HIVE-14483: --- Comment: was deleted (was: please ingore this comment) > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415069#comment-15415069 ] Sergey Zadoroshnyak commented on HIVE-14483: please ingore this comment > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415067#comment-15415067 ] Sergey Zadoroshnyak commented on HIVE-14483: please ingore this comment > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415048#comment-15415048 ] ASF GitHub Bot commented on HIVE-14483: --- GitHub user szador opened a pull request: https://github.com/apache/hive/pull/96 HIVE-14483 Fix java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays You can merge this pull request into a Git repository by running: $ git pull https://github.com/szador/hive HIVE-14483 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/96.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #96 > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415022#comment-15415022 ] ASF GitHub Bot commented on HIVE-14483: --- Github user szador closed the pull request at: https://github.com/apache/hive/pull/95 > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
[ https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415018#comment-15415018 ] ASF GitHub Bot commented on HIVE-14483: --- GitHub user szador opened a pull request: https://github.com/apache/hive/pull/95 HIVE-14483 Fix java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays You can merge this pull request into a Git repository by running: $ git pull https://github.com/szador/hive HIVE-14483_Fix_ArrayIndexOutOfBoundsException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/95.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #95 commit c7c564141d1e960e33fd582017da19649ef9003d Author: szador Date: 2016-08-10T09:26:22Z HIVE-14483 Fix java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > java.lang.ArrayIndexOutOfBoundsException > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays > -- > > Key: HIVE-14483 > URL: https://issues.apache.org/jira/browse/HIVE-14483 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Sergey Zadoroshnyak >Assignee: Owen O'Malley >Priority: Critical > Fix For: 2.2.0 > > > Error message: > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231) > at > org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268) > at > org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368) > at > org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212) > at > org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) > at > org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737) > at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) > ... 22 more > How to reproduce? > Configure StringTreeReader which contains StringDirectTreeReader as > TreeReader (DIRECT or DIRECT_V2 column encoding) > batchSize = 1026; > invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final > int batchSize) > scratchlcv is LongColumnVector with long[] vector (length 1024) > which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, > scratchlcv,result, batchSize); > as result in method commonReadByteArrays(stream, lengths, scratchlcv, > result, (int) batchSize) we received > ArrayIndexOutOfBoundsException. > If we use StringDictionaryTreeReader, then there is no exception, as we have > a verification scratchlcv.ensureSize((int) batchSize, false) before > reader.nextVector(scratchlcv, scratchlcv.vector, batchSize); > These changes were made for Hive 2.1.0 by corresponding commit > https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467 > for task https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley > How to fix? > add only one line : > scratchlcv.ensureSize((int) batchSize, false) ; > in method > org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream > stream, IntegerReader lengths, > LongColumnVector scratchlcv, > BytesColumnVector result, final int batchSize) before invocation > lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12176) NullPointerException in VectorUDAFMinString
[ https://issues.apache.org/jira/browse/HIVE-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414949#comment-15414949 ] Ashish Kumar commented on HIVE-12176: - Additionally, whenever we are facing issue "Hive Runtime Error while processing row" it should print the row also. This will be really helpful for debugging. > NullPointerException in VectorUDAFMinString > --- > > Key: HIVE-12176 > URL: https://issues.apache.org/jira/browse/HIVE-12176 > Project: Hive > Issue Type: Bug > Components: SQL, UDF, Vectorization >Affects Versions: 1.2.1 > Environment: HDP 2.3 + Kerberos >Reporter: Hari Sekhon >Assignee: Jitendra Nath Pandey >Priority: Critical > Attachments: hive-nullpointexception-mr.txt, > hive-nullpointexception-tez.txt > > > Hive gets the following NullPointerException when trying to go a group by > aggregate. > This occurs whether the engine is Tez or MR, but I'm told this works on our > other cluster which is HDP 2.2. > I'm attaching the full outputs from the hive session, but here is the crux of > it: > {code} > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) > ... 8 more > Caused by: java.lang.NullPointerException > at java.lang.System.arraycopy(Native Method) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFMinString$Aggregation.assign(VectorUDAFMinString.java:78) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFMinString$Aggregation.checkValue(VectorUDAFMinString.java:65) > at > org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFMinString.aggregateInput(VectorUDAFMinString.java:279) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:335) > at > org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:880) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:117) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) > ... 9 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez
[ https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414941#comment-15414941 ] Ashish Kumar commented on HIVE-7765: What's the fix version for this issue? > Null pointer error with UNION ALL on partitioned tables using Tez > - > > Key: HIVE-7765 > URL: https://issues.apache.org/jira/browse/HIVE-7765 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 0.13.1 > Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1; > Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6 >Reporter: Chris Dragga > > When executing a UNION ALL query in Tez over partitioned tables where at > least one table is empty, Hive fails to execute the query, returning the > message "FAILED: NullPointerException null". No stack trace accompanies this > message. Removing partitioning solves this problem, as does switching to > MapReduce as the execution engine. > This can be reproduced using a variant of the example tables from the > "Getting Started" documentation on the Hive wiki. To create the schema, use > CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); > CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); > Then, load invites with data (e.g., using the instructions > [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]) > and execute the following: > SELECT * FROM invites > UNION ALL > SELECT * FROM empty_invites; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14501) MiniTez test for union_type_chk.q is slow
[ https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414926#comment-15414926 ] Hive QA commented on HIVE-14501: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12822872/HIVE-14501.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests executed *Failed tests:* {noformat} TestMsgBusConnection - did not produce a TEST-*.xml file TestQueryLifeTimeHook - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/837/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/837/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-837/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12822872 - PreCommit-HIVE-MASTER-Build > MiniTez test for union_type_chk.q is slow > - > > Key: HIVE-14501 > URL: https://issues.apache.org/jira/browse/HIVE-14501 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14501.1.patch > > > union_type_chk.q runs on minimr and minitez but the test itself explicitly > sets execution engine as mr. It takes around 10 mins to run this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14340) Add a new hook triggers before query compilation and after query execution
[ https://issues.apache.org/jira/browse/HIVE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414920#comment-15414920 ] Lefty Leverenz commented on HIVE-14340: --- Doc note: This adds *hive.query.lifetime.hooks* to HiveConf.java in release 2.2.0, so it will need to be documented in the Configuration Properties wikidoc. * [Configuration Properties -- Query and DDL Execution | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution] > Add a new hook triggers before query compilation and after query execution > -- > > Key: HIVE-14340 > URL: https://issues.apache.org/jira/browse/HIVE-14340 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Chao Sun >Assignee: Chao Sun > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14340.0.patch, HIVE-14340.1.patch, > HIVE-14340.2.patch > > > In some cases we may need to have a hook that activates before a query > compilation and after its execution. For instance, dynamically generate a UDF > specifically for the running query and clean up the resource after the query > is done. The current hooks only covers pre & post semantic analysis, pre & > post query execution, which doesn't fit the requirement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
[ https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414874#comment-15414874 ] Lefty Leverenz commented on HIVE-14035: --- So did I. > Enable predicate pushdown to delta files created by ACID Transactions > - > > Key: HIVE-14035 > URL: https://issues.apache.org/jira/browse/HIVE-14035 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, > HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, > HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, > HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, > HIVE-14035.13.patch, HIVE-14035.patch > > > In current Hive version, delta files created by ACID transactions do not > allow predicate pushdown if they contain any update/delete events. This is > done to preserve correctness when following a multi-version approach during > event collapsing, where an update event overwrites an existing insert event. > This JIRA proposes to split an update event into a combination of a delete > event followed by a new insert event, that can enable predicate push down to > all delta files without breaking correctness. To support backward > compatibility for this feature, this JIRA also proposes to add some sort of > versioning to ACID that can allow different versions of ACID transactions to > co-exist together. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp
[ https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414860#comment-15414860 ] Rui Li commented on HIVE-14412: --- If I run {{mvn dependency:tree}}, only sshj-0.8.1 is found in the dependency tree, which doesn't have this unbounded jdk version range in its pom. Not sure why newer versions of sshj (e.g. 0.10.0, 0.10.1-SNAPSHOT) is also downloaded during build. > Add a timezone-aware timestamp > -- > > Key: HIVE-14412 > URL: https://issues.apache.org/jira/browse/HIVE-14412 > Project: Hive > Issue Type: Sub-task > Components: Hive >Reporter: Rui Li >Assignee: Rui Li > Attachments: HIVE-14412.1.patch, HIVE-14412.1.patch > > > Java's Timestamp stores the time elapsed since the epoch. While it's by > itself unambiguous, ambiguity comes when we parse a string into timestamp, or > convert a timestamp to string, causing problems like HIVE-14305. > To solve the issue, I think we should make timestamp aware of timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade
[ https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414845#comment-15414845 ] Lefty Leverenz commented on HIVE-14322: --- Doc note: This adds *datanucleus.rdbms.initializeColumnInfo* to HiveConf.java in 2.0.2, 2.1.1, and 2.2.0 so it will need to be documented in the MetaStore section of Configuration Properties for those releases. * [Configuration Properties -- MetaStore | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore] What other values can be used, besides NONE? Added labels TODOC2.0 (for 2.0.2), TODOC2.1.1, and TODOC2.2 (not needed if 2.1.1 is released before 2.2.0). > Postgres db issues after Datanucleus 4.x upgrade > > > Key: HIVE-14322 > URL: https://issues.apache.org/jira/browse/HIVE-14322 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Thejas M Nair >Assignee: Sergey Shelukhin > Labels: TODOC2.0, TODOC2.1.1, TODOC2.2 > Fix For: 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-14322.02.patch, HIVE-14322.03.patch, > HIVE-14322.04.patch, HIVE-14322.1.patch > > > With the upgrade to datanucleus 4.x versions in HIVE-6113, hive does not > work properly with postgres. > The nullable fields in the database have string "NULL::character varying" > instead of real NULL values. This causes various issues. > One example is - > {code} > hive> create table t(i int); > OK > Time taken: 1.9 seconds > hive> create view v as select * from t; > OK > Time taken: 0.542 seconds > hive> select * from v; > FAILED: SemanticException Unable to fetch table v. > java.net.URISyntaxException: Relative path in absolute URI: > NULL::character%20varying > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade
[ https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-14322: -- Labels: TODOC2.0 TODOC2.1.1 TODOC2.2 (was: ) > Postgres db issues after Datanucleus 4.x upgrade > > > Key: HIVE-14322 > URL: https://issues.apache.org/jira/browse/HIVE-14322 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.0.1 >Reporter: Thejas M Nair >Assignee: Sergey Shelukhin > Labels: TODOC2.0, TODOC2.1.1, TODOC2.2 > Fix For: 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-14322.02.patch, HIVE-14322.03.patch, > HIVE-14322.04.patch, HIVE-14322.1.patch > > > With the upgrade to datanucleus 4.x versions in HIVE-6113, hive does not > work properly with postgres. > The nullable fields in the database have string "NULL::character varying" > instead of real NULL values. This causes various issues. > One example is - > {code} > hive> create table t(i int); > OK > Time taken: 1.9 seconds > hive> create view v as select * from t; > OK > Time taken: 0.542 seconds > hive> select * from v; > FAILED: SemanticException Unable to fetch table v. > java.net.URISyntaxException: Relative path in absolute URI: > NULL::character%20varying > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)