[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416662#comment-15416662
 ] 

Lefty Leverenz commented on HIVE-14270:
---

Thanks for the fix, Sergio.

+1 for the configuration parameters.

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch, HIVE-14270.6.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416644#comment-15416644
 ] 

Hive QA commented on HIVE-14270:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823131/HIVE-14270.6.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10410 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/848/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/848/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-848/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823131 - PreCommit-HIVE-MASTER-Build

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch, HIVE-14270.6.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416521#comment-15416521
 ] 

Hive QA commented on HIVE-14511:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823128/HIVE-14511.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10407 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.ql.metadata.TestHiveMetaStoreChecker.testPartitionsCheck
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/847/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/847/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-847/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823128 - PreCommit-HIVE-MASTER-Build

> Improve MSCK for partitioned table to deal with special cases
> -
>
> Key: HIVE-14511
> URL: https://issues.apache.org/jira/browse/HIVE-14511
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14511.01.patch
>
>
> Some users will have a folder rather than a file under the last partition 
> folder. However, msck is going to search for the leaf folder rather than the 
> last partition folder. We need to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12546) Hive beeline doesn't support arrow keys and tab

2016-08-10 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416378#comment-15416378
 ] 

Junjie Chen edited comment on HIVE-12546 at 8/11/16 2:22 AM:
-

Hi[~sershe]
I also tried with OS X(10.11.6) terminal, it works fine with ssh.

can we close this?


was (Author: junjie):
Hi[~sershe]
I also tried with OS X(10.11.6) terminal, it works fine with ssh. Could you 
please specifiy which version of beeline and what ENV you were using? it would 
be better it you can dump the full ENV, and elaborate reproduce steps. 

> Hive beeline doesn't support arrow keys and tab
> ---
>
> Key: HIVE-12546
> URL: https://issues.apache.org/jira/browse/HIVE-12546
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Sergey Shelukhin
>
> On CLI, up/down arrows navigate history, tab auto-completes, and left/right 
> arrows move around the command text.
> Trying to use beeline, I see that these just print key codes or the tab into 
> the command text. 
> This should be fixed before removing CLI in favor of beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416380#comment-15416380
 ] 

Hive QA commented on HIVE-14483:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823125/HIVE-14483.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/846/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/846/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-846/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823125 - PreCommit-HIVE-MASTER-Build

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Zadoroshnyak
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector r

[jira] [Commented] (HIVE-12546) Hive beeline doesn't support arrow keys and tab

2016-08-10 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416378#comment-15416378
 ] 

Junjie Chen commented on HIVE-12546:


Hi[~sershe]
I also tried with OS X(10.11.6) terminal, it works fine with ssh. Could you 
please specifiy which version of beeline and what ENV you were using? it would 
be better it you can dump the full ENV, and elaborate reproduce steps. 

> Hive beeline doesn't support arrow keys and tab
> ---
>
> Key: HIVE-12546
> URL: https://issues.apache.org/jira/browse/HIVE-12546
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Sergey Shelukhin
>
> On CLI, up/down arrows navigate history, tab auto-completes, and left/right 
> arrows move around the command text.
> Trying to use beeline, I see that these just print key codes or the tab into 
> the command text. 
> This should be fixed before removing CLI in favor of beeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2

2016-08-10 Thread Darren Lo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416318#comment-15416318
 ] 

Darren Lo commented on HIVE-14063:
--

Hi Vihang,

Thanks for considering my points.

In my opinion, the fact that we're using a general-purpose JDBC client under 
the hood should be an implementation detail. If you're trying to make the new 
HS2-based experience as close as possible to the old Hive CLI experience (which 
this JIRA is required for, but isn't focused on), part of that is making the 
configs reasonable to hive users. If hive clients have to know that beeline is 
a special thing that takes configs in a way they're not used to, that's a 
burden on those clients. People can probably get past that burden, sure, but if 
it can be avoided, so much the better.

Perhaps when we invoke "hive" (and we want that to internally use beeline / 
HS2) it can read files from nice, hive-specific locations, and then pass those 
to beeline in whatever way works for beeline. This hides the beeline details 
from the hive client. It's a bit trickier when invoking beeline directly if you 
want to maintain the purity of it being a general-purpose JDBC client.

If we use standard Hadoop conf format, clients could decide to leverage the 
hadoop credential provider to protect / obfuscate passwords.

If users are going to connect to an HS2 that is different from what's listed in 
/etc/hive/conf, then with Sergio's suggestion we can load from somewhere like 
~/.hive/ to get those overrides, or they can set 
HIVE_CONF_DIR=/path/to/custom/hive/conf. They can also just type !connect in 
the prompt.

Also, just a thought, but if the purity of beeline is paramount, you could 
maybe implement a feature where you give it some beeline-specific configuration 
file with something like:
{noformat}
extra.configs.loader=org.apache.HiveToBeelineConfigLoader
{noformat}
Which would allow beeline to have no native knowledge of Hive. Instead, it 
would have a generic config loading plugin mechanism that hive could leverage 
to make it load from hive-site.xml. Quite a bit of extra work to get the 
configs though, I know.

> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14063
> URL: https://issues.apache.org/jira/browse/HIVE-14063
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: beeline.conf.template
>
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time

2016-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14376:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

No related test failures. Thanks for the review! Committed patch to master.

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416243#comment-15416243
 ] 

Prasanth Jayachandran commented on HIVE-14504:
--

This will also bring the runtime for MiniLlap from 7 min to 1m 30s.

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14376) Schema evolution tests takes a long time

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416229#comment-15416229
 ] 

Hive QA commented on HIVE-14376:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823116/HIVE-14376.2.patch

{color:green}SUCCESS:{color} +1 due to 26 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10406 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/845/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/845/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-845/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823116 - PreCommit-HIVE-MASTER-Build

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14513) Enhance custom query feature in LDAP atn to support resultset of ldap groups

2016-08-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14513:
-
Attachment: HIVE-14513.patch

The proposed fix uses an existing LDAP configuration property 
{{hive.server2.authentication.ldap.groupMembershipKey}} to find the users if 
the user-configured LDAP query returns a group instead of a user. This property 
is already used for group filters.

> Enhance custom query feature in LDAP atn to support resultset of ldap groups
> 
>
> Key: HIVE-14513
> URL: https://issues.apache.org/jira/browse/HIVE-14513
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-14513.patch
>
>
> LDAP Authenticator can be configured to use a result set from a LDAP query to 
> authenticate. However, is it expected that this LDAP query would only result 
> a set of users (aka full DNs for the users in LDAP).
> However, its not always straightforward to be able to author queries that 
> return users. For example, say you would like to allow "all users from group1 
> and group2" to be authenticated. The LDAP query has to return a union of all 
> members of the group1 and group2.
> For example, one common configuration is that groups contain a list of its 
> users
>   "dn: uid=group1,ou=Groups,dc=example,dc=com",
>   "distinguishedName: uid=group1,ou=Groups,dc=example,dc=com",
>   "objectClass: top",
>   "objectClass: groupOfNames",
>   "objectClass: ExtensibleObject",
>   "cn: group1",
>   "ou: Groups",
>   "sn: group1",
>   "member: uid=user1,ou=People,dc=example,dc=com",
> The query 
> {{(&(objectClass=groupOfNames)(|(cn=group1)(cn=group2)))}}
> will return the entries
> uid=group1,ou=Groups,dc=example,dc=com
> uid=group2,ou=Groups,dc=example,dc=com
> but there is no means to form a query that would return just the values of 
> "member" attributes. (ldap client tools are able to do by filtering out the 
> attributes on these entries.
> So it will be useful to have such support to be able to specify queries that 
> return groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-10 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416168#comment-15416168
 ] 

Abdullah Yousufi commented on HIVE-14373:
-

Hey [~poeppt], that's a pretty good idea and also the ideal behavior for 
blobstore testing. We'd need to investigate how that would work and what 
changes would be necessary on hiveqa. Perhaps it would make more sense as a 
follow up ticket once this change is in? Let me know if that sounds reasonable. 
Also, thanks for your reviews on the patch as well!

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Attachment: HIVE-14396.1.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-10 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Status: Patch Available  (was: Open)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14507) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters failure

2016-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-14507.
--
Resolution: Duplicate

Duplicate of HIVE-14420. The issue is with tez counters not being published 
correctly from RS. 

> org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
>  failure
> 
>
> Key: HIVE-14507
> URL: https://issues.apache.org/jira/browse/HIVE-14507
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> Fails locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14504) tez_join_hash.q test is slow

2016-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14504:
-
Status: Patch Available  (was: Open)

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14504) tez_join_hash.q test is slow

2016-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14504:
-
Attachment: HIVE-14504.1.patch

This patch removes explicitly setting execution engine to mr. Initially 
explicit usage of execution engine is used to check correctness with MR. I 
reran the test using TestCliDriver to check correctness with MR. After 
separation the test now runs <4 mins in Tez and 1m 30s in MR. If both runs 
parallel this should be much better than what we had before. 

[~sseth] can you please review?

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416108#comment-15416108
 ] 

Sergey Shelukhin commented on HIVE-14448:
-

+1 pending tests

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.N

[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-14483:
---
Attachment: (was: 0001-HIVE-14483.patch)

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Zadoroshnyak
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14426) Extensive logging on info level in WebHCat

2016-08-10 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14426:
--
Attachment: HIVE-14426.3.patch

Instead of changing the log level, an explicit parameter is added to remove the 
possibility of unintended data leak.

> Extensive logging on info level in WebHCat
> --
>
> Key: HIVE-14426
> URL: https://issues.apache.org/jira/browse/HIVE-14426
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14426.2.patch, HIVE-14426.3.patch, HIVE-14426.patch
>
>
> There is an extensive logging in WebHCat at info level, and even some 
> sensitive information could be logged



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416083#comment-15416083
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


After upgrading into Hive 2.1.0, we only found exception for 
StringDirectTreeReader. 

But, I think, that you should ask [~owen.omalley]- he is responsible for user 
story https://issues.apache.org/jira/browse/HIVE-12159 and 

he keeps silent..

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Zadoroshnyak
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14515) Schema evolution uses slow INSERT INTO .. VALUES

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14515:

Summary: Schema evolution uses slow INSERT INTO .. VALUES  (was: Schema 
evolution use slow INSERT INTO .. VALUES)

> Schema evolution uses slow INSERT INTO .. VALUES
> 
>
> Key: HIVE-14515
> URL: https://issues.apache.org/jira/browse/HIVE-14515
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
>
> Use LOAD DATA LOCAL INPATH and INSERT INTO TABLE ... SELECT * FROM instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416077#comment-15416077
 ] 

Matt McCline commented on HIVE-14448:
-

Submitted new approach.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.Na

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: Patch Available  (was: In Progress)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Nat

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Attachment: HIVE-14448.01.patch

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: In Progress  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Nat

[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416074#comment-15416074
 ] 

Hive QA commented on HIVE-14405:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12821686/HIVE-14405.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10460 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/844/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/844/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-844/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12821686 - PreCommit-HIVE-MASTER-Build

> Have tests log to the console along with hive.log
> -
>
> Key: HIVE-14405
> URL: https://issues.apache.org/jira/browse/HIVE-14405
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14405.01.patch
>
>
> When running tests from the IDE (not itests), logs end up going to hive.log - 
> making it difficult to debug tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14270:
---
Attachment: HIVE-14270.6.patch

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch, HIVE-14270.6.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415995#comment-15415995
 ] 

Eugene Koifman commented on HIVE-14199:
---

I think what [~gopalv] meant by legacy is "transactional_properties=legacy"  
though the issues with streaming/bucketing were fixed long ago in HIVE-11983

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2

2016-08-10 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415977#comment-15415977
 ] 

Vihang Karajgaonkar commented on HIVE-14063:


It is a good suggestion to include these values in hive-site.xml with the 
advantages of leveraging existing configuration frameworks to deliver the 
configs to Beeline. But IMO there are the following problems with using 
hive-site.xml

1. Beeline is in theory a general purpose JDBC client. As far as I understand 
it was not designed to be hive-only client. Adding these values to 
hive-site.xml is essentially tying this feature to hive specific environments. 
While one may argue that Beeline is almost always used for Hive, that is 
something that the larger community in general has to decide to whether to 
treat it as hive-specific JDBC client or not.

2. Different users could potentially have different connection endpoints like 
urls, credentials (even authentication mechanisms for different HS2 instances?) 
which means we need to store the user-specific information in some other place 
(environment variables/configuration file) I don't see any particular 
advantages of using two sources of information for getting a few configuration 
values since we will have to read both the files in any case. Similarly, user 
specific configurations like hive variables and hiveConf need to be separated 
from the the all-purpose general hive-site.xml. All this file will contain is 
the user-friendly way to describe connection url (which includes hosts, 
authentication, default db, hivevars and user-specific hiveConf)

> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14063
> URL: https://issues.apache.org/jira/browse/HIVE-14063
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: beeline.conf.template
>
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11525) Bucket pruning

2016-08-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415976#comment-15415976
 ] 

Eugene Koifman commented on HIVE-11525:
---

For completeness: The streaming API was fixed in HIVE-11983

> Bucket pruning
> --
>
> Key: HIVE-11525
> URL: https://issues.apache.org/jira/browse/HIVE-11525
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0
>Reporter: Maciek Kocon
>Assignee: Gopal V
>  Labels: TODOC2.0
> Fix For: 2.0.0
>
> Attachments: HIVE-11525.1.patch, HIVE-11525.2.patch, 
> HIVE-11525.3.patch, HIVE-11525.WIP.patch
>
>
> Logically and functionally bucketing and partitioning are quite similar - 
> both provide mechanism to segregate and separate the table's data based on 
> its content. Thanks to that significant further optimisations like 
> [partition] PRUNING or [bucket] MAP JOIN are possible.
> The difference seems to be imposed by design where the PARTITIONing is 
> open/explicit while BUCKETing is discrete/implicit.
> Partitioning seems to be very common if not a standard feature in all current 
> RDBMS while BUCKETING seems to be HIVE specific only.
> In a way BUCKETING could be also called by "hashing" or simply "IMPLICIT 
> PARTITIONING".
> Regardless of the fact that these two are recognised as two separate features 
> available in Hive there should be nothing to prevent leveraging same existing 
> query/join optimisations across the two.
> BUCKET pruning
> Enable partition PRUNING equivalent optimisation for queries on BUCKETED 
> tables
> Simplest example is for queries like:
> "SELECT … FROM x WHERE colA=123123"
> to read only the relevant bucket file rather than all file-buckets that 
> belong to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases

2016-08-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14511:
---
Status: Patch Available  (was: Open)

[~ashutoshc], could u take a look? Thanks. Actually, i found that if you have a 
p1=1/p2=1/files and also p1=1/p2=2/p3=3/files, the current master will check if 
#partition specifications match and will throw exception.

> Improve MSCK for partitioned table to deal with special cases
> -
>
> Key: HIVE-14511
> URL: https://issues.apache.org/jira/browse/HIVE-14511
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14511.01.patch
>
>
> Some users will have a folder rather than a file under the last partition 
> folder. However, msck is going to search for the leaf folder rather than the 
> last partition folder. We need to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases

2016-08-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14511:
---
Attachment: HIVE-14511.01.patch

> Improve MSCK for partitioned table to deal with special cases
> -
>
> Key: HIVE-14511
> URL: https://issues.apache.org/jira/browse/HIVE-14511
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14511.01.patch
>
>
> Some users will have a folder rather than a file under the last partition 
> folder. However, msck is going to search for the leaf folder rather than the 
> last partition folder. We need to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415940#comment-15415940
 ] 

Sergey Shelukhin commented on HIVE-14483:
-

Patch looks good to me... attaching the same patch for HiveQA (with the 
expected name pattern). Is similar fix needed in other places for other complex 
readers?

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Zadoroshnyak
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14483:

Assignee: Sergey Zadoroshnyak  (was: Sergey Shelukhin)

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Zadoroshnyak
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14483:

Attachment: HIVE-14483.01.patch

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Shelukhin
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-14483:
---

Assignee: Sergey Shelukhin  (was: Owen O'Malley)

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Sergey Shelukhin
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch, HIVE-14483.01.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases

2016-08-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415905#comment-15415905
 ] 

Pengcheng Xiong commented on HIVE-14511:


[~sershe], thanks for your attention. i have exactly the same concern as you. 
But I can not convince [~ashutoshc]. :)

> Improve MSCK for partitioned table to deal with special cases
> -
>
> Key: HIVE-14511
> URL: https://issues.apache.org/jira/browse/HIVE-14511
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> Some users will have a folder rather than a file under the last partition 
> folder. However, msck is going to search for the leaf folder rather than the 
> last partition folder. We need to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14376) Schema evolution tests takes a long time

2016-08-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415898#comment-15415898
 ] 

Matt McCline commented on HIVE-14376:
-

+1 LGTM

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-10 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14501:
--
Fix Version/s: 2.2.0

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14462) Reduce number of partition check calls in add_partitions

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415848#comment-15415848
 ] 

Hive QA commented on HIVE-14462:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823006/HIVE-14462.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10459 tests 
executed
*Failed tests:*
{noformat}
TestHS2ImpersonationWithRemoteMS - did not produce a TEST-*.xml file
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_exist
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partitions_json
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_add_partition_with_whitelist
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/843/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/843/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-843/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823006 - PreCommit-HIVE-MASTER-Build

> Reduce number of partition check calls in add_partitions
> 
>
> Key: HIVE-14462
> URL: https://issues.apache.org/jira/browse/HIVE-14462
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, 
> HIVE-14462.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415819#comment-15415819
 ] 

Siddharth Seth commented on HIVE-14502:
---

I don't think we should move MiniLlap to use MiniHbase - it is not the default 
at the moment. Maybe retain a few tests on MiniTez which can run with 
MiniHBase. Setup times for MiniHBase metatstore based tests is 3 minutes. 1 
minute for regular tests. The 1 minute will be cut down after HIVE-13496. A 
similar effort could be taken up for MiniHbase.

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14448:
--
Priority: Critical  (was: Major)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> su

[jira] [Commented] (HIVE-14405) Have tests log to the console along with hive.log

2016-08-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415812#comment-15415812
 ] 

Siddharth Seth commented on HIVE-14405:
---

Looks like all of that is already in place. Re-triggering a jenkins run to see 
what this looks like. May need to change the console logging to INFO level (and 
let default debug logs go to hive.log)

> Have tests log to the console along with hive.log
> -
>
> Key: HIVE-14405
> URL: https://issues.apache.org/jira/browse/HIVE-14405
> Project: Hive
>  Issue Type: Task
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14405.01.patch
>
>
> When running tests from the IDE (not itests), logs end up going to hive.log - 
> making it difficult to debug tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415807#comment-15415807
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


[~sershe]

Do you know who is responsible for Hive ORC module?



>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14511) Improve MSCK for partitioned table to deal with special cases

2016-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415794#comment-15415794
 ] 

Sergey Shelukhin commented on HIVE-14511:
-

Where does that folder come from? I think if it's for the use case where people 
use msck to "import" manually created partitions, we should not encourage it, 
it's not even a supported scenario. If it continues to be misused, at least it 
can be misused correctly :)
Otherwise, did Hive create those directories with nested directories? Is it for 
ACID case or something like that?

> Improve MSCK for partitioned table to deal with special cases
> -
>
> Key: HIVE-14511
> URL: https://issues.apache.org/jira/browse/HIVE-14511
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> Some users will have a folder rather than a file under the last partition 
> folder. However, msck is going to search for the leaf folder rather than the 
> last partition folder. We need to improve that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14376) Schema evolution tests takes a long time

2016-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14376:
-
Attachment: HIVE-14376.2.patch

Rebased patch

> Schema evolution tests takes a long time
> 
>
> Key: HIVE-14376
> URL: https://issues.apache.org/jira/browse/HIVE-14376
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>Priority: Minor
> Attachments: HIVE-14376.1.patch, HIVE-14376.2.patch
>
>
> schema_evol_* tests (14 tests) takes 47 mins in tez, 40mins in TestCliDriver. 
> Same set of tests are added to llap as well in HIVE-14355 which will some 
> more time. Most tests uses INSERT into table which can be slow and is not 
> needed. Even some individual tests takes about 5 mins to run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14453) refactor physical writing of ORC data and metadata to FS from the logical writers

2016-08-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415784#comment-15415784
 ] 

Sergey Shelukhin commented on HIVE-14453:
-

Jdbc failures are unrelated.

> refactor physical writing of ORC data and metadata to FS from the logical 
> writers
> -
>
> Key: HIVE-14453
> URL: https://issues.apache.org/jira/browse/HIVE-14453
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14453.01.patch, HIVE-14453.patch
>
>
> ORC data doesn't have to go directly into an HDFS stream via buffers, it can 
> go somewhere else (e.g. a write-thru cache, or an addressable system that 
> doesn't require the stream blocks to be held in memory before writing them 
> all together).
> To that effect, it would be nice to abstract the data block/metadata 
> structure creating from the physical file concerns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-10 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14501:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review! Committed to master. 

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14502) Convert MiniTez tests to MiniLlap tests

2016-08-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415760#comment-15415760
 ] 

Prasanth Jayachandran commented on HIVE-14502:
--

Yeah. That will be taken care off. IIRC [~sseth] was mentioning 3 mins setup 
time for hbase cluster. Which we have to cut down. 

> Convert MiniTez tests to MiniLlap tests
> ---
>
> Key: HIVE-14502
> URL: https://issues.apache.org/jira/browse/HIVE-14502
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
>
> Llap shares most of the codepath with tez. MiniLlapCliDriver is much faster 
> than MiniTezCliDriver because of threaded executors and caching. 
> MiniTezCliDriver tests takes around 3hr 15mins to run around 400 tests. To 
> cut down this test time significantly it makes sense to move over mive tez 
> tests to mini llap tests.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-10 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415756#comment-15415756
 ] 

Siddharth Seth commented on HIVE-14501:
---

+1

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-10 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415736#comment-15415736
 ] 

Thomas Poepping commented on HIVE-14373:


Here is an example of how S3A proposes to solve a similar problem: 
https://issues.apache.org/jira/browse/HADOOP-13446

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-10 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415696#comment-15415696
 ] 

Thomas Poepping commented on HIVE-14373:


The build server is part of AmazonEC2, is it not? Why would we not allow hiveqa 
to run these tests using those credentials, while still keeping them out of the 
source? It would be easier for committers and contributors if it was obvious 
that their change might break something, without relying on committers to have 
to run the tests manually.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14479) Add some join tests for acid table

2016-08-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415645#comment-15415645
 ] 

Eugene Koifman commented on HIVE-14479:
---

For the test in TestTxnCommands2, I would add a check that compaction actually 
ran.
See testCompactWithDelete() for example.  Otherwise, you'll never know if 
compaction failed and did nothing.

Also, in .q files, I see various options being set to trigger specific type of 
join.  Is that not needed in TestTxnCommands2?

> Add some join tests for acid table
> --
>
> Key: HIVE-14479
> URL: https://issues.apache.org/jira/browse/HIVE-14479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-14479.1.patch, HIVE-14479.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-10 Thread Abdullah Yousufi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415622#comment-15415622
 ] 

Abdullah Yousufi commented on HIVE-14373:
-

Hey [~yalovyyi], currently the best way to switch between different s3 clients 
would be to use the different key names in auth-keys.xml. I created 
auth-keys.xml.template as an s3a example, but that could be easily changed for 
s3n. However, I agree that the bucket variable name in that file should not be 
specific to s3a. Also thanks a ton for the review on the patch, I'll get to 
that shortly.

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13754) Fix resource leak in HiveClientCache

2016-08-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-13754:

  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Resolved  (was: Patch Available)

> Fix resource leak in HiveClientCache
> 
>
> Key: HIVE-13754
> URL: https://issues.apache.org/jira/browse/HIVE-13754
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Fix For: 2.0.0
>
> Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, 
> HIVE-13754.1.patch, HIVE-13754.patch
>
>
> Found that the {{users}} reference count can go into negative values, which 
> prevents {{tearDownIfUnused}} from closing the client connection when called.
> This leads to a build up of clients which have been evicted from the cache, 
> are no longer in use, but have not been shutdown.
> GC will eventually call {{finalize}}, which forcibly closes the connection 
> and cleans up the client, but I have seen as many as several hundred open 
> client connections as a result.
> The main resource for this is caused by RetryingMetaStoreClient, which will 
> call {{reconnect}} on acquire, which calls {{close}}. This will decrement 
> {{users}} to -1 on the reconnect, then acquire will increase this to 0 while 
> using it, and back to -1 when it releases it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13754) Fix resource leak in HiveClientCache

2016-08-10 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415599#comment-15415599
 ] 

Mithun Radhakrishnan commented on HIVE-13754:
-

Committed to master. Thanks, [~cdrome]!

> Fix resource leak in HiveClientCache
> 
>
> Key: HIVE-13754
> URL: https://issues.apache.org/jira/browse/HIVE-13754
> Project: Hive
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Fix For: 2.0.0
>
> Attachments: HIVE-13754-branch-1.patch, HIVE-13754.1-branch-1.patch, 
> HIVE-13754.1.patch, HIVE-13754.patch
>
>
> Found that the {{users}} reference count can go into negative values, which 
> prevents {{tearDownIfUnused}} from closing the client connection when called.
> This leads to a build up of clients which have been evicted from the cache, 
> are no longer in use, but have not been shutdown.
> GC will eventually call {{finalize}}, which forcibly closes the connection 
> and cleans up the client, but I have seen as many as several hundred open 
> client connections as a result.
> The main resource for this is caused by RetryingMetaStoreClient, which will 
> call {{reconnect}} on acquire, which calls {{close}}. This will decrement 
> {{users}} to -1 on the reconnect, then acquire will increase this to 0 while 
> using it, and back to -1 when it releases it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query

2016-08-10 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-13756:

  Resolution: Fixed
   Fix Version/s: 2.0.0
Target Version/s: 2.0.0, 1.2.1  (was: 1.2.1, 2.0.0)
  Status: Resolved  (was: Patch Available)

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> 
>
> Key: HIVE-13756
> URL: https://issues.apache.org/jira/browse/HIVE-13756
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Fix For: 2.0.0
>
> Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13756) Map failure attempts to delete reducer _temporary directory on multi-query pig query

2016-08-10 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415574#comment-15415574
 ] 

Mithun Radhakrishnan commented on HIVE-13756:
-

Committed to master. Thanks, [~cdrome]!

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> 
>
> Key: HIVE-13756
> URL: https://issues.apache.org/jira/browse/HIVE-13756
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14462) Reduce number of partition check calls in add_partitions

2016-08-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14462:

Status: Patch Available  (was: Open)

> Reduce number of partition check calls in add_partitions
> 
>
> Key: HIVE-14462
> URL: https://issues.apache.org/jira/browse/HIVE-14462
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, 
> HIVE-14462.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-10 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12181:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master.

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, 
> HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, 
> HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, 
> HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415454#comment-15415454
 ] 

Hive QA commented on HIVE-14418:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822926/HIVE-14418.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 65 failed/errored test(s), 10460 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_00_nonpart_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_01_nonpart
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_00_part_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_03_nonpart_over_compat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_all_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_05_some_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_06_one_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_07_all_part_over_nonoverlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_08_nonpart_rename
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_09_part_spec_nonoverlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_10_external_managed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_11_managed_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_12_external_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_13_managed_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_14_managed_location_over_existing
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_15_external_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_16_part_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_17_part_managed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_18_part_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_19_00_part_external_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_19_part_external_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_20_part_managed_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_21_export_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_22_import_exist_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_23_import_part_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_24_import_nonexist_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_25_export_parentpath_has_inaccessible_children
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_hidden_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repl_1_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repl_2_exim_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_repl_3_exim_metadata
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_export
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_00_unsupported_schema
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_01_nonpart_over_loaded
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_02_all_part_over_overlap
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_03_nonpart_noncompat_colschema
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_04_nonpart_noncompat_colnumber
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_05_nonpart_noncompat_coltype
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_06_nonpart_noncompat_storage
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_07_nonpart_noncompat_ifof
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_08_nonpart_noncompat_serde
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_09_nonpart_noncompat_serdeparam
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_10_nonpart_noncompat_bucketing
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_11_nonpart_noncompat_sorting
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_exim_12_nonnative_export
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliD

[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

2016-08-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415405#comment-15415405
 ] 

Sergio Peña commented on HIVE-14270:


Thanks [~leftylev] to catch it. You have a good eye on these document issues :)

> Write temporary data to HDFS when doing inserts on tables located on S3
> ---
>
> Key: HIVE-14270
> URL: https://issues.apache.org/jira/browse/HIVE-14270
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch, HIVE-14270.5.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14063) beeline to auto connect to the HiveServer2

2016-08-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-14063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415393#comment-15415393
 ] 

Sergio Peña commented on HIVE-14063:


Sounds good idea to keep the configs on hive-site.xml just to follow the 
standard. However, we also want that users can have their own credentials in a 
place where Hive can look for them, like ~/.hive/hive-site.xml or any other 
place. 

> beeline to auto connect to the HiveServer2
> --
>
> Key: HIVE-14063
> URL: https://issues.apache.org/jira/browse/HIVE-14063
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: beeline.conf.template
>
>
> Currently one has to give an jdbc:hive2 url in order for Beeline to connect a 
> hiveserver2 instance. It would be great if Beeline can get the info somehow 
> (from a properties file at a well-known location?) and connect automatically 
> if user doesn't specify such a url. If the properties file is not present, 
> then beeline would expect user to provide the url and credentials using 
> !connect or ./beeline -u .. commands
> While Beeline is flexible (being a mere JDBC client), most environments would 
> have just a single HS2. Having users to manually connect into this via either 
> "beeline ~/.propsfile" or -u or !connect statements is lowering the 
> experience part.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14457) Partitions in encryption zone are still trashed though an exception is returned

2016-08-10 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14457:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to 2.2.0 and 2.1.1, thanks [~ychena] and [~spena] for review.

> Partitions in encryption zone are still trashed though an exception is 
> returned
> ---
>
> Key: HIVE-14457
> URL: https://issues.apache.org/jira/browse/HIVE-14457
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption, Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14457.patch
>
>
> drop_partition_common in HiveMetaStore still drops partitions in encryption 
> zone without PURGE even through it returns an exception. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14404) Allow delimiterfordsv to use multiple-character delimiters

2016-08-10 Thread Marta Kuczora (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-14404:
-
Status: Patch Available  (was: Open)

> Allow delimiterfordsv to use multiple-character delimiters
> --
>
> Key: HIVE-14404
> URL: https://issues.apache.org/jira/browse/HIVE-14404
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stephen Measmer
>Assignee: Marta Kuczora
> Attachments: HIVE-14404.patch
>
>
> HIVE-5871 allows for reading multiple character delimiters.  Would like the 
> ability to use outputformat=dsv and define multiple character delimiters.  
> Today  delimiterfordsv only uses on character even if multiple are passes.
> For example:
> when I use:
> beeline>!set outputformat dsv
> beeline>!set delimiterfordsv "^-^"
>  I get:
> 111201081253106275^31-Oct-2011 
> 00:00:00^Text^201605232823^2016051968232151^201605232823_2016051968232151_0_0_1
>  
> Would like it to be:
> 111201081253106275^-^31-Oct-2011 
> 00:00:00^-^Text^-^201605232823^-^2016051968232151^-^201605232823_2016051968232151_0_0_1
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14404) Allow delimiterfordsv to use multiple-character delimiters

2016-08-10 Thread Marta Kuczora (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora updated HIVE-14404:
-
Status: Open  (was: Patch Available)

> Allow delimiterfordsv to use multiple-character delimiters
> --
>
> Key: HIVE-14404
> URL: https://issues.apache.org/jira/browse/HIVE-14404
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stephen Measmer
>Assignee: Marta Kuczora
> Attachments: HIVE-14404.patch
>
>
> HIVE-5871 allows for reading multiple character delimiters.  Would like the 
> ability to use outputformat=dsv and define multiple character delimiters.  
> Today  delimiterfordsv only uses on character even if multiple are passes.
> For example:
> when I use:
> beeline>!set outputformat dsv
> beeline>!set delimiterfordsv "^-^"
>  I get:
> 111201081253106275^31-Oct-2011 
> 00:00:00^Text^201605232823^2016051968232151^201605232823_2016051968232151_0_0_1
>  
> Would like it to be:
> 111201081253106275^-^31-Oct-2011 
> 00:00:00^-^Text^-^201605232823^-^2016051968232151^-^201605232823_2016051968232151_0_0_1
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-10 Thread Illya Yalovyy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415295#comment-15415295
 ] 

Illya Yalovyy commented on HIVE-14373:
--

[~abdullah],

Will it be possible to implement the test framework so it is easy to test other 
blob stores? Currently the configuration and the project relies on s3a 
configuration. I think it will be useful to be able to quickly switch to s3n or 
other implementations.
I would be glad to assist if required.

Thank you for this work. It looks really great and useful!

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415253#comment-15415253
 ] 

Hive QA commented on HIVE-12181:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822904/HIVE-12181.15.patch

{color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10445 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-dynamic_partition_pruning.q-vector_char_mapjoin1.q-unionDistinct_2.q-and-12-more
 - did not produce a TEST-*.xml file
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_tez_dynpart_hashjoin_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/839/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/839/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-839/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822904 - PreCommit-HIVE-MASTER-Build

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, 
> HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, 
> HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, 
> HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline reassigned HIVE-14448:
---

Assignee: Matt McCline  (was: Sergey Shelukhin)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.Nati

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415220#comment-15415220
 ] 

Matt McCline commented on HIVE-14448:
-

This is the area I was struggling with for HIVE-14214: ORC Schema Evolution and 
Predicate Push Down do not work together (no rows returned)
I'll take this over.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnC

[jira] [Commented] (HIVE-14373) Add integration tests for hive on S3

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415113#comment-15415113
 ] 

Hive QA commented on HIVE-14373:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822895/HIVE-14373.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/838/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/838/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-838/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822895 - PreCommit-HIVE-MASTER-Build

> Add integration tests for hive on S3
> 
>
> Key: HIVE-14373
> URL: https://issues.apache.org/jira/browse/HIVE-14373
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergio Peña
>Assignee: Abdullah Yousufi
> Attachments: HIVE-14373.patch
>
>
> With Hive doing improvements to run on S3, it would be ideal to have better 
> integration testing on S3.
> These S3 tests won't be able to be executed by HiveQA because it will need 
> Amazon credentials. We need to write suite based on ideas from the Hadoop 
> project where:
> - an xml file is provided with S3 credentials
> - a committer must run these tests manually to verify it works
> - the xml file should not be part of the commit, and hiveqa should not run 
> these tests.
> https://wiki.apache.org/hadoop/HowToContribute#Submitting_patches_against_object_stores_such_as_Amazon_S3.2C_OpenStack_Swift_and_Microsoft_Azure



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14462) Reduce number of partition check calls in add_partitions

2016-08-10 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14462:

Attachment: HIVE-14462.3.patch

> Reduce number of partition check calls in add_partitions
> 
>
> Key: HIVE-14462
> URL: https://issues.apache.org/jira/browse/HIVE-14462
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, 
> HIVE-14462.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14462) Reduce number of partition check calls in add_partitions

2016-08-10 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14462:

Status: Open  (was: Patch Available)


In some corner cases, it is possible that partitions can have nested & multiple 
directories. (e.g table/ii=1/jj=15/q=10/r=20/s=30/00_0, 
table/ii=1/jj=15/q=11/r=22/s=33/00_0 where in ii and jj are the only 
partition columns).
{{HiveMetastoreChecker.getPartitionName}} ends up resolving partition names as 
"ii=1/jj=15/q=11/r=22/s=33" and "ii=1/jj=15/q=10/r=20/s=30".  
When msck is run, it would end up throwing duplicate partitions exception for 
ii=1, jj=15 in MS. msck falls back to {{msckAddPartitionsOneByOne}}, which 
tries to repair one partition at a time and ignores any exceptions. So job 
completes essentially, but ends up making lots of calls to MS and can be too 
slow. I will attach the latest patch in RB

Without Patch:
=
msck runtime for 1 partitions in small cluster: *370 seconds*

With Patch:
===
msck runtime for 1 partitions in small cluster: *62 seconds*

> Reduce number of partition check calls in add_partitions
> 
>
> Key: HIVE-14462
> URL: https://issues.apache.org/jira/browse/HIVE-14462
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, 
> HIVE-14462.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415069#comment-15415069
 ] 

Sergey Zadoroshnyak edited comment on HIVE-14483 at 8/10/16 10:21 AM:
--

please ignore this comment


was (Author: spring):
please ingore this comment

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415080#comment-15415080
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


[~owen.omalley] [~prasanth_j]

Could you please review  pull request?

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415067#comment-15415067
 ] 

Sergey Zadoroshnyak edited comment on HIVE-14483 at 8/10/16 10:21 AM:
--

please ignore this comment


was (Author: spring):
please ingore this comment

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-14483:
---
Attachment: 0001-HIVE-14483.patch

Fix java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-14483:
---
Comment: was deleted

(was: Fix java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays)

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: 0001-HIVE-14483.patch
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-14483:
---
Status: Patch Available  (was: Open)

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415068#comment-15415068
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


please ingore this comment

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Zadoroshnyak updated HIVE-14483:
---
Comment: was deleted

(was: please ingore this comment)

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415069#comment-15415069
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


please ingore this comment

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread Sergey Zadoroshnyak (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415067#comment-15415067
 ] 

Sergey Zadoroshnyak commented on HIVE-14483:


please ingore this comment

>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415048#comment-15415048
 ] 

ASF GitHub Bot commented on HIVE-14483:
---

GitHub user szador opened a pull request:

https://github.com/apache/hive/pull/96

HIVE-14483

Fix java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/szador/hive HIVE-14483

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/96.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #96






>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415022#comment-15415022
 ] 

ASF GitHub Bot commented on HIVE-14483:
---

Github user szador closed the pull request at:

https://github.com/apache/hive/pull/95


>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14483) java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

2016-08-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415018#comment-15415018
 ] 

ASF GitHub Bot commented on HIVE-14483:
---

GitHub user szador opened a pull request:

https://github.com/apache/hive/pull/95

HIVE-14483

Fix java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/szador/hive 
HIVE-14483_Fix_ArrayIndexOutOfBoundsException

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/95.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #95


commit c7c564141d1e960e33fd582017da19649ef9003d
Author: szador 
Date:   2016-08-10T09:26:22Z

HIVE-14483

Fix java.lang.ArrayIndexOutOfBoundsException 
org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays




>  java.lang.ArrayIndexOutOfBoundsException 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays
> --
>
> Key: HIVE-14483
> URL: https://issues.apache.org/jira/browse/HIVE-14483
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.0
>Reporter: Sergey Zadoroshnyak
>Assignee: Owen O'Malley
>Priority: Critical
> Fix For: 2.2.0
>
>
> Error message:
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
> at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
> at 
> org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
> at 
> org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
> at 
> org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
> at 
> org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
> at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> How to reproduce?
> Configure StringTreeReader  which contains StringDirectTreeReader as 
> TreeReader (DIRECT or DIRECT_V2 column encoding)
> batchSize = 1026;
> invoke method nextVector(ColumnVector previousVector,boolean[] isNull, final 
> int batchSize)
> scratchlcv is LongColumnVector with long[] vector  (length 1024)
>  which execute BytesColumnVectorUtil.readOrcByteArrays(stream, lengths, 
> scratchlcv,result, batchSize);
> as result in method commonReadByteArrays(stream, lengths, scratchlcv,
> result, (int) batchSize) we received 
> ArrayIndexOutOfBoundsException.
> If we use StringDictionaryTreeReader, then there is no exception, as we have 
> a verification  scratchlcv.ensureSize((int) batchSize, false) before 
> reader.nextVector(scratchlcv, scratchlcv.vector, batchSize);
> These changes were made for Hive 2.1.0 by corresponding commit 
> https://github.com/apache/hive/commit/0ac424f0a17b341efe299da167791112e4a953e9#diff-a1cec556fb2db4b69a1a4127a6908177R1467
>  for task  https://issues.apache.org/jira/browse/HIVE-12159 by Owen O'Malley
> How to fix?
> add  only one line :
> scratchlcv.ensureSize((int) batchSize, false) ;
> in method 
> org.apache.orc.impl.TreeReaderFactory#BytesColumnVectorUtil#commonReadByteArrays(InStream
>  stream, IntegerReader lengths,
> LongColumnVector scratchlcv,
> BytesColumnVector result, final int batchSize) before invocation 
> lengths.nextVector(scratchlcv, scratchlcv.vector, batchSize);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12176) NullPointerException in VectorUDAFMinString

2016-08-10 Thread Ashish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414949#comment-15414949
 ] 

Ashish Kumar commented on HIVE-12176:
-

Additionally, whenever we are facing issue "Hive Runtime Error while processing 
row" it should print the row also. This will be really helpful for debugging.

> NullPointerException in VectorUDAFMinString
> ---
>
> Key: HIVE-12176
> URL: https://issues.apache.org/jira/browse/HIVE-12176
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, UDF, Vectorization
>Affects Versions: 1.2.1
> Environment: HDP 2.3 + Kerberos
>Reporter: Hari Sekhon
>Assignee: Jitendra Nath Pandey
>Priority: Critical
> Attachments: hive-nullpointexception-mr.txt, 
> hive-nullpointexception-tez.txt
>
>
> Hive gets the following NullPointerException when trying to go a group by 
> aggregate.
> This occurs whether the engine is Tez or MR, but I'm told this works on our 
> other cluster which is HDP 2.2.
> I'm attaching the full outputs from the hive session, but here is the crux of 
> it:
> {code}
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:52)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
> ... 8 more
> Caused by: java.lang.NullPointerException
> at java.lang.System.arraycopy(Native Method)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFMinString$Aggregation.assign(VectorUDAFMinString.java:78)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFMinString$Aggregation.checkValue(VectorUDAFMinString.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.aggregates.gen.VectorUDAFMinString.aggregateInput(VectorUDAFMinString.java:279)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeBase.processAggregators(VectorGroupByOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:335)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.process(VectorGroupByOperator.java:880)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator.process(VectorFilterOperator.java:117)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
> at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
> ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7765) Null pointer error with UNION ALL on partitioned tables using Tez

2016-08-10 Thread Ashish Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414941#comment-15414941
 ] 

Ashish Kumar commented on HIVE-7765:


What's the fix version for this issue?

> Null pointer error with UNION ALL on partitioned tables using Tez
> -
>
> Key: HIVE-7765
> URL: https://issues.apache.org/jira/browse/HIVE-7765
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 0.13.1
> Environment: Tez 0.4.1, Ubuntu 12.04, Hadoop 2.4.1;
> Hadoop 2.2.6, Tez 0.5.2, Hive 0.14.0, CentOS 6.6
>Reporter: Chris Dragga
>
> When executing a UNION ALL query in Tez over partitioned tables where at 
> least one table is empty, Hive fails to execute the query, returning the 
> message "FAILED: NullPointerException null".  No stack trace accompanies this 
> message.  Removing partitioning solves this problem, as does switching to 
> MapReduce as the execution engine.
> This can be reproduced using a variant of the example tables from the 
> "Getting Started" documentation on the Hive wiki.  To create the schema, use
> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
> CREATE TABLE empty_invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
> Then, load invites with data (e.g., using the instructions 
> [here|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations])
>  and execute the following:
> SELECT * FROM invites
> UNION ALL
> SELECT * FROM empty_invites;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14501) MiniTez test for union_type_chk.q is slow

2016-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414926#comment-15414926
 ] 

Hive QA commented on HIVE-14501:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12822872/HIVE-14501.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10460 tests 
executed
*Failed tests:*
{noformat}
TestMsgBusConnection - did not produce a TEST-*.xml file
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/837/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/837/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-837/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12822872 - PreCommit-HIVE-MASTER-Build

> MiniTez test for union_type_chk.q is slow
> -
>
> Key: HIVE-14501
> URL: https://issues.apache.org/jira/browse/HIVE-14501
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14501.1.patch
>
>
> union_type_chk.q runs on minimr and minitez but the test itself explicitly 
> sets execution engine as mr. It takes around 10 mins to run this test. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14340) Add a new hook triggers before query compilation and after query execution

2016-08-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414920#comment-15414920
 ] 

Lefty Leverenz commented on HIVE-14340:
---

Doc note:  This adds *hive.query.lifetime.hooks* to HiveConf.java in release 
2.2.0, so it will need to be documented in the Configuration Properties wikidoc.

* [Configuration Properties -- Query and DDL Execution | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution]

> Add a new hook triggers before query compilation and after query execution
> --
>
> Key: HIVE-14340
> URL: https://issues.apache.org/jira/browse/HIVE-14340
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>  Labels: TODOC2.2
> Fix For: 2.2.0
>
> Attachments: HIVE-14340.0.patch, HIVE-14340.1.patch, 
> HIVE-14340.2.patch
>
>
> In some cases we may need to have a hook that activates before a query 
> compilation and after its execution. For instance, dynamically generate a UDF 
> specifically for the running query and clean up the resource after the query 
> is done. The current hooks only covers pre & post semantic analysis, pre & 
> post query execution, which doesn't fit the requirement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414874#comment-15414874
 ] 

Lefty Leverenz commented on HIVE-14035:
---

So did I.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14412) Add a timezone-aware timestamp

2016-08-10 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414860#comment-15414860
 ] 

Rui Li commented on HIVE-14412:
---

If I run {{mvn dependency:tree}}, only sshj-0.8.1 is found in the dependency 
tree, which doesn't have this unbounded jdk version range in its pom. Not sure 
why newer versions of sshj (e.g. 0.10.0, 0.10.1-SNAPSHOT) is also downloaded 
during build.

> Add a timezone-aware timestamp
> --
>
> Key: HIVE-14412
> URL: https://issues.apache.org/jira/browse/HIVE-14412
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-14412.1.patch, HIVE-14412.1.patch
>
>
> Java's Timestamp stores the time elapsed since the epoch. While it's by 
> itself unambiguous, ambiguity comes when we parse a string into timestamp, or 
> convert a timestamp to string, causing problems like HIVE-14305.
> To solve the issue, I think we should make timestamp aware of timezone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-08-10 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414845#comment-15414845
 ] 

Lefty Leverenz commented on HIVE-14322:
---

Doc note:  This adds *datanucleus.rdbms.initializeColumnInfo* to HiveConf.java 
in 2.0.2, 2.1.1, and 2.2.0 so it will need to be documented in the MetaStore 
section of Configuration Properties for those releases.

* [Configuration Properties -- MetaStore | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-MetaStore]

What other values can be used, besides NONE?

Added labels TODOC2.0 (for 2.0.2), TODOC2.1.1, and TODOC2.2 (not needed if 
2.1.1 is released before 2.2.0).

> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0, TODOC2.1.1, TODOC2.2
> Fix For: 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-14322.02.patch, HIVE-14322.03.patch, 
> HIVE-14322.04.patch, HIVE-14322.1.patch
>
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14322) Postgres db issues after Datanucleus 4.x upgrade

2016-08-10 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-14322:
--
Labels: TODOC2.0 TODOC2.1.1 TODOC2.2  (was: )

> Postgres db issues after Datanucleus 4.x upgrade
> 
>
> Key: HIVE-14322
> URL: https://issues.apache.org/jira/browse/HIVE-14322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Thejas M Nair
>Assignee: Sergey Shelukhin
>  Labels: TODOC2.0, TODOC2.1.1, TODOC2.2
> Fix For: 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-14322.02.patch, HIVE-14322.03.patch, 
> HIVE-14322.04.patch, HIVE-14322.1.patch
>
>
> With the upgrade to  datanucleus 4.x versions in HIVE-6113, hive does not 
> work properly with postgres.
> The nullable fields in the database have string "NULL::character varying" 
> instead of real NULL values. This causes various issues.
> One example is -
> {code}
> hive> create table t(i int);
> OK
> Time taken: 1.9 seconds
> hive> create view v as select * from t;
> OK
> Time taken: 0.542 seconds
> hive> select * from v;
> FAILED: SemanticException Unable to fetch table v. 
> java.net.URISyntaxException: Relative path in absolute URI: 
> NULL::character%20varying
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)