[jira] [Created] (HIVE-14569) Enable reuseForks for most modules

2016-08-17 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14569:
-

 Summary: Enable reuseForks for most modules
 Key: HIVE-14569
 URL: https://issues.apache.org/jira/browse/HIVE-14569
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth


Follow up from 
https://issues.apache.org/jira/browse/HIVE-14540?focusedCommentId=15422359&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15422359
Hive tests should run with reuseForks=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14568) Hive Decimal Returns NULL

2016-08-17 Thread gurmukh singh (JIRA)
gurmukh singh created HIVE-14568:


 Summary: Hive Decimal Returns NULL
 Key: HIVE-14568
 URL: https://issues.apache.org/jira/browse/HIVE-14568
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0, 1.0.0
 Environment: Centos 6.7, Hadoop 2.7.2
Reporter: gurmukh singh


Hi

I was under the impression that the bug: 
https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the same 
issue in Hive 1.0 and hive 1.2 as well.

hive> desc mul_table;
OK
prc decimal(38,28)
vol decimal(38,10)
Time taken: 0.068 seconds, Fetched: 2 row(s)

hive> select prc, vol, prc*vol as cost from mul_table;
OK
1.2 200 NULL
1.44200 NULL
2.14100 NULL
3.004   50  NULL
1.2 200 NULL
Time taken: 0.048 seconds, Fetched: 5 row(s)

Rather then returning NULL, it should give error or round off.

I understand that, I can use Double instead of decimal or can cast it, but 
still returning "Null" will make many things go unnoticed.

hive> desc mul_table2;
OK
prc double
vol decimal(14,10)
Time taken: 0.049 seconds, Fetched: 2 row(s)

hive> select * from mul_table2;
OK
1.4 200
1.34200
7.34100
7454533.354544  100
Time taken: 0.028 seconds, Fetched: 4 row(s)


hive> select prc, vol, prc*vol  as cost from mul_table2;
OK
7.34100 734.0
7.3410007340.0
1.0004  10001000.4
7454533.354544  100 7.454533354544E8   <- Wrong result
7454533.354544  10007.454533354544E9   <- Wrong result
Time taken: 0.025 seconds, Fetched: 5 row(s)


Casting:

hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from mul_table2;
OK
7.34100 734
7.3410007340
1.0004  10001000.4
7454533.354544  100 745453335.4544
7454533.354544  10007454533354.544
Time taken: 0.026 seconds, Fetched: 5 row(s) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 51194: HIVE-14566: LLAP IO reads timestamp wrongly

2016-08-17 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51194/
---

Review request for hive, Gopal V and Sergey Shelukhin.


Bugs: HIVE-14566
https://issues.apache.org/jira/browse/HIVE-14566


Repository: hive-git


Description
---

HIVE-14566: LLAP IO reads timestamp wrongly


Diffs
-

  itests/src/test/resources/testconfiguration.properties 
2c868074ee8dc51b800b7ecd930abea7793a221a 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java
 94e4750ddc3d4954820f819257f54be0b39e5f08 
  
llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
 687458681a12b8345e42a5d8669a6c5d3ebc2119 
  orc/src/java/org/apache/orc/impl/TreeReaderFactory.java 
e6fef918b22bf5fac83a4ece5576250476daff27 
  
ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
 b44da0689f98fe88147f77387e86b50fd3a9b6c4 
  ql/src/test/results/clientpositive/llap/orc_merge12.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/51194/diff/


Testing
---


Thanks,

Prasanth_J



[jira] [Created] (HIVE-14567) After enabling Hive Parquet Vectorization, POWER_TEST of query24 in TPCx-BB(BigBench) failed with 1TB scale factor, but successful with 3TB scale factor

2016-08-17 Thread KaiXu (JIRA)
KaiXu created HIVE-14567:


 Summary: After enabling Hive Parquet Vectorization, POWER_TEST of 
query24 in TPCx-BB(BigBench) failed with 1TB scale factor, but successful with 
3TB scale factor
 Key: HIVE-14567
 URL: https://issues.apache.org/jira/browse/HIVE-14567
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Hive
Affects Versions: 2.1.0
 Environment: Apache Hadoop2.6.0
Apache Hive2.1.0
JDK1.8.0_73
TPCx-BB 1.0.1
Reporter: KaiXu
Priority: Critical


We use TPCx-BB(BigBench) to evaluate the performance of Hive Parquet 
Vectorization in our local cluster(E5-2699 v3, 256G, 72 vcores, 1 master node + 
5 worker nodes). During our performance test, we found that query24 in TPCx-BB 
failed with 1TB scale factor, but it is successful with 3TB scale factor on the 
same conditions. We retried with 100GB/10GB/1GB scale factor, they all failed. 
That is to say, with smaller data scale it fails but larger data scale 
successes, which seems very unusual.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14566) LLAP IO reads timestamp wrongly

2016-08-17 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-14566:


 Summary: LLAP IO reads timestamp wrongly
 Key: HIVE-14566
 URL: https://issues.apache.org/jira/browse/HIVE-14566
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.0.1, 2.1.0, 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
Priority: Critical


HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap.
It reads timestamp wrongly.
{code:title=LLAP IO Enabled}
hive> select atimestamp1 from alltypesorc3xcols limit 10;
OK
1969-12-31 15:59:46.674
NULL
1969-12-31 15:59:55.787
1969-12-31 15:59:44.187
1969-12-31 15:59:50.434
1969-12-31 16:00:15.007
1969-12-31 16:00:07.021
1969-12-31 16:00:04.963
1969-12-31 15:59:52.176
1969-12-31 15:59:44.569
{code}

{code:title=LLAP IO Disabled}
hive> select atimestamp1 from alltypesorc3xcols limit 10;
OK
1969-12-31 15:59:46.674
NULL
1969-12-31 15:59:55.787
1969-12-31 15:59:44.187
1969-12-31 15:59:50.434
1969-12-31 16:00:14.007
1969-12-31 16:00:06.021
1969-12-31 16:00:03.963
1969-12-31 15:59:52.176
1969-12-31 15:59:44.569
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column

2016-08-17 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-14565:
---

 Summary: CBO (Calcite Return Path) Handle field access for nested 
column
 Key: HIVE-14565
 URL: https://issues.apache.org/jira/browse/HIVE-14565
 Project: Hive
  Issue Type: Sub-task
  Components: Logical Optimizer
Affects Versions: 2.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


ExprNodeConverter doesn't handle field access currently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 51193: HIVE-14358: Add metrics for number of queries executed for each execution engine (mr, spark, tez)

2016-08-17 Thread Barna Zsombor Klara

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51193/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-14358: Add metrics for number of queries executed for each execution 
engine (mr, spark, tez)


Diffs
-

  
common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsConstant.java
 9dc96f9c6412720a891b5c55e2074049c893d780 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 
183ed829ef1742e48539f8928293d56b77bc43c8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 
eeaa54320ffaa7ba5d6ebece80a0cb4aadc1dada 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 
ce1106d91db9ef75e7b425d5950f888bacbfb3e5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 
ac922ce486babe042984d87a7f7442cbfc11484f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 
0b494aa5548f8e6ae76e2d0eea9a7afb33961f97 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 
25c4514b34fb2ed4fc8b1238059bd9dc29d2741b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/mr/TestMapRedTask.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/mr/TestMapredLocalTask.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSparkTask.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 
53672a9783b4d13c5eed4ef01f5c16af568a0a41 

Diff: https://reviews.apache.org/r/51193/diff/


Testing
---

Ran the new unit tests in the ql project, everything was green.
Checked that the metrics for map reduce and spark tasks were appearing and 
being incremented correctly using JMX. 
Map reduce tasks were being created by a simple select statement containing a 
join.
Spark tasks were being created by the same query with the spark execution 
engine being used.
The metrics were correct across several beeline connections, and were reset 
once the HiveServer2 was restarted.
The metric collection can be turned on/off using the configuration variable 
"hive.server2.metrics.enabled". No errors/exceptions encountered when the 
metrics were disabled.

NB only the root tasks are incrementing the counter since the original jira was 
about counting the number of queries issued against each exeution engine, so a 
complex query resulting in more than one task should only count as one as per 
my understanding.


Thanks,

Barna Zsombor Klara



[jira] [Created] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.

2016-08-17 Thread zhihai xu (JIRA)
zhihai xu created HIVE-14564:


 Summary: Column Pruning generates out of order columns in 
SelectOperator which cause ArrayIndexOutOfBoundsException.
 Key: HIVE-14564
 URL: https://issues.apache.org/jira/browse/HIVE-14564
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.1.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical


Column Pruning generates out of order columns in SelectOperator which cause 
ArrayIndexOutOfBoundsException.

{code}
2016-07-26 21:49:24,390 FATAL [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
{"_col0":null,"_col1":0,"_col2":36,"_col3":"499ec44-6dd2-4709-a019-33d6d484ed90�\u0001U5�\u001c��\t\u001b�\u","_col4":"5264db53-d650-4678-9261-cdd51efab8bb","_col5":"cb5233dd-214a-4b0b-b43e-0f41befb5c5c","_col6":"","_col8":48,"_col9":null,"_col10":"1befb5c5c�\u00192016-06-09T15:31:15+00:00\u0002\u0005Rider\u0011svc-dash","_col11":64,"_col12":null,"_col13":null,"_col14":"ber.com�\u0001U5ߨP�\u0001U5ᷨider)
 - 
1000\u0005Rider\u0011svc-d...@uber.com�\u0001U4�;x�\u0001U5\u0004��\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col15":"","_col16":null}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377)
... 13 more
{code}

The exception is because the serialization and deserialization doesn't match.
The serialization by LazyBinarySerDe from previous MapReduce job used different 
order of columns. When the current MapReduce job deserialized the intermediate 
sequence file generated by previous MapReduce job, it will get corrupted data 
from the deserialization using wrong order of columns by LazyBinaryStruct. The 
unmatched columns between  serialization and deserialization caused by 
SelectOperator's Column Pruning {{ColumnPrunerSelectProc}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-17 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review146056
---



What about stop using the superCSV so that we can keep the 'dsv' format that 
can support singler and multiple characters?
I don't like the use of another 'dsv2' format for multiple ones. It might be 
confusing for users.

- Sergio Pena


On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> ---
> 
> (Updated Aug. 17, 2016, 2:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Bugs: HIVE-14404
> https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as 
> delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
> used. This library doesn’t support multiple characters as delimiter. Since 
> the same logic is used for generating csv2, tsv2 and dsv outputformats, I 
> decided not to change this logic, rather introduce a new outputformat (dsv2) 
> which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
> if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
> created two new child classes to separate the logic for single-character and 
> multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
> MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the 
> SeparatedValuesOutputFormat and moved the methods specific to the 
> single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
> moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string 
> in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   
> beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
> 66d9fd0 
>   
> beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/main/resources/BeeLine.properties 95b8fa1 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> ---
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



[jira] [Created] (HIVE-14563) StatsOptimizer treats NULL in a wrong way

2016-08-17 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-14563:
--

 Summary: StatsOptimizer treats NULL in a wrong way
 Key: HIVE-14563
 URL: https://issues.apache.org/jira/browse/HIVE-14563
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong


{code}
OSTHOOK: query: explain select count(key) from (select null as key from src)src
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: 1
  Processor Tree:
ListSink

PREHOOK: query: select count(key) from (select null as key from src)src
PREHOOK: type: QUERY
PREHOOK: Input: default@src
 A masked pattern was here 
POSTHOOK: query: select count(key) from (select null as key from src)src
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
 A masked pattern was here 
500
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 51191: Improve MSCK for partitioned table to deal with special cases

2016-08-17 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51191/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

HIVE-14511


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 
a164b12 
  ql/src/test/queries/clientnegative/msck_repair_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/msck_repair_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/msck_repair_2.q PRE-CREATION 
  ql/src/test/results/clientnegative/msck_repair_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/msck_repair_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/msck_repair_2.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/51191/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Created] (HIVE-14562) CBO (Calcite Return Path) Wrong results for limit + offset

2016-08-17 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-14562:
---

 Summary: CBO (Calcite Return Path) Wrong results for limit + offset
 Key: HIVE-14562
 URL: https://issues.apache.org/jira/browse/HIVE-14562
 Project: Hive
  Issue Type: Sub-task
  Components: Logical Optimizer
Affects Versions: 2.1.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


offset is missed altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Owen O'Malley
On Wed, Aug 17, 2016 at 10:46 AM, Alan Gates  wrote:

> +1 for making the API clean and easy for other projects to work with.  A
> few questions:
>
> 1) Would this also make it easier for Parquet and others to implement
> Hive’s ACID interfaces?
>

Currently the ACID interfaces haven't been moved over to storage-api,
although it would make sense to do so at some point.


>
> 2) Would we make any attempt to coordinate version numbers between Hive
> and the storage module, or would a given version of Hive just depend on a
> given version of the storage module?
>

The two options that I see are:

* Let the numbers run separately starting from 2.2.0.
* Tie the numbers together with an additional level of versioning (eg.
2.2.0.0).

I think that letting the two version numbers diverge is better in the long
term. For example, if you need to make an incompatible change, it is pretty
ugly to do it as a fourth level version number (eg. an incompatible change
from 2.2.0.0 to 2.2.0.1). At the beginning, I expect that storage-api would
move faster than Hive, but as it stabilizes I expect it might start moving
slower than Hive.

I'd propose that we have Hive's build use a released version of storage-api
rather than a snapshot.

Thoughts?

   Owen


> Alan.
>
> > On Aug 15, 2016, at 17:01, Owen O'Malley  wrote:
> >
> > All,
> >
> > As part of moving ORC out of Hive, we pulled all of the vectorization
> > storage and sarg classes into a separate module, which is named
> > storage-api.  Although it is currently only used by ORC, it could be used
> > by Parquet or Avro if they wanted to make a fast vectorized reader that
> > read directly in to Hive's VectorizedRowBatch without needing a shim or
> > data copy. Note that this is in many ways similar to pulling the Arrow
> > project out of Drill.
> >
> > This unfortunately still leaves us with a circular dependency between
> Hive
> > and ORC. I'd hoped that storage-api wouldn't change that much, but that
> > doesn't seem to be happening. As a result, ORC ends up shipping its own
> > fork of storage-api.
> >
> > Although we could make a new project for just the storage-api, I think it
> > would be better to make it a subproject of Hive that is released
> > independently.
> >
> > What do others think?
> >
> >   Owen
>
>


[jira] [Created] (HIVE-14561) Upgrade version of spring for ptest2 to work with Java8

2016-08-17 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-14561:
-

 Summary: Upgrade version of spring for ptest2 to work with Java8
 Key: HIVE-14561
 URL: https://issues.apache.org/jira/browse/HIVE-14561
 Project: Hive
  Issue Type: Task
Reporter: Siddharth Seth


3.2.1 does not work with Java8.
We could switch to 4.3.2.RELEASE or 3.2.16



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14560) Support exchange partition between s3 and hdfs tables

2016-08-17 Thread Abdullah Yousufi (JIRA)
Abdullah Yousufi created HIVE-14560:
---

 Summary: Support exchange partition between s3 and hdfs tables
 Key: HIVE-14560
 URL: https://issues.apache.org/jira/browse/HIVE-14560
 Project: Hive
  Issue Type: Bug
Reporter: Abdullah Yousufi
Assignee: Abdullah Yousufi
 Fix For: 2.2.0


{code}
alter table s3_tbl exchange partition (country='USA', state='CA') with table 
hdfs_tbl;
{code}
results in:
{code}
Error: Error while processing statement: FAILED: Execution Error, return code 1 
from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got 
exception: java.lang.IllegalArgumentException Wrong FS: 
s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: hdfs://localhost:9000) 
(state=08S01,code=1)
{code}
because the check for whether the s3 destination table path exists occurs on 
the hdfs filesystem.

Furthermore, exchanging between s3 to hdfs fails because the hdfs rename 
operation is not supported across filesystems. Fix uses copy + deletion in the 
case that the file systems differ.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Sushanth Sowmyan
+1 for having a separate storage-api project to define common interfaces
for people to develop against. It'll make things much easier to develop
against generically.

I'm okay(+0) with the sub-project idea as opposed to enthusiastic about it,
mostly because I have reservations that it'll encourage laziness and will
in practice wind up being tied to hive releases and dev and over time
assumptions of how hive works and what is available will bleed in. But,
still, having a motion of separation will definitely help.

On Aug 17, 2016 11:39, "Prasanth Jayachandran" <
pjayachand...@hortonworks.com> wrote:

> +1 for making it a subproject with separate (preferably shorter) release
> cycle. The module in itself is too small for a separate project. Also
> having a faster release cycle will resolve circular dependency and will
> help other projects make use of vectorization, sarg, bloom filter etc.
>
> For version management, how about adding another version after patch
> version i.e sub-project version?
> Example: 2.2.0.[0] will be storage api’s release version. Hive will always
> depend on 2.2.0-SNAPSHOT. I think maven will let us release modules with
> different versions. https://dev.c-ware.de/confluence/display/PUBLIC/
> Releasing+modules+of+a+multi-module+project+with+
> independent+version+numbers
>
> Thanks
> Prasanth
>
> > On Aug 17, 2016, at 10:46 AM, Alan Gates  wrote:
> >
> > +1 for making the API clean and easy for other projects to work with.  A
> few questions:
> >
> > 1) Would this also make it easier for Parquet and others to implement
> Hive’s ACID interfaces?
> >
> > 2) Would we make any attempt to coordinate version numbers between Hive
> and the storage module, or would a given version of Hive just depend on a
> given version of the storage module?
> >
> > Alan.
> >
> >> On Aug 15, 2016, at 17:01, Owen O'Malley  wrote:
> >>
> >> All,
> >>
> >> As part of moving ORC out of Hive, we pulled all of the vectorization
> >> storage and sarg classes into a separate module, which is named
> >> storage-api.  Although it is currently only used by ORC, it could be
> used
> >> by Parquet or Avro if they wanted to make a fast vectorized reader that
> >> read directly in to Hive's VectorizedRowBatch without needing a shim or
> >> data copy. Note that this is in many ways similar to pulling the Arrow
> >> project out of Drill.
> >>
> >> This unfortunately still leaves us with a circular dependency between
> Hive
> >> and ORC. I'd hoped that storage-api wouldn't change that much, but that
> >> doesn't seem to be happening. As a result, ORC ends up shipping its own
> >> fork of storage-api.
> >>
> >> Although we could make a new project for just the storage-api, I think
> it
> >> would be better to make it a subproject of Hive that is released
> >> independently.
> >>
> >> What do others think?
> >>
> >>  Owen
> >
> >
>
>


[jira] [Created] (HIVE-14559) Remove setting hive.execution.engine in qfiles

2016-08-17 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-14559:


 Summary: Remove setting hive.execution.engine in qfiles
 Key: HIVE-14559
 URL: https://issues.apache.org/jira/browse/HIVE-14559
 Project: Hive
  Issue Type: Sub-task
  Components: Test
Affects Versions: 2.2.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


Some qfiles are explicitly setting execution engine. If we run those tests on 
different Mini CliDriver's it could be very slow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Prasanth Jayachandran
+1 for making it a subproject with separate (preferably shorter) release cycle. 
The module in itself is too small for a separate project. Also having a faster 
release cycle will resolve circular dependency and will help other projects 
make use of vectorization, sarg, bloom filter etc.

For version management, how about adding another version after patch version 
i.e sub-project version? 
Example: 2.2.0.[0] will be storage api’s release version. Hive will always 
depend on 2.2.0-SNAPSHOT. I think maven will let us release modules with 
different versions. 
https://dev.c-ware.de/confluence/display/PUBLIC/Releasing+modules+of+a+multi-module+project+with+independent+version+numbers

Thanks
Prasanth 

> On Aug 17, 2016, at 10:46 AM, Alan Gates  wrote:
> 
> +1 for making the API clean and easy for other projects to work with.  A few 
> questions:
> 
> 1) Would this also make it easier for Parquet and others to implement Hive’s 
> ACID interfaces?
> 
> 2) Would we make any attempt to coordinate version numbers between Hive and 
> the storage module, or would a given version of Hive just depend on a given 
> version of the storage module?
> 
> Alan.
> 
>> On Aug 15, 2016, at 17:01, Owen O'Malley  wrote:
>> 
>> All,
>> 
>> As part of moving ORC out of Hive, we pulled all of the vectorization
>> storage and sarg classes into a separate module, which is named
>> storage-api.  Although it is currently only used by ORC, it could be used
>> by Parquet or Avro if they wanted to make a fast vectorized reader that
>> read directly in to Hive's VectorizedRowBatch without needing a shim or
>> data copy. Note that this is in many ways similar to pulling the Arrow
>> project out of Drill.
>> 
>> This unfortunately still leaves us with a circular dependency between Hive
>> and ORC. I'd hoped that storage-api wouldn't change that much, but that
>> doesn't seem to be happening. As a result, ORC ends up shipping its own
>> fork of storage-api.
>> 
>> Although we could make a new project for just the storage-api, I think it
>> would be better to make it a subproject of Hive that is released
>> independently.
>> 
>> What do others think?
>> 
>>  Owen
> 
> 



[jira] [Created] (HIVE-14558) Add support for listing views similar to "show tables"

2016-08-17 Thread Naveen Gangam (JIRA)
Naveen Gangam created HIVE-14558:


 Summary: Add support for listing views similar to "show tables"
 Key: HIVE-14558
 URL: https://issues.apache.org/jira/browse/HIVE-14558
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.0.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam


Users have been asking for such feature where they can get a lists of views 
separately.

So perhaps a syntax similar to "show tables" command?
show views [in/from ] []

Does it make sense to add such command? or is it not worth the effort?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Making storage-api a separately released artifact

2016-08-17 Thread Alan Gates
+1 for making the API clean and easy for other projects to work with.  A few 
questions:

1) Would this also make it easier for Parquet and others to implement Hive’s 
ACID interfaces?

2) Would we make any attempt to coordinate version numbers between Hive and the 
storage module, or would a given version of Hive just depend on a given version 
of the storage module?

Alan.

> On Aug 15, 2016, at 17:01, Owen O'Malley  wrote:
> 
> All,
> 
> As part of moving ORC out of Hive, we pulled all of the vectorization
> storage and sarg classes into a separate module, which is named
> storage-api.  Although it is currently only used by ORC, it could be used
> by Parquet or Avro if they wanted to make a fast vectorized reader that
> read directly in to Hive's VectorizedRowBatch without needing a shim or
> data copy. Note that this is in many ways similar to pulling the Arrow
> project out of Drill.
> 
> This unfortunately still leaves us with a circular dependency between Hive
> and ORC. I'd hoped that storage-api wouldn't change that much, but that
> doesn't seem to be happening. As a result, ORC ends up shipping its own
> fork of storage-api.
> 
> Although we could make a new project for just the storage-api, I think it
> would be better to make it a subproject of Hive that is released
> independently.
> 
> What do others think?
> 
>   Owen



Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-17 Thread Peter Vary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/#review145993
---


Ship it!




Hi Marta,

Thanks. LGTM (non binding)

Peter

- Peter Vary


On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50896/
> ---
> 
> (Updated Aug. 17, 2016, 2:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu 
> Zhang.
> 
> 
> Bugs: HIVE-14404
> https://issues.apache.org/jira/browse/HIVE-14404
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Introduced a new outputformat (dsv2) which supports multiple characters as 
> delimiter.
> For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
> used. This library doesn’t support multiple characters as delimiter. Since 
> the same logic is used for generating csv2, tsv2 and dsv outputformats, I 
> decided not to change this logic, rather introduce a new outputformat (dsv2) 
> which supports multiple characters as delimiter. 
> The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
> if the quoting is not disabled.
> Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> Main changes in the code:
>  - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
> created two new child classes to separate the logic for single-character and 
> multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
> MultiCharSeparatedValuesOutputFormat
> 
>  - Kept the methods which are used by both children in the 
> SeparatedValuesOutputFormat and moved the methods specific to the 
> single-character case to the SingleCharSeparatedValuesOutputFormat class.
> 
>  - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
> moved some parts to the child class.
> 
>  - Implemented the value escaping and concatenation with the delimiter string 
> in the MultiCharSeparatedValuesOutputFormat.
> 
> 
> Diffs
> -
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
>   beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
>   
> beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
> 66d9fd0 
>   
> beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
>  PRE-CREATION 
>   beeline/src/main/resources/BeeLine.properties 95b8fa1 
>   
> itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java
>  892c733 
> 
> Diff: https://reviews.apache.org/r/50896/diff/
> 
> 
> Testing
> ---
> 
> - Tested manually in BeeLine.
> - Extended the TestBeeLineWithArgs tests with new test steps which are using 
> multiple characters as delimiter.
> 
> 
> Thanks,
> 
> Marta Kuczora
> 
>



Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters

2016-08-17 Thread Marta Kuczora

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50896/
---

(Updated Aug. 17, 2016, 2:14 p.m.)


Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang.


Changes
---

Patch is fixed according to the review:
- Display an error message if multiple character delimiter is set with dsv 
output format. In this case it will fall back to the default dsv delimiter.
- Introduced new constant for default dsv2 delimiter to avoid the String<->char 
conversions.


Bugs: HIVE-14404
https://issues.apache.org/jira/browse/HIVE-14404


Repository: hive-git


Description
---

Introduced a new outputformat (dsv2) which supports multiple characters as 
delimiter.
For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is 
used. This library doesn’t support multiple characters as delimiter. Since the 
same logic is used for generating csv2, tsv2 and dsv outputformats, I decided 
not to change this logic, rather introduce a new outputformat (dsv2) which 
supports multiple characters as delimiter. 
The new dsv2 outputformat has the same escaping logic as the dsv outputformat 
if the quoting is not disabled.
Extended the TestBeeLineWithArgs tests with new test steps which are using 
multiple characters as delimiter.

Main changes in the code:
 - Changed the SeparatedValuesOutputFormat class to be an abstract class and 
created two new child classes to separate the logic for single-character and 
multi-character delimiters: SingleCharSeparatedValuesOutputFormat and 
MultiCharSeparatedValuesOutputFormat

 - Kept the methods which are used by both children in the 
SeparatedValuesOutputFormat and moved the methods specific to the 
single-character case to the SingleCharSeparatedValuesOutputFormat class.

 - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only 
moved some parts to the child class.

 - Implemented the value escaping and concatenation with the delimiter string 
in the MultiCharSeparatedValuesOutputFormat.


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 
  
beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java
 PRE-CREATION 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
66d9fd0 
  
beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java
 PRE-CREATION 
  beeline/src/main/resources/BeeLine.properties 95b8fa1 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
892c733 

Diff: https://reviews.apache.org/r/50896/diff/


Testing
---

- Tested manually in BeeLine.
- Extended the TestBeeLineWithArgs tests with new test steps which are using 
multiple characters as delimiter.


Thanks,

Marta Kuczora



[jira] [Created] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled

2016-08-17 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-14557:


 Summary: Nullpointer When both SkewJoin  and Mapjoin Enabled
 Key: HIVE-14557
 URL: https://issues.apache.org/jira/browse/HIVE-14557
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 2.1.0, 1.1.0
Reporter: Nemon Lou


The following sql failed with return code 2 on mr.
{noformat}
create table a(id int,id1 int);
create table b(id int,id1 int);
create table c(id int,id1 int);
set hive.optimize.skewjoin=true;
select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1;
{noformat}
Error log as follows:
{noformat}
2016-08-17 21:13:42,081 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
Id =0
  
Id =21
  
Id =28
  
Id =16
  
  <\Children>
  Id = 28 null<\Parent>
<\FS>
  <\Children>
  Id = 21 nullId = 33 
Id =33
  null
  <\Children>
  <\Parent>
<\HASHTABLEDUMMY><\Parent>
<\MAPJOIN>
  <\Children>
  Id = 0 null<\Parent>
<\TS>
  <\Children>
  <\Parent>
<\MAP>
2016-08-17 21:13:42,084 INFO [main] 
org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21]
2016-08-17 21:13:42,084 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator
2016-08-17 21:13:42,086 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:0, RECORDS_IN:0, 
2016-08-17 21:13:42,087 ERROR [main] 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing operators 
- failing tree
2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: Hive Runtime Error while 
closing operators
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189)
... 8 more

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Some time realm is required for some users in beeline connection,even its ensured with ldap authentciation.!

2016-08-17 Thread mathes waran
Hi ,
While connecting with beeline,with configured ldap authentication some
users connected with without realm and some users connect with realm
authentication.This is because of while creating users in active directory
,display name differ with logon name.
During authentication,it can only validate by logon name only,but some
users validate also by display name.Actually its only validate by on logon
name but here it seems validate by display name.
Please refer the attached images and provide the ideas if you overcome this.

Thanks,
Matheswaran.S

On Tue, Aug 16, 2016 at 10:32 AM, mathes waran 
wrote:

> Hi ,
>
> Problem: Why realm is required for some users in beeline connection,even
> its ensured with ldap authentciation.?
>
> I have configured the beeline connection with ldap authentication,and its
> working fine.Some times while connecting with that beeline connection  some
> users connected with out any any realm,its no issues.but with same
> connection some Users required with domain name(Fully qualified domain
> name).Actually ldap authentication does not require any domain in users,but
> here it asks domain name for some users.
>
> Please find the error details and attached screen  shot;
> Error: Could not open client transport with JDBC Uri:
> jdbc:hive2://mylapn2215
> :1/default;: Peer indicated failure: PLAIN auth failed: LDAP
> Authentication
> failed for user (state=08S01,code=0)
>
> Please suggest your ideas if you overcome this ,
>
> Thanks in advance,
>
> Matheskrishna
>
>


Re: YourKit open source license

2016-08-17 Thread calvin hung
Hi Rui,



This is what I get from sa...@yourkit.com

"We provide free licenses only to project committers. If you have no commit 
power, you can ask Apache 

people to request license for you; or use evaluation license."



Is it possible any committer could help? 

Seems that I can only use evaluation license for now. 

Thanks for your help anyway.



Calvin


 On Wed, 17 Aug 2016 10:03:37 +0800 Rui Li 
wrote  




Our wiki doesn't mention it's only for committers. Anyway I suggest you 

contact YourKit sales to figure out. 



On Tue, Aug 16, 2016 at 8:38 PM, calvin hung  

wrote: 



> 

> 

> Thanks for your response, Rui. 

> 

> I don't have an apache email account. 

> 

> It looks like only committer can get an email account according to this 

> page http://www.apache.org/dev/committers.html 

> 

> Does it mean that only Hive committers can get YourKit free licenses for 

> Hive performance profiling? 

> 

> 

> 

> 

> 

>  On Tue, 16 Aug 2016 13:33:34 +0800 Rui Li 
<lirui.fu...@gmail.com 

> >wrote  

> 

> 

> 

> 

> If I remember correctly, I just contacted the sales of Yourkit and they 

> 

> sent me the license by email. You'd better send your email using your 

> 

> apache email account, in order to convince them you're a developer of 
Hive. 

> 

> 

> 

> On Tue, Aug 16, 2016 at 2:51 AM, calvin hung 
<calvinh...@wasaitech.com& 

> gt; 

> 

> wrote: 

> 

> 

> 

> > Hi Rui and Alan, 

> 

> > 

> 

> > Could you or any nice guy share more detail steps of getting a 
Yourkit 

> 

> > license for Hive? 

> 

> > I've searched the full Hive dev mail archive but got no exact 
steps 

> to get 

> 

> > one. 

> 

> > Thanks! 

> 

> > 

> 

> > Calvin 

> 

> > From: "Li, Rui"<rui...@intel.com> 

> 

> > Date: Tue, 31 Mar 2015 01:22:51 + 

> 

> > To: "dev@hive.apache.org"<dev@hive.apache.org> 

> 

> > 

> 

> > - Contents - 

> 

> > 

> 

> > Thanks Alan! But I don’t see Hive in the sponsored open source 
project 

> 

> > list. I’ll contact them anyway. 

> 

> > 

> 

> > 

> 

> > 

> 

> > Cheers, 

> 

> > 

> 

> > Rui Li 

> 

> > 

> 

> > 

> 

> > 

> 

> > From: Alan Gates [mailto:alanfga...@gmail.com] 

> 

> > Sent: Tuesday, March 31, 2015 1:02 AM 

> 

> > To: dev@hive.apache.org 

> 

> > Subject: Re: YourKit open source license 

> 

> > 

> 

> > 

> 

> > 

> 

> > Seehttps://www.yourkit.com/customers/. 

> 

> > 

> 

> > Alan. 

> 

> > 

> 

> > 

> 

> > 

> 

> > 

> 

> > 

> 

> > Li, Rui 

> 

> > 

> 

> > March 30, 2015 at 0:54 

> 

> > 

> 

> > Hi guys, 

> 

> > 

> 

> > I want to use YourKit to profile hive performance. According to 
the 

> wiki< 

> 

> > 
https://cwiki.apache.org/confluence/display/Hive/Performance> 

> hive has 

> 

> > been granted open source license. Could anybody tell me how I can 
get 

> the 

> 

> > license? Thanks! 

> 

> > 

> 

> > Cheers, 

> 

> > Rui Li 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> -- 

> 

> Best regards! 

> 

> Rui Li 

> 

> Cell: (+86) 13564950210 

> 

> 

> 

> 

> 

> 

> 





-- 

Best regards! 

Rui Li 

Cell: (+86) 13564950210 








[jira] [Created] (HIVE-14556) Load data into text table fail caused by IndexOutOfBoundsException

2016-08-17 Thread Niklaus Xiao (JIRA)
Niklaus Xiao created HIVE-14556:
---

 Summary: Load data into text table fail caused by 
IndexOutOfBoundsException
 Key: HIVE-14556
 URL: https://issues.apache.org/jira/browse/HIVE-14556
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.3.0, 2.2.0
Reporter: Niklaus Xiao
Assignee: Niklaus Xiao


{code}
echo "1" > foo.txt

0: jdbc:hive2://189.39.151.74:21066/> create table foo(id int) stored as 
textfile;
No rows affected (1.846 seconds)
0: jdbc:hive2://189.39.151.74:21066/> load data local inpath '/foo.txt' into 
table foo;
Error: Error while compiling statement: FAILED: SemanticException Unable to 
load data to destination table. Error: java.lang.IndexOutOfBoundsException 
(state=42000,code=4)
{code}

Exception:
{code}
2016-08-17 17:15:36,301 | ERROR | HiveServer2-Handler-Pool: Thread-55 | FAILED: 
SemanticException Unable to load data to destination table. Error: 
java.lang.IndexOutOfBoundsException
org.apache.hadoop.hive.ql.parse.SemanticException: Unable to load data to 
destination table. Error: java.lang.IndexOutOfBoundsException
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.ensureFileFormatsMatch(LoadSemanticAnalyzer.java:356)
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:236)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:473)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:325)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1358)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1340
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)