[jira] [Created] (HIVE-14569) Enable reuseForks for most modules
Siddharth Seth created HIVE-14569: - Summary: Enable reuseForks for most modules Key: HIVE-14569 URL: https://issues.apache.org/jira/browse/HIVE-14569 Project: Hive Issue Type: Sub-task Reporter: Siddharth Seth Follow up from https://issues.apache.org/jira/browse/HIVE-14540?focusedCommentId=15422359&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15422359 Hive tests should run with reuseForks=true -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14568) Hive Decimal Returns NULL
gurmukh singh created HIVE-14568: Summary: Hive Decimal Returns NULL Key: HIVE-14568 URL: https://issues.apache.org/jira/browse/HIVE-14568 Project: Hive Issue Type: Bug Affects Versions: 1.2.0, 1.0.0 Environment: Centos 6.7, Hadoop 2.7.2 Reporter: gurmukh singh Hi I was under the impression that the bug: https://issues.apache.org/jira/browse/HIVE-5022 got fixed. But, I see the same issue in Hive 1.0 and hive 1.2 as well. hive> desc mul_table; OK prc decimal(38,28) vol decimal(38,10) Time taken: 0.068 seconds, Fetched: 2 row(s) hive> select prc, vol, prc*vol as cost from mul_table; OK 1.2 200 NULL 1.44200 NULL 2.14100 NULL 3.004 50 NULL 1.2 200 NULL Time taken: 0.048 seconds, Fetched: 5 row(s) Rather then returning NULL, it should give error or round off. I understand that, I can use Double instead of decimal or can cast it, but still returning "Null" will make many things go unnoticed. hive> desc mul_table2; OK prc double vol decimal(14,10) Time taken: 0.049 seconds, Fetched: 2 row(s) hive> select * from mul_table2; OK 1.4 200 1.34200 7.34100 7454533.354544 100 Time taken: 0.028 seconds, Fetched: 4 row(s) hive> select prc, vol, prc*vol as cost from mul_table2; OK 7.34100 734.0 7.3410007340.0 1.0004 10001000.4 7454533.354544 100 7.454533354544E8 <- Wrong result 7454533.354544 10007.454533354544E9 <- Wrong result Time taken: 0.025 seconds, Fetched: 5 row(s) Casting: hive> select prc, vol, cast(prc*vol as decimal(38,10)) as cost from mul_table2; OK 7.34100 734 7.3410007340 1.0004 10001000.4 7454533.354544 100 745453335.4544 7454533.354544 10007454533354.544 Time taken: 0.026 seconds, Fetched: 5 row(s) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 51194: HIVE-14566: LLAP IO reads timestamp wrongly
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51194/ --- Review request for hive, Gopal V and Sergey Shelukhin. Bugs: HIVE-14566 https://issues.apache.org/jira/browse/HIVE-14566 Repository: hive-git Description --- HIVE-14566: LLAP IO reads timestamp wrongly Diffs - itests/src/test/resources/testconfiguration.properties 2c868074ee8dc51b800b7ecd930abea7793a221a llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java 94e4750ddc3d4954820f819257f54be0b39e5f08 llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java 687458681a12b8345e42a5d8669a6c5d3ebc2119 orc/src/java/org/apache/orc/impl/TreeReaderFactory.java e6fef918b22bf5fac83a4ece5576250476daff27 ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java b44da0689f98fe88147f77387e86b50fd3a9b6c4 ql/src/test/results/clientpositive/llap/orc_merge12.q.out PRE-CREATION Diff: https://reviews.apache.org/r/51194/diff/ Testing --- Thanks, Prasanth_J
[jira] [Created] (HIVE-14567) After enabling Hive Parquet Vectorization, POWER_TEST of query24 in TPCx-BB(BigBench) failed with 1TB scale factor, but successful with 3TB scale factor
KaiXu created HIVE-14567: Summary: After enabling Hive Parquet Vectorization, POWER_TEST of query24 in TPCx-BB(BigBench) failed with 1TB scale factor, but successful with 3TB scale factor Key: HIVE-14567 URL: https://issues.apache.org/jira/browse/HIVE-14567 Project: Hive Issue Type: Bug Components: File Formats, Hive Affects Versions: 2.1.0 Environment: Apache Hadoop2.6.0 Apache Hive2.1.0 JDK1.8.0_73 TPCx-BB 1.0.1 Reporter: KaiXu Priority: Critical We use TPCx-BB(BigBench) to evaluate the performance of Hive Parquet Vectorization in our local cluster(E5-2699 v3, 256G, 72 vcores, 1 master node + 5 worker nodes). During our performance test, we found that query24 in TPCx-BB failed with 1TB scale factor, but it is successful with 3TB scale factor on the same conditions. We retried with 100GB/10GB/1GB scale factor, they all failed. That is to say, with smaller data scale it fails but larger data scale successes, which seems very unusual. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14566) LLAP IO reads timestamp wrongly
Prasanth Jayachandran created HIVE-14566: Summary: LLAP IO reads timestamp wrongly Key: HIVE-14566 URL: https://issues.apache.org/jira/browse/HIVE-14566 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.0.1, 2.1.0, 2.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Priority: Critical HIVE-10127 is causing incorrect results when orc_merge12.q is run in llap. It reads timestamp wrongly. {code:title=LLAP IO Enabled} hive> select atimestamp1 from alltypesorc3xcols limit 10; OK 1969-12-31 15:59:46.674 NULL 1969-12-31 15:59:55.787 1969-12-31 15:59:44.187 1969-12-31 15:59:50.434 1969-12-31 16:00:15.007 1969-12-31 16:00:07.021 1969-12-31 16:00:04.963 1969-12-31 15:59:52.176 1969-12-31 15:59:44.569 {code} {code:title=LLAP IO Disabled} hive> select atimestamp1 from alltypesorc3xcols limit 10; OK 1969-12-31 15:59:46.674 NULL 1969-12-31 15:59:55.787 1969-12-31 15:59:44.187 1969-12-31 15:59:50.434 1969-12-31 16:00:14.007 1969-12-31 16:00:06.021 1969-12-31 16:00:03.963 1969-12-31 15:59:52.176 1969-12-31 15:59:44.569 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14565) CBO (Calcite Return Path) Handle field access for nested column
Ashutosh Chauhan created HIVE-14565: --- Summary: CBO (Calcite Return Path) Handle field access for nested column Key: HIVE-14565 URL: https://issues.apache.org/jira/browse/HIVE-14565 Project: Hive Issue Type: Sub-task Components: Logical Optimizer Affects Versions: 2.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan ExprNodeConverter doesn't handle field access currently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 51193: HIVE-14358: Add metrics for number of queries executed for each execution engine (mr, spark, tez)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51193/ --- Review request for hive. Repository: hive-git Description --- HIVE-14358: Add metrics for number of queries executed for each execution engine (mr, spark, tez) Diffs - common/src/java/org/apache/hadoop/hive/common/metrics/common/MetricsConstant.java 9dc96f9c6412720a891b5c55e2074049c893d780 ql/src/java/org/apache/hadoop/hive/ql/Driver.java 183ed829ef1742e48539f8928293d56b77bc43c8 ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java eeaa54320ffaa7ba5d6ebece80a0cb4aadc1dada ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java ce1106d91db9ef75e7b425d5950f888bacbfb3e5 ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java ac922ce486babe042984d87a7f7442cbfc11484f ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTask.java 0b494aa5548f8e6ae76e2d0eea9a7afb33961f97 ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 25c4514b34fb2ed4fc8b1238059bd9dc29d2741b ql/src/test/org/apache/hadoop/hive/ql/exec/mr/TestMapRedTask.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/mr/TestMapredLocalTask.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSparkTask.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 53672a9783b4d13c5eed4ef01f5c16af568a0a41 Diff: https://reviews.apache.org/r/51193/diff/ Testing --- Ran the new unit tests in the ql project, everything was green. Checked that the metrics for map reduce and spark tasks were appearing and being incremented correctly using JMX. Map reduce tasks were being created by a simple select statement containing a join. Spark tasks were being created by the same query with the spark execution engine being used. The metrics were correct across several beeline connections, and were reset once the HiveServer2 was restarted. The metric collection can be turned on/off using the configuration variable "hive.server2.metrics.enabled". No errors/exceptions encountered when the metrics were disabled. NB only the root tasks are incrementing the counter since the original jira was about counting the number of queries issued against each exeution engine, so a complex query resulting in more than one task should only count as one as per my understanding. Thanks, Barna Zsombor Klara
[jira] [Created] (HIVE-14564) Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException.
zhihai xu created HIVE-14564: Summary: Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException. Key: HIVE-14564 URL: https://issues.apache.org/jira/browse/HIVE-14564 Project: Hive Issue Type: Bug Components: Query Planning Affects Versions: 2.1.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Column Pruning generates out of order columns in SelectOperator which cause ArrayIndexOutOfBoundsException. {code} 2016-07-26 21:49:24,390 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"_col0":null,"_col1":0,"_col2":36,"_col3":"499ec44-6dd2-4709-a019-33d6d484ed90�\u0001U5�\u001c��\t\u001b�\u","_col4":"5264db53-d650-4678-9261-cdd51efab8bb","_col5":"cb5233dd-214a-4b0b-b43e-0f41befb5c5c","_col6":"","_col8":48,"_col9":null,"_col10":"1befb5c5c�\u00192016-06-09T15:31:15+00:00\u0002\u0005Rider\u0011svc-dash","_col11":64,"_col12":null,"_col13":null,"_col14":"ber.com�\u0001U5ߨP�\u0001U5ᷨider) - 1000\u0005Rider\u0011svc-d...@uber.com�\u0001U4�;x�\u0001U5\u0004��\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u\u","_col15":"","_col16":null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264) at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201) at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:550) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:377) ... 13 more {code} The exception is because the serialization and deserialization doesn't match. The serialization by LazyBinarySerDe from previous MapReduce job used different order of columns. When the current MapReduce job deserialized the intermediate sequence file generated by previous MapReduce job, it will get corrupted data from the deserialization using wrong order of columns by LazyBinaryStruct. The unmatched columns between serialization and deserialization caused by SelectOperator's Column Pruning {{ColumnPrunerSelectProc}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50896/#review146056 --- What about stop using the superCSV so that we can keep the 'dsv' format that can support singler and multiple characters? I don't like the use of another 'dsv2' format for multiple ones. It might be confusing for users. - Sergio Pena On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50896/ > --- > > (Updated Aug. 17, 2016, 2:14 p.m.) > > > Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu > Zhang. > > > Bugs: HIVE-14404 > https://issues.apache.org/jira/browse/HIVE-14404 > > > Repository: hive-git > > > Description > --- > > Introduced a new outputformat (dsv2) which supports multiple characters as > delimiter. > For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is > used. This library doesn’t support multiple characters as delimiter. Since > the same logic is used for generating csv2, tsv2 and dsv outputformats, I > decided not to change this logic, rather introduce a new outputformat (dsv2) > which supports multiple characters as delimiter. > The new dsv2 outputformat has the same escaping logic as the dsv outputformat > if the quoting is not disabled. > Extended the TestBeeLineWithArgs tests with new test steps which are using > multiple characters as delimiter. > > Main changes in the code: > - Changed the SeparatedValuesOutputFormat class to be an abstract class and > created two new child classes to separate the logic for single-character and > multi-character delimiters: SingleCharSeparatedValuesOutputFormat and > MultiCharSeparatedValuesOutputFormat > > - Kept the methods which are used by both children in the > SeparatedValuesOutputFormat and moved the methods specific to the > single-character case to the SingleCharSeparatedValuesOutputFormat class. > > - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only > moved some parts to the child class. > > - Implemented the value escaping and concatenation with the delimiter string > in the MultiCharSeparatedValuesOutputFormat. > > > Diffs > - > > beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 > > beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java > PRE-CREATION > beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java > 66d9fd0 > > beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java > PRE-CREATION > beeline/src/main/resources/BeeLine.properties 95b8fa1 > > itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java > 892c733 > > Diff: https://reviews.apache.org/r/50896/diff/ > > > Testing > --- > > - Tested manually in BeeLine. > - Extended the TestBeeLineWithArgs tests with new test steps which are using > multiple characters as delimiter. > > > Thanks, > > Marta Kuczora > >
[jira] [Created] (HIVE-14563) StatsOptimizer treats NULL in a wrong way
Pengcheng Xiong created HIVE-14563: -- Summary: StatsOptimizer treats NULL in a wrong way Key: HIVE-14563 URL: https://issues.apache.org/jira/browse/HIVE-14563 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong {code} OSTHOOK: query: explain select count(key) from (select null as key from src)src POSTHOOK: type: QUERY STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: 1 Processor Tree: ListSink PREHOOK: query: select count(key) from (select null as key from src)src PREHOOK: type: QUERY PREHOOK: Input: default@src A masked pattern was here POSTHOOK: query: select count(key) from (select null as key from src)src POSTHOOK: type: QUERY POSTHOOK: Input: default@src A masked pattern was here 500 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 51191: Improve MSCK for partitioned table to deal with special cases
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/51191/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- HIVE-14511 Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java a164b12 ql/src/test/queries/clientnegative/msck_repair_1.q PRE-CREATION ql/src/test/queries/clientpositive/msck_repair_1.q PRE-CREATION ql/src/test/queries/clientpositive/msck_repair_2.q PRE-CREATION ql/src/test/results/clientnegative/msck_repair_1.q.out PRE-CREATION ql/src/test/results/clientpositive/msck_repair_1.q.out PRE-CREATION ql/src/test/results/clientpositive/msck_repair_2.q.out PRE-CREATION Diff: https://reviews.apache.org/r/51191/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Created] (HIVE-14562) CBO (Calcite Return Path) Wrong results for limit + offset
Ashutosh Chauhan created HIVE-14562: --- Summary: CBO (Calcite Return Path) Wrong results for limit + offset Key: HIVE-14562 URL: https://issues.apache.org/jira/browse/HIVE-14562 Project: Hive Issue Type: Sub-task Components: Logical Optimizer Affects Versions: 2.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan offset is missed altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Making storage-api a separately released artifact
On Wed, Aug 17, 2016 at 10:46 AM, Alan Gates wrote: > +1 for making the API clean and easy for other projects to work with. A > few questions: > > 1) Would this also make it easier for Parquet and others to implement > Hive’s ACID interfaces? > Currently the ACID interfaces haven't been moved over to storage-api, although it would make sense to do so at some point. > > 2) Would we make any attempt to coordinate version numbers between Hive > and the storage module, or would a given version of Hive just depend on a > given version of the storage module? > The two options that I see are: * Let the numbers run separately starting from 2.2.0. * Tie the numbers together with an additional level of versioning (eg. 2.2.0.0). I think that letting the two version numbers diverge is better in the long term. For example, if you need to make an incompatible change, it is pretty ugly to do it as a fourth level version number (eg. an incompatible change from 2.2.0.0 to 2.2.0.1). At the beginning, I expect that storage-api would move faster than Hive, but as it stabilizes I expect it might start moving slower than Hive. I'd propose that we have Hive's build use a released version of storage-api rather than a snapshot. Thoughts? Owen > Alan. > > > On Aug 15, 2016, at 17:01, Owen O'Malley wrote: > > > > All, > > > > As part of moving ORC out of Hive, we pulled all of the vectorization > > storage and sarg classes into a separate module, which is named > > storage-api. Although it is currently only used by ORC, it could be used > > by Parquet or Avro if they wanted to make a fast vectorized reader that > > read directly in to Hive's VectorizedRowBatch without needing a shim or > > data copy. Note that this is in many ways similar to pulling the Arrow > > project out of Drill. > > > > This unfortunately still leaves us with a circular dependency between > Hive > > and ORC. I'd hoped that storage-api wouldn't change that much, but that > > doesn't seem to be happening. As a result, ORC ends up shipping its own > > fork of storage-api. > > > > Although we could make a new project for just the storage-api, I think it > > would be better to make it a subproject of Hive that is released > > independently. > > > > What do others think? > > > > Owen > >
[jira] [Created] (HIVE-14561) Upgrade version of spring for ptest2 to work with Java8
Siddharth Seth created HIVE-14561: - Summary: Upgrade version of spring for ptest2 to work with Java8 Key: HIVE-14561 URL: https://issues.apache.org/jira/browse/HIVE-14561 Project: Hive Issue Type: Task Reporter: Siddharth Seth 3.2.1 does not work with Java8. We could switch to 4.3.2.RELEASE or 3.2.16 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14560) Support exchange partition between s3 and hdfs tables
Abdullah Yousufi created HIVE-14560: --- Summary: Support exchange partition between s3 and hdfs tables Key: HIVE-14560 URL: https://issues.apache.org/jira/browse/HIVE-14560 Project: Hive Issue Type: Bug Reporter: Abdullah Yousufi Assignee: Abdullah Yousufi Fix For: 2.2.0 {code} alter table s3_tbl exchange partition (country='USA', state='CA') with table hdfs_tbl; {code} results in: {code} Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.lang.IllegalArgumentException Wrong FS: s3a://hive-on-s3/s3_tbl/country=USA/state=CA, expected: hdfs://localhost:9000) (state=08S01,code=1) {code} because the check for whether the s3 destination table path exists occurs on the hdfs filesystem. Furthermore, exchanging between s3 to hdfs fails because the hdfs rename operation is not supported across filesystems. Fix uses copy + deletion in the case that the file systems differ. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Making storage-api a separately released artifact
+1 for having a separate storage-api project to define common interfaces for people to develop against. It'll make things much easier to develop against generically. I'm okay(+0) with the sub-project idea as opposed to enthusiastic about it, mostly because I have reservations that it'll encourage laziness and will in practice wind up being tied to hive releases and dev and over time assumptions of how hive works and what is available will bleed in. But, still, having a motion of separation will definitely help. On Aug 17, 2016 11:39, "Prasanth Jayachandran" < pjayachand...@hortonworks.com> wrote: > +1 for making it a subproject with separate (preferably shorter) release > cycle. The module in itself is too small for a separate project. Also > having a faster release cycle will resolve circular dependency and will > help other projects make use of vectorization, sarg, bloom filter etc. > > For version management, how about adding another version after patch > version i.e sub-project version? > Example: 2.2.0.[0] will be storage api’s release version. Hive will always > depend on 2.2.0-SNAPSHOT. I think maven will let us release modules with > different versions. https://dev.c-ware.de/confluence/display/PUBLIC/ > Releasing+modules+of+a+multi-module+project+with+ > independent+version+numbers > > Thanks > Prasanth > > > On Aug 17, 2016, at 10:46 AM, Alan Gates wrote: > > > > +1 for making the API clean and easy for other projects to work with. A > few questions: > > > > 1) Would this also make it easier for Parquet and others to implement > Hive’s ACID interfaces? > > > > 2) Would we make any attempt to coordinate version numbers between Hive > and the storage module, or would a given version of Hive just depend on a > given version of the storage module? > > > > Alan. > > > >> On Aug 15, 2016, at 17:01, Owen O'Malley wrote: > >> > >> All, > >> > >> As part of moving ORC out of Hive, we pulled all of the vectorization > >> storage and sarg classes into a separate module, which is named > >> storage-api. Although it is currently only used by ORC, it could be > used > >> by Parquet or Avro if they wanted to make a fast vectorized reader that > >> read directly in to Hive's VectorizedRowBatch without needing a shim or > >> data copy. Note that this is in many ways similar to pulling the Arrow > >> project out of Drill. > >> > >> This unfortunately still leaves us with a circular dependency between > Hive > >> and ORC. I'd hoped that storage-api wouldn't change that much, but that > >> doesn't seem to be happening. As a result, ORC ends up shipping its own > >> fork of storage-api. > >> > >> Although we could make a new project for just the storage-api, I think > it > >> would be better to make it a subproject of Hive that is released > >> independently. > >> > >> What do others think? > >> > >> Owen > > > > > >
[jira] [Created] (HIVE-14559) Remove setting hive.execution.engine in qfiles
Prasanth Jayachandran created HIVE-14559: Summary: Remove setting hive.execution.engine in qfiles Key: HIVE-14559 URL: https://issues.apache.org/jira/browse/HIVE-14559 Project: Hive Issue Type: Sub-task Components: Test Affects Versions: 2.2.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Some qfiles are explicitly setting execution engine. If we run those tests on different Mini CliDriver's it could be very slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Making storage-api a separately released artifact
+1 for making it a subproject with separate (preferably shorter) release cycle. The module in itself is too small for a separate project. Also having a faster release cycle will resolve circular dependency and will help other projects make use of vectorization, sarg, bloom filter etc. For version management, how about adding another version after patch version i.e sub-project version? Example: 2.2.0.[0] will be storage api’s release version. Hive will always depend on 2.2.0-SNAPSHOT. I think maven will let us release modules with different versions. https://dev.c-ware.de/confluence/display/PUBLIC/Releasing+modules+of+a+multi-module+project+with+independent+version+numbers Thanks Prasanth > On Aug 17, 2016, at 10:46 AM, Alan Gates wrote: > > +1 for making the API clean and easy for other projects to work with. A few > questions: > > 1) Would this also make it easier for Parquet and others to implement Hive’s > ACID interfaces? > > 2) Would we make any attempt to coordinate version numbers between Hive and > the storage module, or would a given version of Hive just depend on a given > version of the storage module? > > Alan. > >> On Aug 15, 2016, at 17:01, Owen O'Malley wrote: >> >> All, >> >> As part of moving ORC out of Hive, we pulled all of the vectorization >> storage and sarg classes into a separate module, which is named >> storage-api. Although it is currently only used by ORC, it could be used >> by Parquet or Avro if they wanted to make a fast vectorized reader that >> read directly in to Hive's VectorizedRowBatch without needing a shim or >> data copy. Note that this is in many ways similar to pulling the Arrow >> project out of Drill. >> >> This unfortunately still leaves us with a circular dependency between Hive >> and ORC. I'd hoped that storage-api wouldn't change that much, but that >> doesn't seem to be happening. As a result, ORC ends up shipping its own >> fork of storage-api. >> >> Although we could make a new project for just the storage-api, I think it >> would be better to make it a subproject of Hive that is released >> independently. >> >> What do others think? >> >> Owen > >
[jira] [Created] (HIVE-14558) Add support for listing views similar to "show tables"
Naveen Gangam created HIVE-14558: Summary: Add support for listing views similar to "show tables" Key: HIVE-14558 URL: https://issues.apache.org/jira/browse/HIVE-14558 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.0.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Users have been asking for such feature where they can get a lists of views separately. So perhaps a syntax similar to "show tables" command? show views [in/from ] [] Does it make sense to add such command? or is it not worth the effort? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSS] Making storage-api a separately released artifact
+1 for making the API clean and easy for other projects to work with. A few questions: 1) Would this also make it easier for Parquet and others to implement Hive’s ACID interfaces? 2) Would we make any attempt to coordinate version numbers between Hive and the storage module, or would a given version of Hive just depend on a given version of the storage module? Alan. > On Aug 15, 2016, at 17:01, Owen O'Malley wrote: > > All, > > As part of moving ORC out of Hive, we pulled all of the vectorization > storage and sarg classes into a separate module, which is named > storage-api. Although it is currently only used by ORC, it could be used > by Parquet or Avro if they wanted to make a fast vectorized reader that > read directly in to Hive's VectorizedRowBatch without needing a shim or > data copy. Note that this is in many ways similar to pulling the Arrow > project out of Drill. > > This unfortunately still leaves us with a circular dependency between Hive > and ORC. I'd hoped that storage-api wouldn't change that much, but that > doesn't seem to be happening. As a result, ORC ends up shipping its own > fork of storage-api. > > Although we could make a new project for just the storage-api, I think it > would be better to make it a subproject of Hive that is released > independently. > > What do others think? > > Owen
Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50896/#review145993 --- Ship it! Hi Marta, Thanks. LGTM (non binding) Peter - Peter Vary On Aug. 17, 2016, 2:14 p.m., Marta Kuczora wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/50896/ > --- > > (Updated Aug. 17, 2016, 2:14 p.m.) > > > Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu > Zhang. > > > Bugs: HIVE-14404 > https://issues.apache.org/jira/browse/HIVE-14404 > > > Repository: hive-git > > > Description > --- > > Introduced a new outputformat (dsv2) which supports multiple characters as > delimiter. > For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is > used. This library doesn’t support multiple characters as delimiter. Since > the same logic is used for generating csv2, tsv2 and dsv outputformats, I > decided not to change this logic, rather introduce a new outputformat (dsv2) > which supports multiple characters as delimiter. > The new dsv2 outputformat has the same escaping logic as the dsv outputformat > if the quoting is not disabled. > Extended the TestBeeLineWithArgs tests with new test steps which are using > multiple characters as delimiter. > > Main changes in the code: > - Changed the SeparatedValuesOutputFormat class to be an abstract class and > created two new child classes to separate the logic for single-character and > multi-character delimiters: SingleCharSeparatedValuesOutputFormat and > MultiCharSeparatedValuesOutputFormat > > - Kept the methods which are used by both children in the > SeparatedValuesOutputFormat and moved the methods specific to the > single-character case to the SingleCharSeparatedValuesOutputFormat class. > > - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only > moved some parts to the child class. > > - Implemented the value escaping and concatenation with the delimiter string > in the MultiCharSeparatedValuesOutputFormat. > > > Diffs > - > > beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 > beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 > > beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java > PRE-CREATION > beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java > 66d9fd0 > > beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java > PRE-CREATION > beeline/src/main/resources/BeeLine.properties 95b8fa1 > > itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java > 892c733 > > Diff: https://reviews.apache.org/r/50896/diff/ > > > Testing > --- > > - Tested manually in BeeLine. > - Extended the TestBeeLineWithArgs tests with new test steps which are using > multiple characters as delimiter. > > > Thanks, > > Marta Kuczora > >
Re: Review Request 50896: HIVE-14404: Allow delimiterfordsv to use multiple-character delimiters
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50896/ --- (Updated Aug. 17, 2016, 2:14 p.m.) Review request for hive, Naveen Gangam, Sergio Pena, Szehon Ho, and Xuefu Zhang. Changes --- Patch is fixed according to the review: - Display an error message if multiple character delimiter is set with dsv output format. In this case it will fall back to the default dsv delimiter. - Introduced new constant for default dsv2 delimiter to avoid the String<->char conversions. Bugs: HIVE-14404 https://issues.apache.org/jira/browse/HIVE-14404 Repository: hive-git Description --- Introduced a new outputformat (dsv2) which supports multiple characters as delimiter. For generating the dsv, csv2 and tsv2 outputformats, the Super CSV library is used. This library doesn’t support multiple characters as delimiter. Since the same logic is used for generating csv2, tsv2 and dsv outputformats, I decided not to change this logic, rather introduce a new outputformat (dsv2) which supports multiple characters as delimiter. The new dsv2 outputformat has the same escaping logic as the dsv outputformat if the quoting is not disabled. Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter. Main changes in the code: - Changed the SeparatedValuesOutputFormat class to be an abstract class and created two new child classes to separate the logic for single-character and multi-character delimiters: SingleCharSeparatedValuesOutputFormat and MultiCharSeparatedValuesOutputFormat - Kept the methods which are used by both children in the SeparatedValuesOutputFormat and moved the methods specific to the single-character case to the SingleCharSeparatedValuesOutputFormat class. - Didn’t change the logic which was in the SeparatedValuesOutputFormat, only moved some parts to the child class. - Implemented the value escaping and concatenation with the delimiter string in the MultiCharSeparatedValuesOutputFormat. Diffs (updated) - beeline/src/java/org/apache/hive/beeline/BeeLine.java e0fa032 beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java e6e24b1 beeline/src/java/org/apache/hive/beeline/MultiCharSeparatedValuesOutputFormat.java PRE-CREATION beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 66d9fd0 beeline/src/java/org/apache/hive/beeline/SingleCharSeparatedValuesOutputFormat.java PRE-CREATION beeline/src/main/resources/BeeLine.properties 95b8fa1 itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 892c733 Diff: https://reviews.apache.org/r/50896/diff/ Testing --- - Tested manually in BeeLine. - Extended the TestBeeLineWithArgs tests with new test steps which are using multiple characters as delimiter. Thanks, Marta Kuczora
[jira] [Created] (HIVE-14557) Nullpointer When both SkewJoin and Mapjoin Enabled
Nemon Lou created HIVE-14557: Summary: Nullpointer When both SkewJoin and Mapjoin Enabled Key: HIVE-14557 URL: https://issues.apache.org/jira/browse/HIVE-14557 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 2.1.0, 1.1.0 Reporter: Nemon Lou The following sql failed with return code 2 on mr. {noformat} create table a(id int,id1 int); create table b(id int,id1 int); create table c(id int,id1 int); set hive.optimize.skewjoin=true; select a.id,b.id,c.id1 from a,b,c where a.id=b.id and a.id1=c.id1; {noformat} Error log as follows: {noformat} 2016-08-17 21:13:42,081 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Id =0 Id =21 Id =28 Id =16 <\Children> Id = 28 null<\Parent> <\FS> <\Children> Id = 21 nullId = 33 Id =33 null <\Children> <\Parent> <\HASHTABLEDUMMY><\Parent> <\MAPJOIN> <\Children> Id = 0 null<\Parent> <\TS> <\Children> <\Parent> <\MAP> 2016-08-17 21:13:42,084 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing operator TS[21] 2016-08-17 21:13:42,084 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Initializing dummy operator 2016-08-17 21:13:42,086 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_IN:0, 2016-08-17 21:13:42,087 ERROR [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: Hit error while closing operators - failing tree 2016-08-17 21:13:42,088 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:207) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:474) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:682) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:696) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:189) ... 8 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Some time realm is required for some users in beeline connection,even its ensured with ldap authentciation.!
Hi , While connecting with beeline,with configured ldap authentication some users connected with without realm and some users connect with realm authentication.This is because of while creating users in active directory ,display name differ with logon name. During authentication,it can only validate by logon name only,but some users validate also by display name.Actually its only validate by on logon name but here it seems validate by display name. Please refer the attached images and provide the ideas if you overcome this. Thanks, Matheswaran.S On Tue, Aug 16, 2016 at 10:32 AM, mathes waran wrote: > Hi , > > Problem: Why realm is required for some users in beeline connection,even > its ensured with ldap authentciation.? > > I have configured the beeline connection with ldap authentication,and its > working fine.Some times while connecting with that beeline connection some > users connected with out any any realm,its no issues.but with same > connection some Users required with domain name(Fully qualified domain > name).Actually ldap authentication does not require any domain in users,but > here it asks domain name for some users. > > Please find the error details and attached screen shot; > Error: Could not open client transport with JDBC Uri: > jdbc:hive2://mylapn2215 > :1/default;: Peer indicated failure: PLAIN auth failed: LDAP > Authentication > failed for user (state=08S01,code=0) > > Please suggest your ideas if you overcome this , > > Thanks in advance, > > Matheskrishna > >
Re: YourKit open source license
Hi Rui, This is what I get from sa...@yourkit.com "We provide free licenses only to project committers. If you have no commit power, you can ask Apache people to request license for you; or use evaluation license." Is it possible any committer could help? Seems that I can only use evaluation license for now. Thanks for your help anyway. Calvin On Wed, 17 Aug 2016 10:03:37 +0800 Rui Liwrote Our wiki doesn't mention it's only for committers. Anyway I suggest you contact YourKit sales to figure out. On Tue, Aug 16, 2016 at 8:38 PM, calvin hung wrote: > > > Thanks for your response, Rui. > > I don't have an apache email account. > > It looks like only committer can get an email account according to this > page http://www.apache.org/dev/committers.html > > Does it mean that only Hive committers can get YourKit free licenses for > Hive performance profiling? > > > > > > On Tue, 16 Aug 2016 13:33:34 +0800 Rui Li <lirui.fu...@gmail.com > >wrote > > > > > If I remember correctly, I just contacted the sales of Yourkit and they > > sent me the license by email. You'd better send your email using your > > apache email account, in order to convince them you're a developer of Hive. > > > > On Tue, Aug 16, 2016 at 2:51 AM, calvin hung <calvinh...@wasaitech.com& > gt; > > wrote: > > > > > Hi Rui and Alan, > > > > > > Could you or any nice guy share more detail steps of getting a Yourkit > > > license for Hive? > > > I've searched the full Hive dev mail archive but got no exact steps > to get > > > one. > > > Thanks! > > > > > > Calvin > > > From: "Li, Rui"<rui...@intel.com> > > > Date: Tue, 31 Mar 2015 01:22:51 + > > > To: "dev@hive.apache.org"<dev@hive.apache.org> > > > > > > - Contents - > > > > > > Thanks Alan! But I don’t see Hive in the sponsored open source project > > > list. I’ll contact them anyway. > > > > > > > > > > > > Cheers, > > > > > > Rui Li > > > > > > > > > > > > From: Alan Gates [mailto:alanfga...@gmail.com] > > > Sent: Tuesday, March 31, 2015 1:02 AM > > > To: dev@hive.apache.org > > > Subject: Re: YourKit open source license > > > > > > > > > > > > Seehttps://www.yourkit.com/customers/. > > > > > > Alan. > > > > > > > > > > > > > > > > > > Li, Rui > > > > > > March 30, 2015 at 0:54 > > > > > > Hi guys, > > > > > > I want to use YourKit to profile hive performance. According to the > wiki< > > > https://cwiki.apache.org/confluence/display/Hive/Performance> > hive has > > > been granted open source license. Could anybody tell me how I can get > the > > > license? Thanks! > > > > > > Cheers, > > > Rui Li > > > > > > > > > > -- > > Best regards! > > Rui Li > > Cell: (+86) 13564950210 > > > > > > > -- Best regards! Rui Li Cell: (+86) 13564950210
[jira] [Created] (HIVE-14556) Load data into text table fail caused by IndexOutOfBoundsException
Niklaus Xiao created HIVE-14556: --- Summary: Load data into text table fail caused by IndexOutOfBoundsException Key: HIVE-14556 URL: https://issues.apache.org/jira/browse/HIVE-14556 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 1.3.0, 2.2.0 Reporter: Niklaus Xiao Assignee: Niklaus Xiao {code} echo "1" > foo.txt 0: jdbc:hive2://189.39.151.74:21066/> create table foo(id int) stored as textfile; No rows affected (1.846 seconds) 0: jdbc:hive2://189.39.151.74:21066/> load data local inpath '/foo.txt' into table foo; Error: Error while compiling statement: FAILED: SemanticException Unable to load data to destination table. Error: java.lang.IndexOutOfBoundsException (state=42000,code=4) {code} Exception: {code} 2016-08-17 17:15:36,301 | ERROR | HiveServer2-Handler-Pool: Thread-55 | FAILED: SemanticException Unable to load data to destination table. Error: java.lang.IndexOutOfBoundsException org.apache.hadoop.hive.ql.parse.SemanticException: Unable to load data to destination table. Error: java.lang.IndexOutOfBoundsException at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.ensureFileFormatsMatch(LoadSemanticAnalyzer.java:356) at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:236) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:473) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:325) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1358) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1340 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)