date:20140321

[jira] [Updated] (HIVE-6310) Fix a few minimr test failures

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6310:
---

Fix Version/s: 0.13.0

> Fix a few minimr test failures
> --
>
> Key: HIVE-6310
> URL: https://issues.apache.org/jira/browse/HIVE-6310
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-6310.1.patch, HIVE-6310.2.patch, HIVE-6310.patch
>
>
> These test cases are:
> {code}
> ql/src/test/queries/clientpositive/import_exported_table.q
> ql/src/test/queries/clientpositive/load_hdfs_file_with_space_in_the_name.q
> ql/src/test/queries/clientpositive/root_dir_external_table.q
> {code}
> They are failing because of existing hdfs:///tmp/test, possible left over by 
> other tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6360) Hadoop 2.3 + Tez 0.3

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6360:
---

Fix Version/s: 0.13.0

> Hadoop 2.3 + Tez 0.3
> 
>
> Key: HIVE-6360
> URL: https://issues.apache.org/jira/browse/HIVE-6360
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6360.1.patch, HIVE-6360.2.patch
>
>
> There are some things pending that rely on hadoop 2.3 or tez 0.3. These are 
> not released yet, but will be soon. I'm proposing to collect these in the tez 
> branch and do a merge back once these components have been released at that 
> version.
> The things depending on 0.3 or hadoop 2.3 are:
> - Zero Copy read for ORC
> - Unions in Tez
> - Tez on secure clusters
> - Changes to DagUtils to reflect tez 0.2 -> 0.3
> - Prewarm containers



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6323) Fix unit test file_with_header_footer_negative.q in TestNegativeMinimrCliDriver

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6323:
---

Fix Version/s: 0.13.0

> Fix unit test file_with_header_footer_negative.q in 
> TestNegativeMinimrCliDriver
> ---
>
> Key: HIVE-6323
> URL: https://issues.apache.org/jira/browse/HIVE-6323
> Project: Hive
>  Issue Type: Bug
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Fix For: 0.13.0
>
> Attachments: HIVE-6323.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6354) Some index test golden files produce non-deterministic stats in explain

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6354:
---

Fix Version/s: 0.13.0

> Some index test golden files produce non-deterministic stats in explain
> ---
>
> Key: HIVE-6354
> URL: https://issues.apache.org/jira/browse/HIVE-6354
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6354.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6333) Generate vectorized plan for decimal expressions.

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6333:
---

Fix Version/s: 0.13.0

> Generate vectorized plan for decimal expressions.
> -
>
> Key: HIVE-6333
> URL: https://issues.apache.org/jira/browse/HIVE-6333
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.13.0
>
> Attachments: HIVE-6333.1.patch, HIVE-6333.2.patch, HIVE-6333.3.patch, 
> HIVE-6333.4.patch, HIVE-6333.5.patch, HIVE-6333.6.patch
>
>
> Transform non-vector plan to vectorized plan for supported decimal 
> expressions. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6369) ORC Writer (int RLE v2) fails with ArrayIndexOutOfBounds

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6369:
---

Fix Version/s: 0.13.0

> ORC Writer (int RLE v2) fails with ArrayIndexOutOfBounds
> 
>
> Key: HIVE-6369
> URL: https://issues.apache.org/jira/browse/HIVE-6369
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.0
> Environment: hadoop-2.2 + hive-0.13-trunk (f1807ede)
>Reporter: Gopal V
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.13.0
>
>
> The ORC writer for store_sales TPC-DS table fails with 
> {code}
> 2014-01-30 09:23:07,819 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.ArrayIndexOutOfBoundsException: 2
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.flush(RunLengthIntegerWriterV2.java:682)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.writeStripe(WriterImpl.java:752)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1330)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6372) getDatabaseMajor/Minor version returns wrong values

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6372:
---

Fix Version/s: 0.13.0

> getDatabaseMajor/Minor version returns wrong values
> ---
>
> Key: HIVE-6372
> URL: https://issues.apache.org/jira/browse/HIVE-6372
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.13.0
>
> Attachments: HIVE-6372.patch
>
>
> Currently getDatabaseMajorVersion returns 13, and getDatabaseMinorVersion 
> returns 0.   The index is off by one.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6396) Implement vectorized unary minus for decimal

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6396:
---

Fix Version/s: 0.13.0

> Implement vectorized unary minus for decimal
> 
>
> Key: HIVE-6396
> URL: https://issues.apache.org/jira/browse/HIVE-6396
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.13.0
>
>
> Implement vectorized unary minus for decimal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6420) upgrade script for Hive 13 is missing for Derby

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6420:
---

Fix Version/s: 0.13.0

> upgrade script for Hive 13 is missing for Derby
> ---
>
> Key: HIVE-6420
> URL: https://issues.apache.org/jira/browse/HIVE-6420
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: HIVE-6420.patch
>
>
> There's an upgrade script for all DSes but not for Derby. Nothing needs to be 
> done in that script but I'm being told that some tools might break if there's 
> no matching file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6475) Implement support for appending to mutable tables in HCatalog

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6475:
---

Fix Version/s: 0.13.0

> Implement support for appending to mutable tables in HCatalog
> -
>
> Key: HIVE-6475
> URL: https://issues.apache.org/jira/browse/HIVE-6475
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Fix For: 0.13.0
>
> Attachments: 6475.log, 6475.log.hadoop2, HIVE-6475.2.patch, 
> HIVE-6475.patch
>
>
> Part of HIVE-6405, this is the implementation of the append feature on the 
> HCatalog side. If a table is mutable, we must support being able to append to 
> existing data instead of erroring out as  a duplicate publish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6496) Queries fail to Vectorize.

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6496:
---

Fix Version/s: 0.13.0

> Queries fail to Vectorize.
> --
>
> Key: HIVE-6496
> URL: https://issues.apache.org/jira/browse/HIVE-6496
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.13.0
>
> Attachments: HIVE-6496.1.patch, HIVE-6496.2.patch, HIVE-6496.3.patch, 
> HIVE-6496.4.patch
>
>
> Following issues are causing many queries to fail to vectorize:
> 1) NPE because row resolver is null.
> 2) VectorUDFAdapter doesn't handle decimal.
> 3) Decimal cast to boolean, timestamp, string fail because classes are not 
> annotated appropriately.
> 4) Decimal modulo fails to vectorize because GenericUDFOPMod is not annotated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6491) ClassCastException in AbstractParquetMapInspector

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6491:
---

Fix Version/s: 0.13.0

> ClassCastException in AbstractParquetMapInspector
> -
>
> Key: HIVE-6491
> URL: https://issues.apache.org/jira/browse/HIVE-6491
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
> Environment: cdh5-beta2, trunk
>Reporter: Andrey Stepachev
> Fix For: 0.13.0
>
>
> AbstractParquetMapInspector uses wrong class cast 
> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java#L144
> It should be AbstractParquetMapInspector
> {code:java}
> final StandardParquetHiveMapInspector other = 
> (StandardParquetHiveMapInspector) obj;
> {code}
> Such conversion leads to class cast exception in case of 
> DeepParquetHiveMapInspector.
> {code}
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.io.parquet.serde.DeepParquetHiveMapInspector cannot 
> be cast to 
> org.apache.hadoop.hive.ql.io.parquet.serde.StandardParquetHiveMapInspector
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.AbstractParquetMapInspector.equals(AbstractParquetMapInspector.java:131)
> at java.util.AbstractList.equals(AbstractList.java:523)
> at java.util.AbstractList.equals(AbstractList.java:523)
> at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:996)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:281)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.getStandardStructObjectInspector(ObjectInspectorFactory.java:268)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1022)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:453)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:409)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:188)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
> at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:80)
> ... 31 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6477) Aggregation functions for tiny/smallint broken with parquet

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6477:
---

Fix Version/s: 0.13.0

> Aggregation functions for tiny/smallint broken with parquet
> ---
>
> Key: HIVE-6477
> URL: https://issues.apache.org/jira/browse/HIVE-6477
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>  Labels: parquet
> Fix For: 0.13.0
>
> Attachments: HIVE-6477.2.patch, HIVE-6477.patch
>
>
> Given the following table:
> {noformat}
> CREATE TABLE IF NOT EXISTS commontypesagg (
> id int,
> bool_col boolean,
> tinyint_col tinyint,
> smallint_col smallint,
> int_col int,
> bigint_col bigint,
> float_col float,
> double_col double,
> date_string_col string,
> string_col string)
> PARTITIONED BY (year int, month int, day int)
> STORED AS PARQUET;
> {noformat}
> The following queries throws ClassCastException:
> {noformat}
> select count(tinyint_col), min(tinyint_col), max(tinyint_col), 
> sum(tinyint_col) from commontypesagg;
> select count(smallint_col), min(smallint_col), max(smallint_col), 
> sum(smallint_col) from commontypesagg;
> {noformat}
> Exception is the following:
> {noformat}
> 2014-01-29 14:02:11,381 INFO org.apache.hadoop.mapred.TaskStatus: 
> task-diagnostic-info for task attempt_201401290934_0006_m_01_1 : 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row 
> {"id":null,"bool_col":null,"tinyint_col":1,"smallint_col":null,"int_col":null,"bigint_col":null,"float_col":null,"double_col":null,"date_string_col":null,"string_col":null,"year":"2009","month":"1"}
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> {"id":null,"bool_col":null,"tinyint_col":1,"smallint_col":null,"int_col":null,"bigint_col":null,"float_col":null,"double_col":null,"date_string_col":null,"string_col":null,"year":"2009","month":"1"}
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
> at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:157)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
> to java.lang.Byte
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:796)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:844)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:844)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:844)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519)
> ... 9 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
> cannot be cast to java.lang.Byte
> at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaByteObjectInspector.get(JavaByteObjectInspector.java:40)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:666)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96)
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:629)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:826)
> at 
> org

[jira] [Updated] (HIVE-6496) Queries fail to Vectorize.

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6496:
---

Component/s: Vectorization

> Queries fail to Vectorize.
> --
>
> Key: HIVE-6496
> URL: https://issues.apache.org/jira/browse/HIVE-6496
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.13.0
>
> Attachments: HIVE-6496.1.patch, HIVE-6496.2.patch, HIVE-6496.3.patch, 
> HIVE-6496.4.patch
>
>
> Following issues are causing many queries to fail to vectorize:
> 1) NPE because row resolver is null.
> 2) VectorUDFAdapter doesn't handle decimal.
> 3) Decimal cast to boolean, timestamp, string fail because classes are not 
> annotated appropriately.
> 4) Decimal modulo fails to vectorize because GenericUDFOPMod is not annotated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6506) hcatalog should automatically work with new tableproperties in ORC

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6506:
---

Fix Version/s: 0.13.0

> hcatalog should automatically work with new tableproperties in ORC
> --
>
> Key: HIVE-6506
> URL: https://issues.apache.org/jira/browse/HIVE-6506
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Thejas M Nair
> Fix For: 0.13.0
>
>
> HIVE-5504 has changes to handle existing table properties for ORC file 
> format. But it does not automatically pick newly added table properties. We 
> should refactor ORC so that its table property list can be automatically 
> determined.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6542) build error with jdk 7

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6542:
---

Fix Version/s: 0.13.0

> build error with jdk 7
> --
>
> Key: HIVE-6542
> URL: https://issues.apache.org/jira/browse/HIVE-6542
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.13.0
>
> Attachments: HIVE-6542.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6542) build error with jdk 7

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6542:
---

Affects Version/s: 0.13.0

> build error with jdk 7
> --
>
> Key: HIVE-6542
> URL: https://issues.apache.org/jira/browse/HIVE-6542
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.13.0
>
> Attachments: HIVE-6542.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6530) JDK 7 trunk build fails after HIVE-6418 patch

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6530:
---

Fix Version/s: 0.13.0

> JDK 7 trunk build fails after HIVE-6418 patch
> -
>
> Key: HIVE-6530
> URL: https://issues.apache.org/jira/browse/HIVE-6530
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Prasad Mujumdar
>Assignee: Navis
>Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HIVE-6530.1.patch.txt, HIVE-6530.2.patch.txt
>
>
> JDK7 build fails with following error 
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) 
> on project hive-exec: Compilation failure
> [ERROR] 
> /home/prasadm/repos/apache/hive-trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/LazyFlatRowContainer.java:[118,15]
>  name clash: add(java.util.List) in 
> org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer overrides a 
> method whose erasure is the same as another method, yet neither overrides the 
> other
> [ERROR] first method:  add(E) in java.util.AbstractCollection
> [ERROR] second method: add(ROW) in 
> org.apache.hadoop.hive.ql.exec.persistence.AbstractRowContainer
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hive-exec
> {noformat}
> This LazyFlatRowContainer.java is  a new file added as part of  HIVE-6418 
> patch. It's extending AbstractCollection and implements AbstractRowContainer. 
> Looks like the both these have a add() method that's conflicting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6529) Tez output files are out of date

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6529:
---

Fix Version/s: 0.13.0

> Tez output files are out of date
> 
>
> Key: HIVE-6529
> URL: https://issues.apache.org/jira/browse/HIVE-6529
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-6529.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6557) TestSchemaTool tests are failing

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6557:
---

Fix Version/s: 0.13.0

> TestSchemaTool tests are failing
> 
>
> Key: HIVE-6557
> URL: https://issues.apache.org/jira/browse/HIVE-6557
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Vikram Dixit K
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6566) Incorrect union-all plan with map-joins on Tez

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6566:
---

Fix Version/s: 0.13.0

> Incorrect union-all plan with map-joins on Tez
> --
>
> Key: HIVE-6566
> URL: https://issues.apache.org/jira/browse/HIVE-6566
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-6566.1.patch, HIVE-6566.2.patch
>
>
> The tez dag is hooked up incorrectly for some union all queries involving map 
> joins. That's quite common and results in either NPE or invalid results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6571) query id should be available for logging during query compilation

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6571:
---

Fix Version/s: 0.13.0

> query id should be available for logging during query compilation
> -
>
> Key: HIVE-6571
> URL: https://issues.apache.org/jira/browse/HIVE-6571
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-6571.1.patch
>
>
> Would be nice to have the query id set during compilation to tie logs 
> together etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6663) remove TUGIContainingProcessor class as it is not used anymore

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6663:
---

Fix Version/s: 0.13.0

> remove TUGIContainingProcessor class as it is not used anymore
> --
>
> Key: HIVE-6663
> URL: https://issues.apache.org/jira/browse/HIVE-6663
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.13.0
>
> Attachments: HIVE-6663.1.patch
>
>
> After HIVE-6312 changes, TUGIContainingProcessor class is unused. It should 
> be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6698) hcat.py script does not correctly load the hbase storage handler jars

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6698:
---

Fix Version/s: 0.14.0

> hcat.py script does not correctly load the hbase storage handler jars
> -
>
> Key: HIVE-6698
> URL: https://issues.apache.org/jira/browse/HIVE-6698
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 0.14.0
>
> Attachments: HIVE-6698.patch
>
>
> Currently queries using the HBaseHCatStorageHandler when run using hcat.py 
> fail. Example query
> {code}
> create table pig_hbase_1(key string, age string, gpa string)
> STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
> TBLPROPERTIES ('hbase.columns.mapping'=':key,info:age,info:gpa');
> {code}
> Following error is seen in the hcat logs:
> {noformat}
> 2014-03-18 08:25:49,437 ERROR ql.Driver (SessionState.java:printError(541)) - 
> FAILED: SemanticException java.io.IOException: Error in loading storage 
> handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
> org.apache.hadoop.hive.ql.parse.SemanticException: java.io.IOException: Error 
> in loading storage handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.CreateTableHook.postAnalyze(CreateTableHook.java:208)
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.HCatSemanticAnalyzer.postAnalyze(HCatSemanticAnalyzer.java:242)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:295)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:949)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:997)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:875)
>   at org.apache.hive.hcatalog.cli.HCatDriver.run(HCatDriver.java:43)
>   at org.apache.hive.hcatalog.cli.HCatCli.processCmd(HCatCli.java:259)
>   at org.apache.hive.hcatalog.cli.HCatCli.processLine(HCatCli.java:213)
>   at org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.java:172)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.io.IOException: Error in loading storage 
> handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at 
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:432)
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.CreateTableHook.postAnalyze(CreateTableHook.java:199)
>   ... 16 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at java.net.URLClassLoader$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:426)
>   ... 17 more
> {noformat}
> The problem is that the hbaseStorageJar is incorrect with the merging of hcat 
> into hive. Also as per HIVE-6695 we should add the HBASE_LIB in the classpath.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943945#comment-13943945
 ] 

Ashutosh Chauhan commented on HIVE-6687:


Seems like checked-in .q.out file is incorrect, what we are getting now seems 
more appropriate.

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6650) hive.optimize.index.filter breaks non-index where with HBaseStorageHandler

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6650:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to 0.13 & trunk.

> hive.optimize.index.filter breaks non-index where with HBaseStorageHandler
> --
>
> Key: HIVE-6650
> URL: https://issues.apache.org/jira/browse/HIVE-6650
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.12.0
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.13.0
>
> Attachments: HIVE-6650.0.patch, HIVE-6650.1.patch, HIVE-6650.2.patch, 
> HIVE-6650.3.patch
>
>
> With the above enabled, where clauses including non-rowkey columns cannot be 
> used with the HBaseStorageHandler. Job fails to launch with the following 
> exception.
> {noformat}
> java.lang.RuntimeException: Unexpected residual predicate (s_address = '200 
> WEST 56TH STREET')
> at 
> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.convertFilter(HiveHBaseTableInputFormat.java:292)
> at 
> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:495)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:294)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:303)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:518)
> at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
> at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1437)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1215)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1043)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Job Submission failed with exception 'java.lang.RuntimeException(Unexpected 
> residual predicate (s_address = '200 WEST 56TH STREET'))'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> {noformat}
> I believe this bug was introduced in HIVE-2036, see change to 
> OpProcFactory.java that always includes full predicate, even after storage 
> handler negotiates the predicates it can pushdown. Since this behavior is 
> divergent from input formats (they cannot negotiate), there's no harm in the 
> SH ignoring non-indexed predicates -- Hive respects all of them at a layer 
> above anyway. Might as well remove the check/exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-4598) Incorrect results when using subquery in multi table insert

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-4598.


Resolution: Fixed
  Assignee: Navis

Fixed via HIVE-4293

> Incorrect results when using subquery in multi table insert
> ---
>
> Key: HIVE-4598
> URL: https://issues.apache.org/jira/browse/HIVE-4598
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0, 0.11.0
>Reporter: Sebastian
>Assignee: Navis
> Attachments: HIVE-4598.1.patch.txt
>
>
> I'm using a multi table insert like this:
> FROM 
> INSERT INTO TABLE t PARTITION (type='x')
> SELECT * WHERE type='x'
> INSERT INTO TABLE t PARTITION (type='y')
> SELECT * WHERE type='y';
> Now when  is the name of a table, everything works as expected.
> However if I use a subquery as , the query runs but it inserts all results 
> from the subquery into each partition, as if there were no "WHERE" clauses in 
> the selects.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-5964) Hive missing a filter predicate causing wrong results joining tables after sort by

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-5964.


   Resolution: Fixed
Fix Version/s: 0.13.0

Fixed via HIVE-4293

> Hive missing a filter predicate causing wrong results joining tables after 
> sort by
> --
>
> Key: HIVE-5964
> URL: https://issues.apache.org/jira/browse/HIVE-5964
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0, 0.11.0
>Reporter: dima machlin
>Assignee: Navis
>Priority: Blocker
> Fix For: 0.13.0
>
>
> It seems like the optimization of predicate pushdown is failing under certain 
> conditions causing wrong results as a filter predicate appears to be 
> completely disregarded by the query processor for some reason.
> Here is the scenario (assuming "dual" table exists) :
> set hive.optimize.ppd=true;
> drop table if exists test_tbl ;
> create table test_tbl (id string,name string);
> insert into table test_tbl
> select 'a','b' from dual;
> test_tbl now contains :
> a b
> the following query :
> select t2.* 
> from
> (select id,name from (select id,name from test_tbl) t1 sort by id) t2
>  join test_tbl t3 on (t2.id=t3.id )
> where t2.name='c' and t3.id='a';
> returns :
> a b
> The filter :" t2.name='c' " is missing from the execution plan and obviously 
> doesn't apply.
> The filter "t3.id='a' " does appear in the plan and is being applied before 
> the join.
> If the query changes a little bit like removing the sort by, removing the t1 
> sub-query or disabling hive.optimize.ppd then the predicate appears.
> I'm able to reproduce the problem both in Hive 0.10 and Hive 0.11 although It 
> seems to work fine in Hive 0.7



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6395:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Tested the test case in patch on trunk which now has HIVE-4293. It now passes. 
Feel free to reopen if you can still repro in some other form.

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Fix For: 0.13.0
>
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-4293) Predicates following UDTF operator are removed by PPD

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4293:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk & 0.13. Thanks, Navis!

> Predicates following UDTF operator are removed by PPD
> -
>
> Key: HIVE-4293
> URL: https://issues.apache.org/jira/browse/HIVE-4293
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: D9933.6.patch, HIVE-4293.10.patch, 
> HIVE-4293.11.patch.txt, HIVE-4293.12.patch, HIVE-4293.13.patch, 
> HIVE-4293.7.patch.txt, HIVE-4293.8.patch.txt, HIVE-4293.9.patch.txt, 
> HIVE-4293.D9933.1.patch, HIVE-4293.D9933.2.patch, HIVE-4293.D9933.3.patch, 
> HIVE-4293.D9933.4.patch, HIVE-4293.D9933.5.patch
>
>
> For example, 
> {noformat}
> explain SELECT value from (
>   select explode(array(key, value)) as (value) from (
> select * FROM src WHERE key > 200
>   ) A
> ) B WHERE value > 300
> ;
> {noformat}
> Makes plan like this, removing last predicates
> {noformat}
>   TableScan
> alias: src
> Filter Operator
>   predicate:
>   expr: (key > 200.0)
>   type: boolean
>   Select Operator
> expressions:
>   expr: array(key,value)
>   type: array
> outputColumnNames: _col0
> UDTF Operator
>   function name: explode
>   Select Operator
> expressions:
>   expr: col
>   type: string
> outputColumnNames: _col0
> File Output Operator
>   compressed: false
>   GlobalTableId: 0
>   table:
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943907#comment-13943907
 ] 

Hive QA commented on HIVE-6687:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12636089/HIVE-6687.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5437 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_print_header
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1898/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1898/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12636089

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943874#comment-13943874
 ] 

Prasad Mujumdar commented on HIVE-6687:
---

+1


> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4293) Predicates following UDTF operator are removed by PPD

2014-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943865#comment-13943865
 ] 

Hive QA commented on HIVE-4293:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12635401/HIVE-4293.13.patch

{color:green}SUCCESS:{color} +1 5439 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1897/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1897/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12635401

> Predicates following UDTF operator are removed by PPD
> -
>
> Key: HIVE-4293
> URL: https://issues.apache.org/jira/browse/HIVE-4293
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Critical
> Attachments: D9933.6.patch, HIVE-4293.10.patch, 
> HIVE-4293.11.patch.txt, HIVE-4293.12.patch, HIVE-4293.13.patch, 
> HIVE-4293.7.patch.txt, HIVE-4293.8.patch.txt, HIVE-4293.9.patch.txt, 
> HIVE-4293.D9933.1.patch, HIVE-4293.D9933.2.patch, HIVE-4293.D9933.3.patch, 
> HIVE-4293.D9933.4.patch, HIVE-4293.D9933.5.patch
>
>
> For example, 
> {noformat}
> explain SELECT value from (
>   select explode(array(key, value)) as (value) from (
> select * FROM src WHERE key > 200
>   ) A
> ) B WHERE value > 300
> ;
> {noformat}
> Makes plan like this, removing last predicates
> {noformat}
>   TableScan
> alias: src
> Filter Operator
>   predicate:
>   expr: (key > 200.0)
>   type: boolean
>   Select Operator
> expressions:
>   expr: array(key,value)
>   type: array
> outputColumnNames: _col0
> UDTF Operator
>   function name: explode
>   Select Operator
> expressions:
>   expr: col
>   type: string
> outputColumnNames: _col0
> File Output Operator
>   compressed: false
>   GlobalTableId: 0
>   table:
>   input format: org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization

2014-03-21 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6455:
-

Attachment: HIVE-6455.20.patch

Changes in PartitionDesc related to Interners are no longer required. The diffs 
are consistent if all the test for TestParse are run together which I believe 
is what precommit QA is doing. Refreshed the patch to trunk.

> Scalable dynamic partitioning and bucketing optimization
> 
>
> Key: HIVE-6455
> URL: https://issues.apache.org/jira/browse/HIVE-6455
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: optimization
> Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch, 
> HIVE-6455.10.patch, HIVE-6455.10.patch, HIVE-6455.11.patch, 
> HIVE-6455.12.patch, HIVE-6455.13.patch, HIVE-6455.13.patch, 
> HIVE-6455.14.patch, HIVE-6455.15.patch, HIVE-6455.16.patch, 
> HIVE-6455.17.patch, HIVE-6455.17.patch.txt, HIVE-6455.18.patch, 
> HIVE-6455.19.patch, HIVE-6455.2.patch, HIVE-6455.20.patch, HIVE-6455.3.patch, 
> HIVE-6455.4.patch, HIVE-6455.4.patch, HIVE-6455.5.patch, HIVE-6455.6.patch, 
> HIVE-6455.7.patch, HIVE-6455.8.patch, HIVE-6455.9.patch, HIVE-6455.9.patch
>
>
> The current implementation of dynamic partition works by keeping at least one 
> record writer open per dynamic partition directory. In case of bucketing 
> there can be multispray file writers which further adds up to the number of 
> open record writers. The record writers of column oriented file format (like 
> ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or 
> compression buffers) open all the time to buffer up the rows and compress 
> them before flushing it to disk. Since these buffers are maintained per 
> column basis the amount of constant memory that will required at runtime 
> increases as the number of partitions and number of columns per partition 
> increases. This often leads to OutOfMemory (OOM) exception in mappers or 
> reducers depending on the number of open record writers. Users often tune the 
> JVM heapsize (runtime memory) to get over such OOM issues. 
> With this optimization, the dynamic partition columns and bucketing columns 
> (in case of bucketed tables) are sorted before being fed to the reducers. 
> Since the partitioning and bucketing columns are sorted, each reducers can 
> keep only one record writer open at any time thereby reducing the memory 
> pressure on the reducers. This optimization is highly scalable as the number 
> of partition and number of columns per partition increases at the cost of 
> sorting the columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization

2014-03-21 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943861#comment-13943861
 ] 

Prasanth J commented on HIVE-6455:
--

[~leftylev] Following configs are no longer required..
hive.cache.table.desc.max.entries
hive.cache.properties.max.entries
hive.cache.io.format.max.entries

> Scalable dynamic partitioning and bucketing optimization
> 
>
> Key: HIVE-6455
> URL: https://issues.apache.org/jira/browse/HIVE-6455
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: optimization
> Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch, 
> HIVE-6455.10.patch, HIVE-6455.10.patch, HIVE-6455.11.patch, 
> HIVE-6455.12.patch, HIVE-6455.13.patch, HIVE-6455.13.patch, 
> HIVE-6455.14.patch, HIVE-6455.15.patch, HIVE-6455.16.patch, 
> HIVE-6455.17.patch, HIVE-6455.17.patch.txt, HIVE-6455.18.patch, 
> HIVE-6455.19.patch, HIVE-6455.2.patch, HIVE-6455.20.patch, HIVE-6455.3.patch, 
> HIVE-6455.4.patch, HIVE-6455.4.patch, HIVE-6455.5.patch, HIVE-6455.6.patch, 
> HIVE-6455.7.patch, HIVE-6455.8.patch, HIVE-6455.9.patch, HIVE-6455.9.patch
>
>
> The current implementation of dynamic partition works by keeping at least one 
> record writer open per dynamic partition directory. In case of bucketing 
> there can be multispray file writers which further adds up to the number of 
> open record writers. The record writers of column oriented file format (like 
> ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or 
> compression buffers) open all the time to buffer up the rows and compress 
> them before flushing it to disk. Since these buffers are maintained per 
> column basis the amount of constant memory that will required at runtime 
> increases as the number of partitions and number of columns per partition 
> increases. This often leads to OutOfMemory (OOM) exception in mappers or 
> reducers depending on the number of open record writers. Users often tune the 
> JVM heapsize (runtime memory) to get over such OOM issues. 
> With this optimization, the dynamic partition columns and bucketing columns 
> (in case of bucketed tables) are sorted before being fed to the reducers. 
> Since the partitioning and bucketing columns are sorted, each reducers can 
> keep only one record writer open at any time thereby reducing the memory 
> pressure on the reducers. This optimization is highly scalable as the number 
> of partition and number of columns per partition increases at the cost of 
> sorting the columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943851#comment-13943851
 ] 

Xuefu Zhang commented on HIVE-6395:
---

{quote}
Although I am curious why this is duplicated in HIVE-4293 which is about 
subquery + udtf, I wonder if its a part of the main fix, or just an additional 
fix that got added?
{quote}

I didn't read the patch in HIVE-4293, but from Hive's perspective, UDTF is very 
similar to TRANSFORM() except that the former is is done via UDTF's java code, 
and the later in external script via streaming. For this reason, the problem 
here might be a sub-problem of HIVE-4293. This is just my guess.

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HIVE-6631) NPE when select a field of a struct from a table stored by ORC

2014-03-21 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved HIVE-6631.


Resolution: Duplicate

> NPE when select a field of a struct from a table stored by ORC
> --
>
> Key: HIVE-6631
> URL: https://issues.apache.org/jira/browse/HIVE-6631
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Serializers/Deserializers
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Yin Huai
>
> I have a table like this ...
> {code:sql}
> create table lineitem_orc_cg
> (
> CG1 STRUCTL_SUPPKEY:INT,
>L_COMMITDATE:STRING,
>L_RECEIPTDATE:STRING,
>L_SHIPINSTRUCT:STRING,
>L_SHIPMODE:STRING,
>L_COMMENT:STRING,
>L_TAX:float,
>L_RETURNFLAG:STRING,
>L_LINESTATUS:STRING,
>L_LINENUMBER:INT,
>L_ORDERKEY:INT>,
> CG2 STRUCTL_EXTENDEDPRICE:float,
>L_DISCOUNT:float,
>L_SHIPDATE:STRING>
> )
> row format serde 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> stored as orc tblproperties ("orc.compress"="NONE");
> {code}
> When I want to select a field from a struct by using
> {code:sql}
> select cg1.l_comment from lineitem_orc_cg limit 1;
> {code}
> I got 
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:928)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:954)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:459)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:415)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:409)
>   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:133)
>   ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6716) ORC struct throws NPE for tables with inner structs having null values

2014-03-21 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943843#comment-13943843
 ] 

Yin Huai commented on HIVE-6716:


ok. Have marked that one. Thanks.

> ORC struct throws NPE for tables with inner structs having null values 
> ---
>
> Key: HIVE-6716
> URL: https://issues.apache.org/jira/browse/HIVE-6716
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-6716.1.patch
>
>
> ORCStruct should return null when object passed to 
> getStructFieldsDataAsList(Object obj) is null.
> {code}
> public List getStructFieldsDataAsList(Object object) {
>   OrcStruct struct = (OrcStruct) object;
>   List result = new ArrayList(struct.fields.length);
> {code}
> In the above code struct.fields will throw NPE if struct is NULL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6716) ORC struct throws NPE for tables with inner structs having null values

2014-03-21 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943842#comment-13943842
 ] 

Prasanth J commented on HIVE-6716:
--

[~yhuai] yes it looks like the same one.. in your description if cg1 inner 
struct is null then it will throw NPE.. You can mark it as duplicate.

> ORC struct throws NPE for tables with inner structs having null values 
> ---
>
> Key: HIVE-6716
> URL: https://issues.apache.org/jira/browse/HIVE-6716
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-6716.1.patch
>
>
> ORCStruct should return null when object passed to 
> getStructFieldsDataAsList(Object obj) is null.
> {code}
> public List getStructFieldsDataAsList(Object object) {
>   OrcStruct struct = (OrcStruct) object;
>   List result = new ArrayList(struct.fields.length);
> {code}
> In the above code struct.fields will throw NPE if struct is NULL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6716) ORC struct throws NPE for tables with inner structs having null values

2014-03-21 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943839#comment-13943839
 ] 

Yin Huai commented on HIVE-6716:


[~prasanth_j] It is the same bug as I mentioned in 
https://issues.apache.org/jira/browse/HIVE-6631, right? If so, I will mark that 
one as duplicate.

> ORC struct throws NPE for tables with inner structs having null values 
> ---
>
> Key: HIVE-6716
> URL: https://issues.apache.org/jira/browse/HIVE-6716
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-6716.1.patch
>
>
> ORCStruct should return null when object passed to 
> getStructFieldsDataAsList(Object obj) is null.
> {code}
> public List getStructFieldsDataAsList(Object object) {
>   OrcStruct struct = (OrcStruct) object;
>   List result = new ArrayList(struct.fields.length);
> {code}
> In the above code struct.fields will throw NPE if struct is NULL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943837#comment-13943837
 ] 

Szehon Ho commented on HIVE-6395:
-

Thanks Xuefu, its not possible unless I'm misunderstanding the question, the 
entire FilterOp is pushed down, the predicate is just a part of it.  Navis's 
patch also seems to remove the candidate if its multi-insert case, as its not 
implemented yet in ppd.  (see OpProcFactory.getChildWalkerInfo).

Although I am curious why this is duplicated in HIVE-4293 which is about 
subquery + udtf, I wonder if its a part of the main fix, or just an additional 
fix that got added?

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6650) hive.optimize.index.filter breaks non-index where with HBaseStorageHandler

2014-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943819#comment-13943819
 ] 

Hive QA commented on HIVE-6650:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12635847/HIVE-6650.3.patch

{color:green}SUCCESS:{color} +1 5437 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1896/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1896/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12635847

> hive.optimize.index.filter breaks non-index where with HBaseStorageHandler
> --
>
> Key: HIVE-6650
> URL: https://issues.apache.org/jira/browse/HIVE-6650
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.12.0
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Attachments: HIVE-6650.0.patch, HIVE-6650.1.patch, HIVE-6650.2.patch, 
> HIVE-6650.3.patch
>
>
> With the above enabled, where clauses including non-rowkey columns cannot be 
> used with the HBaseStorageHandler. Job fails to launch with the following 
> exception.
> {noformat}
> java.lang.RuntimeException: Unexpected residual predicate (s_address = '200 
> WEST 56TH STREET')
> at 
> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.convertFilter(HiveHBaseTableInputFormat.java:292)
> at 
> org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:495)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:294)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:303)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:518)
> at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
> at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1437)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1215)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1043)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Job Submission failed with exception 'java.lang.RuntimeException(Unexpected 
> residual predicate (s_address = '200 WEST 56TH STREET'))'
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> {noformat}
> I believe this bug was introduced in HIVE-2036, see change to 
> OpProcFactory.j

[jira] [Updated] (HIVE-6708) ConstantVectorExpression should create copies of data objects rather than referencing them

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6708:


Status: Patch Available  (was: Open)

> ConstantVectorExpression should create copies of data objects rather than 
> referencing them
> --
>
> Key: HIVE-6708
> URL: https://issues.apache.org/jira/browse/HIVE-6708
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6708-1.patch, HIVE-6708.2.patch
>
>
> 1. ConstantVectorExpression vector should be updated for bytecolumnvectors 
> and decimalColumnVectors. The current code changes the reference to the 
> vector which might be shared across multiple columns
> 2. VectorizationContext.foldConstantsForUnaryExpression(ExprNodeDesc 
> exprDesc) has a minor bug as to when to constant fold the expression.
> The following code should replace the corresponding piece of code in the 
> trunk.
> ..
> GenericUDF gudf = ((ExprNodeGenericFuncDesc) exprDesc).getGenericUDF();
> if (gudf instanceof GenericUDFOPNegative || gudf instanceof 
> GenericUDFOPPositive
> || castExpressionUdfs.contains(gudf.getClass())
> ... 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6708) ConstantVectorExpression should create copies of data objects rather than referencing them

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6708:


Status: Open  (was: Patch Available)

> ConstantVectorExpression should create copies of data objects rather than 
> referencing them
> --
>
> Key: HIVE-6708
> URL: https://issues.apache.org/jira/browse/HIVE-6708
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6708-1.patch, HIVE-6708.2.patch
>
>
> 1. ConstantVectorExpression vector should be updated for bytecolumnvectors 
> and decimalColumnVectors. The current code changes the reference to the 
> vector which might be shared across multiple columns
> 2. VectorizationContext.foldConstantsForUnaryExpression(ExprNodeDesc 
> exprDesc) has a minor bug as to when to constant fold the expression.
> The following code should replace the corresponding piece of code in the 
> trunk.
> ..
> GenericUDF gudf = ((ExprNodeGenericFuncDesc) exprDesc).getGenericUDF();
> if (gudf instanceof GenericUDFOPNegative || gudf instanceof 
> GenericUDFOPPositive
> || castExpressionUdfs.contains(gudf.getClass())
> ... 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6642) Query fails to vectorize when a non string partition column is part of the query expression

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6642:


Status: Open  (was: Patch Available)

> Query fails to vectorize when a non string partition column is part of the 
> query expression
> ---
>
> Key: HIVE-6642
> URL: https://issues.apache.org/jira/browse/HIVE-6642
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6642-2.patch, HIVE-6642.1.patch
>
>
> drop table if exists alltypesorc_part;
> CREATE TABLE alltypesorc_part (
> ctinyint tinyint,
> csmallint smallint,
> cint int,
> cbigint bigint,
> cfloat float,
> cdouble double,
> cstring1 string,
> cstring2 string,
> ctimestamp1 timestamp,
> ctimestamp2 timestamp,
> cboolean1 boolean,
> cboolean2 boolean) partitioned by (ds int) STORED AS ORC;
> insert overwrite table alltypesorc_part partition (ds=2011) select * from 
> alltypesorc limit 100;
> insert overwrite table alltypesorc_part partition (ds=2012) select * from 
> alltypesorc limit 200;
> explain select *
> from (select ds from alltypesorc_part) t1,
>  alltypesorc t2
> where t1.ds = t2.cint
> order by t2.ctimestamp1
> limit 100;
> The above query fails to vectorize because (select ds from alltypesorc_part) 
> t1 returns a string column and the join equality on t2 is performed on an int 
> column. The correct output when vectorization is turned on should be:
> STAGE DEPENDENCIES:
>   Stage-5 is a root stage
>   Stage-2 depends on stages: Stage-5
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-5
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> t1:alltypesorc_part
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> t1:alltypesorc_part
>   TableScan
> alias: alltypesorc_part
> Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: ds (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   HashTable Sink Operator
> condition expressions:
>   0 {_col0}
>   1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} 
> {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} 
> {cboolean2}
> keys:
>   0 _col0 (type: int)
>   1 cint (type: int)
>   Stage: Stage-2
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t2
> Statistics: Num rows: 3536 Data size: 1131711 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {_col0}
> 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} 
> {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2}
>   keys:
> 0 _col0 (type: int)
> 1 cint (type: int)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
> _col6, _col7, _col8, _col9, _col10, _col11, _col12
>   Statistics: Num rows: 3889 Data size: 1244882 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (_col0 = _col3) (type: boolean)
> Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: int), _col1 (type: tinyint), 
> _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: 
> float), _col6 (type: double), _col7 (type: string), _col8 (type: string), 
> _col\
> 9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 
> (type: boolean)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
>   Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col9 (type: timestamp)
> sort order: +
> Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: int), _col1 (type: 
> tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), 
> _col5 (type: float), _col6 (typ

[jira] [Updated] (HIVE-6642) Query fails to vectorize when a non string partition column is part of the query expression

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6642:


Status: Patch Available  (was: Open)

> Query fails to vectorize when a non string partition column is part of the 
> query expression
> ---
>
> Key: HIVE-6642
> URL: https://issues.apache.org/jira/browse/HIVE-6642
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6642-2.patch, HIVE-6642.1.patch
>
>
> drop table if exists alltypesorc_part;
> CREATE TABLE alltypesorc_part (
> ctinyint tinyint,
> csmallint smallint,
> cint int,
> cbigint bigint,
> cfloat float,
> cdouble double,
> cstring1 string,
> cstring2 string,
> ctimestamp1 timestamp,
> ctimestamp2 timestamp,
> cboolean1 boolean,
> cboolean2 boolean) partitioned by (ds int) STORED AS ORC;
> insert overwrite table alltypesorc_part partition (ds=2011) select * from 
> alltypesorc limit 100;
> insert overwrite table alltypesorc_part partition (ds=2012) select * from 
> alltypesorc limit 200;
> explain select *
> from (select ds from alltypesorc_part) t1,
>  alltypesorc t2
> where t1.ds = t2.cint
> order by t2.ctimestamp1
> limit 100;
> The above query fails to vectorize because (select ds from alltypesorc_part) 
> t1 returns a string column and the join equality on t2 is performed on an int 
> column. The correct output when vectorization is turned on should be:
> STAGE DEPENDENCIES:
>   Stage-5 is a root stage
>   Stage-2 depends on stages: Stage-5
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-5
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> t1:alltypesorc_part
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> t1:alltypesorc_part
>   TableScan
> alias: alltypesorc_part
> Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: ds (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   HashTable Sink Operator
> condition expressions:
>   0 {_col0}
>   1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} 
> {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} 
> {cboolean2}
> keys:
>   0 _col0 (type: int)
>   1 cint (type: int)
>   Stage: Stage-2
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t2
> Statistics: Num rows: 3536 Data size: 1131711 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {_col0}
> 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} 
> {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2}
>   keys:
> 0 _col0 (type: int)
> 1 cint (type: int)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
> _col6, _col7, _col8, _col9, _col10, _col11, _col12
>   Statistics: Num rows: 3889 Data size: 1244882 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (_col0 = _col3) (type: boolean)
> Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: int), _col1 (type: tinyint), 
> _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: 
> float), _col6 (type: double), _col7 (type: string), _col8 (type: string), 
> _col\
> 9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 
> (type: boolean)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
>   Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col9 (type: timestamp)
> sort order: +
> Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: int), _col1 (type: 
> tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), 
> _col5 (type: float), _col6 (typ

[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the "source" command

2014-03-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943746#comment-13943746
 ] 

Ashutosh Chauhan commented on HIVE-6570:


I am waiting to hear from [~appodictic] seems like he had some concerns.

> Hive variable substitution does not work with the "source" command
> --
>
> Key: HIVE-6570
> URL: https://issues.apache.org/jira/browse/HIVE-6570
> Project: Hive
>  Issue Type: Bug
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-6570.1.patch
>
>
> The following does not work:
> {code}
> source ${hivevar:test-dir}/test.q;
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6570) Hive variable substitution does not work with the "source" command

2014-03-21 Thread Anthony Hsu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943740#comment-13943740
 ] 

Anthony Hsu commented on HIVE-6570:
---

Ping

> Hive variable substitution does not work with the "source" command
> --
>
> Key: HIVE-6570
> URL: https://issues.apache.org/jira/browse/HIVE-6570
> Project: Hive
>  Issue Type: Bug
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
> Attachments: HIVE-6570.1.patch
>
>
> The following does not work:
> {code}
> source ${hivevar:test-dir}/test.q;
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6728) Missing file override-container-log4j.properties in Hcatalog

2014-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6728:
-

Attachment: HIVE-6728.patch

added missing file to assembly descriptor

> Missing file override-container-log4j.properties in Hcatalog 
> -
>
> Key: HIVE-6728
> URL: https://issues.apache.org/jira/browse/HIVE-6728
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.13.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-6728.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6728) Missing file override-container-log4j.properties in Hcatalog

2014-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6728:
-

Status: Patch Available  (was: Open)

NO PRECOMMIT TESTS

> Missing file override-container-log4j.properties in Hcatalog 
> -
>
> Key: HIVE-6728
> URL: https://issues.apache.org/jira/browse/HIVE-6728
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.13.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-6728.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-6728) Missing file override-container-log4j.properties in Hcatalog

2014-03-21 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-6728:


 Summary: Missing file override-container-log4j.properties in 
Hcatalog 
 Key: HIVE-6728
 URL: https://issues.apache.org/jira/browse/HIVE-6728
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.13.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6222) Make Vector Group By operator abandon grouping if too many distinct keys

2014-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943714#comment-13943714
 ] 

Hive QA commented on HIVE-6222:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12636015/HIVE-6222.5.patch

{color:green}SUCCESS:{color} +1 5437 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1894/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1894/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12636015

> Make Vector Group By operator abandon grouping if too many distinct keys
> 
>
> Key: HIVE-6222
> URL: https://issues.apache.org/jira/browse/HIVE-6222
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: vectorization
> Attachments: HIVE-6222.1.patch, HIVE-6222.2.patch, HIVE-6222.3.patch, 
> HIVE-6222.4.patch, HIVE-6222.5.patch
>
>
> Row mode GBY is becoming a pass-through if not enough aggregation occurs on 
> the map side, relying on the shuffle+reduce side to do the work. Have VGBY do 
> the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-6727) Table level stats for external tables are set incorrectly

2014-03-21 Thread Harish Butani (JIRA)

Harish Butani created HIVE-6727:
---

 Summary: Table level stats for external tables are set incorrectly
 Key: HIVE-6727
 URL: https://issues.apache.org/jira/browse/HIVE-6727
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani


if you do the following:
{code}
CREATE EXTERNAL TABLE anaylyze_external (a INT) LOCATION 'data/files/ext_test';
describe formatted anaylyze_external;
{code}
The table level stats are:
{noformat}
Table Parameters:
COLUMN_STATS_ACCURATE   true
EXTERNALTRUE
numFiles0
numRows 6
rawDataSize 6
totalSize   0
{noformat}
numFiles and totalSize is always 0.
Issue is:
MetaStoreUtils:updateUnpartitionedTableStatsFast attempts to set table level 
stats from FileStatus. But it doesn't account for External tables, it always 
calls Warehouse.getFileStatusesForUnpartitionedTable



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6642) Query fails to vectorize when a non string partition column is part of the query expression

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6642:


Attachment: HIVE-6642-2.patch

cc-ing [~rhbutani] for reviewing the new patch

> Query fails to vectorize when a non string partition column is part of the 
> query expression
> ---
>
> Key: HIVE-6642
> URL: https://issues.apache.org/jira/browse/HIVE-6642
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6642-2.patch, HIVE-6642.1.patch
>
>
> drop table if exists alltypesorc_part;
> CREATE TABLE alltypesorc_part (
> ctinyint tinyint,
> csmallint smallint,
> cint int,
> cbigint bigint,
> cfloat float,
> cdouble double,
> cstring1 string,
> cstring2 string,
> ctimestamp1 timestamp,
> ctimestamp2 timestamp,
> cboolean1 boolean,
> cboolean2 boolean) partitioned by (ds int) STORED AS ORC;
> insert overwrite table alltypesorc_part partition (ds=2011) select * from 
> alltypesorc limit 100;
> insert overwrite table alltypesorc_part partition (ds=2012) select * from 
> alltypesorc limit 200;
> explain select *
> from (select ds from alltypesorc_part) t1,
>  alltypesorc t2
> where t1.ds = t2.cint
> order by t2.ctimestamp1
> limit 100;
> The above query fails to vectorize because (select ds from alltypesorc_part) 
> t1 returns a string column and the join equality on t2 is performed on an int 
> column. The correct output when vectorization is turned on should be:
> STAGE DEPENDENCIES:
>   Stage-5 is a root stage
>   Stage-2 depends on stages: Stage-5
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-5
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> t1:alltypesorc_part
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> t1:alltypesorc_part
>   TableScan
> alias: alltypesorc_part
> Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Select Operator
>   expressions: ds (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   HashTable Sink Operator
> condition expressions:
>   0 {_col0}
>   1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} 
> {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} 
> {cboolean2}
> keys:
>   0 _col0 (type: int)
>   1 cint (type: int)
>   Stage: Stage-2
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t2
> Statistics: Num rows: 3536 Data size: 1131711 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   condition expressions:
> 0 {_col0}
> 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} 
> {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2}
>   keys:
> 0 _col0 (type: int)
> 1 cint (type: int)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
> _col6, _col7, _col8, _col9, _col10, _col11, _col12
>   Statistics: Num rows: 3889 Data size: 1244882 Basic stats: 
> COMPLETE Column stats: NONE
>   Filter Operator
> predicate: (_col0 = _col3) (type: boolean)
> Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col0 (type: int), _col1 (type: tinyint), 
> _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: 
> float), _col6 (type: double), _col7 (type: string), _col8 (type: string), 
> _col\
> 9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 
> (type: boolean)
>   outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
>   Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> key expressions: _col9 (type: timestamp)
> sort order: +
> Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: int), _col1 (type: 
> tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bi

[jira] [Resolved] (HIVE-6369) ORC Writer (int RLE v2) fails with ArrayIndexOutOfBounds

2014-03-21 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J resolved HIVE-6369.
--

Resolution: Duplicate

Duplicate of HIVE-6382

> ORC Writer (int RLE v2) fails with ArrayIndexOutOfBounds
> 
>
> Key: HIVE-6369
> URL: https://issues.apache.org/jira/browse/HIVE-6369
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.0
> Environment: hadoop-2.2 + hive-0.13-trunk (f1807ede)
>Reporter: Gopal V
>Assignee: Prasanth J
>  Labels: orcfile
>
> The ORC writer for store_sales TPC-DS table fails with 
> {code}
> 2014-01-30 09:23:07,819 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.ArrayIndexOutOfBoundsException: 2
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.preparePatchedBlob(RunLengthIntegerWriterV2.java:593)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.determineEncoding(RunLengthIntegerWriterV2.java:541)
>   at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.flush(RunLengthIntegerWriterV2.java:682)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.writeStripe(WriterImpl.java:752)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1330)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1699)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:1868)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6682) nonstaged mapjoin table memory check may be broken

2014-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6682:
---

Attachment: HIVE-6682.03.patch

> nonstaged mapjoin table memory check may be broken
> --
>
> Key: HIVE-6682
> URL: https://issues.apache.org/jira/browse/HIVE-6682
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6682.01.patch, HIVE-6682.02.patch, 
> HIVE-6682.03.patch, HIVE-6682.patch
>
>
> We are getting the below error from task while the staged load works 
> correctly. 
> We don't set the memory threshold so low so it seems the settings are just 
> not handled correctly. This seems to always trigger on the first check. Given 
> that map task might have bunch more stuff, not just the hashmap, we may also 
> need to adjust the memory check (e.g. have separate configs).
> {noformat}
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
> 2014-03-14 08:11:21 Processing rows:20  Hashtable size: 
> 19  Memory usage:   204001888   percentage: 0.197
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
> 2014-03-14 08:11:21 Processing rows:20  Hashtable size: 
> 19  Memory usage:   204001888   percentage: 0.197
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:104)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:165)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1026)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1030)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1030)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>   ... 8 more
> Caused by: 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
> 2014-03-14 08:11:21   Processing rows:20  Hashtable size: 
> 19  Memory usage:   204001888   percentage: 0.197
>   at 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
>   at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:248)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:791)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:346)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.loadDirectly(HashTableLoader.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:82)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6682) nonstaged mapjoin table memory check may be broken

2014-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943677#comment-13943677
 ] 

Sergey Shelukhin commented on HIVE-6682:


2 tests are unrelated. For one, broken golden file was committed with original 
JIRA. Let me update. I hope +1 still stands

> nonstaged mapjoin table memory check may be broken
> --
>
> Key: HIVE-6682
> URL: https://issues.apache.org/jira/browse/HIVE-6682
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6682.01.patch, HIVE-6682.02.patch, HIVE-6682.patch
>
>
> We are getting the below error from task while the staged load works 
> correctly. 
> We don't set the memory threshold so low so it seems the settings are just 
> not handled correctly. This seems to always trigger on the first check. Given 
> that map task might have bunch more stuff, not just the hashmap, we may also 
> need to adjust the memory check (e.g. have separate configs).
> {noformat}
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
> 2014-03-14 08:11:21 Processing rows:20  Hashtable size: 
> 19  Memory usage:   204001888   percentage: 0.197
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
> 2014-03-14 08:11:21 Processing rows:20  Hashtable size: 
> 19  Memory usage:   204001888   percentage: 0.197
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:104)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:150)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:165)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1026)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1030)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1030)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>   ... 8 more
> Caused by: 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 
> 2014-03-14 08:11:21   Processing rows:20  Hashtable size: 
> 19  Memory usage:   204001888   percentage: 0.197
>   at 
> org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
>   at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:248)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:791)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:375)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:346)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.loadDirectly(HashTableLoader.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.HashTableLoader.load(HashTableLoader.java:82)
>   ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943646#comment-13943646
 ] 

Ashutosh Chauhan commented on HIVE-6395:


I think HIVE-4293 indeed fixes this and is more complete. [~rhbutani] / 
[~navis] Can you verify ?

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6060) Define API for RecordUpdater and UpdateReader

2014-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943639#comment-13943639
 ] 

Sergey Shelukhin commented on HIVE-6060:


and the above bugfix...

> Define API for RecordUpdater and UpdateReader
> -
>
> Key: HIVE-6060
> URL: https://issues.apache.org/jira/browse/HIVE-6060
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-6060.patch, HIVE-6060.patch, HIVE-6060.patch, 
> HIVE-6060.patch, HIVE-6060.patch, HIVE-6060.patch, acid-io.patch, 
> h-5317.patch, h-5317.patch, h-5317.patch, h-6060.patch, h-6060.patch
>
>
> We need to define some new APIs for how Hive interacts with the file formats 
> since it needs to be much richer than the current RecordReader and 
> RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943635#comment-13943635
 ] 

Xuefu Zhang commented on HIVE-6395:
---

[~szehon] The patch looks good to me. However, just one thing to consider: your 
patch would make PPD off in the given case, which has performance implications. 
However, if we push down ( state == 1 || state == 2 ) in the test case while 
keeping the filter at the filter, wouldn't that work? However, I'm not sure if 
this is possible or have covets. What you think?

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6701) Analyze table compute statistics for decimal columns.

2014-03-21 Thread Shreepadma Venugopalan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943626#comment-13943626
 ] 

Shreepadma Venugopalan commented on HIVE-6701:
--

The extra unused field were added in HIVE-1362 precisely to avoid upgrading the 
schema.

> Analyze table compute statistics for decimal columns.
> -
>
> Key: HIVE-6701
> URL: https://issues.apache.org/jira/browse/HIVE-6701
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6701.02.patch, HIVE-6701.1.patch
>
>
> Analyze table should compute statistics for decimal columns as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Xuefu Zhang



> On March 21, 2014, 9:50 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 180
> > 
> >
> > Just for my understanding, for the given example, what's the filterOp, 
> > what's the parent, and what are the siblings?
> 
> Szehon Ho wrote:
> Hi Xuefu, thanks for looking.  Like in my ascii diagram above, filter op 
> is the (Filter).  The parent is the script operator.

I guess "script" is the parent, based your comments.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/#review38215
---


On March 21, 2014, 9:05 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19549/
> ---
> 
> (Updated March 21, 2014, 9:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In this scenario, PPD on the script (transform) operator did the following 
> wrong predicate pushdown:
> 
> script --> filter (state=1)
>--> select, insert into test1
>-->filter (state=2)
>--> select, insert into test2
> 
> into:
> 
> script --> filter (state=1 and state=2)   //not possible.
>  --> select, insert into test1
>  --> select, insert into test2
> 
> 
> The bug was a combination of two things, first that these filters got chosen 
> by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD 
> called  "mergeWithChildrenPred + createFilters" which did the above 
> transformation due to them being marked.  
> 
> ScriptPPD was one of the few simple operator that did this, I tried with some 
> other parent operator like extract (see my added test in transform_ppr2.q) 
> and also just a select operator and could not produce the issue with those.
> 
> The fix is to skip marking a predicate as a 'candidate' for the pushdown if 
> it is a sibling of another filter.  We still want to pushdown children of 
> transform-operator with grandchildren, etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
>   ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
>   ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 
> 
> Diff: https://reviews.apache.org/r/19549/diff/
> 
> 
> Testing
> ---
> 
> Reproduced both the issue in transform_ppd_multi.q, also did another similar 
> issue with an extract (cluster) operator in transform_pp2.q.  Ran other 
> transform_ppd and general ppd tests to ensure no regression.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943611#comment-13943611
 ] 

Ashutosh Chauhan commented on HIVE-6687:


+1 LGTM

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943607#comment-13943607
 ] 

Szehon Ho commented on HIVE-6395:
-

Actually, I just saw your fix for HIVE-4293, is it a more complete fix for the 
same situation ?

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho



> On March 21, 2014, 9:50 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 180
> > 
> >
> > Just for my understanding, for the given example, what's the filterOp, 
> > what's the parent, and what are the siblings?

Hi Xuefu, thanks for looking.  Like in my ascii diagram above, filter op is the 
(Filter).  The parent is the script operator.


- Szehon


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/#review38215
---


On March 21, 2014, 9:05 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19549/
> ---
> 
> (Updated March 21, 2014, 9:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In this scenario, PPD on the script (transform) operator did the following 
> wrong predicate pushdown:
> 
> script --> filter (state=1)
>--> select, insert into test1
>-->filter (state=2)
>--> select, insert into test2
> 
> into:
> 
> script --> filter (state=1 and state=2)   //not possible.
>  --> select, insert into test1
>  --> select, insert into test2
> 
> 
> The bug was a combination of two things, first that these filters got chosen 
> by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD 
> called  "mergeWithChildrenPred + createFilters" which did the above 
> transformation due to them being marked.  
> 
> ScriptPPD was one of the few simple operator that did this, I tried with some 
> other parent operator like extract (see my added test in transform_ppr2.q) 
> and also just a select operator and could not produce the issue with those.
> 
> The fix is to skip marking a predicate as a 'candidate' for the pushdown if 
> it is a sibling of another filter.  We still want to pushdown children of 
> transform-operator with grandchildren, etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
>   ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
>   ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 
> 
> Diff: https://reviews.apache.org/r/19549/diff/
> 
> 
> Testing
> ---
> 
> Reproduced both the issue in transform_ppd_multi.q, also did another similar 
> issue with an extract (cluster) operator in transform_pp2.q.  Ran other 
> transform_ppd and general ppd tests to ensure no regression.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>

Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/#review38215
---



ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java


Just for my understanding, for the given example, what's the filterOp, 
what's the parent, and what are the siblings?


- Xuefu Zhang


On March 21, 2014, 9:05 p.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19549/
> ---
> 
> (Updated March 21, 2014, 9:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> In this scenario, PPD on the script (transform) operator did the following 
> wrong predicate pushdown:
> 
> script --> filter (state=1)
>--> select, insert into test1
>-->filter (state=2)
>--> select, insert into test2
> 
> into:
> 
> script --> filter (state=1 and state=2)   //not possible.
>  --> select, insert into test1
>  --> select, insert into test2
> 
> 
> The bug was a combination of two things, first that these filters got chosen 
> by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD 
> called  "mergeWithChildrenPred + createFilters" which did the above 
> transformation due to them being marked.  
> 
> ScriptPPD was one of the few simple operator that did this, I tried with some 
> other parent operator like extract (see my added test in transform_ppr2.q) 
> and also just a select operator and could not produce the issue with those.
> 
> The fix is to skip marking a predicate as a 'candidate' for the pushdown if 
> it is a sibling of another filter.  We still want to pushdown children of 
> transform-operator with grandchildren, etc.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
>   ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
>   ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 
> 
> Diff: https://reviews.apache.org/r/19549/diff/
> 
> 
> Testing
> ---
> 
> Reproduced both the issue in transform_ppd_multi.q, also did another similar 
> issue with an extract (cluster) operator in transform_pp2.q.  Ran other 
> transform_ppd and general ppd tests to ensure no regression.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>

[jira] [Commented] (HIVE-6701) Analyze table compute statistics for decimal columns.

2014-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943597#comment-13943597
 ] 

Sergey Shelukhin commented on HIVE-6701:


https://reviews.apache.org/r/19552

> Analyze table compute statistics for decimal columns.
> -
>
> Key: HIVE-6701
> URL: https://issues.apache.org/jira/browse/HIVE-6701
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6701.02.patch, HIVE-6701.1.patch
>
>
> Analyze table should compute statistics for decimal columns as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 19552: HIVE-6701 Analyze table compute statistics for decimal columns.

2014-03-21 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19552/
---

Review request for hive and Jitendra Pandey.


Repository: hive-git


Description
---

See JIRA


Diffs
-

  data/files/decimal.txt PRE-CREATION 
  metastore/if/hive_metastore.thrift b3f01d6 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.h d0998e0 
  metastore/src/gen/thrift/gen-cpp/hive_metastore_types.cpp 59ac959 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ColumnStatisticsData.java
 848188a 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Decimal.java
 PRE-CREATION 
  
metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/DecimalColumnStatsData.java
 PRE-CREATION 
  metastore/src/gen/thrift/gen-php/metastore/Types.php 39062f9 
  metastore/src/gen/thrift/gen-py/hive_metastore/ttypes.py 2e9f238 
  metastore/src/gen/thrift/gen-rb/hive_metastore_types.rb b768b7f 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
325aa8b 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
af54095 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
 eb23cf9 
  
metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
 c7ac9b9 
  metastore/src/model/package.jdo 158fdcd 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java 99b062f 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/DecimalNumDistinctValueEstimator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFComputeStats.java 
7348478 
  ql/src/test/queries/clientpositive/compute_stats_decimal.q PRE-CREATION 
  ql/src/test/results/clientpositive/compute_stats_decimal.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/19552/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Updated] (HIVE-6701) Analyze table compute statistics for decimal columns.

2014-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6701:
---

Status: Patch Available  (was: Open)

> Analyze table compute statistics for decimal columns.
> -
>
> Key: HIVE-6701
> URL: https://issues.apache.org/jira/browse/HIVE-6701
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6701.02.patch, HIVE-6701.1.patch
>
>
> Analyze table should compute statistics for decimal columns as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6701) Analyze table compute statistics for decimal columns.

2014-03-21 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6701:
---

Attachment: HIVE-6701.02.patch

the metastore work and the the q file.
Note that fields for decimal (varchar) already existed in the schema since 
HIVE-1362, they just weren't used. So upgrade scripts are not necessary.
Most of the patch is generated code...

> Analyze table compute statistics for decimal columns.
> -
>
> Key: HIVE-6701
> URL: https://issues.apache.org/jira/browse/HIVE-6701
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6701.02.patch, HIVE-6701.1.patch
>
>
> Analyze table should compute statistics for decimal columns as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943587#comment-13943587
 ] 

Laljo John Pullokkaran commented on HIVE-6687:
--

Review Board: https://reviews.apache.org/r/19551/

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6723) Tez golden files need to be updated

2014-03-21 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6723:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to 0.13 & trunk.

> Tez golden files need to be updated
> ---
>
> Key: HIVE-6723
> URL: https://issues.apache.org/jira/browse/HIVE-6723
> Project: Hive
>  Issue Type: Task
>  Components: Tests, Tez
>Affects Versions: 0.13.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6723.patch
>
>
> Golden files are out of date.
> NO PRECOMMIT TESTS
> since these are purely .q.out changes



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 19503: JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread John Pullokkaran



> On March 20, 2014, 10:32 p.m., Harish Butani wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java, line 9425
> > 
> >
> > Can you verify that there is no issue if table/column names have the 
> > '.' character. Sounds like jdbc treats column names as a string, so this 
> > should be ok.

1. JDBC ResultSet treats label as string.
2. I don't think the qualifier would get anything more than the table name and 
hence there should only be one ".".


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19503/#review37998
---


On March 20, 2014, 10:24 p.m., Harish Butani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19503/
> ---
> 
> (Updated March 20, 2014, 10:24 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: hive-6687
> https://issues.apache.org/jira/browse/hive-6687
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> JDBC ResultSet fails to get value by qualified projection name
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java
>  dac62d5 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> c91df83 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java e1e427f 
> 
> Diff: https://reviews.apache.org/r/19503/diff/
> 
> 
> Testing
> ---
> 
>   
> 
> 
> Thanks,
> 
> Harish Butani
> 
>

[jira] [Commented] (HIVE-1362) Optimizer statistics on columns in tables and partitions

2014-03-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943576#comment-13943576
 ] 

Sergey Shelukhin commented on HIVE-1362:


This jira adds but doesn't use decimal fields in the schema... I am going to 
"reuse" them for HIVE-6701. We probably cannot use decimal due to derby 
limitations (31 precision max, Hive is 38), so between string and binary there 
might not be a difference that matters

> Optimizer statistics on columns in tables and partitions
> 
>
> Key: HIVE-1362
> URL: https://issues.apache.org/jira/browse/HIVE-1362
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Ning Zhang
>Assignee: Shreepadma Venugopalan
> Fix For: 0.10.0
>
> Attachments: HIVE-1362-gen_thrift.1.patch.txt, 
> HIVE-1362-gen_thrift.2.patch.txt, HIVE-1362-gen_thrift.3.patch.txt, 
> HIVE-1362-gen_thrift.4.patch.txt, HIVE-1362-gen_thrift.5.patch.txt, 
> HIVE-1362-gen_thrift.6.patch.txt, HIVE-1362.1.patch.txt, 
> HIVE-1362.10.patch.txt, HIVE-1362.11.patch.txt, HIVE-1362.2.patch.txt, 
> HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, HIVE-1362.5.patch.txt, 
> HIVE-1362.6.patch.txt, HIVE-1362.7.patch.txt, HIVE-1362.8.patch.txt, 
> HIVE-1362.9.patch.txt, HIVE-1362.D6339.1.patch, 
> HIVE-1362_gen-thrift.10.patch.txt, HIVE-1362_gen-thrift.7.patch.txt, 
> HIVE-1362_gen-thrift.8.patch.txt, HIVE-1362_gen-thrift.9.patch.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943568#comment-13943568
 ] 

Lefty Leverenz commented on HIVE-6687:
--

The documentation of *hive.resultset.use.unique.column.names* looks good to me. 
 When the time comes I'll add it to the wiki.

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6692) Location for new table or partition should be a write entity

2014-03-21 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943569#comment-13943569
 ] 

Hive QA commented on HIVE-6692:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12635255/HIVE-6692.1.patch.txt

{color:red}ERROR:{color} -1 due to 60 failed/errored test(s), 5437 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_if_with_path_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_concatenate_inherit_table_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_like
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_database_removes_partition_dirs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_index_removes_partition_dirs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_drop_table_removes_partition_dirs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_00_nonpart_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_01_nonpart
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_00_part_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_02_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_03_nonpart_over_compat
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_all_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_05_some_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_06_one_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_07_all_part_over_nonoverlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_08_nonpart_rename
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_09_part_spec_nonoverlap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_10_external_managed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_11_managed_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_12_external_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_13_managed_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_14_managed_location_over_existing
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_15_external_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_16_part_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_17_part_managed
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_18_part_external
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_19_00_part_external_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_19_part_external_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_20_part_managed_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_22_import_exist_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_23_import_part_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_24_import_nonexist_authsuccess
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_hidden_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_hook_context_cs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_overwrite_local_directory_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insertexternal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_fs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_fs_overwrite
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_external_partition_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_partition_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_table_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_delimited
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_noscan_2
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_file_with_header_footer
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority

[jira] [Commented] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943552#comment-13943552
 ] 

Ashutosh Chauhan commented on HIVE-6687:


[~jpullokkaran] Can you also update RB with latest patch?

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
>  Labels: documentation
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/
---

(Updated March 21, 2014, 9:05 p.m.)


Review request for hive.


Repository: hive-git


Description (updated)
---

In this scenario, PPD on the script (transform) operator did the following 
wrong predicate pushdown:

script --> filter (state=1)
   --> select, insert into test1
   -->filter (state=2)
   --> select, insert into test2

into:

script --> filter (state=1 and state=2)   //not possible.
 --> select, insert into test1
 --> select, insert into test2


The bug was a combination of two things, first that these filters got chosen by 
FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  
"mergeWithChildrenPred + createFilters" which did the above transformation due 
to them being marked.  

ScriptPPD was one of the few simple operator that did this, I tried with some 
other parent operator like extract (see my added test in transform_ppr2.q) and 
also just a select operator and could not produce the issue with those.

The fix is to skip marking a predicate as a 'candidate' for the pushdown if it 
is a sibling of another filter.  We still want to pushdown children of 
transform-operator with grandchildren, etc.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
  ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
  ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
  ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 

Diff: https://reviews.apache.org/r/19549/diff/


Testing
---

Reproduced both the issue in transform_ppd_multi.q, also did another similar 
issue with an extract (cluster) operator in transform_pp2.q.  Ran other 
transform_ppd and general ppd tests to ensure no regression.


Thanks,

Szehon Ho

[jira] [Updated] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6395:


Affects Version/s: 0.13.0
   Status: Patch Available  (was: Open)

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/
---

(Updated March 21, 2014, 9:01 p.m.)


Review request for hive.


Changes
---

Cleaned up comments.


Repository: hive-git


Description (updated)
---

In this scenario, PPD on the script (transform) operator did the following 
wrong predicate pushdown:

script --> filter (state=1)
   --> select, insert into test1
   -->filter (state=2)
   --> select, insert into test2

into:

script --> filter (state=1 and state=2)   //not possible.
 --> select, insert into test1
 --> select, insert into test2


The bug was a combination of two things, first that these filters got chosen by 
FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  
"mergeWithChildrenPred + createFilters" which did the above transformation due 
to them being marked.  

ScriptPPD was one of the few simple operator that did this, I tried with some 
other combination like extract (see my added test in transform_ppr2.q) and also 
just a select operator and could not produce the issue with those.

The fix is to skip marking a predicate as a 'candidate' for the pushdown if it 
is a sibling of another filter.  We still want to pushdown children of select 
transform with grandchildren, etc.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
  ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
  ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
  ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 

Diff: https://reviews.apache.org/r/19549/diff/


Testing
---

Reproduced both the issue in transform_ppd_multi.q, also did another similar 
issue with an extract (cluster) operator in transform_pp2.q.  Ran other 
transform_ppd and general ppd tests to ensure no regression.


Thanks,

Szehon Ho

Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/
---

Review request for hive.


Repository: hive-git


Description
---

In this scenario, PPD on the script (transform) operator did the following 
wrong predicate pushdown:

script --> filter (state=1)
   --> select, insert into test1
   -->filter (state=2)
   --> select, insert into test2

into:

script --> filter (state=1 and state=2)   //not possible.
 --> select, insert into test1
 --> select, insert into test2


The bug was a combination of two things, first that these filters got chosen by 
FilterPPD and that the ScriptPPD called the sequence "mergeWithChildrenPred 
/createFilters (pred)" which did the above transformation.  ScriptPPD was one 
of the few simple operator that did this, I tried with some other combination 
like extract (see my added test in transform_ppr2.q) and also just a select 
operator.

The fix is to skip marking a predicate as a 'candidate' for the pushdown if it 
is a sibling of another filter.  We still want to pushdown children of select 
transform with grandchildren, etc.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
  ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
  ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
  ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 

Diff: https://reviews.apache.org/r/19549/diff/


Testing
---

Reproduced both the issue in transform_ppd_multi.q, also did another similar 
issue with an extract (cluster) operator in transform_pp2.q.  Ran other 
transform_ppd and general ppd tests to ensure no regression.


Thanks,

Szehon Ho

[jira] [Updated] (HIVE-6724) HCatStorer throws ClassCastException while storing tinyint/smallint data

2014-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6724:
-

Status: Patch Available  (was: Open)

> HCatStorer throws ClassCastException while storing tinyint/smallint data
> 
>
> Key: HIVE-6724
> URL: https://issues.apache.org/jira/browse/HIVE-6724
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-6724.patch
>
>
> given Hive tables:
> 1) create table pig_hcatalog_1 (si smallint)  STORED AS TEXTFILE;
> 2) create table all100k (si smallint, ti tinyint) STORED ;
> the following sequence of steps (assuming there is data in all100k)
> {noformat}
> a=load 'all100k' using org.apache.hive.hcatalog.pig.HCatLoader();
> b = foreach a generate si;
> store b into 'pig_hcatalog_1' using org.apache.hive.hcatalog.pig.HCatStorer();
> {noformat}
> produces 
> {noformat}
> org.apache.hadoop.mapred.YarnChild: Exception running child : 
> java.lang.ClassCastException: java.lang.Short cannot be cast to 
> java.lang.Integer
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseStorer.getJavaObj(HCatBaseStorer.java:372)
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:306)
>   at org.apache.hive.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:61)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6724) HCatStorer throws ClassCastException while storing tinyint/smallint data

2014-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6724:
-

Attachment: HIVE-6724.patch

> HCatStorer throws ClassCastException while storing tinyint/smallint data
> 
>
> Key: HIVE-6724
> URL: https://issues.apache.org/jira/browse/HIVE-6724
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-6724.patch
>
>
> given Hive tables:
> 1) create table pig_hcatalog_1 (si smallint)  STORED AS TEXTFILE;
> 2) create table all100k (si smallint, ti tinyint) STORED ;
> the following sequence of steps (assuming there is data in all100k)
> {noformat}
> a=load 'all100k' using org.apache.hive.hcatalog.pig.HCatLoader();
> b = foreach a generate si;
> store b into 'pig_hcatalog_1' using org.apache.hive.hcatalog.pig.HCatStorer();
> {noformat}
> produces 
> {noformat}
> org.apache.hadoop.mapred.YarnChild: Exception running child : 
> java.lang.ClassCastException: java.lang.Short cannot be cast to 
> java.lang.Integer
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseStorer.getJavaObj(HCatBaseStorer.java:372)
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:306)
>   at org.apache.hive.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:61)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6395) multi-table insert from select transform fails if optimize.ppd enabled

2014-03-21 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6395:


Attachment: HIVE-6395.patch

> multi-table insert from select transform fails if optimize.ppd enabled
> --
>
> Key: HIVE-6395
> URL: https://issues.apache.org/jira/browse/HIVE-6395
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6395.patch, test.py
>
>
> {noformat}
> set hive.optimize.ppd=true;
> add file ./test.py;
> from (select transform(test.*) using 'python ./test.py'
> as id,name,state from test) t0
> insert overwrite table test2 select * where state=1
> insert overwrite table test3 select * where state=2;
> {noformat}
> In the above example, the select transform returns an extra column, and that 
> column is used in where clause of the multi-insert selects.  However, if 
> optimize is on, the query plan is wrong:
> filter (state=1 and state=2) //impossible
> --> select, insert into test1
> --> select, insert into test2
> The correct query plan for hive.optimize.ppd=false is:
> filter (state=1)
> --> select, insert into test1
> filter (state=2)
> --> select, insert into test2
> For reference
> {noformat}
> create table test (id int, name string)
> create table test2(id int, name string, state int)
> create table test3(id int, name string, state int)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6708) ConstantVectorExpression should create copies of data objects rather than referencing them

2014-03-21 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943523#comment-13943523
 ] 

Jitendra Nath Pandey commented on HIVE-6708:


+1

> ConstantVectorExpression should create copies of data objects rather than 
> referencing them
> --
>
> Key: HIVE-6708
> URL: https://issues.apache.org/jira/browse/HIVE-6708
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6708-1.patch, HIVE-6708.2.patch
>
>
> 1. ConstantVectorExpression vector should be updated for bytecolumnvectors 
> and decimalColumnVectors. The current code changes the reference to the 
> vector which might be shared across multiple columns
> 2. VectorizationContext.foldConstantsForUnaryExpression(ExprNodeDesc 
> exprDesc) has a minor bug as to when to constant fold the expression.
> The following code should replace the corresponding piece of code in the 
> trunk.
> ..
> GenericUDF gudf = ((ExprNodeGenericFuncDesc) exprDesc).getGenericUDF();
> if (gudf instanceof GenericUDFOPNegative || gudf instanceof 
> GenericUDFOPPositive
> || castExpressionUdfs.contains(gudf.getClass())
> ... 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6724) HCatStorer throws ClassCastException while storing tinyint/smallint data

2014-03-21 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943524#comment-13943524
 ] 

Eugene Koifman commented on HIVE-6724:
--

The use case is Hive-Pig-Hive.
HCatLoader automatically sets hcat.data.tiny.small.int.promotion=true as that 
is required by Pig. HCatStorer did the opposite for no good reason. Since Pig 
doesn't evaluate each statement separately, the Storer action clobbered Loader 
action (the context that contains the configuration is shared). I changed 
Storer not to do that.

> HCatStorer throws ClassCastException while storing tinyint/smallint data
> 
>
> Key: HIVE-6724
> URL: https://issues.apache.org/jira/browse/HIVE-6724
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> given Hive tables:
> 1) create table pig_hcatalog_1 (si smallint)  STORED AS TEXTFILE;
> 2) create table all100k (si smallint, ti tinyint) STORED ;
> the following sequence of steps (assuming there is data in all100k)
> {noformat}
> a=load 'all100k' using org.apache.hive.hcatalog.pig.HCatLoader();
> b = foreach a generate si;
> store b into 'pig_hcatalog_1' using org.apache.hive.hcatalog.pig.HCatStorer();
> {noformat}
> produces 
> {noformat}
> org.apache.hadoop.mapred.YarnChild: Exception running child : 
> java.lang.ClassCastException: java.lang.Short cannot be cast to 
> java.lang.Integer
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseStorer.getJavaObj(HCatBaseStorer.java:372)
>   at 
> org.apache.hive.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:306)
>   at org.apache.hive.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:61)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6698) hcat.py script does not correctly load the hbase storage handler jars

2014-03-21 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943510#comment-13943510
 ] 

Sushanth Sowmyan commented on HIVE-6698:


Committed. Thanks, Deepesh!

> hcat.py script does not correctly load the hbase storage handler jars
> -
>
> Key: HIVE-6698
> URL: https://issues.apache.org/jira/browse/HIVE-6698
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-6698.patch
>
>
> Currently queries using the HBaseHCatStorageHandler when run using hcat.py 
> fail. Example query
> {code}
> create table pig_hbase_1(key string, age string, gpa string)
> STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
> TBLPROPERTIES ('hbase.columns.mapping'=':key,info:age,info:gpa');
> {code}
> Following error is seen in the hcat logs:
> {noformat}
> 2014-03-18 08:25:49,437 ERROR ql.Driver (SessionState.java:printError(541)) - 
> FAILED: SemanticException java.io.IOException: Error in loading storage 
> handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
> org.apache.hadoop.hive.ql.parse.SemanticException: java.io.IOException: Error 
> in loading storage handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.CreateTableHook.postAnalyze(CreateTableHook.java:208)
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.HCatSemanticAnalyzer.postAnalyze(HCatSemanticAnalyzer.java:242)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:295)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:949)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:997)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:875)
>   at org.apache.hive.hcatalog.cli.HCatDriver.run(HCatDriver.java:43)
>   at org.apache.hive.hcatalog.cli.HCatCli.processCmd(HCatCli.java:259)
>   at org.apache.hive.hcatalog.cli.HCatCli.processLine(HCatCli.java:213)
>   at org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.java:172)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.io.IOException: Error in loading storage 
> handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at 
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:432)
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.CreateTableHook.postAnalyze(CreateTableHook.java:199)
>   ... 16 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at java.net.URLClassLoader$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:426)
>   ... 17 more
> {noformat}
> The problem is that the hbaseStorageJar is incorrect with the merging of hcat 
> into hive. Also as per HIVE-6695 we should add the HBASE_LIB in the classpath.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6698) hcat.py script does not correctly load the hbase storage handler jars

2014-03-21 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6698:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> hcat.py script does not correctly load the hbase storage handler jars
> -
>
> Key: HIVE-6698
> URL: https://issues.apache.org/jira/browse/HIVE-6698
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-6698.patch
>
>
> Currently queries using the HBaseHCatStorageHandler when run using hcat.py 
> fail. Example query
> {code}
> create table pig_hbase_1(key string, age string, gpa string)
> STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
> TBLPROPERTIES ('hbase.columns.mapping'=':key,info:age,info:gpa');
> {code}
> Following error is seen in the hcat logs:
> {noformat}
> 2014-03-18 08:25:49,437 ERROR ql.Driver (SessionState.java:printError(541)) - 
> FAILED: SemanticException java.io.IOException: Error in loading storage 
> handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
> org.apache.hadoop.hive.ql.parse.SemanticException: java.io.IOException: Error 
> in loading storage handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.CreateTableHook.postAnalyze(CreateTableHook.java:208)
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.HCatSemanticAnalyzer.postAnalyze(HCatSemanticAnalyzer.java:242)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:295)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:949)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:997)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:885)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:875)
>   at org.apache.hive.hcatalog.cli.HCatDriver.run(HCatDriver.java:43)
>   at org.apache.hive.hcatalog.cli.HCatCli.processCmd(HCatCli.java:259)
>   at org.apache.hive.hcatalog.cli.HCatCli.processLine(HCatCli.java:213)
>   at org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.java:172)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.io.IOException: Error in loading storage 
> handler.org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at 
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:432)
>   at 
> org.apache.hive.hcatalog.cli.SemanticAnalysis.CreateTableHook.postAnalyze(CreateTableHook.java:199)
>   ... 16 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hcatalog.hbase.HBaseHCatStorageHandler
>   at java.net.URLClassLoader$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> org.apache.hive.hcatalog.common.HCatUtil.getStorageHandler(HCatUtil.java:426)
>   ... 17 more
> {noformat}
> The problem is that the hbaseStorageJar is incorrect with the merging of hcat 
> into hive. Also as per HIVE-6695 we should add the HBASE_LIB in the classpath.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6726) Hcat cli does not close SessionState

2014-03-21 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943501#comment-13943501
 ] 

Sushanth Sowmyan commented on HIVE-6726:


[~hagleitn], could I bug you to have a quick look at this patch to see if this 
is sufficient?

> Hcat cli does not close SessionState
> 
>
> Key: HIVE-6726
> URL: https://issues.apache.org/jira/browse/HIVE-6726
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6726.patch
>
>
> When running HCat E2E tests, it was observed that hcat cli left Tez sessions 
> on the RM which ultimately die upon timeout. Expected behavior is to clean 
> the Tez sessions immediately upon exit. This is causing slowness in system 
> tests as over time lot of orphan Tez sessions hang around.
> On looking through code, it seems obvious in retrospect because HCatCli 
> starts a SessionState, but does not explicitly call close on them, exiting 
> the jvm through System.exit instead. This needs to be changed to explicitly 
> call SessionState.close() before exiting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6726) Hcat cli does not close SessionState

2014-03-21 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6726:
---

Status: Patch Available  (was: Open)

> Hcat cli does not close SessionState
> 
>
> Key: HIVE-6726
> URL: https://issues.apache.org/jira/browse/HIVE-6726
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6726.patch
>
>
> When running HCat E2E tests, it was observed that hcat cli left Tez sessions 
> on the RM which ultimately die upon timeout. Expected behavior is to clean 
> the Tez sessions immediately upon exit. This is causing slowness in system 
> tests as over time lot of orphan Tez sessions hang around.
> On looking through code, it seems obvious in retrospect because HCatCli 
> starts a SessionState, but does not explicitly call close on them, exiting 
> the jvm through System.exit instead. This needs to be changed to explicitly 
> call SessionState.close() before exiting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6726) Hcat cli does not close SessionState

2014-03-21 Thread Sushanth Sowmyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6726:
---

Attachment: HIVE-6726.patch

Attaching patch.

> Hcat cli does not close SessionState
> 
>
> Key: HIVE-6726
> URL: https://issues.apache.org/jira/browse/HIVE-6726
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6726.patch
>
>
> When running HCat E2E tests, it was observed that hcat cli left Tez sessions 
> on the RM which ultimately die upon timeout. Expected behavior is to clean 
> the Tez sessions immediately upon exit. This is causing slowness in system 
> tests as over time lot of orphan Tez sessions hang around.
> On looking through code, it seems obvious in retrospect because HCatCli 
> starts a SessionState, but does not explicitly call close on them, exiting 
> the jvm through System.exit instead. This needs to be changed to explicitly 
> call SessionState.close() before exiting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-6726) Hcat cli does not close SessionState

2014-03-21 Thread Sushanth Sowmyan (JIRA)

Sushanth Sowmyan created HIVE-6726:
--

 Summary: Hcat cli does not close SessionState
 Key: HIVE-6726
 URL: https://issues.apache.org/jira/browse/HIVE-6726
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0, 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


When running HCat E2E tests, it was observed that hcat cli left Tez sessions on 
the RM which ultimately die upon timeout. Expected behavior is to clean the Tez 
sessions immediately upon exit. This is causing slowness in system tests as 
over time lot of orphan Tez sessions hang around.

On looking through code, it seems obvious in retrospect because HCatCli starts 
a SessionState, but does not explicitly call close on them, exiting the jvm 
through System.exit instead. This needs to be changed to explicitly call 
SessionState.close() before exiting.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-6687:
-

Status: Patch Available  (was: Open)

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6708) ConstantVectorExpression should create copies of data objects rather than referencing them

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6708:


Attachment: HIVE-6708.2.patch

> ConstantVectorExpression should create copies of data objects rather than 
> referencing them
> --
>
> Key: HIVE-6708
> URL: https://issues.apache.org/jira/browse/HIVE-6708
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6708-1.patch, HIVE-6708.2.patch
>
>
> 1. ConstantVectorExpression vector should be updated for bytecolumnvectors 
> and decimalColumnVectors. The current code changes the reference to the 
> vector which might be shared across multiple columns
> 2. VectorizationContext.foldConstantsForUnaryExpression(ExprNodeDesc 
> exprDesc) has a minor bug as to when to constant fold the expression.
> The following code should replace the corresponding piece of code in the 
> trunk.
> ..
> GenericUDF gudf = ((ExprNodeGenericFuncDesc) exprDesc).getGenericUDF();
> if (gudf instanceof GenericUDFOPNegative || gudf instanceof 
> GenericUDFOPPositive
> || castExpressionUdfs.contains(gudf.getClass())
> ... 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6708) ConstantVectorExpression should create copies of data objects rather than referencing them

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6708:


Attachment: (was: HIVE-6708.2.patch)

> ConstantVectorExpression should create copies of data objects rather than 
> referencing them
> --
>
> Key: HIVE-6708
> URL: https://issues.apache.org/jira/browse/HIVE-6708
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6708-1.patch
>
>
> 1. ConstantVectorExpression vector should be updated for bytecolumnvectors 
> and decimalColumnVectors. The current code changes the reference to the 
> vector which might be shared across multiple columns
> 2. VectorizationContext.foldConstantsForUnaryExpression(ExprNodeDesc 
> exprDesc) has a minor bug as to when to constant fold the expression.
> The following code should replace the corresponding piece of code in the 
> trunk.
> ..
> GenericUDF gudf = ((ExprNodeGenericFuncDesc) exprDesc).getGenericUDF();
> if (gudf instanceof GenericUDFOPNegative || gudf instanceof 
> GenericUDFOPPositive
> || castExpressionUdfs.contains(gudf.getClass())
> ... 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6687) JDBC ResultSet fails to get value by qualified projection name

2014-03-21 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-6687:
-

Attachment: HIVE-6687.3.patch

> JDBC ResultSet fails to get value by qualified projection name
> --
>
> Key: HIVE-6687
> URL: https://issues.apache.org/jira/browse/HIVE-6687
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.12.0
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 0.12.1
>
> Attachments: HIVE-6687.3.patch
>
>
> Getting value from result set using fully qualified name would throw 
> exception. Only solution today is to use position of the column as opposed to 
> column label.
> {code}
> String sql = "select r1.x, r2.x from r1 join r2 on r1.y=r2.y";
> ResultSet res = stmt.executeQuery(sql);
> res.getInt("r1.x");
> {code}
> res.getInt("r1.x"); would throw exception unknown column even though sql 
> specifies it.
> Fix is to fix resultsetschema in semantic analyzer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-6725) getTables() only returns partial table descriptions

2014-03-21 Thread Jonathan Seidman (JIRA)

Jonathan Seidman created HIVE-6725:
--

 Summary: getTables() only returns partial table descriptions 
 Key: HIVE-6725
 URL: https://issues.apache.org/jira/browse/HIVE-6725
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
Reporter: Jonathan Seidman
Priority: Minor


The ResultSet from calling DatabaseMetaData.getTables() only returns 5 columns, 
as opposed to the 10 columns called for in the JDBC spec:

TABLE_CAT String => table catalog (may be null)
TABLE_SCHEM String => table schema (may be null)
TABLE_NAME String => table name
TABLE_TYPE String => table type. Typical types are "TABLE", "VIEW", "SYSTEM 
TABLE", "GLOBAL TEMPORARY", "LOCAL TEMPORARY", "ALIAS", "SYNONYM".
REMARKS String => explanatory comment on the table
TYPE_CAT String => the types catalog (may be null)
TYPE_SCHEM String => the types schema (may be null)
TYPE_NAME String => type name (may be null)
SELF_REFERENCING_COL_NAME String => name of the designated "identifier" column 
of a typed table (may be null)
REF_GENERATION String => specifies how values in SELF_REFERENCING_COL_NAME are 
created. Values are "SYSTEM", "USER", "DERIVED". (may be null)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6657) Add test coverage for Kerberos authentication implementation using Hadoop's miniKdc

2014-03-21 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943438#comment-13943438
 ] 

Prasad Mujumdar commented on HIVE-6657:
---

[~brocknoland] The patch was rebased after the initial review due to conflicts 
on trunk. I would appreciate if you could take another quick look. Thanks!

> Add test coverage for Kerberos authentication implementation using Hadoop's 
> miniKdc
> ---
>
> Key: HIVE-6657
> URL: https://issues.apache.org/jira/browse/HIVE-6657
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication, Testing Infrastructure, Tests
>Affects Versions: 0.13.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-6657.2.patch, HIVE-6657.3.patch, HIVE-6657.4.patch, 
> HIVE-6657.4.patch, HIVE-6657.5.patch, HIVE-6657.5.patch
>
>
> Hadoop 2.3 includes miniKdc module. This provides a KDC that can be used by 
> downstream projects to implement unit tests for Kerberos authentication code.
> Hive has lot of code related to Kerberos and delegation token for 
> authentication, as well as accessing secure hadoop resources. This pretty 
> much has no coverage in the unit tests. We needs to add unit tests using 
> miniKdc module.
> Note that Hadoop 2.3 doesn't include a secure mini-cluster. Until that is 
> available, we can at least test authentication for components like 
> HiveServer2, Metastore and WebHCat.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6724) HCatStorer throws ClassCastException while storing tinyint/smallint data

2014-03-21 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-6724:
-

Description: 
given Hive tables:
1) create table pig_hcatalog_1 (si smallint)  STORED AS TEXTFILE;
2) create table all100k (si smallint, ti tinyint) STORED ;

the following sequence of steps (assuming there is data in all100k)

{noformat}
a=load 'all100k' using org.apache.hive.hcatalog.pig.HCatLoader();
b = foreach a generate si;
store b into 'pig_hcatalog_1' using org.apache.hive.hcatalog.pig.HCatStorer();
{noformat}
produces 
{noformat}
org.apache.hadoop.mapred.YarnChild: Exception running child : 
java.lang.ClassCastException: java.lang.Short cannot be cast to 
java.lang.Integer
at 
org.apache.hive.hcatalog.pig.HCatBaseStorer.getJavaObj(HCatBaseStorer.java:372)
at 
org.apache.hive.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:306)
at org.apache.hive.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:61)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
{noformat}



  was:
given Hive tables:
1) create table pig_hcatalog_1 (si smallint)  STORED AS TEXTFILE;
2) create table all100k (si smallint, ti tinyint) STORED ;

the following sequence of steps (assuming there is data in all100k)

a=load 'all100k' using org.apache.hive.hcatalog.pig.HCatLoader();
b = foreach a generate si;
store b into 'pig_hcatalog_1' using org.apache.hive.hcatalog.pig.HCatStorer();

produces 
{noformat}
org.apache.hadoop.mapred.YarnChild: Exception running child : 
java.lang.ClassCastException: java.lang.Short cannot be cast to 
java.lang.Integer
at 
org.apache.hive.hcatalog.pig.HCatBaseStorer.getJavaObj(HCatBaseStorer.java:372)
at 
org.apache.hive.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:306)
at org.apache.hive.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:61)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:635)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.

[jira] [Updated] (HIVE-6708) ConstantVectorExpression should create copies of data objects rather than referencing them

2014-03-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-6708:


Attachment: HIVE-6708.2.patch

> ConstantVectorExpression should create copies of data objects rather than 
> referencing them
> --
>
> Key: HIVE-6708
> URL: https://issues.apache.org/jira/browse/HIVE-6708
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6708-1.patch, HIVE-6708.2.patch
>
>
> 1. ConstantVectorExpression vector should be updated for bytecolumnvectors 
> and decimalColumnVectors. The current code changes the reference to the 
> vector which might be shared across multiple columns
> 2. VectorizationContext.foldConstantsForUnaryExpression(ExprNodeDesc 
> exprDesc) has a minor bug as to when to constant fold the expression.
> The following code should replace the corresponding piece of code in the 
> trunk.
> ..
> GenericUDF gudf = ((ExprNodeGenericFuncDesc) exprDesc).getGenericUDF();
> if (gudf instanceof GenericUDFOPNegative || gudf instanceof 
> GenericUDFOPPositive
> || castExpressionUdfs.contains(gudf.getClass())
> ... 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 >

1 - 100 of 160 matches

Mail list logo