[jira] [Updated] (HIVE-9658) Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter

2015-05-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9658:
--
Description: 
The ETypeConverter class passes Writable objects to the collection converters 
in order to be read later by the map/reduce functions. These objects are all 
wrapped in a unique ArrayWritable object.

We can save some memory by returning the java primitive objects instead in 
order to prevent memory allocation. The only writable object needed by 
map/reduce is ArrayWritable. If we create another writable class where to store 
primitive objects (Object), then we can stop using all primitive wirtables.

  was:
NO PRECOMMIT TESTS

The ETypeConverter class passes Writable objects to the collection converters 
in order to be read later by the map/reduce functions. These objects are all 
wrapped in a unique ArrayWritable object.

We can save some memory by returning the java primitive objects instead in 
order to prevent memory allocation. The only writable object needed by 
map/reduce is ArrayWritable. If we create another writable class where to store 
primitive objects (Object), then we can stop using all primitive wirtables.


 Reduce parquet memory usage by bypassing java primitive objects on 
 ETypeConverter
 -

 Key: HIVE-9658
 URL: https://issues.apache.org/jira/browse/HIVE-9658
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9658.1.patch, HIVE-9658.2.patch, HIVE-9658.3.patch, 
 HIVE-9658.4.patch, HIVE-9658.5.patch


 The ETypeConverter class passes Writable objects to the collection converters 
 in order to be read later by the map/reduce functions. These objects are all 
 wrapped in a unique ArrayWritable object.
 We can save some memory by returning the java primitive objects instead in 
 order to prevent memory allocation. The only writable object needed by 
 map/reduce is ArrayWritable. If we create another writable class where to 
 store primitive objects (Object), then we can stop using all primitive 
 wirtables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554784#comment-14554784
 ] 

Alexander Pivovarov commented on HIVE-10427:


I recommend to add non-primitive array sort tests to integration tests
ql/src/test/queries/clientpositive/udf_sort_array.q

And to JUnit Tests - TestGenericUDFSortArray (which does not even exist).


 collect_list() and collect_set() should accept struct types as argument
 ---

 Key: HIVE-10427
 URL: https://issues.apache.org/jira/browse/HIVE-10427
 Project: Hive
  Issue Type: Wish
  Components: UDF
Reporter: Alexander Behm
Assignee: Chao Sun
 Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, 
 HIVE-10427.3.patch


 The collect_list() and collect_set() functions currently only accept scalar 
 argument types. It would be very useful if these functions could also accept 
 struct argument types for creating nested data from flat data.
 For example, suppose I wanted to create a nested customers/orders table from 
 two flat tables, customers and orders. Then it'd be very convenient to write 
 something like this:
 {code}
 insert into table nested_customers_orders
 select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...))
 from customers c inner join orders o on (c.cid = o.oid)
 group by c.cid
 {code}
 Thanks you for your consideration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2

2015-05-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10778:
---

Assignee: Sergey Shelukhin

 LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
 -

 Key: HIVE-10778
 URL: https://issues.apache.org/jira/browse/HIVE-10778
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: llap-hs2-heap.png


 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
 !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10788) Change sort_array to support non-primitive types

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-10788:

Component/s: UDF

 Change sort_array to support non-primitive types
 

 Key: HIVE-10788
 URL: https://issues.apache.org/jira/browse/HIVE-10788
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Chao Sun
Assignee: Chao Sun

 Currently {{sort_array}} only support primitive types. As we already support 
 comparison between non-primitive types, it makes sense to remove this 
 restriction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9658) Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter

2015-05-21 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9658:
--
Attachment: HIVE-9658.6.patch

[~Ferd] Could you review the patch 6? It will be used to be committed to master.

 Reduce parquet memory usage by bypassing java primitive objects on 
 ETypeConverter
 -

 Key: HIVE-9658
 URL: https://issues.apache.org/jira/browse/HIVE-9658
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergio Peña
Assignee: Sergio Peña
 Attachments: HIVE-9658.1.patch, HIVE-9658.2.patch, HIVE-9658.3.patch, 
 HIVE-9658.4.patch, HIVE-9658.5.patch, HIVE-9658.6.patch


 The ETypeConverter class passes Writable objects to the collection converters 
 in order to be read later by the map/reduce functions. These objects are all 
 wrapped in a unique ArrayWritable object.
 We can save some memory by returning the java primitive objects instead in 
 order to prevent memory allocation. The only writable object needed by 
 map/reduce is ArrayWritable. If we create another writable class where to 
 store primitive objects (Object), then we can stop using all primitive 
 wirtables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9152:
---
Attachment: (was: HIVE-9152.9-spark.patch)

 Dynamic Partition Pruning [Spark Branch]
 

 Key: HIVE-9152
 URL: https://issues.apache.org/jira/browse/HIVE-9152
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao Sun
 Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, 
 HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, 
 HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch


 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: (was: HIVE-8769.04.patch)

 Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
 join (PK/FK pattern not detected)
 --

 Key: HIVE-8769
 URL: https://issues.apache.org/jira/browse/HIVE-8769
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
 HIVE-8769.03.patch


 TPC-DS Q82 is running slower than hive 13 because the join type is not 
 correct.
 The estimate for item x inventory x date_dim is 227 Million rows while the 
 actual is  3K rows.
 Hive 13 finishes in  753  seconds.
 Hive 14 finishes in  1,267  seconds.
 Hive 14 + force map join finished in 431 seconds.
 Query
 {code}
 select  i_item_id
,i_item_desc
,i_current_price
  from item, inventory, date_dim, store_sales
  where i_current_price between 30 and 30+30
  and inv_item_sk = i_item_sk
  and d_date_sk=inv_date_sk
  and d_date between '2002-05-30' and '2002-07-30'
  and i_manufact_id in (437,129,727,663)
  and inv_quantity_on_hand between 100 and 500
  and ss_item_sk = i_item_sk
  group by i_item_id,i_item_desc,i_current_price
  order by i_item_id
  limit 100
 {code}
 Plan 
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 7 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
 Reducer 4 - Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
 Reducer 5 - Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Reducer 5 (SIMPLE_EDGE)
   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: item
   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
 (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
 boolean)
   Statistics: Num rows: 462000 Data size: 663862160 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ((i_current_price BETWEEN 30 AND 60 and 
 (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
 boolean)
 Statistics: Num rows: 115500 Data size: 34185680 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: i_item_sk (type: int), i_item_id (type: 
 string), i_item_desc (type: string), i_current_price (type: float)
   outputColumnNames: _col0, _col1, _col2, _col3
   Statistics: Num rows: 115500 Data size: 33724832 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 115500 Data size: 33724832 
 Basic stats: COMPLETE Column stats: COMPLETE
 value expressions: _col1 (type: string), _col2 (type: 
 string), _col3 (type: float)
 Execution mode: vectorized
 Map 2 
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
 and d_date_sk is not null) (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
 and d_date_sk is not null) (type: boolean)
 Statistics: Num rows: 36524 Data size: 3579352 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: d_date_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col0 (type: int)
 outputColumnNames: _col0
 Statistics: Num rows: 36524 

[jira] [Updated] (HIVE-10786) Propagate range for column stats

2015-05-21 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10786:
--
Assignee: Pengcheng Xiong  (was: Jesus Camacho Rodriguez)

 Propagate range for column stats
 

 Key: HIVE-10786
 URL: https://issues.apache.org/jira/browse/HIVE-10786
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Pengcheng Xiong

 For column stats, Calcite doesn't propagate range. Range of a col will help 
 us in deciding filter cardinality for inequality.
 Range of values of a column and NDV together will help us to get build 
 histograms of uniform height.
 This needs special handling for each operator:
 - Inner Join where col is part of join key: range is lowest range of lhs, rhs
 - Outer Join: range of outer side if col is from outer side
 - Filter inequality on literal (x10): Range is restricted on upper side by 
 literal value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10787) MatchPath misses the last matched row from the final result set

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554796#comment-14554796
 ] 

Hive QA commented on HIVE-10787:




{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734579/HIVE-10787.1.patch

{color:green}SUCCESS:{color} +1 8967 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3988/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3988/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3988/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734579 - PreCommit-HIVE-TRUNK-Build

 MatchPath misses the last matched row from the final result set
 ---

 Key: HIVE-10787
 URL: https://issues.apache.org/jira/browse/HIVE-10787
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 1.2.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-10787.1.patch


 For example, if you have a STAR(*) pattern at the end, the current code 
 misses the last row from the final result.  For example, if I have pattern 
 like (LATE.EARLY*), the matched rows are :
 1. LATE
 2. EARLY
 In the current implementation, the final 'tpath' missed the last EARLY and 
 returns only LATE . Ideally it should return LATE and EARLY.
 The following code snippets shows the bug.
 {noformat}
 0. SymbolFunctionResult rowResult = symbolFn.match(row, pItr);
 1. while (rowResult.matches  pItr.hasNext())
 2.{
 3.  row = pItr.next();
 4.rowResult = symbolFn.match(row, pItr);
 5.  }
 6.
 7.  result.nextRow = pItr.getIndex() - 1;
 {noformat}
 Line 7 of the code always moves the row index by one. If ,in some cases, loop 
 (line 1)  is never executed (due to pItr.hasNext() being 'false'), the code 
 still moves the row pointer back by one. Although the line 0 found the first 
 match and the iterator reaches to the end.
 I'm uploading a patch which I already tested.
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2

2015-05-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10778:

Attachment: HIVE-10778.patch

Simple patch. [~gopalv] that relies on the assumption that these threads are 
one-shots and will actually exit, doesn't it? Also how come clearWorkMap 
doesn't solve the problem; should we add logging around it to see why?

[~thejas] can you take a look?
1) Is this a good way to detect HS2? I wads thinking of adding a static boolean 
set to true in startHiveServer2 when it determines the options are for start; 
but it looks like session is also always initialized in init. Would it be 
present at all times?
2) Would the compilation threads that access this map exit after every query, 
or stick around? In the latter case different fix is needed.

 LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
 -

 Key: HIVE-10778
 URL: https://issues.apache.org/jira/browse/HIVE-10778
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10778.patch, llap-hs2-heap.png


 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
 !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10722) external table creation with msck in Hive can create unusable partition

2015-05-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555094#comment-14555094
 ] 

Sergey Shelukhin commented on HIVE-10722:
-

[~sushanth] can you take a look?

 external table creation with msck in Hive can create unusable partition
 ---

 Key: HIVE-10722
 URL: https://issues.apache.org/jira/browse/HIVE-10722
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.1, 1.0.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-10722.patch


 There can be directories in HDFS containing unprintable characters; when 
 doing hadoop fs -ls, these characters are not even visible, and can only be 
 seen for example if output is piped thru od.
 When these are loaded via msck, they are stored in e.g. mysql as ? (literal 
 question mark, findable via LIKE '%?%' in db) and show accordingly in Hive.
 However, datanucleus appears to encode it as %3F; this causes the partition 
 to be unusable - it cannot be dropped, and other operations like drop table 
 get stuck (didn't investigate in detail why; drop table got unstuck as soon 
 as the partition was removed from metastore).
 We should probably have a 2-way option for such cases - error out on load 
 (default), or convert to '?'/drop such characters (and have partition that 
 actually works, too).
 We should also check if partitions with '?' inserted explicitly work at all 
 with datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10723) better logging/etc. for stuck metastore

2015-05-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554995#comment-14554995
 ] 

Sergey Shelukhin commented on HIVE-10723:
-

[~thejas] [~sushanth] [~apivovarov] can you guys +1? ;)

 better logging/etc. for stuck metastore
 ---

 Key: HIVE-10723
 URL: https://issues.apache.org/jira/browse/HIVE-10723
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10723.01.patch, HIVE-10723.02.patch, 
 HIVE-10723.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-21 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555068#comment-14555068
 ] 

Laljo John Pullokkaran commented on HIVE-6867:
--

Check for partition columns seems wrong
return (getPartCols().size() != 0);

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Pengcheng Xiong
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10729) Query failed when select complex columns from joinned table (tez map join only)

2015-05-21 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555069#comment-14555069
 ] 

Selina Zhang commented on HIVE-10729:
-

The above unit test failure seems not relevant to this patch. 

 Query failed when select complex columns from joinned table (tez map join 
 only)
 ---

 Key: HIVE-10729
 URL: https://issues.apache.org/jira/browse/HIVE-10729
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.2.0
Reporter: Selina Zhang
Assignee: Selina Zhang
 Attachments: HIVE-10729.1.patch, HIVE-10729.2.patch


 When map join happens, if projection columns include complex data types, 
 query will fail. 
 Steps to reproduce:
 {code:sql}
 hive set hive.auto.convert.join;
 hive.auto.convert.join=true
 hive desc foo;
 a arrayint
 hive select * from foo;
 [1,2]
 hive desc src_int;
 key   int
 value string
 hive select * from src_int where key=2;
 2val_2
 hive select * from foo join src_int src  on src.key = foo.a[1];
 {code}
 Query will fail with stack trace
 {noformat}
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to 
 [Ljava.lang.Object;
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector.getList(StandardListObjectInspector.java:111)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:314)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:262)
   at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:246)
   at 
 org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:50)
   at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:692)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:644)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:676)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:754)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:386)
   ... 23 more
 {noformat}
 Similar error when projection columns include a map:
 {code:sql}
 hive CREATE TABLE test (a INT, b MAPINT, STRING) STORED AS ORC;
 hive INSERT OVERWRITE TABLE test SELECT 1, MAP(1, val_1, 2, val_2) FROM 
 src LIMIT 1;
 hive select * from src join test where src.key=test.a;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7926) long-lived daemons for query fragment execution, I/O and caching

2015-05-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555079#comment-14555079
 ] 

Sergey Shelukhin commented on HIVE-7926:


There are too many child jiras here. I wonder if we should create separate 
JIRAs for some stages of completion so we could have more manageable lists..

 long-lived daemons for query fragment execution, I/O and caching
 

 Key: HIVE-7926
 URL: https://issues.apache.org/jira/browse/HIVE-7926
 Project: Hive
  Issue Type: New Feature
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: LLAPdesigndocument.pdf


 We are proposing a new execution model for Hive that is a combination of 
 existing process-based tasks and long-lived daemons running on worker nodes. 
 These nodes can take care of efficient I/O, caching and query fragment 
 execution, while heavy lifting like most joins, ordering, etc. can be handled 
 by tasks.
 The proposed model is not a 2-system solution for small and large queries; 
 neither it is a separate execution engine like MR or Tez. It can be used by 
 any Hive execution engine, if support is added; in future even external 
 products (e.g. Pig) can use it.
 The document with high-level design we are proposing will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: HIVE-10677.02.patch

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch


 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-21 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555045#comment-14555045
 ] 

Laljo John Pullokkaran commented on HIVE-9392:
--

Please remove empty space, unused import.
Add documentation about using internal names as opposed to fully qualified 
names.

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9392) JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to column names having duplicated fqColumnName

2015-05-21 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555046#comment-14555046
 ] 

Laljo John Pullokkaran commented on HIVE-9392:
--

+1

 JoinStatsRule miscalculates join cardinality as incorrect NDV is used due to 
 column names having duplicated fqColumnName
 

 Key: HIVE-9392
 URL: https://issues.apache.org/jira/browse/HIVE-9392
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
Priority: Critical
 Attachments: HIVE-9392.1.patch, HIVE-9392.2.patch, HIVE-9392.3.patch, 
 HIVE-9392.4.patch, HIVE-9392.5.patch, HIVE-9392.6.patch


 In JoinStatsRule.process the join column statistics are stored in HashMap  
 joinedColStats, the key used which is the ColStatistics.fqColName is 
 duplicated between join column in the same vertex, as a result distinctVals 
 ends up having duplicated values which negatively affects the join 
 cardinality estimation.
 The duplicate keys are usually named KEY.reducesinkkey0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10722) external table creation with msck in Hive can create unusable partition

2015-05-21 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555128#comment-14555128
 ] 

Sushanth Sowmyan commented on HIVE-10722:
-

I'm +1 on the change in general. Would it be possible to add one more test, a 
negative test for hive.msck.path.validation=throw ?

 external table creation with msck in Hive can create unusable partition
 ---

 Key: HIVE-10722
 URL: https://issues.apache.org/jira/browse/HIVE-10722
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.1, 1.0.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: HIVE-10722.patch


 There can be directories in HDFS containing unprintable characters; when 
 doing hadoop fs -ls, these characters are not even visible, and can only be 
 seen for example if output is piped thru od.
 When these are loaded via msck, they are stored in e.g. mysql as ? (literal 
 question mark, findable via LIKE '%?%' in db) and show accordingly in Hive.
 However, datanucleus appears to encode it as %3F; this causes the partition 
 to be unusable - it cannot be dropped, and other operations like drop table 
 get stuck (didn't investigate in detail why; drop table got unstuck as soon 
 as the partition was removed from metastore).
 We should probably have a 2-way option for such cases - error out on load 
 (default), or convert to '?'/drop such characters (and have partition that 
 actually works, too).
 We should also check if partitions with '?' inserted explicitly work at all 
 with datanucleus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-10677:
---
Attachment: (was: HIVE-10677.02.patch)

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10677.01.patch


 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10728) deprecate unix_timestamp(void) and make it deterministic

2015-05-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555014#comment-14555014
 ] 

Sergey Shelukhin commented on HIVE-10728:
-

[~ashutoshc] ping?

 deprecate unix_timestamp(void) and make it deterministic
 

 Key: HIVE-10728
 URL: https://issues.apache.org/jira/browse/HIVE-10728
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10728.01.patch, HIVE-10728.patch


 We have a proper current_timestamp function that is not evaluated at runtime.
 Behavior of unix_timestamp(void) is both surprising, and is preventing some 
 optimizations on the other overload since the function becomes 
 non-deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10101) LLAP: enable yourkit profiling of tasks

2015-05-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555015#comment-14555015
 ] 

Sergey Shelukhin commented on HIVE-10101:
-

[~gopalv] ping?

 LLAP: enable yourkit profiling of tasks
 ---

 Key: HIVE-10101
 URL: https://issues.apache.org/jira/browse/HIVE-10101
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10101.02.patch, HIVE-10101.03.patch, 
 HIVE-10101.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10702) COUNT(*) over windowing 'x preceding and y preceding' doesn't work properly

2015-05-21 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10702:

Attachment: HIVE-10702.patch

 COUNT(*) over windowing 'x preceding and y preceding' doesn't work properly
 ---

 Key: HIVE-10702
 URL: https://issues.apache.org/jira/browse/HIVE-10702
 Project: Hive
  Issue Type: Sub-task
  Components: PTF-Windowing
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-10702.patch


 Given the following query:
 {noformat}
 select ts, f, count(*) over (partition by ts order by f rows between 2 
 preceding and 1 preceding) from over10k limit 100;
 {noformat}
 It returns the result 
 {noformat}
 2013-03-01 09:11:58.70307   3.170
 2013-03-01 09:11:58.70307   10.89   0
 2013-03-01 09:11:58.70307   14.54   1
 2013-03-01 09:11:58.70307   14.78   1
 2013-03-01 09:11:58.70307   17.85   1
 2013-03-01 09:11:58.70307   20.61   1
 2013-03-01 09:11:58.70307   28.69   1
 2013-03-01 09:11:58.70307   29.22   1
 2013-03-01 09:11:58.70307   31.17   1
 2013-03-01 09:11:58.70307   38.35   1
 2013-03-01 09:11:58.70307   38.61   1
 {noformat}
 Mostly it should return count 2 rather than 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10658) Insert with values clause may expose data that should be encrypted

2015-05-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10658:
--
Attachment: HIVE-10658.5.patch

 Insert with values clause may expose data that should be encrypted
 --

 Key: HIVE-10658
 URL: https://issues.apache.org/jira/browse/HIVE-10658
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, 
 HIVE-10658.4.patch, HIVE-10658.5.patch


 Insert into T values() operation uses temporary table.
 the data in temp tables is stored under the hive.exec.scratchdir which is not 
 usually encrypted.  This is a similar issue to using scratchdir for staging 
 query results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10728) deprecate unix_timestamp(void) and make it deterministic

2015-05-21 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555085#comment-14555085
 ] 

Ashutosh Chauhan commented on HIVE-10728:
-

I will let [~alangates] comment on whats the policy for deprecating udfs. 
Throwing exception as you have done I see breaks backward compatibility.

 deprecate unix_timestamp(void) and make it deterministic
 

 Key: HIVE-10728
 URL: https://issues.apache.org/jira/browse/HIVE-10728
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-10728.01.patch, HIVE-10728.patch


 We have a proper current_timestamp function that is not evaluated at runtime.
 Behavior of unix_timestamp(void) is both surprising, and is preventing some 
 optimizations on the other overload since the function becomes 
 non-deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555086#comment-14555086
 ] 

Hive QA commented on HIVE-10781:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734392/HIVE-10781.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8966 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3990/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3990/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3990/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734392 - PreCommit-HIVE-TRUNK-Build

 HadoopJobExecHelper Leaks RunningJobs
 -

 Key: HIVE-10781
 URL: https://issues.apache.org/jira/browse/HIVE-10781
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2
Affects Versions: 0.13.1, 1.2.0
Reporter: Nemon Lou
Assignee: Chinna Rao Lalam
 Attachments: HIVE-10781.patch


 On one of our busy hadoop cluster, hiveServer2 holds more than 4000 
 org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less 
 than 3 backgroud handler thread at the same time.
 All these instances are hold in one LinkedList from 
 org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's  runningJobs 
 property,which is static.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2

2015-05-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555117#comment-14555117
 ] 

Sergey Shelukhin commented on HIVE-10778:
-

Ok, this solution is not going to work. 

 LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
 -

 Key: HIVE-10778
 URL: https://issues.apache.org/jira/browse/HIVE-10778
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10778.patch, llap-hs2-heap.png


 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
 !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2

2015-05-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10778:

Summary: LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2  
(was: LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2)

 LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
 

 Key: HIVE-10778
 URL: https://issues.apache.org/jira/browse/HIVE-10778
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10778.patch, llap-hs2-heap.png


 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
 !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555209#comment-14555209
 ] 

Hive QA commented on HIVE-10427:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734596/HIVE-10427.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8968 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3991/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3991/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3991/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734596 - PreCommit-HIVE-TRUNK-Build

 collect_list() and collect_set() should accept struct types as argument
 ---

 Key: HIVE-10427
 URL: https://issues.apache.org/jira/browse/HIVE-10427
 Project: Hive
  Issue Type: Wish
  Components: UDF
Reporter: Alexander Behm
Assignee: Chao Sun
 Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, 
 HIVE-10427.3.patch


 The collect_list() and collect_set() functions currently only accept scalar 
 argument types. It would be very useful if these functions could also accept 
 struct argument types for creating nested data from flat data.
 For example, suppose I wanted to create a nested customers/orders table from 
 two flat tables, customers and orders. Then it'd be very convenient to write 
 something like this:
 {code}
 insert into table nested_customers_orders
 select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...))
 from customers c inner join orders o on (c.cid = o.oid)
 group by c.cid
 {code}
 Thanks you for your consideration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8769) Physical optimizer : Incorrect CE results in a shuffle join instead of a Map join (PK/FK pattern not detected)

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-8769:
--
Attachment: HIVE-8769.04.patch

address review comments.

 Physical optimizer : Incorrect CE results in a shuffle join instead of a Map 
 join (PK/FK pattern not detected)
 --

 Key: HIVE-8769
 URL: https://issues.apache.org/jira/browse/HIVE-8769
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Attachments: HIVE-8769.01.patch, HIVE-8769.02.patch, 
 HIVE-8769.03.patch, HIVE-8769.04.patch


 TPC-DS Q82 is running slower than hive 13 because the join type is not 
 correct.
 The estimate for item x inventory x date_dim is 227 Million rows while the 
 actual is  3K rows.
 Hive 13 finishes in  753  seconds.
 Hive 14 finishes in  1,267  seconds.
 Hive 14 + force map join finished in 431 seconds.
 Query
 {code}
 select  i_item_id
,i_item_desc
,i_current_price
  from item, inventory, date_dim, store_sales
  where i_current_price between 30 and 30+30
  and inv_item_sk = i_item_sk
  and d_date_sk=inv_date_sk
  and d_date between '2002-05-30' and '2002-07-30'
  and i_manufact_id in (437,129,727,663)
  and inv_quantity_on_hand between 100 and 500
  and ss_item_sk = i_item_sk
  group by i_item_id,i_item_desc,i_current_price
  order by i_item_id
  limit 100
 {code}
 Plan 
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Tez
   Edges:
 Map 7 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
 Reducer 4 - Map 3 (SIMPLE_EDGE), Map 7 (SIMPLE_EDGE)
 Reducer 5 - Reducer 4 (SIMPLE_EDGE)
 Reducer 6 - Reducer 5 (SIMPLE_EDGE)
   DagName: mmokhtar_20141106005353_7a2eb8df-12ff-4fe9-89b4-30f1e4e3fb90:1
   Vertices:
 Map 1 
 Map Operator Tree:
 TableScan
   alias: item
   filterExpr: ((i_current_price BETWEEN 30 AND 60 and 
 (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
 boolean)
   Statistics: Num rows: 462000 Data size: 663862160 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: ((i_current_price BETWEEN 30 AND 60 and 
 (i_manufact_id) IN (437, 129, 727, 663)) and i_item_sk is not null) (type: 
 boolean)
 Statistics: Num rows: 115500 Data size: 34185680 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: i_item_sk (type: int), i_item_id (type: 
 string), i_item_desc (type: string), i_current_price (type: float)
   outputColumnNames: _col0, _col1, _col2, _col3
   Statistics: Num rows: 115500 Data size: 33724832 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 115500 Data size: 33724832 
 Basic stats: COMPLETE Column stats: COMPLETE
 value expressions: _col1 (type: string), _col2 (type: 
 string), _col3 (type: float)
 Execution mode: vectorized
 Map 2 
 Map Operator Tree:
 TableScan
   alias: date_dim
   filterExpr: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
 and d_date_sk is not null) (type: boolean)
   Statistics: Num rows: 73049 Data size: 81741831 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Filter Operator
 predicate: (d_date BETWEEN '2002-05-30' AND '2002-07-30' 
 and d_date_sk is not null) (type: boolean)
 Statistics: Num rows: 36524 Data size: 3579352 Basic 
 stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: d_date_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 36524 Data size: 146096 Basic 
 stats: COMPLETE Column stats: COMPLETE
   Select Operator
 expressions: _col0 (type: int)
 outputColumnNames: _col0
 

[jira] [Updated] (HIVE-10789) union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error

2015-05-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10789:

Attachment: HIVE-10789.01.patch

 union distinct query with NULL constant on both the sides throws Unsuported 
 vector output type: void error
 

 Key: HIVE-10789
 URL: https://issues.apache.org/jira/browse/HIVE-10789
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.1

 Attachments: HIVE-10789.01.patch


 A NULL expression in the SELECT projection list causes exception to be thrown 
 instead of not vectorizing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-21 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-6867:
--
Attachment: HIVE-6867.03.patch

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Pengcheng Xiong
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
 HIVE-6867.03.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6867) Bucketized Table feature fails in some cases

2015-05-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555283#comment-14555283
 ] 

Pengcheng Xiong commented on HIVE-6867:
---

address [~jpullokkaran]'s comments

 Bucketized Table feature fails in some cases
 

 Key: HIVE-6867
 URL: https://issues.apache.org/jira/browse/HIVE-6867
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Laljo John Pullokkaran
Assignee: Pengcheng Xiong
 Attachments: HIVE-6867.01.patch, HIVE-6867.02.patch, 
 HIVE-6867.03.patch


 Bucketized Table feature fails in some cases. if src  destination is 
 bucketed on same key, and if actual data in the src is not bucketed (because 
 data got loaded using LOAD DATA LOCAL INPATH ) then the data won't be 
 bucketed while writing to destination.
 Example
 --
 CREATE TABLE P1(key STRING, val STRING)
 CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '/Users/jp/apache-hive1/data/files/P1.txt' INTO TABLE 
 P1;
 – perform an insert to make sure there are 2 files
 INSERT OVERWRITE TABLE P1 select key, val from P1;
 --
 This is not a regression. This has never worked.
 This got only discovered due to Hadoop2 changes.
 In Hadoop1, in local mode, number of reducers will always be 1, regardless of 
 what is requested by app. Hadoop2 now honors the number of reducer setting in 
 local mode (by spawning threads).
 Long term solution seems to be to prevent load data for bucketed table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10630) Renaming tables across encryption zones renames table even though the operation throws error

2015-05-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10630:
--
Fix Version/s: 1.2.1

 Renaming tables across encryption zones renames table even though the 
 operation throws error
 

 Key: HIVE-10630
 URL: https://issues.apache.org/jira/browse/HIVE-10630
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Security
Reporter: Deepesh Khandelwal
Assignee: Eugene Koifman
 Fix For: 1.3.0, 1.2.1

 Attachments: HIVE-10630.patch


 Create a table with data in an encrypted zone 1 and then rename it to 
 encrypted zone 2.
 {noformat}
 hive alter table encdb1.testtbl rename to encdb2.testtbl;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Unable to 
 access old location 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl for 
 table encdb1.testtbl
 {noformat}
 Even though the command errors out the table is renamed. I think the right 
 behavior should be to not rename the table at all including the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-21 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555389#comment-14555389
 ] 

Matt McCline commented on HIVE-10244:
-

For 1.2.1, I would prefer to not vectorize if there is 
GroupByDesc.pruneGroupingSetId is true

 Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
 hive.vectorized.execution.reduce.enabled is enabled
 ---

 Key: HIVE-10244
 URL: https://issues.apache.org/jira/browse/HIVE-10244
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: explain_q80_vectorized_reduce_on.txt


 Query 
 {code}
 set hive.vectorized.execution.reduce.enabled=true;
 with ssr as
  (select  s_store_id as store_id,
   sum(ss_ext_sales_price) as sales,
   sum(coalesce(sr_return_amt, 0)) as returns,
   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
   from store_sales left outer join store_returns on
  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
  date_dim,
  store,
  item,
  promotion
  where ss_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date) 
   and (cast('1998-09-04' as date))
and ss_store_sk = s_store_sk
and ss_item_sk = i_item_sk
and i_current_price  50
and ss_promo_sk = p_promo_sk
and p_channel_tv = 'N'
  group by s_store_id)
  ,
  csr as
  (select  cp_catalog_page_id as catalog_page_id,
   sum(cs_ext_sales_price) as sales,
   sum(coalesce(cr_return_amount, 0)) as returns,
   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
   from catalog_sales left outer join catalog_returns on
  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
  date_dim,
  catalog_page,
  item,
  promotion
  where cs_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and cs_catalog_page_sk = cp_catalog_page_sk
and cs_item_sk = i_item_sk
and i_current_price  50
and cs_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by cp_catalog_page_id)
  ,
  wsr as
  (select  web_site_id,
   sum(ws_ext_sales_price) as sales,
   sum(coalesce(wr_return_amt, 0)) as returns,
   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
   from web_sales left outer join web_returns on
  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
  date_dim,
  web_site,
  item,
  promotion
  where ws_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and ws_web_site_sk = web_site_sk
and ws_item_sk = i_item_sk
and i_current_price  50
and ws_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by web_site_id)
   select  channel
 , id
 , sum(sales) as sales
 , sum(returns) as returns
 , sum(profit) as profit
  from 
  (select 'store channel' as channel
 , concat('store', store_id) as id
 , sales
 , returns
 , profit
  from   ssr
  union all
  select 'catalog channel' as channel
 , concat('catalog_page', catalog_page_id) as id
 , sales
 , returns
 , profit
  from  csr
  union all
  select 'web channel' as channel
 , concat('web_site', web_site_id) as id
 , sales
 , returns
 , profit
  from   wsr
  ) x
  group by channel, id with rollup
  order by channel
  ,id
  limit 100
 {code}
 Exception 
 {code}
 Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
 diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) 
 \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
   at 
 

[jira] [Updated] (HIVE-10629) Dropping table in an encrypted zone does not drop warehouse directory

2015-05-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10629:
--
Fix Version/s: 1.2.1

 Dropping table in an encrypted zone does not drop warehouse directory
 -

 Key: HIVE-10629
 URL: https://issues.apache.org/jira/browse/HIVE-10629
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Affects Versions: 1.1.0
Reporter: Deepesh Khandelwal
Assignee: Eugene Koifman
 Fix For: 1.3.0, 1.2.1

 Attachments: HIVE-10629.2.patch, HIVE-10629.3.patch, 
 HIVE-10629.4.patch, HIVE-10629.5.patch, HIVE-10629.patch


 Drop table in an encrypted zone removes the table but not its data. The 
 client sees the following on Hive CLI:
 {noformat}
 hive drop table testtbl;
 OK
 Time taken: 0.158 seconds
 {noformat}
 On the Hive Metastore log following error is thrown:
 {noformat}
 2015-05-05 08:55:27,665 ERROR [pool-6-thread-142]: hive.log 
 (MetaStoreUtils.java:logAndThrowMetaException(1200)) - Got exception: 
 java.io.IOException Failed to move to trash: 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl
 java.io.IOException: Failed to move to trash: 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl
 at 
 org.apache.hadoop.fs.TrashPolicyDefault.moveToTrash(TrashPolicyDefault.java:160)
 at org.apache.hadoop.fs.Trash.moveToTrash(Trash.java:114)
 at org.apache.hadoop.fs.Trash.moveToAppropriateTrash(Trash.java:95)
 at 
 org.apache.hadoop.hive.shims.Hadoop23Shims.moveToAppropriateTrash(Hadoop23Shims.java:270)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl.deleteDir(HiveMetaStoreFsImpl.java:47)
 at 
 org.apache.hadoop.hive.metastore.Warehouse.deleteDir(Warehouse.java:229)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.deleteTableData(HiveMetaStore.java:1584)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1552)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1705)
 at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
 at com.sun.proxy.$Proxy13.drop_table_with_environment_context(Unknown 
 Source)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:9256)
 
 {noformat}
 The client should throw the error and maybe fail the drop table call. To 
 delete the table data one currently has to use {{drop table testtbl purge}} 
 which basically remove the table data permanently skipping trash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10747) Enable the cleanup of side effect for the Encryption related qfile test

2015-05-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10747:
--
Fix Version/s: 1.2.1

 Enable the cleanup of side effect for the Encryption related qfile test
 ---

 Key: HIVE-10747
 URL: https://issues.apache.org/jira/browse/HIVE-10747
 Project: Hive
  Issue Type: Sub-task
  Components: Testing Infrastructure
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Fix For: 1.3.0, 1.2.1

 Attachments: HIVE-10747.patch


 The hive conf is not reset in the clearTestSideEffects method which is 
 involved from HIVE-8900. This will have pollute other qfile's settings 
 running by TestEncryptedHDFSCliDriver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10640) Vectorized query with NULL constant throws Unsuported vector output type: void error

2015-05-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10640:

Attachment: HIVE-10640.02.patch

 Vectorized query with NULL constant  throws Unsuported vector output type: 
 void error
 ---

 Key: HIVE-10640
 URL: https://issues.apache.org/jira/browse/HIVE-10640
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.3.0

 Attachments: HIVE-10640.01.patch, HIVE-10640.02.patch


 This query from join_nullsafe.q when vectorized throws Unsuported vector 
 output type: void during execution...
 {noformat}
 select * from myinput1 a join myinput1 b on a.key=b.value AND a.key is NULL;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-21 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-10244:

Attachment: HIVE-10244.01.patch

 Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
 hive.vectorized.execution.reduce.enabled is enabled
 ---

 Key: HIVE-10244
 URL: https://issues.apache.org/jira/browse/HIVE-10244
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: HIVE-10244.01.patch, explain_q80_vectorized_reduce_on.txt


 Query 
 {code}
 set hive.vectorized.execution.reduce.enabled=true;
 with ssr as
  (select  s_store_id as store_id,
   sum(ss_ext_sales_price) as sales,
   sum(coalesce(sr_return_amt, 0)) as returns,
   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
   from store_sales left outer join store_returns on
  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
  date_dim,
  store,
  item,
  promotion
  where ss_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date) 
   and (cast('1998-09-04' as date))
and ss_store_sk = s_store_sk
and ss_item_sk = i_item_sk
and i_current_price  50
and ss_promo_sk = p_promo_sk
and p_channel_tv = 'N'
  group by s_store_id)
  ,
  csr as
  (select  cp_catalog_page_id as catalog_page_id,
   sum(cs_ext_sales_price) as sales,
   sum(coalesce(cr_return_amount, 0)) as returns,
   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
   from catalog_sales left outer join catalog_returns on
  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
  date_dim,
  catalog_page,
  item,
  promotion
  where cs_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and cs_catalog_page_sk = cp_catalog_page_sk
and cs_item_sk = i_item_sk
and i_current_price  50
and cs_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by cp_catalog_page_id)
  ,
  wsr as
  (select  web_site_id,
   sum(ws_ext_sales_price) as sales,
   sum(coalesce(wr_return_amt, 0)) as returns,
   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
   from web_sales left outer join web_returns on
  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
  date_dim,
  web_site,
  item,
  promotion
  where ws_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and ws_web_site_sk = web_site_sk
and ws_item_sk = i_item_sk
and i_current_price  50
and ws_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by web_site_id)
   select  channel
 , id
 , sum(sales) as sales
 , sum(returns) as returns
 , sum(profit) as profit
  from 
  (select 'store channel' as channel
 , concat('store', store_id) as id
 , sales
 , returns
 , profit
  from   ssr
  union all
  select 'catalog channel' as channel
 , concat('catalog_page', catalog_page_id) as id
 , sales
 , returns
 , profit
  from  csr
  union all
  select 'web channel' as channel
 , concat('web_site', web_site_id) as id
 , sales
 , returns
 , profit
  from   wsr
  ) x
  group by channel, id with rollup
  order by channel
  ,id
  limit 100
 {code}
 Exception 
 {code}
 Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
 diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) 
 \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
 

[jira] [Updated] (HIVE-10778) LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2

2015-05-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10778:

Attachment: HIVE-10778.01.patch

alternative approach - clear the map where needed

 LLAP: Utilities::gWorkMap needs to be cleaned in HiveServer2
 

 Key: HIVE-10778
 URL: https://issues.apache.org/jira/browse/HIVE-10778
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10778.01.patch, HIVE-10778.patch, llap-hs2-heap.png


 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
 !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10630) Renaming tables across encryption zones renames table even though the operation throws error

2015-05-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555373#comment-14555373
 ] 

Eugene Koifman commented on HIVE-10630:
---

Committed to master and 1.2.1

 Renaming tables across encryption zones renames table even though the 
 operation throws error
 

 Key: HIVE-10630
 URL: https://issues.apache.org/jira/browse/HIVE-10630
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Security
Reporter: Deepesh Khandelwal
Assignee: Eugene Koifman
 Fix For: 1.3.0, 1.2.1

 Attachments: HIVE-10630.patch


 Create a table with data in an encrypted zone 1 and then rename it to 
 encrypted zone 2.
 {noformat}
 hive alter table encdb1.testtbl rename to encdb2.testtbl;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Unable to 
 access old location 
 hdfs://node-1.example.com:8020/apps/hive/warehouse/encdb1.db/testtbl for 
 table encdb1.testtbl
 {noformat}
 Even though the command errors out the table is renamed. I think the right 
 behavior should be to not rename the table at all including the metadata.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10789) union distinct query with NULL constant on both the sides throws Unsuported vector output type: void error

2015-05-21 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555405#comment-14555405
 ] 

Gunther Hagleitner commented on HIVE-10789:
---

LGTM +1 assuming tests pass.

 union distinct query with NULL constant on both the sides throws Unsuported 
 vector output type: void error
 

 Key: HIVE-10789
 URL: https://issues.apache.org/jira/browse/HIVE-10789
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Fix For: 1.2.1

 Attachments: HIVE-10789.01.patch


 A NULL expression in the SELECT projection list causes exception to be thrown 
 instead of not vectorizing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10777) LLAP: add pre-fragment and per-table cache details

2015-05-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10777:

Attachment: HIVE-10777.WIP.patch

backup patch

 LLAP: add pre-fragment and per-table cache details
 --

 Key: HIVE-10777
 URL: https://issues.apache.org/jira/browse/HIVE-10777
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10777.WIP.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555490#comment-14555490
 ] 

xiaowei wang commented on HIVE-10790:
-


 FSDataOutputStream getStream() throws IOException {
 if (rawWriter == null) {
   rawWriter = fs.create(path, false, HDFS_BUFFER_SIZE,
-fs.getDefaultReplication(), blockSize);
+fs.getDefaultReplication(path), blockSize);
   rawWriter.writeBytes(OrcFile.MAGIC);
   headerLength = rawWriter.getPos();

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang

 from a text table insert into a orc table,like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10776) Schema on insert for bucketed tables throwing NullPointerException

2015-05-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10776:
--
Attachment: HIVE-10776.patch

[~alangates], could you review please

 Schema on insert for bucketed tables throwing NullPointerException
 --

 Key: HIVE-10776
 URL: https://issues.apache.org/jira/browse/HIVE-10776
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
 Environment: Linux, Windows
Reporter: Aswathy Chellammal Sreekumar
Assignee: Eugene Koifman
 Attachments: HIVE-10776.patch


 Hive schema on insert queries, with select * , are failing with below 
 exception
 {noformat}
 2015-05-15 19:29:01,278 ERROR [main]: ql.Driver 
 (SessionState.java:printError(957)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7257)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6100)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6271)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8972)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8863)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9708)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9601)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10037)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:323)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 Steps to reproduce
 {noformat}
 set hive.support.concurrency=true;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 set hive.enforce.bucketing=true;
 drop table if exists studenttab10k;
 create table studenttab10k (age int, name varchar(50),gpa decimal(3,2));
 insert into studenttab10k values(1,'foo', 1.1), (2,'bar', 2.3),(3,'baz', 3.1);
 drop table if exists student_acid;
 create table student_acid (age int, name varchar(50),gpa decimal(3,2), grade 
 int) 
 clustered by (age) into 2 buckets
 stored as orc
 tblproperties ('transactional'='true');
 insert into student_acid(name,age,gpa) select * from studenttab10k;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10776) Schema on insert for bucketed tables throwing NullPointerException

2015-05-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10776:
--
Fix Version/s: (was: 1.2.0)

 Schema on insert for bucketed tables throwing NullPointerException
 --

 Key: HIVE-10776
 URL: https://issues.apache.org/jira/browse/HIVE-10776
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
 Environment: Linux, Windows
Reporter: Aswathy Chellammal Sreekumar
Assignee: Eugene Koifman
 Attachments: HIVE-10776.patch


 Hive schema on insert queries, with select * , are failing with below 
 exception
 {noformat}
 2015-05-15 19:29:01,278 ERROR [main]: ql.Driver 
 (SessionState.java:printError(957)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7257)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6100)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6271)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8972)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8863)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9708)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9601)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10037)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:323)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 Steps to reproduce
 {noformat}
 set hive.support.concurrency=true;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 set hive.enforce.bucketing=true;
 drop table if exists studenttab10k;
 create table studenttab10k (age int, name varchar(50),gpa decimal(3,2));
 insert into studenttab10k values(1,'foo', 1.1), (2,'bar', 2.3),(3,'baz', 3.1);
 drop table if exists student_acid;
 create table student_acid (age int, name varchar(50),gpa decimal(3,2), grade 
 int) 
 clustered by (age) into 2 buckets
 stored as orc
 tblproperties ('transactional'='true');
 insert into student_acid(name,age,gpa) select * from studenttab10k;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10776) Schema on insert for bucketed tables throwing NullPointerException

2015-05-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555345#comment-14555345
 ] 

Eugene Koifman commented on HIVE-10776:
---

[~sushanth], this is a good candidate for 1.2.1.  Support for adding 'age' like 
{{insert into student_acid(age) select * from studenttab10k;}} was added in 
1.2.  It NPEs if target table is bucketed.

 Schema on insert for bucketed tables throwing NullPointerException
 --

 Key: HIVE-10776
 URL: https://issues.apache.org/jira/browse/HIVE-10776
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
 Environment: Linux, Windows
Reporter: Aswathy Chellammal Sreekumar
Assignee: Eugene Koifman
 Attachments: HIVE-10776.patch


 Hive schema on insert queries, with select * , are failing with below 
 exception
 {noformat}
 2015-05-15 19:29:01,278 ERROR [main]: ql.Driver 
 (SessionState.java:printError(957)) - FAILED: NullPointerException null
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:7257)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBucketingSortingDest(SemanticAnalyzer.java:6100)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6271)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8972)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8863)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9708)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9601)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10037)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:323)
 at 
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10048)
 at 
 org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:207)
 at 
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:714)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
 {noformat}
 Steps to reproduce
 {noformat}
 set hive.support.concurrency=true;
 set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
 set hive.enforce.bucketing=true;
 drop table if exists studenttab10k;
 create table studenttab10k (age int, name varchar(50),gpa decimal(3,2));
 insert into studenttab10k values(1,'foo', 1.1), (2,'bar', 2.3),(3,'baz', 3.1);
 drop table if exists student_acid;
 create table student_acid (age int, name varchar(50),gpa decimal(3,2), grade 
 int) 
 clustered by (age) into 2 buckets
 stored as orc
 tblproperties ('transactional'='true');
 insert into student_acid(name,age,gpa) select * from studenttab10k;
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9658) Reduce parquet memory usage by bypassing java primitive objects on ETypeConverter

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555350#comment-14555350
 ] 

Hive QA commented on HIVE-9658:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734600/HIVE-9658.6.patch

{color:red}ERROR:{color} -1 due to 56 failed/errored test(s), 8965 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList
org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups
org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive
org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive
org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testEmptyContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testNullContainer
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testRegularList

[jira] [Commented] (HIVE-10244) Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when hive.vectorized.execution.reduce.enabled is enabled

2015-05-21 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555386#comment-14555386
 ] 

Matt McCline commented on HIVE-10244:
-

I think this is directly related to HIVE-9347.

 Vectorization : TPC-DS Q80 fails with java.lang.ClassCastException when 
 hive.vectorized.execution.reduce.enabled is enabled
 ---

 Key: HIVE-10244
 URL: https://issues.apache.org/jira/browse/HIVE-10244
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Matt McCline
 Attachments: explain_q80_vectorized_reduce_on.txt


 Query 
 {code}
 set hive.vectorized.execution.reduce.enabled=true;
 with ssr as
  (select  s_store_id as store_id,
   sum(ss_ext_sales_price) as sales,
   sum(coalesce(sr_return_amt, 0)) as returns,
   sum(ss_net_profit - coalesce(sr_net_loss, 0)) as profit
   from store_sales left outer join store_returns on
  (ss_item_sk = sr_item_sk and ss_ticket_number = sr_ticket_number),
  date_dim,
  store,
  item,
  promotion
  where ss_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date) 
   and (cast('1998-09-04' as date))
and ss_store_sk = s_store_sk
and ss_item_sk = i_item_sk
and i_current_price  50
and ss_promo_sk = p_promo_sk
and p_channel_tv = 'N'
  group by s_store_id)
  ,
  csr as
  (select  cp_catalog_page_id as catalog_page_id,
   sum(cs_ext_sales_price) as sales,
   sum(coalesce(cr_return_amount, 0)) as returns,
   sum(cs_net_profit - coalesce(cr_net_loss, 0)) as profit
   from catalog_sales left outer join catalog_returns on
  (cs_item_sk = cr_item_sk and cs_order_number = cr_order_number),
  date_dim,
  catalog_page,
  item,
  promotion
  where cs_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and cs_catalog_page_sk = cp_catalog_page_sk
and cs_item_sk = i_item_sk
and i_current_price  50
and cs_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by cp_catalog_page_id)
  ,
  wsr as
  (select  web_site_id,
   sum(ws_ext_sales_price) as sales,
   sum(coalesce(wr_return_amt, 0)) as returns,
   sum(ws_net_profit - coalesce(wr_net_loss, 0)) as profit
   from web_sales left outer join web_returns on
  (ws_item_sk = wr_item_sk and ws_order_number = wr_order_number),
  date_dim,
  web_site,
  item,
  promotion
  where ws_sold_date_sk = d_date_sk
and d_date between cast('1998-08-04' as date)
   and (cast('1998-09-04' as date))
 and ws_web_site_sk = web_site_sk
and ws_item_sk = i_item_sk
and i_current_price  50
and ws_promo_sk = p_promo_sk
and p_channel_tv = 'N'
 group by web_site_id)
   select  channel
 , id
 , sum(sales) as sales
 , sum(returns) as returns
 , sum(profit) as profit
  from 
  (select 'store channel' as channel
 , concat('store', store_id) as id
 , sales
 , returns
 , profit
  from   ssr
  union all
  select 'catalog channel' as channel
 , concat('catalog_page', catalog_page_id) as id
 , sales
 , returns
 , profit
  from  csr
  union all
  select 'web channel' as channel
 , concat('web_site', web_site_id) as id
 , sales
 , returns
 , profit
  from   wsr
  ) x
  group by channel, id with rollup
  order by channel
  ,id
  limit 100
 {code}
 Exception 
 {code}
 Vertex failed, vertexName=Reducer 5, vertexId=vertex_1426707664723_1377_1_22, 
 diagnostics=[Task failed, taskId=task_1426707664723_1377_1_22_00, 
 diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
 task:java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing vector batch (tag=0) 
 \N\N09.285817653506076E84.639990363237801E7-1.1814318134887291E8
 \N\N04.682909323885761E82.2415242712669864E7-5.966176123188091E7
 \N\N01.2847032699693155E96.300096113768728E7-5.94963316209578E8
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:330)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at 
 

[jira] [Commented] (HIVE-10746) Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by

2015-05-21 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555456#comment-14555456
 ] 

Greg Senia commented on HIVE-10746:
---

After having offline discussion with Gopal V he determined the cause of this 
problem is that starting in Hive 0.14 org.apache.hadoop.mapred.TextInputFormat  
uses whatever is defined in property: 
mapreduce.input.fileinputformat.split.minsize; In my case this was defined to 
1... Unfortunately that is 1 byte so it created 40040 splits creating 40400 
reads of the single 3MB file...

Hope this helps someone else out.

Should be around half of the HDFS block size in my case 64MB since my block 
size is 128MB..
mapreduce.input.fileinputformat.split.minsize=67108864


Gopal V if no fix is coming should we resolve/close this JIRA?

 Hive 0.14.x and Hive 1.2.0 w/ Tez 0.5.3/Tez 0.6.0 Slow group by/order by
 

 Key: HIVE-10746
 URL: https://issues.apache.org/jira/browse/HIVE-10746
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 0.14.0, 0.14.1, 1.2.0, 1.1.0, 1.1.1
Reporter: Greg Senia
Priority: Critical
 Attachments: slow_query_output.zip


 The following query: SELECT appl_user_id, arsn_cd, COUNT(*) as RecordCount 
 FROM adw.crc_arsn GROUP BY appl_user_id,arsn_cd ORDER BY appl_user_id; runs 
 consistently fast in Spark and Mapreduce on Hive 1.2.0. When attempting to 
 run this same query against Tez as the execution engine it consistently runs 
 for over 300-500 seconds this seems extremely long. This is a basic external 
 table delimited by tabs and is a single file in a folder. In Hive 0.13 this 
 query with Tez runs fast and I tested with Hive 0.14, 0.14.1/1.0.0 and now 
 Hive 1.2.0 and there clearly is something going awry with Hive w/Tez as an 
 execution engine with Single or small file tables. I can attach further logs 
 if someone needs them for deeper analysis.
 HDFS Output:
 hadoop fs -ls /example_dw/crc/arsn
 Found 2 items
 -rwxr-x---   6 loaduser hadoopusers  0 2015-05-17 20:03 
 /example_dw/crc/arsn/_SUCCESS
 -rwxr-x---   6 loaduser hadoopusers3883880 2015-05-17 20:03 
 /example_dw/crc/arsn/part-m-0
 Hive Table Describe:
 hive describe formatted crc_arsn;
 OK
 # col_name  data_type   comment 
  
 arsn_cd string  
 clmlvl_cd   string  
 arclss_cd   string  
 arclssg_cd  string  
 arsn_prcsr_rmk_ind  string  
 arsn_mbr_rspns_ind  string  
 savtyp_cd   string  
 arsn_eff_dt string  
 arsn_exp_dt string  
 arsn_pstd_dts   string  
 arsn_lstupd_dts string  
 arsn_updrsn_txt string  
 appl_user_idstring  
 arsntyp_cd  string  
 pre_d_indicator string  
 arsn_display_txtstring  
 arstat_cd   string  
 arsn_tracking_nostring  
 arsn_cstspcfc_ind   string  
 arsn_mstr_rcrd_ind  string  
 state_specific_ind  string  
 region_specific_in  string  
 arsn_dpndnt_cd  string  
 unit_adjustment_in  string  
 arsn_mbr_only_ind   string  
 arsn_qrmb_ind   string  
  
 # Detailed Table Information 
 Database:   adw  
 Owner:  loadu...@exa.example.com   
 CreateTime: Mon Apr 28 13:28:05 EDT 2014 
 LastAccessTime: UNKNOWN  
 Protect Mode:   None 
 Retention:  0
 Location:   hdfs://xhadnnm1p.example.com:8020/example_dw/crc/arsn 

 Table Type: EXTERNAL_TABLE   
 Table Parameters:
 EXTERNALTRUE
 

[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555485#comment-14555485
 ] 

xiaowei wang commented on HIVE-10790:
-

i think,orc file WriterImpl invoke a depressed method of ViewFileSystem 
,getDefaultReplication() .

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang

 from a text table insert into a orc table,like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10711:
---
Fix Version/s: 1.2.1

 Tez HashTableLoader attempts to allocate more memory than available when 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
 --

 Key: HIVE-10711
 URL: https://issues.apache.org/jira/browse/HIVE-10711
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
 HIVE-10711.3.patch


 Tez HashTableLoader bases its memory allocation on 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
 process max memory then this can result in the HashTableLoader trying to use 
 more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10711:
---
Attachment: HIVE-10711.3.patch

Updated patch based on feedback.

 Tez HashTableLoader attempts to allocate more memory than available when 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
 --

 Key: HIVE-10711
 URL: https://issues.apache.org/jira/browse/HIVE-10711
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
 HIVE-10711.3.patch


 Tez HashTableLoader bases its memory allocation on 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
 process max memory then this can result in the HashTableLoader trying to use 
 more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10022) DFS in authorization might take too long

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-10022:

Fix Version/s: (was: 1.0.1)
   1.3.0

 DFS in authorization might take too long
 

 Key: HIVE-10022
 URL: https://issues.apache.org/jira/browse/HIVE-10022
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Affects Versions: 0.14.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
 Fix For: 1.3.0

 Attachments: HIVE-10022.2.patch, HIVE-10022.patch


 I am testing a query like : 
 set hive.test.authz.sstd.hs2.mode=true;
 set 
 hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
 set 
 hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
 set hive.security.authorization.enabled=true;
 set user.name=user1;
 create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
 location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
 Now, in the above query,  since authorization is true, 
 we would end up calling doAuthorizationV2() which ultimately ends up calling 
 SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
 FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
 of the object we are trying to authorize if the object does not exist. 
 The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
 Now assume, we have a path as a/b/c/d that we are trying to authorize.
 In case, a/b/c/d does not exist, we would call 
 FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
 also does not exist.
 If under the subtree at a/b, we have millions of files, then 
 FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
 permission on each of those objects. 
 I do not completely understand why do we have to check for file permissions 
 in all the objects in  branch of the tree that we are not  trying to read 
 from /write to.  
 We could have checked file permission on the ancestor that exists and if it 
 matches what we expect, the return true.
 Please confirm if this is a bug so that I can submit a patch else let me know 
 what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10704:
---
Assignee: Mostafa Mokhtar  (was: Jason Dere)

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10704:
---
Attachment: HIVE-10704.3.patch

Rebase patch on latest. 

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9683) Hive metastore thrift client connections hang indefinitely

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9683:
---
Fix Version/s: (was: 1.0.1)
   1.0.0

 Hive metastore thrift client connections hang indefinitely
 --

 Key: HIVE-9683
 URL: https://issues.apache.org/jira/browse/HIVE-9683
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.0.0, 1.0.1
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: 1.0.0

 Attachments: HIVE-9683.1.patch


 THRIFT-2788 fixed network-partition problems that affect Thrift client 
 connections.
 Since hive-1.0 is on thrift-0.9.0 which is affected by the bug, a workaround 
 can be applied to prevent indefinite connection hangs during net-splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1425#comment-1425
 ] 

Pengcheng Xiong commented on HIVE-10677:


[~ashutoshc] and [~jpullokkaran], the test failure is unrelated and i think the 
patch is ready to go. Thanks.

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch


 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10240) Patch HIVE-9473 breaks KERBEROS

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-10240:

Fix Version/s: (was: 1.0.1)

 Patch HIVE-9473 breaks KERBEROS
 ---

 Key: HIVE-10240
 URL: https://issues.apache.org/jira/browse/HIVE-10240
 Project: Hive
  Issue Type: Bug
  Components: Authentication, HiveServer2
Affects Versions: 1.0.0
Reporter: Olaf Flebbe
Assignee: Vaibhav Gumashta

 The patch from HIVE-9473 introduces a regression. Hive-Server2 does not start 
 properly any more for our config (more or less the bigtop environment)
 sql std auth enabled, enableDoAs disabled, tez enabled, kerberos enabled.
 Problem seems to be that the kerberos ticket is not present when hive-server2 
 tries first to access HDFS. When HIVE-9473 is reverted getting the ticket is 
 one of the first things hive-server2 does.
 Posting startup of vanilla hive-1.0.0 and startup of a hive-1.0.0 with this 
 commit revoked, where hive-server2 correctly starts.
 {code}
 commit 35582c2065a6b90b003a656bdb3b0ff08b0c35b9
 Author: Thejas Nair the...@apache.org
 Date:   Fri Jan 30 00:05:50 2015 +
 HIVE-9473 : sql std auth should disallow built-in udfs that allow any 
 java methods to be called (Thejas Nair, reviewed by Jason Dere)
 
 git-svn-id: 
 https://svn.apache.org/repos/asf/hive/branches/branch-1.0@1655891 
 13f79535-47bb-0310-9956-ffa450edef68
 {code}
 revoked.
 Startup of vanilla hive-1.0.0 hive-server2 
 {code}
 STARTUP_MSG:   build = 
 git://os2-debian80/net/os2-debian80/fs1/olaf/bigtop/output/hive/hive-1.0.0 -r 
 813996292c9f966109f990127ddd5673cf813125; compiled by 'olaf' on Tue Apr 7 
 09:33:01 CEST 2015
 /
 2015-04-07 10:23:52,579 INFO  [main]: server.HiveServer2 
 (HiveServer2.java:startHiveServer2(292)) - Starting HiveServer2
 2015-04-07 10:23:53,104 INFO  [main]: metastore.HiveMetaStore 
 (HiveMetaStore.java:newRawStore(556)) - 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 2015-04-07 10:23:53,135 INFO  [main]: metastore.ObjectStore 
 (ObjectStore.java:initialize(264)) - ObjectStore, initialize called
 2015-04-07 10:23:54,775 INFO  [main]: metastore.ObjectStore 
 (ObjectStore.java:getPMF(345)) - Setting MetaStore object pin classes with 
 hive.metastore.cache.pinobjtypes=Table,StorageDescriptor,SerDeInfo,Pa
 rtition,Database,Type,FieldSchema,Order
 2015-04-07 10:23:56,953 INFO  [main]: metastore.MetaStoreDirectSql 
 (MetaStoreDirectSql.java:init(132)) - Using direct SQL, underlying DB is 
 DERBY
 2015-04-07 10:23:56,954 INFO  [main]: metastore.ObjectStore 
 (ObjectStore.java:setConf(247)) - Initialized ObjectStore
 2015-04-07 10:23:57,275 INFO  [main]: metastore.HiveMetaStore 
 (HiveMetaStore.java:createDefaultRoles_core(630)) - Added admin role in 
 metastore
 2015-04-07 10:23:57,276 INFO  [main]: metastore.HiveMetaStore 
 (HiveMetaStore.java:createDefaultRoles_core(639)) - Added public role in 
 metastore
 2015-04-07 10:23:58,241 WARN  [main]: ipc.Client (Client.java:run(675)) - 
 Exception encountered while connecting to the server : 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]
 2015-04-07 10:23:58,248 WARN  [main]: ipc.Client (Client.java:run(675)) - 
 Exception encountered while connecting to the server : 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]
 2015-04-07 10:23:58,249 INFO  [main]: retry.RetryInvocationHandler 
 (RetryInvocationHandler.java:invoke(140)) - Exception while invoking 
 getFileInfo of class ClientNamenodeProtocolTranslatorPB over 
 node2.proto.bsi.de/192.168.100.22:8020 after 1 fail over attempts. Trying to 
 fail over immediately.
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: 
 node2.proto.bsi.de/192.168.100.22; destination host is: 
 node2.proto.bsi.de:8020; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
 at org.apache.hadoop.ipc.Client.call(Client.java:1472)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at 

[jira] [Updated] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils

2015-05-21 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-10794:
-
Attachment: HIVE-10794.patch

 Remove the dependence from ErrorMsg to HiveUtils
 

 Key: HIVE-10794
 URL: https://issues.apache.org/jira/browse/HIVE-10794
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
 Attachments: HIVE-10794.patch


 HiveUtils has a large set of dependencies and ErrorMsg only needs the new 
 line constant. Breaking the dependence will reduce the dependency set from 
 ErrorMsg significantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1478#comment-1478
 ] 

Hive QA commented on HIVE-10658:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734645/HIVE-10658.5.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8969 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3995/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3995/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3995/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734645 - PreCommit-HIVE-TRUNK-Build

 Insert with values clause may expose data that should be encrypted
 --

 Key: HIVE-10658
 URL: https://issues.apache.org/jira/browse/HIVE-10658
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, 
 HIVE-10658.4.patch, HIVE-10658.5.patch


 Insert into T values() operation uses temporary table.
 the data in temp tables is stored under the hive.exec.scratchdir which is not 
 usually encrypted.  This is a similar issue to using scratchdir for staging 
 query results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10781) HadoopJobExecHelper Leaks RunningJobs

2015-05-21 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1404#comment-1404
 ] 

Nemon Lou commented on HIVE-10781:
--

I think  Utilities.clearWork(job);  should also  be put into try block in 
ExecDriver.java.

 HadoopJobExecHelper Leaks RunningJobs
 -

 Key: HIVE-10781
 URL: https://issues.apache.org/jira/browse/HIVE-10781
 Project: Hive
  Issue Type: Bug
  Components: Hive, HiveServer2
Affects Versions: 0.13.1, 1.2.0
Reporter: Nemon Lou
Assignee: Chinna Rao Lalam
 Attachments: HIVE-10781.patch


 On one of our busy hadoop cluster, hiveServer2 holds more than 4000 
 org.apache.hadoop.mapred.JobClient$NetworkedJob instances,while only has less 
 than 3 backgroud handler thread at the same time.
 All these instances are hold in one LinkedList from 
 org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper's  runningJobs 
 property,which is static.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10704:
---
Fix Version/s: 1.2.1

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10711:
---
Assignee: Mostafa Mokhtar  (was: Jason Dere)

 Tez HashTableLoader attempts to allocate more memory than available when 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
 --

 Key: HIVE-10711
 URL: https://issues.apache.org/jira/browse/HIVE-10711
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch


 Tez HashTableLoader bases its memory allocation on 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
 process max memory then this can result in the HashTableLoader trying to use 
 more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10677) hive.exec.parallel=true has problem when it is used for analyze table column stats

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1419#comment-1419
 ] 

Hive QA commented on HIVE-10677:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734637/HIVE-10677.02.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 8968 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3994/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3994/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3994/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734637 - PreCommit-HIVE-TRUNK-Build

 hive.exec.parallel=true has problem when it is used for analyze table column 
 stats
 --

 Key: HIVE-10677
 URL: https://issues.apache.org/jira/browse/HIVE-10677
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-10677.01.patch, HIVE-10677.02.patch


 To reproduce it, in q tests.
 {code}
 hive set hive.exec.parallel;
 hive.exec.parallel=true
 hive analyze table src compute statistics for columns;
 FAILED: Execution Error, return code 1 from 
 org.apache.hadoop.hive.ql.exec.ColumnStatsTask
 java.lang.RuntimeException: Error caching map.xml: java.io.IOException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:747)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:682)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setMapRedWork(Utilities.java:674)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:375)
   at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)
 Caused by: java.io.IOException: java.lang.InterruptedException
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:541)
   at org.apache.hadoop.util.Shell.run(Shell.java:455)
   at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
   at org.apache.hadoop.util.Shell.execCommand(Shell.java:774)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
   at 
 org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:472)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:460)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:426)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
   at 
 org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:715)
   ... 7 more
 hive Job Submission failed with exception 'java.lang.RuntimeException(Error 
 caching map.xml: java.io.IOException: java.lang.InterruptedException)'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9509) Restore partition spec validation removed by HIVE-9445

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9509:
---
Fix Version/s: (was: 1.1.1)
   (was: 1.0.1)

 Restore partition spec validation removed by HIVE-9445
 --

 Key: HIVE-9509
 URL: https://issues.apache.org/jira/browse/HIVE-9509
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 1.2.0

 Attachments: HIVE-9509.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9831) HiveServer2 should use ConcurrentHashMap in ThreadFactory

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9831:
---
Fix Version/s: (was: 1.1.1)
   (was: 1.0.1)

 HiveServer2 should use ConcurrentHashMap in ThreadFactory
 -

 Key: HIVE-9831
 URL: https://issues.apache.org/jira/browse/HIVE-9831
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta
 Fix For: 1.2.0

 Attachments: HIVE-9831.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9593) ORC Reader should ignore unknown metadata streams

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9593:
---
Fix Version/s: (was: 1.0.1)

 ORC Reader should ignore unknown metadata streams 
 --

 Key: HIVE-9593
 URL: https://issues.apache.org/jira/browse/HIVE-9593
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0, 0.12.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Gopal V
Assignee: Owen O'Malley
 Fix For: 1.1.0

 Attachments: HIVE-9593.no-autogen.patch, hive-9593.patch


 ORC readers should ignore metadata streams which are non-essential additions 
 to the main data streams.
 This will include additional indices, histograms or anything we add as an 
 optional stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10622) Hive doc error: 'from' is a keyword, when use it as a column name throw error.

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-10622:

Fix Version/s: (was: 1.1.1)

 Hive doc error: 'from' is a keyword, when use it as a column name throw error.
 --

 Key: HIVE-10622
 URL: https://issues.apache.org/jira/browse/HIVE-10622
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.1.1
Reporter: Anne Yu

 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML, Use 
 from as a column name in create table, throw error.
 {code}
 CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING)
   PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS 
 STORED AS ORC;
 Error: Error while compiling statement: FAILED: ParseException line 1:57 
 cannot recognize input near 'from' 'STRING' ')' in column specification 
 (state=42000,code=4)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9445) Revert HIVE-5700 - enforce single date format for partition column storage

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9445:
---
Fix Version/s: (was: 1.0.1)

 Revert HIVE-5700 - enforce single date format for partition column storage
 --

 Key: HIVE-9445
 URL: https://issues.apache.org/jira/browse/HIVE-9445
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0, 0.14.1
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Blocker
 Fix For: 1.1.0

 Attachments: HIVE-9445.1.patch, HIVE-9445.1.patch


 HIVE-5700 has the following issues:
 * HIVE-8730 - fails mysql upgrades
 * Does not upgrade all metadata, e.g. {{PARTITIONS.PART_NAME}} See comments 
 in HIVE-5700.
 * Completely corrupts postgres, see below.
 With a postgres metastore on 0.12, I executed the following:
 {noformat}
 CREATE TABLE HIVE5700_DATE_PARTED (line string) PARTITIONED BY (ddate date);
 CREATE TABLE HIVE5700_STRING_PARTED (line string) PARTITIONED BY (ddate 
 string);
 ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='NOT_DATE');
 ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150121');
 ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150122');
 ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='2015-01-23');
 ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='NOT_DATE');
 ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150121');
 ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150122');
 ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='2015-01-23');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_DATE_PARTED PARTITION (ddate='NOT_DATE');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_DATE_PARTED PARTITION (ddate='20150121');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_DATE_PARTED PARTITION (ddate='20150122');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_DATE_PARTED PARTITION (ddate='2015-01-23');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_STRING_PARTED PARTITION (ddate='NOT_DATE');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_STRING_PARTED PARTITION (ddate='20150121');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_STRING_PARTED PARTITION (ddate='20150122');
 LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE 
 HIVE5700_STRING_PARTED PARTITION (ddate='2015-01-23');
 hive show partitions HIVE5700_DATE_PARTED;  
 OK
 ddate=20150121
 ddate=20150122
 ddate=2015-01-23
 ddate=NOT_DATE
 Time taken: 0.052 seconds, Fetched: 4 row(s)
 hive show partitions HIVE5700_STRING_PARTED;
 OK
 ddate=20150121
 ddate=20150122
 ddate=2015-01-23
 ddate=NOT_DATE
 Time taken: 0.051 seconds, Fetched: 4 row(s)
 {noformat}
 I then took a dump of the database named {{postgres-pre-upgrade.sql}} and the 
 data in the dump looks good:
 {noformat}
 [root@hive5700-1-1 ~]# egrep -A9 '^COPY PARTITIONS|^COPY 
 PARTITION_KEY_VALS' postgres-pre-upgrade.sql 
 COPY PARTITIONS (PART_ID, CREATE_TIME, LAST_ACCESS_TIME, PART_NAME, 
 SD_ID, TBL_ID) FROM stdin;
 3 1421943647  0   ddate=NOT_DATE  6   2
 4 1421943647  0   ddate=20150121  7   2
 5 1421943648  0   ddate=20150122  8   2
 6 1421943664  0   ddate=NOT_DATE  9   3
 7 1421943664  0   ddate=20150121  10  3
 8 1421943665  0   ddate=20150122  11  3
 9 1421943694  0   ddate=2015-01-2312  2
 101421943695  0   ddate=2015-01-2313  3
 \.
 --
 COPY PARTITION_KEY_VALS (PART_ID, PART_KEY_VAL, INTEGER_IDX) FROM 
 stdin;
 3 NOT_DATE0
 4 201501210
 5 201501220
 6 NOT_DATE0
 7 201501210
 8 201501220
 9 2015-01-23  0
 102015-01-23  0
 \.
 {noformat}
 I then upgraded to 0.13 and subsequently upgraded the MS with the following 
 command: {{schematool -dbType postgres -upgradeSchema -verbose}}
 The file {{postgres-post-upgrade.sql}} is the post-upgrade db dump. As you 
 can see the data is completely corrupt.
 {noformat}
 [root@hive5700-1-1 ~]# egrep -A9 '^COPY PARTITIONS|^COPY 
 PARTITION_KEY_VALS' postgres-post-upgrade.sql 
 COPY PARTITIONS (PART_ID, CREATE_TIME, LAST_ACCESS_TIME, PART_NAME, 
 SD_ID, TBL_ID) FROM stdin;
 3 1421943647  0   ddate=NOT_DATE  6   2
 4 1421943647  0   ddate=20150121  7   2
 5 1421943648  0   ddate=20150122  8   2
 6 1421943664  0   ddate=NOT_DATE  9   3
 7 1421943664  0   ddate=20150121  10  3
 8 1421943665  0   ddate=20150122  11  

[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429
 ] 

Gopal V commented on HIVE-10790:


[~wisgood]: is this with Namenode HA, if so - can you put up the patch as a 
.patch?

 orc file sql excute fail 
 -

 Key: HIVE-10790
 URL: https://issues.apache.org/jira/browse/HIVE-10790
 Project: Hive
  Issue Type: Bug
  Components: API
Affects Versions: 0.13.0, 0.14.0
 Environment: Hadoop 2.5.0-cdh5.3.2 
 hive 0.14
Reporter: xiaowei wang
Assignee: xiaowei wang

 from a text table insert into a orc table,like as 
 insert overwrite table custom.rank_less_orc_none 
 partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
 where logdate='2015051500';
 will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
 while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
 getDefaultReplication on empty path is invalid
 at 
 org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
 at 
 org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
 at 
 org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
 ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1445#comment-1445
 ] 

Gopal V commented on HIVE-10792:


Are you sure this is related to ORC at all?

{code}
create temporary table test_txt (c0 int, c1 int) stored as textfile;
insert into test_txt values (0, 1);
select * from test_txt t1 union all select * from test_txt t2 where t2.c0 = 1;
{code}

returns the same.

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical

 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-21 Thread Dayue Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1444#comment-1444
 ] 

Dayue Gao commented on HIVE-10792:
--

I think in HiveInputFormat#pushProjectionsAndFilters, _pushFilters_ shouldn't 
be called if there is more than one aliases. Please correct me if I'm wrong.

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical

 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1460#comment-1460
 ] 

Gopal V commented on HIVE-10792:


And the output seems to be correct, because no row has {{c0 = 1}} ?

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical

 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10792) PPD leads to wrong answer when mapper scans the same table with multiple aliases

2015-05-21 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10792:
---
Attachment: HIVE-10792.test.sql

 PPD leads to wrong answer when mapper scans the same table with multiple 
 aliases
 

 Key: HIVE-10792
 URL: https://issues.apache.org/jira/browse/HIVE-10792
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.2.0, 1.1.0
Reporter: Dayue Gao
Assignee: Dayue Gao
Priority: Critical
 Attachments: HIVE-10792.test.sql


 Here's the steps to reproduce the bug.
 First of all, prepare a simple ORC table with one row
 {code}
 create table test_orc (c0 int, c1 int) stored as ORC;
 {code}
 Table: test_orc
 ||c0||c1||
 |0|1|
 The following SQL gets empty result which is not expected
 {code}
 select * from test_orc t1
 union all
 select * from test_orc t2
 where t2.c0 = 1
 {code}
 Self join is also broken
 {code}
 set hive.auto.convert.join=false; -- force common join
 select * from test_orc t1
 left outer join test_orc t2 on (t1.c0=t2.c0 and t2.c1=0);
 {code}
 It gets empty result while the expected answer is
 ||t1.c0||t1.c1||t2.c0||t2.c1||
 |0|1|NULL|NULL|
 In these cases, we pushdown predicates into OrcInputFormat. As a result, 
 TableScanOperator for t1 can't receive its rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10794) Remove the dependence from ErrorMsg to HiveUtils

2015-05-21 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1470#comment-1470
 ] 

Owen O'Malley commented on HIVE-10794:
--

The recursive set of dependencies for HiveUtils is 16901 and the next largest 
one is HiveConf$ConfVars at 281, so
removing HiveUtils will make it much less.

{code}
  Class org.apache.hadoop.hive.ql.ErrorMsg (16901, 1)
Forward:
  org.antlr.runtime.tree.Tree (7, 2)
  org.apache.hadoop.hive.ql.metadata.HiveUtils (16901, 2)
  org.antlr.runtime.Token (4, 2)
  org.apache.hadoop.hive.conf.HiveConf$ConfVars (281, 1)
  org.apache.hadoop.hive.ql.parse.ASTNode (10, 2)
  org.apache.hadoop.hive.ql.parse.ASTNodeOrigin (10, 2)
{code}

 Remove the dependence from ErrorMsg to HiveUtils
 

 Key: HIVE-10794
 URL: https://issues.apache.org/jira/browse/HIVE-10794
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley

 HiveUtils has a large set of dependencies and ErrorMsg only needs the new 
 line constant. Breaking the dependence will reduce the dependency set from 
 ErrorMsg significantly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10711) Tez HashTableLoader attempts to allocate more memory than available when HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem

2015-05-21 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1485#comment-1485
 ] 

Alexander Pivovarov commented on HIVE-10711:


1. should we put the following code inside if block where 
hashtableMemoryUsage is actually used
{code}
+float hashtableMemoryUsage = HiveConf.getFloatVar(
+hconf, HiveConf.ConfVars.HIVEHASHTABLEFOLLOWBYGBYMAXMEMORYUSAGE);
{code}


 Tez HashTableLoader attempts to allocate more memory than available when 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD exceeds process max mem
 --

 Key: HIVE-10711
 URL: https://issues.apache.org/jira/browse/HIVE-10711
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10711.1.patch, HIVE-10711.2.patch, 
 HIVE-10711.3.patch


 Tez HashTableLoader bases its memory allocation on 
 HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD. If this value is largeer than the 
 process max memory then this can result in the HashTableLoader trying to use 
 more memory than available to the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted

2015-05-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1497#comment-1497
 ] 

Eugene Koifman commented on HIVE-10658:
---

The 2 failed tests failed to init the metastore DB properly.  It's not related 
to the changes in this patch.  The same error in TestStreaming test cases can 
be seen in other runs.

[~spena] or [~alangates], could you review please?

 Insert with values clause may expose data that should be encrypted
 --

 Key: HIVE-10658
 URL: https://issues.apache.org/jira/browse/HIVE-10658
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, 
 HIVE-10658.4.patch, HIVE-10658.5.patch


 Insert into T values() operation uses temporary table.
 the data in temp tables is stored under the hive.exec.scratchdir which is not 
 usually encrypted.  This is a similar issue to using scratchdir for staging 
 query results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10704) Errors in Tez HashTableLoader when estimated table size is 0

2015-05-21 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1498#comment-1498
 ] 

Alexander Pivovarov commented on HIVE-10704:


Can you update RB?
{code}
$ rbt post -g yes -u
{code}

 Errors in Tez HashTableLoader when estimated table size is 0
 

 Key: HIVE-10704
 URL: https://issues.apache.org/jira/browse/HIVE-10704
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10704.1.patch, HIVE-10704.2.patch, 
 HIVE-10704.3.patch


 Couple of issues:
 - If the table sizes in MapJoinOperator.getParentDataSizes() are 0 for all 
 tables, the largest small table selection is wrong and could select the large 
 table (which results in NPE)
 - The memory estimates can either divide-by-zero, or allocate 0 memory if the 
 table size is 0. Try to come up with a sensible default for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10793) Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront

2015-05-21 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10793:
---
Attachment: HIVE-10793.1.patch

 Hybrid Hybrid Grace Hash Join : Don't allocate all hash table memory upfront
 

 Key: HIVE-10793
 URL: https://issues.apache.org/jira/browse/HIVE-10793
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.2.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 1.2.1

 Attachments: HIVE-10793.1.patch


 HybridHashTableContainer will allocate memory based on estimate, which means 
 if the actual is less than the estimate the allocated memory won't be used.
 Number of partitions is calculated based on estimated data size
 {code}
 numPartitions = calcNumPartitions(memoryThreshold, estimatedTableSize, 
 minNumParts, minWbSize,
   nwayConf);
 {code}
 Then based on number of partitions writeBufferSize is set
 {code}
 writeBufferSize = (int)(estimatedTableSize / numPartitions);
 {code}
 Each hash partition will allocate 1 WriteBuffer, with no further allocation 
 if the estimate data size is correct.
 Suggested solution is to reduce writeBufferSize by a factor such that only X% 
 of the memory is preallocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554895#comment-14554895
 ] 

Hive QA commented on HIVE-9152:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734602/HIVE-9152.9-spark.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8725 tests 
executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucket6.q-scriptfile1_win.q-quotedid_smb.q-and-1-more - did 
not produce a TEST-*.xml file
TestMinimrCliDriver-bucketizedhiveinputformat.q-empty_dir_in_table.q - did not 
produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-infer_bucket_sort_map_operators.q-load_hdfs_file_with_space_in_the_name.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-import_exported_table.q-truncate_column_buckets.q-bucket_num_reducers2.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-infer_bucket_sort_num_buckets.q-parallel_orderby.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-join1.q-infer_bucket_sort_bucketed_table.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-bucket5.q-infer_bucket_sort_merge.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-input16_cc.q-temp_table_external.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-bucket_num_reducers.q-scriptfile1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx_cbo_2.q-bucketmapjoin6.q-bucket4.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-reduce_deduplicate.q-infer_bucket_sort_dyn_part.q-udf_using.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-uber_reduce.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
 - did not produce a TEST-*.xml file
TestMinimrCliDriver-stats_counter_partitioned.q-external_table_with_space_in_location_path.q-disable_merge_for_bucketing.q-and-1-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_spark_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_spark_dynamic_partition_pruning_2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/864/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/864/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-864/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734602 - PreCommit-HIVE-SPARK-Build

 Dynamic Partition Pruning [Spark Branch]
 

 Key: HIVE-9152
 URL: https://issues.apache.org/jira/browse/HIVE-9152
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao Sun
 Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, 
 HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, 
 HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch


 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10778) LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2

2015-05-21 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554896#comment-14554896
 ] 

Thejas M Nair commented on HIVE-10778:
--

[~sershe]
1. Yes SessionState.get().isHiveServerQuery() is a good way to check if its in 
HS2.
2. Compilation threads do get re-used across queries. Something that shouldn't 
be re-used across sessions should ideally be part of objects mapped to Driver 
objects lifetime.


 LLAP: Utilities::gWorkMap needs thread-locals for HiveServer2
 -

 Key: HIVE-10778
 URL: https://issues.apache.org/jira/browse/HIVE-10778
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Affects Versions: llap
Reporter: Gopal V
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10778.patch, llap-hs2-heap.png


 95% of heap is occupied by the Utilities::gWorkMap in the llap branch HS2.
 !llap-hs2-heap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554946#comment-14554946
 ] 

Hive QA commented on HIVE-10658:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734585/HIVE-10658.4.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 8969 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_encryption_insert_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3989/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3989/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3989/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12734585 - PreCommit-HIVE-TRUNK-Build

 Insert with values clause may expose data that should be encrypted
 --

 Key: HIVE-10658
 URL: https://issues.apache.org/jira/browse/HIVE-10658
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch, 
 HIVE-10658.4.patch


 Insert into T values() operation uses temporary table.
 the data in temp tables is stored under the hive.exec.scratchdir which is not 
 usually encrypted.  This is a similar issue to using scratchdir for staging 
 query results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10427) collect_list() and collect_set() should accept struct types as argument

2015-05-21 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554767#comment-14554767
 ] 

Alexander Pivovarov commented on HIVE-10427:


Should we open separate Jira for adding NON-Primitive array sort functionality 
to sort_array?
{code}
modified:   
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSortArray.java
deleted:ql/src/test/queries/clientnegative/udf_sort_array_wrong3.q
modified:   
ql/src/test/results/clientnegative/udf_sort_array_wrong2.q.out
{code}
Can you also use common GenericUDF methods in GenericUDFSortArray if possible. 
I put one suggestion on RB.

 collect_list() and collect_set() should accept struct types as argument
 ---

 Key: HIVE-10427
 URL: https://issues.apache.org/jira/browse/HIVE-10427
 Project: Hive
  Issue Type: Wish
  Components: UDF
Reporter: Alexander Behm
Assignee: Chao Sun
 Attachments: HIVE-10427.1.patch, HIVE-10427.2.patch, 
 HIVE-10427.3.patch


 The collect_list() and collect_set() functions currently only accept scalar 
 argument types. It would be very useful if these functions could also accept 
 struct argument types for creating nested data from flat data.
 For example, suppose I wanted to create a nested customers/orders table from 
 two flat tables, customers and orders. Then it'd be very convenient to write 
 something like this:
 {code}
 insert into table nested_customers_orders
 select c.*, collect_list(named_struct(oid, o.oid, order_date: o.date...))
 from customers c inner join orders o on (c.cid = o.oid)
 group by c.cid
 {code}
 Thanks you for your consideration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10786) Propagate Histograms in Calcite/Physical Optimizer

2015-05-21 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10786:
--
Summary: Propagate Histograms in Calcite/Physical Optimizer  (was: 
Propagate range for column stats)

 Propagate Histograms in Calcite/Physical Optimizer
 --

 Key: HIVE-10786
 URL: https://issues.apache.org/jira/browse/HIVE-10786
 Project: Hive
  Issue Type: Bug
Reporter: Jesus Camacho Rodriguez
Assignee: Pengcheng Xiong

 For column stats, Calcite doesn't propagate range. Range of a col will help 
 us in deciding filter cardinality for inequality.
 Range of values of a column and NDV together will help us to get build 
 histograms of uniform height.
 This needs special handling for each operator:
 - Inner Join where col is part of join key: range is lowest range of lhs, rhs
 - Outer Join: range of outer side if col is from outer side
 - Filter inequality on literal (x10): Range is restricted on upper side by 
 literal value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10684) Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary jar files

2015-05-21 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554545#comment-14554545
 ] 

Ferdinand Xu commented on HIVE-10684:
-

Hi [~sushanth], do you have some circles reviewing this jira? Thank you!

 Fix the unit test failures for HIVE-7553 after HIVE-10674 removed the binary 
 jar files
 --

 Key: HIVE-10684
 URL: https://issues.apache.org/jira/browse/HIVE-10684
 Project: Hive
  Issue Type: Bug
  Components: Tests
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10684.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9922) Compile hive failed

2015-05-21 Thread Rudd Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554542#comment-14554542
 ] 

Rudd Chen commented on HIVE-9922:
-

I'm facing the same problem when compiling Hive 1.1.0 on MAC.

I found the jar file but cannot down it:
http://repo.spring.io/libs-release/org/pentaho/pentaho-aggdesigner-algorithm/5.1.5-jhyde/


 Compile hive failed
 ---

 Key: HIVE-9922
 URL: https://issues.apache.org/jira/browse/HIVE-9922
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 1.0.0
 Environment: red hat linux6.3
Reporter: dqpylf
 Attachments: log-hive1.0, log-hive1.1


 Hi,
 I compile hive failed,please refer to following information:
 [INFO] 
 
 [INFO] Reactor Summary:
 [INFO] 
 [INFO] Hive ... SUCCESS [ 31.673 
 s]
 [INFO] Hive Shims Common .. SUCCESS [ 20.184 
 s]
 [INFO] Hive Shims 0.20  SUCCESS [ 10.680 
 s]
 [INFO] Hive Shims Secure Common ... SUCCESS [ 14.380 
 s]
 [INFO] Hive Shims 0.20S ... SUCCESS [  5.792 
 s]
 [INFO] Hive Shims 0.23  SUCCESS [ 25.961 
 s]
 [INFO] Hive Shims . SUCCESS [  1.550 
 s]
 [INFO] Hive Common  SUCCESS [ 30.775 
 s]
 [INFO] Hive Serde . SUCCESS [01:21 
 min]
 [INFO] Hive Metastore . SUCCESS [02:39 
 min]
 [INFO] Hive Ant Utilities . SUCCESS [  4.433 
 s]
 [INFO] Hive Query Language  FAILURE [04:51 
 min]
 [INFO] Hive Service ... SKIPPED
 [INFO] Hive Accumulo Handler .. SKIPPED
 [INFO] Hive JDBC .. SKIPPED
 [INFO] Hive Beeline ... SKIPPED
 [INFO] Hive CLI ... SKIPPED
 [INFO] Hive Contrib ... SKIPPED
 [INFO] Hive HBase Handler . SKIPPED
 [INFO] Hive HCatalog .. SKIPPED
 [INFO] Hive HCatalog Core . SKIPPED
 [INFO] Hive HCatalog Pig Adapter .. SKIPPED
 [INFO] Hive HCatalog Server Extensions  SKIPPED
 [INFO] Hive HCatalog Webhcat Java Client .. SKIPPED
 [INFO] Hive HCatalog Webhcat .. SKIPPED
 [INFO] Hive HCatalog Streaming  SKIPPED
 [INFO] Hive HWI ... SKIPPED
 [INFO] Hive ODBC .. SKIPPED
 [INFO] Hive Shims Aggregator .. SKIPPED
 [INFO] Hive TestUtils . SKIPPED
 [INFO] Hive Packaging . SKIPPED
 [INFO] 
 
 [INFO] BUILD FAILURE
 [INFO] 
 
 [INFO] Total time: 11:26 min
 [INFO] Finished at: 2015-03-10T22:51:30-07:00
 [INFO] Final Memory: 72M/451M
 [INFO] 
 
 [WARNING] The requested profile disist could not be activated because it 
 does not exist.
 [ERROR] Failed to execute goal on project hive-exec: Could not resolve 
 dependencies for project org.apache.hive:hive-exec:jar:1.0.0: The following 
 artifacts could not be resolved: 
 org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.3-jhyde, 
 eigenbase:eigenbase-properties:jar:1.1.4, net.hydromatic:linq4j:jar:0.4, 
 net.hydromatic:quidem:jar:0.1.1: Could not find artifact 
 org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.3-jhyde in nexus-osc 
 (http://maven.oschina.net/content/groups/public/) - [Help 1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10749) Implement Insert ACID statement for parquet

2015-05-21 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554637#comment-14554637
 ] 

Alan Gates commented on HIVE-10749:
---

Looks good to me, other than the one question I had on streaming ingest.  You 
may also want to get a review from [~owen.omalley] since he did most of the ORC 
work for this and thus understands the file format pieces more completely than 
I do.

 Implement Insert ACID statement for parquet
 ---

 Key: HIVE-10749
 URL: https://issues.apache.org/jira/browse/HIVE-10749
 Project: Hive
  Issue Type: Sub-task
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu
 Attachments: HIVE-10749.1.patch, HIVE-10749.1.patch, HIVE-10749.patch


 We need to implement insert statement for parquet format like ORC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-05-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9716:
---
Component/s: Query Processor

 Map job fails when table's LOCATION does not have scheme
 

 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor
 Fix For: 1.2.0

 Attachments: HIVE-9716.1.patch


 When a table's location (the value of column 'LOCATION' in SDS table in 
 metastore) does not have a scheme, map job returns error. For example, 
 when do select count ( * ) from t1, get following exception:
 {noformat}
 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
 job_local2120192529_0001
 java.lang.Exception: java.lang.RuntimeException: 
 java.lang.IllegalStateException: Invalid input path 
 file:/user/hive/warehouse/t1/data
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
 Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: 
 Invalid input path file:/user/hive/warehouse/t1/data
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.IllegalStateException: Invalid input path 
 file:/user/hive/warehouse/t1/data
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
   ... 9 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9152) Dynamic Partition Pruning [Spark Branch]

2015-05-21 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-9152:
---
Attachment: (was: HIVE-9152.10-spark.patch)

 Dynamic Partition Pruning [Spark Branch]
 

 Key: HIVE-9152
 URL: https://issues.apache.org/jira/browse/HIVE-9152
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Brock Noland
Assignee: Chao Sun
 Attachments: HIVE-9152.1-spark.patch, HIVE-9152.2-spark.patch, 
 HIVE-9152.3-spark.patch, HIVE-9152.4-spark.patch, HIVE-9152.5-spark.patch, 
 HIVE-9152.6-spark.patch, HIVE-9152.8-spark.patch, HIVE-9152.9-spark.patch


 Tez implemented dynamic partition pruning in HIVE-7826. This is a nice 
 optimization and we should implement the same in HOS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10453) HS2 leaking open file descriptors when using UDFs

2015-05-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-10453:

Component/s: UDF

 HS2 leaking open file descriptors when using UDFs
 -

 Key: HIVE-10453
 URL: https://issues.apache.org/jira/browse/HIVE-10453
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.3.0

 Attachments: HIVE-10453.1.patch, HIVE-10453.2.patch


 1. create a custom function by
 CREATE FUNCTION myfunc AS 'someudfclass' using jar 'hdfs:///tmp/myudf.jar';
 2. Create a simple jdbc client, just do 
 connect, 
 run simple query which using the function such as:
 select myfunc(col1) from sometable
 3. Disconnect.
 Check open file for HiveServer2 by:
 lsof -p HSProcID | grep myudf.jar
 You will see the leak as:
 {noformat}
 java  28718 ychen  txt  REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 java  28718 ychen  330r REG1,4741 212977666 
 /private/var/folders/6p/7_njf13d6h144wldzbbsfpz8gp/T/1bfe3de0-ac63-4eba-a725-6a9840f1f8d5_resources/myudf.jar
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8872) Hive view of HBase range scan intermittently returns incorrect data.

2015-05-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-8872:
---
Component/s: HBase Handler

 Hive view of HBase range scan intermittently returns incorrect data.
 

 Key: HIVE-8872
 URL: https://issues.apache.org/jira/browse/HIVE-8872
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.13.1
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.1.0

 Attachments: HIVE-8872.1.patch, HIVE-8872.2.patch


 This need running in cluster:
 1. Create a hive external table pointing to a hbase table.
 2. Create views to the hive table(for example 30 views), each view looks like 
 following with different range check:
 CREATE VIEW hview_nn AS SELECT * FROM hivehbasetable WHERE (pk ='pk_nn_0' 
 AND pk = pk_nn_A')
 3. Create same number of hive new tables as views.
 4. then runs several queries in parallel (30 threads):
 INSERT OVERWRITE TABLE hivenewtable_nn SELECT * FROM hview_nn   //nn is from 
 01 to 30
 5 After insert, check the hivenewtables, some values are not right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted

2015-05-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554647#comment-14554647
 ] 

Eugene Koifman commented on HIVE-10658:
---

I didn't notice isPathEncrypted - will make use of it

It cannot be readonly, as you are doing an insert into a table in that zone.

getStrongestEncryptedTablePath() doesn't quite work as applies after the plan 
is resolved and I need to handle the temp table way before that happens. 

 Insert with values clause may expose data that should be encrypted
 --

 Key: HIVE-10658
 URL: https://issues.apache.org/jira/browse/HIVE-10658
 Project: Hive
  Issue Type: Sub-task
  Components: Security
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10658.2.patch, HIVE-10658.3.patch


 Insert into T values() operation uses temporary table.
 the data in temp tables is stored under the hive.exec.scratchdir which is not 
 usually encrypted.  This is a similar issue to using scratchdir for staging 
 query results



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8448) Union All might not work due to the type conversion issue

2015-05-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-8448:
---
Priority: Major  (was: Minor)

 Union All might not work due to the type conversion issue
 -

 Key: HIVE-8448
 URL: https://issues.apache.org/jira/browse/HIVE-8448
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Chaoyu Tang
Assignee: Yongzhi Chen
 Fix For: 1.1.0

 Attachments: HIVE-8448.4.patch


 create table t1 (val date);
 insert overwrite table t1 select '2014-10-10' from src limit 1;
 create table t2 (val varchar(10));
 insert overwrite table t2 select '2014-10-10' from src limit 1; 
 ==
 Query:
 select t.val from
 (select val from t1
 union all
 select val from t1
 union all
 select val from t2
 union all
 select val from t1) t;
 ==
 Will throw exception: 
 {code}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Incompatible 
 types for union operator
   at 
 org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:86)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:464)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:420)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:464)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:420)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:443)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:133)
   ... 22 more
 {code}
 It was because at this query parse step, getCommonClassForUnionAll is used, 
 but at execution getCommonClass is used. They are not used consistently in 
 union. The later one does not support the implicit conversion from date to 
 string, which is the problem cause.
 The change might be simple to fix this particular union issue but I noticed 
 that there are three versions of getCommonClass: getCommonClass, 
 getCommonClassForComparison, getCommonClassForUnionAll, and wonder if they 
 need to be cleaned and refactored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10709) Update Avro version to 1.7.7

2015-05-21 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-10709:

Fix Version/s: 1.3.0

 Update Avro version to 1.7.7
 

 Key: HIVE-10709
 URL: https://issues.apache.org/jira/browse/HIVE-10709
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Swarnim Kulkarni
Assignee: Swarnim Kulkarni
  Labels: Avro
 Fix For: 1.3.0

 Attachments: HIVE-10709.1.patch, HIVE-10709.2.patch, 
 HIVE-10709.2.patch, HIVE-10790.3.patch


 We should update the avro version to 1.7.7 to consumer some of the nicer 
 compatibility features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10771) separatorChar has no effect in CREATE TABLE AS SELECT statement

2015-05-21 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554594#comment-14554594
 ] 

Yongzhi Chen commented on HIVE-10771:
-

Thank you [~xuefuz] for reviewing it. 

 separatorChar has no effect in CREATE TABLE AS SELECT statement
 ---

 Key: HIVE-10771
 URL: https://issues.apache.org/jira/browse/HIVE-10771
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Attachments: HIVE-10771.1.patch


 To replicate:
 CREATE TABLE separator_test 
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
 WITH SERDEPROPERTIES (separatorChar = |,quoteChar=\,escapeChar=
 ) 
 STORED AS TEXTFILE
 AS
 SELECT * FROM sample_07;
 Then hadoop fs -cat /user/hive/warehouse/separator_test/*
 53-3032,Truck drivers, heavy and tractor-trailer,1693590,37560
 53-3033,Truck drivers, light or delivery services,922900,28820
 53-3041,Taxi drivers and chauffeurs,165590,22740
 The separator is till ,, not | as specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10784) Beeline requires new line (EOL) at the end of an Hive SQL script (NullPointerException)

2015-05-21 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554596#comment-14554596
 ] 

Chaoyu Tang commented on HIVE-10784:


I wonder the code changes both from HIVE-9877 and HIVE-10541 might have already 
addressed your observed issue.

 Beeline requires new line (EOL) at the end of an Hive SQL script 
 (NullPointerException)
 ---

 Key: HIVE-10784
 URL: https://issues.apache.org/jira/browse/HIVE-10784
 Project: Hive
  Issue Type: Bug
  Components: Beeline, CLI
Affects Versions: 0.13.1
 Environment: Linux 2.6.32 (Red Hat 4.4.7)
Reporter: Andrey Dmitriev
Assignee: Chinna Rao Lalam
Priority: Minor
 Attachments: HIVE-10784.patch


 Beeline tool requires to have new line at the end of a Hive/Impala SQL 
 script otherwise the last statement will be not executed or 
 NullPointerException will be thrown.
 # If a statement ends without end of line AND semicolon is on the same line 
 then the statement will be ignored; i.e.
 {code}select * from TABLE;EOF{code} will be *not* executed
 # If a statement ends without end of line BUT semicolon is on the next line 
 then the statement will be executed, but 
 {color:red};java.lang.NullPointerException{color} will be thrown; i.e.
 {code}select * from TABLE
 ;EOF{code} will be executed, but print 
 {color:red};java.lang.NullPointerException{color}
 # If a statement ends with end of line regardless where semicolon is then the 
 statement will be executed; i.e.
 {code}select * from TABLE;
 EOLEOF{code}
 or
 {code}select * from TABLE
 ;EOLEOF{code}
 will be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10658) Insert with values clause may expose data that should be encrypted

2015-05-21 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554603#comment-14554603
 ] 

Hive QA commented on HIVE-10658:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12734470/HIVE-10658.3.patch

{color:red}ERROR:{color} -1 due to 69 failed/errored test(s), 5434 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestCompareCliDriver.initializationError
org.apache.hadoop.hive.cli.TestContribCliDriver.initializationError
org.apache.hadoop.hive.cli.TestContribNegativeCliDriver.initializationError
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_dynamic
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver_encryption_insert_partition_static
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_join
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp_format
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeCliDriver.initializationError
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.initializationError
org.apache.hadoop.hive.cli.TestSparkCliDriver.initializationError
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_ambiguous_join_col
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_duplicate_alias
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_garbage
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_insert_wrong_number_columns
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_create_table
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_dot
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_function_param2
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_index
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_select
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_macro_reserved_word
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_missing_overwrite
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_nonkey_groupby
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_quoted_string
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column1

[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-05-21 Thread Elliot West (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554625#comment-14554625
 ] 

Elliot West commented on HIVE-10165:


h3. Current status
I've had to change tack with this recently as I found that what I had built 
upon the existing API was not actually suited to the ETL merge use cases. 
Consider that the existing API is focused on the task of continuously writing 
small batches of new data and making that data available in Hive rapidly. 
Conversely, my use case is focused on infrequently writing large batches of 
changes that should only be available in Hive as a single batch or not at all. 
I've tried to summarise the differences:

h3. Use case comparison
||Attribute||Streaming case (current API)||Merge case (proposed API)||
|Ingest type|Data arrives continuously|Merges are performed periodically and 
the deltas are applied in a single batch|
|Transaction scope|Transactions are created for small batches of writes|The 
entire delta should be applied within a single transaction|
|Data availability|Surfaces new data to users frequently and quickly|Change 
sets should be applied atomically, either the effect of the delta is visible or 
it is not.|
|Sensitive to record order|No, records do not have pre-existing {{lastTxnIds}} 
or {{bucketIds}}. Records are likely being written into a single partition 
(today's date for example)|Yes, all mutated records have existing 
{{RecordIdentifiers}} and must be grouped by ({{partitionValues}}, 
{{bucketId}}) and sorted by {{lastTxnId}}. These record coordinates initially 
arrive in an effectively random order.|
|Impact of a write failure|Transaction can be aborted and producer can choose 
to resubmit failed records as ordering is not important.|Ingest for the 
respective must be halted and failed records resubmitted to preserve sequence.|
|User perception of missing data|Data has not arrived yet → latency?|This 
data is inconsistent, some records have been updated, but other related records 
have not - consider here the classic transfer between bank accounts scenario|
|API end point scope|A given {{HiveEndPoint}} instance submits many 
transactions to a specific bucket, in a specific partition, of a specific 
table|An API is required that writes changes to unknown set of buckets, of an 
unknown set of partitions, of a specific table (but perhaps more than one), 
within a single transaction. |

I think this table highlights two key points:
# A merge is not that useful if it cannot be atomic (i.e. the entire delta is 
applied in a single transaction).
# The current streaming API is based on the premise that {{partitionValues}} 
and {{bucketIds}} are known before ingestion and so the whole stack can be 
constructed with these as constants. Transactions are a small scale concern 
(small batches of writes) and therefore are not available to coordinate larger 
sets of operations across partitions and buckets.

h3. Proposal
In summary, I do not believe that the current API can or should be bent to 
handle the merge case as I think it is a different animal. Instead I propose an 
alternate API where the transaction is the highest-level construct. It presents 
two core collaborators: a client ({{MutationClient}}) that manages a long 
running transaction, and workers ({{MutationCoordinators}}) that coordinate 
updates within the transaction via managed {{OrcRecordUpdater}} instances. The 
mutation workload can be scaled horizontally by partitioning records by 
({{partitionValues}}, {{bucketId}}) across a number of workers:
{panel}
{code}
// CLIENT/TOOL END
//
// Create a client to manage our transaction - singleton instance in the 
job client
MutatorClient client = // a thing that knows how to get a transaction and 
manage a Hive lock

// Get the transaction
Transaction transaction = client.newTransaction();
transaction.begin();

// CLUSTER / WORKER END
//
// A job submitted to the cluster
// The Jjob partitions the data by (partitionValues, ROW__ID.bucketId)
// and orders the groups by (ROW__ID.lastTransactionId)

// One of these sits at the output of each or the job's tasks
MutatorCoordinator coordinator = // a thing that knows how to read 
bucketIds, write records, and create OrcRecordUpdaters

coordinator.insert(partitionValues1, record1);
coordinator.update(partitionValues2, record2);
coordinator.delete(partitionValues3, record3);
// millions of operations

coordinator.close();

// CLIENT/TOOL END
//
// The tasks have completed, control is back at the tool

transaction.commit();

client.close();
{code}
{panel}
h3. Relation to the current streaming API
I believe that there is some potential for reuse by factoring out common 
implementation code blocks into independent classes. I also believe this would 
improve the current API implementation by 

[jira] [Updated] (HIVE-10098) HS2 local task for map join fails in KMS encrypted cluster

2015-05-21 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-10098:

Component/s: Security

 HS2 local task for map join fails in KMS encrypted cluster
 --

 Key: HIVE-10098
 URL: https://issues.apache.org/jira/browse/HIVE-10098
 Project: Hive
  Issue Type: Bug
  Components: Security
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
 Fix For: 1.2.0

 Attachments: HIVE-10098.1.patch, HIVE-10098.2.patch


 Env: KMS was enabled after cluster was kerberos secured. 
 Problem: PROBLEM: Any Hive query via beeline that performs a MapJoin fails 
 with a java.lang.reflect.UndeclaredThrowableException  from 
 KMSClientProvider.addDelegationTokens.
 {code}
 2015-03-18 08:49:17,948 INFO [main]: Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1022)) - mapred.input.dir is 
 deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
 2015-03-18 08:49:19,048 WARN [main]: security.UserGroupInformation 
 (UserGroupInformation.java:doAs(1645)) - PriviledgedActionException as:hive 
 (auth:KERBEROS) 
 cause:org.apache.hadoop.security.authentication.client.AuthenticationException:
  GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt) 
 2015-03-18 08:49:19,050 ERROR [main]: mr.MapredLocalTask 
 (MapredLocalTask.java:executeFromChildJVM(314)) - Hive Runtime Error: Map 
 local work failed 
 java.io.IOException: java.io.IOException: 
 java.lang.reflect.UndeclaredThrowableException 
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:634)
  
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363)
  
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:337)
  
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:303)
 at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:735) 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
 at java.lang.reflect.Method.invoke(Method.java:606) 
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
 Caused by: java.io.IOException: 
 java.lang.reflect.UndeclaredThrowableException 
 at 
 org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:826)
  
 at 
 org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
  
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2017)
  
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
  
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
  
 at 
 org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
  
 at 
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) 
 at 
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:413)
  
 at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:559)
  
 ... 9 more 
 Caused by: java.lang.reflect.UndeclaredThrowableException 
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1655)
  
 at 
 org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:808)
  
 ... 18 more 
 Caused by: 
 org.apache.hadoop.security.authentication.client.AuthenticationException: 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt) 
 at 
 org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:306)
  
 at 
 org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:196)
  
 at 
 org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:127)
 {code}
 To make sure map join happen, test need a small table join with a large one, 
 for example:
 {code}
 CREATE TABLE if not exists jsmall (code string, des string, t int, s int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t';
 CREATE TABLE if not exists jbig1 (code string, des string, t int, s int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t';
 load data local inpath '/tmp/jdata' into table jsmall;
 load data local inpath '/tmp/jdata' into table 

  1   2   >