[jira] [Updated] (HIVE-10994) Hive.moveFile should not fail on a no-op move

2015-06-15 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-10994:

Fix Version/s: 2.0.0

> Hive.moveFile should not fail on a no-op move
> -
>
> Key: HIVE-10994
> URL: https://issues.apache.org/jira/browse/HIVE-10994
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.2.1, 2.0.0
>
> Attachments: HIVE-10994.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-11012) LLAP: fix some tests in the branch and revert incorrectly committed changed out files (from HIVE-11014)

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-11012.
-
   Resolution: Fixed
Fix Version/s: llap

committed to branch

> LLAP: fix some tests in the branch and revert incorrectly committed changed 
> out files (from HIVE-11014)
> ---
>
> Key: HIVE-11012
> URL: https://issues.apache.org/jira/browse/HIVE-11012
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> I am assigning some new issues to people and fixing whatever random issues 
> from HIVE-10997. So far fixed all the TestLocationQueries/MtQueries, 
> list_bucket* Kryo exception, and some tez NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11018) Turn on cbo in more q files

2015-06-15 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11018:

Attachment: HIVE-11018.patch

No code changes. Only test changes.

> Turn on cbo in more q files
> ---
>
> Key: HIVE-11018
> URL: https://issues.apache.org/jira/browse/HIVE-11018
> Project: Hive
>  Issue Type: Task
>  Components: Tests
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-11018.patch
>
>
> There are few tests in which cbo was turned off for various reasons. Those 
> reasons don't exists anymore. For those tests, we should turn on cbo. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11014) LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing tests have result changes compared to master

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587254#comment-14587254
 ] 

Sergey Shelukhin commented on HIVE-11014:
-

Feel free to create separate jiras if changes are for different reasons

> LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
> vector_outer_join2 and cbo_windowing tests have result changes compared to 
> master
> ---
>
> Key: HIVE-11014
> URL: https://issues.apache.org/jira/browse/HIVE-11014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Matt McCline
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11012) LLAP: fix some tests in the branch and revert incorrectly committed changed out files (from HIVE-11014)

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11012:

Summary: LLAP: fix some tests in the branch and revert incorrectly 
committed changed out files (from HIVE-11014)  (was: LLAP: fix some tests in 
the branch)

> LLAP: fix some tests in the branch and revert incorrectly committed changed 
> out files (from HIVE-11014)
> ---
>
> Key: HIVE-11012
> URL: https://issues.apache.org/jira/browse/HIVE-11012
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> I am assigning some new issues to people and fixing whatever random issues 
> from HIVE-10997. So far fixed all the TestLocationQueries/MtQueries, 
> list_bucket* Kryo exception, and some tez NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11014) LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, vector_outer_join2 and cbo_windowing tests have result changes compared to master

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11014:

Summary: LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
vector_outer_join2 and cbo_windowing tests have result changes compared to 
master  (was: LLAP: MiniTez vector_binary_join_groupby test has result changes 
compared to master)

> LLAP: MiniTez vector_binary_join_groupby, vector_outer_join1, 
> vector_outer_join2 and cbo_windowing tests have result changes compared to 
> master
> ---
>
> Key: HIVE-11014
> URL: https://issues.apache.org/jira/browse/HIVE-11014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Matt McCline
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive

2015-06-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587229#comment-14587229
 ] 

Eugene Koifman commented on HIVE-11008:
---

there are 2 places where StatusDelegator.run() is called Server.showJobList() 
and Server.showJobId().
Don't we need the same logic in both places?

Can the setting of the 2 properties be moved into StatusDelegator.run() just 
before
ShimLoader.getHadoopShims().getWebHCatShim()?

> webhcat GET /jobs retries on getting job details from history server is too 
> agressive
> -
>
> Key: HIVE-11008
> URL: https://issues.apache.org/jira/browse/HIVE-11008
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.2.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-11008.1.patch
>
>
> Webhcat "jobs" api gets the list of jobs from RM and then gets details from 
> history server.
> RM has a policy of retaining fixed number of jobs to accommodate for the 
> memory it has, while HistoryServer retains jobs based on their age. As a 
> result, jobs that RM returns might not be present in HistoryServer and can 
> result in a failure. HistoryServer also ends up retrying on failures even if 
> they happen because the job actually does not exist. 
> The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode

2015-06-15 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587209#comment-14587209
 ] 

Jason Dere commented on HIVE-9248:
--

+1

> Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is 
> Hash mode
> ---
>
> Key: HIVE-9248
> URL: https://issues.apache.org/jira/browse/HIVE-9248
> Project: Hive
>  Issue Type: Bug
>  Components: Tez, Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, 
> HIVE-9248.03.patch, HIVE-9248.04.patch, HIVE-9248.05.patch, HIVE-9248.06.patch
>
>
> Under Tez and Vectorization, ReduceWork not getting vectorized unless it 
> GROUP BY operator is MergePartial.  Add valid cases where GROUP BY is Hash 
> (and presumably there are downstream reducers that will do MergePartial).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11014) LLAP: MiniTez vector_binary_join_groupby test has result changes compared to master

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587205#comment-14587205
 ] 

Sergey Shelukhin commented on HIVE-11014:
-

Note: right now there are some incorrect changes committed there, I'm going to 
commit the master version again

> LLAP: MiniTez vector_binary_join_groupby test has result changes compared to 
> master
> ---
>
> Key: HIVE-11014
> URL: https://issues.apache.org/jira/browse/HIVE-11014
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Matt McCline
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10907) Hive on Tez: Classcast exception in some cases with SMB joins

2015-06-15 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-10907:
--
Affects Version/s: 1.0.0
   1.2.0

> Hive on Tez: Classcast exception in some cases with SMB joins
> -
>
> Key: HIVE-10907
> URL: https://issues.apache.org/jira/browse/HIVE-10907
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.2.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Fix For: 1.2.1
>
> Attachments: HIVE-10907.1.patch, HIVE-10907.2.patch, 
> HIVE-10907.3.patch, HIVE-10907.4.patch
>
>
> In cases where there is a mix of Map side work and reduce side work, we get a 
> classcast exception because we assume homogeneity in the code. We need to fix 
> this correctly. For now this is a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-10915) ORC fails to read table with a 38Gb ORC file

2015-06-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-10915.
--
Resolution: Fixed

Fixed by HIVE-10685. Verified it against lineitem TPCH 1000 scale.

> ORC fails to read table with a 38Gb ORC file
> 
>
> Key: HIVE-10915
> URL: https://issues.apache.org/jira/browse/HIVE-10915
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.3.0
>Reporter: Gopal V
>
> {code}
> hive>  set mapreduce.input.fileinputformat.split.maxsize=1;
> hive> set  mapreduce.input.fileinputformat.split.maxsize=1;
> hive> alter table lineitem concatenate;
> ..
> hive> dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
> Found 12 items
> -rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/00_0
> -rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/01_0
> -rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/02_0
> -rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/03_0
> -rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/04_0
> -rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/05_0
> -rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/06_0
> -rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/07_0
> -rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/08_0
> -rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/09_0
> -rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/10_0
> -rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 
> /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/11_0
> {code}
> Errors without PPD
> Suspicious offsets in the stream information - 
> {code}
> Caused by: java.io.EOFException: Read past end of RLE integer from compressed 
> stream Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 
> 0 offset: 1608840 limit: 1608840 range 0 = 0 to 1608840 uncompressed: 36845 
> to 36845
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
> at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
> at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
> ... 25 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587171#comment-14587171
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


I can reproduce the problem in 1.2. Still investigating the issue...

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-15 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Attachment: HIVE-10999.1-spark.patch

> Upgrade Spark dependency to 1.4 [Spark Branch]
> --
>
> Key: HIVE-10999
> URL: https://issues.apache.org/jira/browse/HIVE-10999
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-10999.1-spark.patch, HIVE-10999.1-spark.patch
>
>
> Spark 1.4.0 is release. Let's update the dependency version from 1.3.1 to 
> 1.4.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10999) Upgrade Spark dependency to 1.4 [Spark Branch]

2015-06-15 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-10999:
---
Comment: was deleted

(was: 

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739493/HIVE-10999.1-spark.patch

{color:red}ERROR:{color} -1 due to 604 failed/errored test(s), 7420 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.initializationError
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucket6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketizedhiveinputformat
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_empty_dir_in_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_external_table_with_space_in_location_path
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_leftsemijoin_mr
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_quotedid_smb
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_schemeAuthority2
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_smb_mapjoin_8
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_stats_counter_partitioned
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_truncate_column_buckets
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_uber_reduce
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_add_part_multiple
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join17
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDrive

[jira] [Commented] (HIVE-11006) improve logging wrt ACID module

2015-06-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587157#comment-14587157
 ] 

Alan Gates commented on HIVE-11006:
---

Will review.

> improve logging wrt ACID module
> ---
>
> Key: HIVE-11006
> URL: https://issues.apache.org/jira/browse/HIVE-11006
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11006.patch
>
>
> especially around metastore DB operations (TxnHandler) which are retried or 
> fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10937:

Attachment: HIVE-10937.01.patch

simplified patch

> LLAP: make ObjectCache for plans work properly in the daemon
> 
>
> Key: HIVE-10937
> URL: https://issues.apache.org/jira/browse/HIVE-10937
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10937.01.patch, HIVE-10937.patch
>
>
> There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
> 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587114#comment-14587114
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


This seems to be fixed in HIVE-9613. The fix was backported to 1.0, but not to 
1.1.

[~hagleitn], [~brocknoland], could we backport HIVE-9613 to 1.1 to solve this 
issue? Thanks

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats resulting in OOM and in-efficient plans

2015-06-15 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-10107:
---
Fix Version/s: 1.2.1

> Union All : Vertex missing stats resulting in OOM and in-efficient plans
> 
>
> Key: HIVE-10107
> URL: https://issues.apache.org/jira/browse/HIVE-10107
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Pengcheng Xiong
> Fix For: 1.2.1
>
>
> Reducer Vertices sending data to a Union all edge are missing statistics and 
> as a result we either use very few reducers in the UNION ALL edge or decide 
> to broadcast the results of UNION ALL.
> Query
> {code}
> select 
> count(*) rowcount
> from
> (select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales a, store_returns b
> where
> a.ss_item_sk = b.sr_item_sk
> and a.ss_ticket_number = b.sr_ticket_number union all select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales c, store_returns d
> where
> c.ss_item_sk = d.sr_item_sk
> and c.ss_ticket_number = d.sr_ticket_number) t
> group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
> having rowcount > 1;
> {code}
> Plan snippet 
> {code}
>  Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
> Reducer 4 <- Union 3 (SIMPLE_EDGE)
> Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>   Reducer 4
> Reduce Operator Tree:
>   Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 
> (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
> Filter Operator
>   predicate: (_col3 > 1) (type: boolean)
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>   Select Operator
> expressions: _col3 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
> File Output Operator
>   compressed: false
>   Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: COMPLETE
>   table:
>   input format: 
> org.apache.hadoop.mapred.TextInputFormat
>   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 7
> Reduce Operator Tree:
>   Merge Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 ss_item_sk (type: int), ss_ticket_number (type: int)
>   1 sr_item_sk (type: int), sr_ticket_number (type: int)
> outputColumnNames: _col1, _col6, _col8, _col27, _col34
> Filter Operator
>   predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: 
> boolean)
>   Select Operator
> expressions: _col1 (type: int), _col8 (type: int), _col6 
> (type: int)
> outputColumnNames: _col0, _col1, _col2
> Group By Operator
>   aggregations: count()
>   keys: _col2 (type: int), _col0 (type: int), _col1 
> (type: int)
>   mode: hash
>   outputColumnNames: _col0, _col1, _col2, _col3
>   Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: 
> int), _col2 (type: int)
> sort order: +++
> Map-reduce partition columns: _col0 (type: int), 
> _col1 (type: int), _col2 (type: int)
> value expressions: _col3 (type: bigint)
> {code}
> The full explain plan 
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
> Reducer 4 <- Union 3 (SIMPLE_EDGE)
> Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 
> (CONTAINS)
>   DagName: mmokhtar_201502141

[jira] [Commented] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587091#comment-14587091
 ] 

Josh Elser commented on HIVE-11010:
---

Thanks for filing this. Was doing some debugging with [~taksaito] with the 
AccumuloStorageHandler -- loaded some data in both HBase and Accumulo, ran some 
Hive queries against both and found that when we ran the Accumulo queries via 
hiveserver (but not in the local client) both the queries would fail on the RPC 
handshakes. Short story, AccumuloStorageHandler queries with Kerberos on don't 
work with HiveServer2.

I think what was happening is that the additions to the AccumuloStorageHandler 
in HIVE-10857 don't work as expected because HS2 is going to be running with 
its own Kerberos credentials. I think we need to change how we set up the 
credentials inside of AccumuloStorageHandler so that it will work regardless of 
a local hive client or hs2 -- running a doAs with a PROXY instead of replacing 
the HS2 credentials.

The second half is that we'd need to make sure Accumulo itself is configured to 
allow HS2 to proxy on behalf of users -- not relevant for Hive code, but 
something to document for users to set up in Accumulo.

> Accumulo storage handler queries via HS2 fail
> -
>
> Key: HIVE-11010
> URL: https://issues.apache.org/jira/browse/HIVE-11010
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
> Environment: Secure
>Reporter: Takahiko Saito
>Assignee: Josh Elser
> Fix For: 1.2.1
>
>
> On Kerberized cluster, accumulo storage handler throws an error, 
> "[usrname]@[principlaname] is not allowed to impersonate [username]" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11010:
--
Fix Version/s: 1.2.1

> Accumulo storage handler queries via HS2 fail
> -
>
> Key: HIVE-11010
> URL: https://issues.apache.org/jira/browse/HIVE-11010
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
> Environment: Secure
>Reporter: Takahiko Saito
>Assignee: Josh Elser
> Fix For: 1.2.1
>
>
> On Kerberized cluster, accumulo storage handler throws an error, 
> "[usrname]@[principlaname] is not allowed to impersonate [username]" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11010:
--
Affects Version/s: 1.2.1

> Accumulo storage handler queries via HS2 fail
> -
>
> Key: HIVE-11010
> URL: https://issues.apache.org/jira/browse/HIVE-11010
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0, 1.2.1
> Environment: Secure
>Reporter: Takahiko Saito
>Assignee: Josh Elser
> Fix For: 1.2.1
>
>
> On Kerberized cluster, accumulo storage handler throws an error, 
> "[usrname]@[principlaname] is not allowed to impersonate [username]" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-4239) Remove lock on compilation stage

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-4239:
---
Attachment: HIVE-4239.07.patch

Rebased the patch. Some other commit has refactored Hive out of the session 
class, so the issue with this change is moot

> Remove lock on compilation stage
> 
>
> Key: HIVE-4239
> URL: https://issues.apache.org/jira/browse/HIVE-4239
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Query Processor
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
> Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
> HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
> HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11010) Accumulo storage handler queries via HS2 fail

2015-06-15 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HIVE-11010:
--
Summary: Accumulo storage handler queries via HS2 fail  (was: Accumulo 
storage handler throws "[usrname]@[principlaname] is not allowed to impersonate 
[username]" via beeline on kerberized cluster)

> Accumulo storage handler queries via HS2 fail
> -
>
> Key: HIVE-11010
> URL: https://issues.apache.org/jira/browse/HIVE-11010
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.2.0
> Environment: Secure
>Reporter: Takahiko Saito
>Assignee: Josh Elser
>
> On Kerberized cluster, accumulo storage handler throws an error, 
> "[usrname]@[principlaname] is not allowed to impersonate [username]" 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587071#comment-14587071
 ] 

Sergey Shelukhin commented on HIVE-10937:
-

I didn't get a repro of issues reported in the cluster

> LLAP: make ObjectCache for plans work properly in the daemon
> 
>
> Key: HIVE-10937
> URL: https://issues.apache.org/jira/browse/HIVE-10937
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: llap
>
> Attachments: HIVE-10937.patch
>
>
> There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
> 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10841) [WHERE col is not null] does not work sometimes for queries with many JOIN statements

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587064#comment-14587064
 ] 

Laljo John Pullokkaran commented on HIVE-10841:
---

Committed to branch 1.0.

> [WHERE col is not null] does not work sometimes for queries with many JOIN 
> statements
> -
>
> Key: HIVE-10841
> URL: https://issues.apache.org/jira/browse/HIVE-10841
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.2.0, 1.3.0
>Reporter: Alexander Pivovarov
>Assignee: Laljo John Pullokkaran
> Fix For: 1.2.1
>
> Attachments: HIVE-10841.03.patch, HIVE-10841.1.patch, 
> HIVE-10841.2.patch, HIVE-10841.patch
>
>
> The result from the following SELECT query is 3 rows but it should be 1 row.
> I checked it in MySQL - it returned 1 row.
> To reproduce the issue in Hive
> 1. prepare tables
> {code}
> drop table if exists L;
> drop table if exists LA;
> drop table if exists FR;
> drop table if exists A;
> drop table if exists PI;
> drop table if exists acct;
> create table L as select 4436 id;
> create table LA as select 4436 loan_id, 4748 aid, 4415 pi_id;
> create table FR as select 4436 loan_id;
> create table A as select 4748 id;
> create table PI as select 4415 id;
> create table acct as select 4748 aid, 10 acc_n, 122 brn;
> insert into table acct values(4748, null, null);
> insert into table acct values(4748, null, null);
> {code}
> 2. run SELECT query
> {code}
> select
>   acct.ACC_N,
>   acct.brn
> FROM L
> JOIN LA ON L.id = LA.loan_id
> JOIN FR ON L.id = FR.loan_id
> JOIN A ON LA.aid = A.id
> JOIN PI ON PI.id = LA.pi_id
> JOIN acct ON A.id = acct.aid
> WHERE
>   L.id = 4436
>   and acct.brn is not null;
> {code}
> the result is 3 rows
> {code}
> 10122
> NULL  NULL
> NULL  NULL
> {code}
> but it should be 1 row
> {code}
> 10122
> {code}
> 2.1 "explain select ..." output for hive-1.3.0 MR
> {code}
> STAGE DEPENDENCIES:
>   Stage-12 is a root stage
>   Stage-9 depends on stages: Stage-12
>   Stage-0 depends on stages: Stage-9
> STAGE PLANS:
>   Stage: Stage-12
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> a 
>   Fetch Operator
> limit: -1
> acct 
>   Fetch Operator
> limit: -1
> fr 
>   Fetch Operator
> limit: -1
> l 
>   Fetch Operator
> limit: -1
> pi 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> a 
>   TableScan
> alias: a
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: id is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> acct 
>   TableScan
> alias: acct
> Statistics: Num rows: 3 Data size: 31 Basic stats: COMPLETE 
> Column stats: NONE
> Filter Operator
>   predicate: aid is not null (type: boolean)
>   Statistics: Num rows: 2 Data size: 20 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 _col5 (type: int)
>   1 id (type: int)
>   2 aid (type: int)
> fr 
>   TableScan
> alias: fr
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (loan_id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> l 
>   TableScan
> alias: l
> Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: (id = 4436) (type: boolean)
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   HashTable Sink Operator
> keys:
>   0 4436 (type: int)
>   1 4436 (type: int)
>   2 4436 (type: int)
> pi 
>   TableScan
> alias: pi
> Statistics: Num rows: 1 Data size: 4 Basic stat

[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587060#comment-14587060
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

+1

> HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
> call
> -
>
> Key: HIVE-10940
> URL: https://issues.apache.org/jira/browse/HIVE-10940
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10940.01.patch, HIVE-10940.patch
>
>
> {code}
> String filterText = filterExpr.getExprString();
> String filterExprSerialized = Utilities.serializeExpression(filterExpr);
> {code}
> the serializeExpression initializes Kryo and produces a new packed object for 
> every split.
> HiveInputFormat::getRecordReader -> pushProjectionAndFilters -> pushFilters.
> And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11006) improve logging wrt ACID module

2015-06-15 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587045#comment-14587045
 ] 

Eugene Koifman commented on HIVE-11006:
---

[~sushanth], could we get this into 1.2.1?  It's only logging changes but will 
make diagnostics easier.

> improve logging wrt ACID module
> ---
>
> Key: HIVE-11006
> URL: https://issues.apache.org/jira/browse/HIVE-11006
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11006.patch
>
>
> especially around metastore DB operations (TxnHandler) which are retried or 
> fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10940:

Attachment: HIVE-10940.01.patch

> HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
> call
> -
>
> Key: HIVE-10940
> URL: https://issues.apache.org/jira/browse/HIVE-10940
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10940.01.patch, HIVE-10940.patch
>
>
> {code}
> String filterText = filterExpr.getExprString();
> String filterExprSerialized = Utilities.serializeExpression(filterExpr);
> {code}
> the serializeExpression initializes Kryo and produces a new packed object for 
> every split.
> HiveInputFormat::getRecordReader -> pushProjectionAndFilters -> pushFilters.
> And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data

2015-06-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-10685:
-
Attachment: (was: HIVE-10685.1.patch)

> Alter table concatenate oparetor will cause duplicate data
> --
>
> Key: HIVE-10685
> URL: https://issues.apache.org/jira/browse/HIVE-10685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1
>Reporter: guoliming
>Assignee: guoliming
>Priority: Critical
> Fix For: 1.2.0, 1.1.0
>
> Attachments: HIVE-10685.patch
>
>
> "Orders" table has 15 rows and stored as ORC. 
> {noformat}
> hive> select count(*) from orders;
> OK
> 15
> Time taken: 37.692 seconds, Fetched: 1 row(s)
> {noformat}
> The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB.
> After executing command : ALTER TABLE orders CONCATENATE;
> The table is already 1530115000 rows.
> My hive version is 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587018#comment-14587018
 ] 

Sergey Shelukhin commented on HIVE-10940:
-

text representation is preserved for backward compat (if you mean the original 
one we used to serialize). Will add logging

> HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
> call
> -
>
> Key: HIVE-10940
> URL: https://issues.apache.org/jira/browse/HIVE-10940
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10940.patch
>
>
> {code}
> String filterText = filterExpr.getExprString();
> String filterExprSerialized = Utilities.serializeExpression(filterExpr);
> {code}
> the serializeExpression initializes Kryo and produces a new packed object for 
> every split.
> HiveInputFormat::getRecordReader -> pushProjectionAndFilters -> pushFilters.
> And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-10685) Alter table concatenate oparetor will cause duplicate data

2015-06-15 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran reopened HIVE-10685:
--

I am gonna revert the committed patch and apply the original patch. The 
committed patch will not work as the stripe index increment is outside of 
continue. 

> Alter table concatenate oparetor will cause duplicate data
> --
>
> Key: HIVE-10685
> URL: https://issues.apache.org/jira/browse/HIVE-10685
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 1.2.1
>Reporter: guoliming
>Assignee: guoliming
>Priority: Critical
> Fix For: 1.2.0, 1.1.0
>
> Attachments: HIVE-10685.1.patch, HIVE-10685.patch
>
>
> "Orders" table has 15 rows and stored as ORC. 
> {noformat}
> hive> select count(*) from orders;
> OK
> 15
> Time taken: 37.692 seconds, Fetched: 1 row(s)
> {noformat}
> The table contain 14 files,the size of each file is about 2.1 ~ 3.2 GB.
> After executing command : ALTER TABLE orders CONCATENATE;
> The table is already 1530115000 rows.
> My hive version is 1.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call

2015-06-15 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586996#comment-14586996
 ] 

Prasanth Jayachandran commented on HIVE-10940:
--

Patch mostly looks good. Although it will be good to add some debug logging 
after each if null checks. Also from simple reference look up we don't seem be 
using textual representation of the filter expression anywhere. I don't think 
we need to set the text representation of filter expression. If we need text 
representation we have methods in PlanUtils to do so.

[~ashutoshc]/[~gopalv] Any idea why we set the filter expression in text form 
to job conf?

> HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
> call
> -
>
> Key: HIVE-10940
> URL: https://issues.apache.org/jira/browse/HIVE-10940
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 1.2.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10940.patch
>
>
> {code}
> String filterText = filterExpr.getExprString();
> String filterExprSerialized = Utilities.serializeExpression(filterExpr);
> {code}
> the serializeExpression initializes Kryo and produces a new packed object for 
> every split.
> HiveInputFormat::getRecordReader -> pushProjectionAndFilters -> pushFilters.
> And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586978#comment-14586978
 ] 

Laljo John Pullokkaran commented on HIVE-10996:
---

[~jcamachorodriguez] Could you take a look? seems like related to DT removal 
HIVE-8435.

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Gautam Kowshik
>Priority: Minor
> Attachments: explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10996:
--
Priority: Critical  (was: Minor)

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results

2015-06-15 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-10996:
--
Assignee: Jesus Camacho Rodriguez

> Aggregation / Projection over Multi-Join Inner Query producing incorrect 
> results
> 
>
> Key: HIVE-10996
> URL: https://issues.apache.org/jira/browse/HIVE-10996
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.0.0, 1.2.0, 1.1.0
>Reporter: Gautam Kowshik
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
> Attachments: explain_q1.txt, explain_q2.txt
>
>
> We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
> a regression.
> The following query (Q1) produces no results:
> {code}
> select s
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> {code}
> While this one (Q2) does produce results :
> {code}
> select *
> from (
>   select last.*, action.st2, action.n
>   from (
> select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
> last_stage_timestamp
> from (select * from purchase_history) purchase
> join (select * from cart_history) mevt
> on purchase.s = mevt.s
> where purchase.timestamp > mevt.timestamp
> group by purchase.s, purchase.timestamp
>   ) last
>   join (select * from events) action
>   on last.s = action.s and last.last_stage_timestamp = action.timestamp
> ) list;
> 1 21  20  Bob 1234
> 1 31  30  Bob 1234
> 3 51  50  Jeff1234
> {code}
> The setup to test this is:
> {code}
> create table purchase_history (s string, product string, price double, 
> timestamp int);
> insert into purchase_history values ('1', 'Belt', 20.00, 21);
> insert into purchase_history values ('1', 'Socks', 3.50, 31);
> insert into purchase_history values ('3', 'Belt', 20.00, 51);
> insert into purchase_history values ('4', 'Shirt', 15.50, 59);
> create table cart_history (s string, cart_id int, timestamp int);
> insert into cart_history values ('1', 1, 10);
> insert into cart_history values ('1', 2, 20);
> insert into cart_history values ('1', 3, 30);
> insert into cart_history values ('1', 4, 40);
> insert into cart_history values ('3', 5, 50);
> insert into cart_history values ('4', 6, 60);
> create table events (s string, st2 string, n int, timestamp int);
> insert into events values ('1', 'Bob', 1234, 20);
> insert into events values ('1', 'Bob', 1234, 30);
> insert into events values ('1', 'Bob', 1234, 25);
> insert into events values ('2', 'Sam', 1234, 30);
> insert into events values ('3', 'Jeff', 1234, 50);
> insert into events values ('4', 'Ted', 1234, 60);
> {code}
> I realize select * and select s are not all that interesting in this context 
> but what lead us to this issue was select count(distinct s) was not returning 
> results. The above queries are the simplified queries that produce the issue. 
> I will note that if I convert the inner join to a table and select from that 
> the issue does not appear.
> Update: Found that turning off  hive.optimize.remove.identity.project fixes 
> this issue. This optimization was introduced in 
> https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586753#comment-14586753
 ] 

Alan Gates commented on HIVE-10972:
---

Yes, I'll take a look.

> DummyTxnManager always locks the current database in shared mode, which is 
> incorrect.
> -
>
> Key: HIVE-10972
> URL: https://issues.apache.org/jira/browse/HIVE-10972
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10972.2.patch, HIVE-10972.patch
>
>
> In DummyTxnManager [line 163 | 
> http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
>  it always locks the current database. 
> That is not correct since the current database can be "db1", and the query 
> can be "select * from db2.tb1", which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: HIVE-11007.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
> mapInputToDP should depends on the last SEL
> -
>
> Key: HIVE-11007
> URL: https://issues.apache.org/jira/browse/HIVE-11007
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11007.01.patch
>
>
> In dynamic partitioning case, for example, we are going to have 
> TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
> SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: (was: HIVE-11007.01.patch)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
> mapInputToDP should depends on the last SEL
> -
>
> Key: HIVE-11007
> URL: https://issues.apache.org/jira/browse/HIVE-11007
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11007.01.patch
>
>
> In dynamic partitioning case, for example, we are going to have 
> TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
> SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10986) Check of fs.trash.interval in HiveMetaStore should be consistent with Trash.moveToAppropriateTrash()

2015-06-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-10986:
--
Attachment: HIVE-10986.patch

> Check of fs.trash.interval in HiveMetaStore should be consistent with 
> Trash.moveToAppropriateTrash()
> 
>
> Key: HIVE-10986
> URL: https://issues.apache.org/jira/browse/HIVE-10986
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 1.2.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-10986.patch
>
>
> This is a followup to HIVE-10629.
> Trash.moveToAppropriateTrash() takes core-site.xml but HiveMetaStore checks 
> "hiveConf" which is a problem when they disagree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10884:

Comment: was deleted

(was: It looks like the instrumentation needs to be updated to run beeline 
tests... )

> Enable some beeline tests and turn on HIVE-4239 by default
> --
>
> Key: HIVE-10884
> URL: https://issues.apache.org/jira/browse/HIVE-10884
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
> HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
> HIVE-10884.patch
>
>
> See comments in HIVE-4239.
> Beeline tests with parallelism need to be enabled to turn compilation 
> parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-06-15 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10884:

Attachment: HIVE-10884.05.patch

Beeline tests weren't attempted. Attempting to remove the exclude from 
hivetest...

> Enable some beeline tests and turn on HIVE-4239 by default
> --
>
> Key: HIVE-10884
> URL: https://issues.apache.org/jira/browse/HIVE-10884
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
> HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
> HIVE-10884.patch
>
>
> See comments in HIVE-4239.
> Beeline tests with parallelism need to be enabled to turn compilation 
> parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10884) Enable some beeline tests and turn on HIVE-4239 by default

2015-06-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586674#comment-14586674
 ] 

Sergey Shelukhin commented on HIVE-10884:
-

It looks like the instrumentation needs to be updated to run beeline tests... 

> Enable some beeline tests and turn on HIVE-4239 by default
> --
>
> Key: HIVE-10884
> URL: https://issues.apache.org/jira/browse/HIVE-10884
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-10884.01.patch, HIVE-10884.02.patch, 
> HIVE-10884.03.patch, HIVE-10884.04.patch, HIVE-10884.05.patch, 
> HIVE-10884.patch
>
>
> See comments in HIVE-4239.
> Beeline tests with parallelism need to be enabled to turn compilation 
> parallelism on by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q

2015-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586613#comment-14586613
 ] 

Hive QA commented on HIVE-10991:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739659/HIVE-10991.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 9008 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4269/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4269/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4269/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739659 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
> 
>
> Key: HIVE-10991
> URL: https://issues.apache.org/jira/browse/HIVE-10991
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10991.patch
>
>
> NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11008) webhcat GET /jobs retries on getting job details from history server is too agressive

2015-06-15 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-11008:
-
Attachment: HIVE-11008.1.patch

Patch from [~cwelch]


> webhcat GET /jobs retries on getting job details from history server is too 
> agressive
> -
>
> Key: HIVE-11008
> URL: https://issues.apache.org/jira/browse/HIVE-11008
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 1.2.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-11008.1.patch
>
>
> Webhcat "jobs" api gets the list of jobs from RM and then gets details from 
> history server.
> RM has a policy of retaining fixed number of jobs to accommodate for the 
> memory it has, while HistoryServer retains jobs based on their age. As a 
> result, jobs that RM returns might not be present in HistoryServer and can 
> result in a failure. HistoryServer also ends up retrying on failures even if 
> they happen because the job actually does not exist. 
> The retries to get details from HistoryServer in such cases is too aggressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10984) "Lock table" explicit lock command doesn't lock the database object.

2015-06-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Description: 
There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
exclusively on a table, it doesn't lock the database object (which does if it's 
from the query).
The current implementation of ZooKeeperHiveLockManager will lock the the object 
and the parents, and won't check the children when it tries to acquire lock on 
certain object. Then it will cause the following scenario which should not be 
allowed but right now it goes through.

{noformat}
use default; 
lock table db1.tbl1 shared; 
lock database db1 exclusive;
{noformat}

Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


  was:
There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
exclusively on an object we didn't check if the children are locked.

So the following should not be allowed.
{noformat}
use default; 
lock table lockneg2.tstsrcpart shared; 
lock database lockneg2 exclusive;
{noformat}

Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
failure cases.


> "Lock table" explicit lock command doesn't lock the database object.
> 
>
> Key: HIVE-10984
> URL: https://issues.apache.org/jira/browse/HIVE-10984
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
> exclusively on a table, it doesn't lock the database object (which does if 
> it's from the query).
> The current implementation of ZooKeeperHiveLockManager will lock the the 
> object and the parents, and won't check the children when it tries to acquire 
> lock on certain object. Then it will cause the following scenario which 
> should not be allowed but right now it goes through.
> {noformat}
> use default; 
> lock table db1.tbl1 shared; 
> lock database db1 exclusive;
> {noformat}
> Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
> failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10984) "Lock table" explicit lock command doesn't lock the database object.

2015-06-15 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-10984:

Summary: "Lock table" explicit lock command doesn't lock the database 
object.  (was: When ZooKeeperHiveLockManager locks an object exclusively, it 
doesn't check the lock on the children.)

> "Lock table" explicit lock command doesn't lock the database object.
> 
>
> Key: HIVE-10984
> URL: https://issues.apache.org/jira/browse/HIVE-10984
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>
> There is an issue in ZooKeeperHiveLockManager.java, in which when locking 
> exclusively on an object we didn't check if the children are locked.
> So the following should not be allowed.
> {noformat}
> use default; 
> lock table lockneg2.tstsrcpart shared; 
> lock database lockneg2 exclusive;
> {noformat}
> Also check the test case lockneg_try_lock_db_in_use.q to add more reasonable 
> failure cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11006) improve logging wrt ACID module

2015-06-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11006:
--
Attachment: HIVE-11006.patch

[~alangates] could you review please

> improve logging wrt ACID module
> ---
>
> Key: HIVE-11006
> URL: https://issues.apache.org/jira/browse/HIVE-11006
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11006.patch
>
>
> especially around metastore DB operations (TxnHandler) which are retried or 
> fail for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11007) CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's mapInputToDP should depends on the last SEL

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11007:
---
Attachment: HIVE-11007.01.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): dpCtx's 
> mapInputToDP should depends on the last SEL
> -
>
> Key: HIVE-11007
> URL: https://issues.apache.org/jira/browse/HIVE-11007
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11007.01.patch
>
>
> In dynamic partitioning case, for example, we are going to have 
> TS0-SEL1-SEL2-FS3. The dpCtx's mapInputToDP is populated by SEL1 rather than 
> SEL2, which causes error in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11004) PermGen OOM error in Hiveserver2

2015-06-15 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586438#comment-14586438
 ] 

Mostafa Mokhtar commented on HIVE-11004:


[~martinbenson]
Try setting hive.orc.cache.stripe.details.size=-1 and restart HS2. 

> PermGen OOM error in Hiveserver2
> 
>
> Key: HIVE-11004
> URL: https://issues.apache.org/jira/browse/HIVE-11004
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
> Environment: cdh 5.4
>Reporter: Martin Benson
>Priority: Critical
>
> Periodically Hiveserver2 will become unresponsive and looking in the logs 
> there is the following error:
> 2:28:22.965 PMERROR   org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
> Unexpected Exception
> java.lang.OutOfMemoryError: PermGen space
> 2:28:22.969 PMWARN
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.RuntimeException: serious problem
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.RuntimeException: serious problem
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338)
>   ... 13 more
> Caused by: java.lang.RuntimeException: serious problem
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:944)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:362)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:294)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
>   ... 17 more
> Caused by: java.lang.OutOfMemoryError: PermGen space
> There does not appear to be an obvious trigger for this (other than the fact 
> that the error mentions ORC). If further details would be helpful in 
> diagnosing the issue please let me know and I'll supply them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10533) CBO (Calcite Return Path): Join to MultiJoin support for outer joins

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10533:
---
Attachment: HIVE-10533.03.patch

> CBO (Calcite Return Path): Join to MultiJoin support for outer joins
> 
>
> Key: HIVE-10533
> URL: https://issues.apache.org/jira/browse/HIVE-10533
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10533.01.patch, HIVE-10533.02.patch, 
> HIVE-10533.02.patch, HIVE-10533.03.patch, HIVE-10533.patch
>
>
> CBO return path: auto_join7.q can be used to reproduce the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10991:
---
Attachment: HIVE-10991.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
> 
>
> Key: HIVE-10991
> URL: https://issues.apache.org/jira/browse/HIVE-10991
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10991.patch
>
>
> NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-10991) CBO: Calcite Operator To Hive Operator (Calcite Return Path): NonBlockingOpDeDupProc did not kick in rcfile_merge2.q

2015-06-15 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-10991:
--

Assignee: Jesus Camacho Rodriguez  (was: Pengcheng Xiong)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> NonBlockingOpDeDupProc did not kick in rcfile_merge2.q
> 
>
> Key: HIVE-10991
> URL: https://issues.apache.org/jira/browse/HIVE-10991
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-10991.patch
>
>
> NonBlockingOpDeDupProc did not kick in rcfile_merge2.q in return path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11005) CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on the latest master

2015-06-15 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11005:
---
Assignee: Jesus Camacho Rodriguez

> CBO: Calcite Operator To Hive Operator (Calcite Return Path) : Regression on 
> the latest master
> --
>
> Key: HIVE-11005
> URL: https://issues.apache.org/jira/browse/HIVE-11005
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Jesus Camacho Rodriguez
>
> Test cbo_join.q and cbo_views.q on return path failed. Part of the stack 
> trace is 
> {code}
> 2015-06-15 09:51:53,377 ERROR [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:genOPTree(282)) - CBO failed, skipping CBO.
> java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
> at 
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
> at 
> com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.java:80)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveInsertExchange4JoinRule.onMatch(HiveInsertExchange4JoinRule.java:101)
> at 
> org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:326)
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:515)
> at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:392)
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:255)
> at 
> org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:125)
> at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:207)
> at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:194)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:888)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:771)
> at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:109)
> at 
> org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:876)
> at 
> org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:145)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10233) Hive on LLAP: Memory manager

2015-06-15 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-10233:
-
Attachment: HIVE-10233-WIP-8.patch

Upload WIP-8 patch for join only MM.

> Hive on LLAP: Memory manager
> 
>
> Key: HIVE-10233
> URL: https://issues.apache.org/jira/browse/HIVE-10233
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-10233-WIP-2.patch, HIVE-10233-WIP-3.patch, 
> HIVE-10233-WIP-4.patch, HIVE-10233-WIP-5.patch, HIVE-10233-WIP-6.patch, 
> HIVE-10233-WIP-7.patch, HIVE-10233-WIP-8.patch
>
>
> We need a memory manager in llap/tez to manage the usage of memory across 
> threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586234#comment-14586234
 ] 

Hive QA commented on HIVE-10165:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12739615/HIVE-10165.7.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 9085 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join28
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchAbort
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4268/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4268/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4268/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12739615 - PreCommit-HIVE-TRUNK-Build

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: streaming_api
> Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
> HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, 
> mutate-system-overview.png
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11004) PermGen OOM error in Hiveserver2

2015-06-15 Thread Martin Benson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Benson updated HIVE-11004:
-
Summary: PermGen OOM error in Hiveserver2  (was: PermGen)

> PermGen OOM error in Hiveserver2
> 
>
> Key: HIVE-11004
> URL: https://issues.apache.org/jira/browse/HIVE-11004
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.1.0
> Environment: cdh 5.4
>Reporter: Martin Benson
>Priority: Critical
>
> Periodically Hiveserver2 will become unresponsive and looking in the logs 
> there is the following error:
> 2:28:22.965 PMERROR   org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
> Unexpected Exception
> java.lang.OutOfMemoryError: PermGen space
> 2:28:22.969 PMWARN
> org.apache.hive.service.cli.thrift.ThriftCLIService 
> Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
> java.lang.RuntimeException: serious problem
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:343)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:250)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:656)
>   at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:451)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
>   at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.RuntimeException: serious problem
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:507)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
>   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
>   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1655)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:338)
>   ... 13 more
> Caused by: java.lang.RuntimeException: serious problem
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$Context.waitForTasks(OrcInputFormat.java:478)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:944)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:969)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:362)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:294)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
>   ... 17 more
> Caused by: java.lang.OutOfMemoryError: PermGen space
> There does not appear to be an obvious trigger for this (other than the fact 
> that the error mentions ORC). If further details would be helpful in 
> diagnosing the issue please let me know and I'll supply them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586162#comment-14586162
 ] 

Aihua Xu commented on HIVE-10972:
-

[~alangates]  Seems you worked on the initial version? Can you also take a look 
at the change to see if it will cause any issue?

> DummyTxnManager always locks the current database in shared mode, which is 
> incorrect.
> -
>
> Key: HIVE-10972
> URL: https://issues.apache.org/jira/browse/HIVE-10972
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10972.2.patch, HIVE-10972.patch
>
>
> In DummyTxnManager [line 163 | 
> http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
>  it always locks the current database. 
> That is not correct since the current database can be "db1", and the query 
> can be "select * from db2.tb1", which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10972) DummyTxnManager always locks the current database in shared mode, which is incorrect.

2015-06-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586157#comment-14586157
 ] 

Aihua Xu commented on HIVE-10972:
-

The tests are not related.

> DummyTxnManager always locks the current database in shared mode, which is 
> incorrect.
> -
>
> Key: HIVE-10972
> URL: https://issues.apache.org/jira/browse/HIVE-10972
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10972.2.patch, HIVE-10972.patch
>
>
> In DummyTxnManager [line 163 | 
> http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java#163],
>  it always locks the current database. 
> That is not correct since the current database can be "db1", and the query 
> can be "select * from db2.tb1", which will lock db1 unnecessarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-06-15 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586098#comment-14586098
 ] 

Chaoyu Tang commented on HIVE-7018:
---

[~ychena] Looks like the HMS upgrade test failed, do you know the reason?

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Attachments: HIVE-7018.1.patch, HIVE-7018.2.patch, HIVE-7018.3.patch, 
> HIVE-7018.4.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10754) Pig+Hcatalog doesn't work properly since we need to clone the Job instance in HCatLoader

2015-06-15 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586087#comment-14586087
 ] 

Aihua Xu commented on HIVE-10754:
-

[~mithun] Sorry for the late reply. Busy with something else. Seems it's hadoop 
version related issue.

Would it be fair to update all the calls in HCatalog to use the new 
getInstance() since it's deprecated anyway? If you agree, I will use this jira 
to do that and I will update the title to reflect it.

> Pig+Hcatalog doesn't work properly since we need to clone the Job instance in 
> HCatLoader
> 
>
> Key: HIVE-10754
> URL: https://issues.apache.org/jira/browse/HIVE-10754
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-10754.patch
>
>
> {noformat}
> Create table tbl1 (key string, value string) stored as rcfile;
> Create table tbl2 (key string, value string);
> insert into tbl1 values( '1', '111');
> insert into tbl2 values('1', '2');
> {noformat}
> Pig script:
> {noformat}
> src_tbl1 = FILTER tbl1 BY (key == '1');
> prj_tbl1 = FOREACH src_tbl1 GENERATE
>key as tbl1_key,
>value as tbl1_value,
>'333' as tbl1_v1;
>
> src_tbl2 = FILTER tbl2 BY (key == '1');
> prj_tbl2 = FOREACH src_tbl2 GENERATE
>key as tbl2_key,
>value as tbl2_value;
>
> dump prj_tbl1;
> dump prj_tbl2;
> result = JOIN prj_tbl1 BY (tbl1_key), prj_tbl2 BY (tbl2_key);
> prj_result = FOREACH result 
>   GENERATE  prj_tbl1::tbl1_key AS key1,
> prj_tbl1::tbl1_value AS value1,
> prj_tbl1::tbl1_v1 AS v1,
> prj_tbl2::tbl2_key AS key2,
> prj_tbl2::tbl2_value AS value2;
>
> dump prj_result;
> {noformat}
> The expected result is (1,111,333,1,2) while the result is (1,2,333,1,2).  We 
> need to clone the job instance in HCatLoader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10165) Improve hive-hcatalog-streaming extensibility and support updates and deletes.

2015-06-15 Thread Elliot West (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliot West updated HIVE-10165:
---
Attachment: HIVE-10165.7.patch

> Improve hive-hcatalog-streaming extensibility and support updates and deletes.
> --
>
> Key: HIVE-10165
> URL: https://issues.apache.org/jira/browse/HIVE-10165
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 1.2.0
>Reporter: Elliot West
>Assignee: Elliot West
>  Labels: streaming_api
> Attachments: HIVE-10165.0.patch, HIVE-10165.4.patch, 
> HIVE-10165.5.patch, HIVE-10165.6.patch, HIVE-10165.7.patch, 
> mutate-system-overview.png
>
>
> h3. Overview
> I'd like to extend the 
> [hive-hcatalog-streaming|https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]
>  API so that it also supports the writing of record updates and deletes in 
> addition to the already supported inserts.
> h3. Motivation
> We have many Hadoop processes outside of Hive that merge changed facts into 
> existing datasets. Traditionally we achieve this by: reading in a 
> ground-truth dataset and a modified dataset, grouping by a key, sorting by a 
> sequence and then applying a function to determine inserted, updated, and 
> deleted rows. However, in our current scheme we must rewrite all partitions 
> that may potentially contain changes. In practice the number of mutated 
> records is very small when compared with the records contained in a 
> partition. This approach results in a number of operational issues:
> * Excessive amount of write activity required for small data changes.
> * Downstream applications cannot robustly read these datasets while they are 
> being updated.
> * Due to scale of the updates (hundreds or partitions) the scope for 
> contention is high. 
> I believe we can address this problem by instead writing only the changed 
> records to a Hive transactional table. This should drastically reduce the 
> amount of data that we need to write and also provide a means for managing 
> concurrent access to the data. Our existing merge processes can read and 
> retain each record's {{ROW_ID}}/{{RecordIdentifier}} and pass this through to 
> an updated form of the hive-hcatalog-streaming API which will then have the 
> required data to perform an update or insert in a transactional manner. 
> h3. Benefits
> * Enables the creation of large-scale dataset merge processes  
> * Opens up Hive transactional functionality in an accessible manner to 
> processes that operate outside of Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10989) HoS can't control number of map tasks for runtime skew join [Spark Branch]

2015-06-15 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585889#comment-14585889
 ] 

Xuefu Zhang commented on HIVE-10989:


Makes sense. +1

> HoS can't control number of map tasks for runtime skew join [Spark Branch]
> --
>
> Key: HIVE-10989
> URL: https://issues.apache.org/jira/browse/HIVE-10989
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-10989.1-spark.patch
>
>
> Flags {{hive.skewjoin.mapjoin.map.tasks}} and 
> {{hive.skewjoin.mapjoin.min.split}} are used to control the number of map 
> tasks for the map join of runtime skew join. They work well for MR but have 
> no effect for spark.
> This makes runtime skew join less useful, i.e. we just end up with slow 
> mappers instead of reducers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585874#comment-14585874
 ] 

Damien Carol commented on HIVE-6500:


Plz ignore my last comment

> Stats collection via filesystem
> ---
>
> Key: HIVE-6500
> URL: https://issues.apache.org/jira/browse/HIVE-6500
> Project: Hive
>  Issue Type: New Feature
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch
>
>
> Recently, support for stats gathering via counter was [added | 
> https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
> following issues:
> * [Length of counter group name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
> * [Length of counter name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
> * [Number of distinct counter groups are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
> * [Number of distinct counters are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
> Although, these limits are configurable, but setting them to higher value 
> implies increased memory load on AM and job history server.
> Now, whether these limits makes sense or not is [debatable | 
> https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
> Hive doesn't make use of counters features of framework so that it we can 
> evolve this feature without relying on support from framework. Filesystem 
> based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585873#comment-14585873
 ] 

Damien Carol commented on HIVE-6500:


Plz ignore my last comment

> Stats collection via filesystem
> ---
>
> Key: HIVE-6500
> URL: https://issues.apache.org/jira/browse/HIVE-6500
> Project: Hive
>  Issue Type: New Feature
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch
>
>
> Recently, support for stats gathering via counter was [added | 
> https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
> following issues:
> * [Length of counter group name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
> * [Length of counter name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
> * [Number of distinct counter groups are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
> * [Number of distinct counters are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
> Although, these limits are configurable, but setting them to higher value 
> implies increased memory load on AM and job history server.
> Now, whether these limits makes sense or not is [debatable | 
> https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
> Hive doesn't make use of counters features of framework so that it we can 
> evolve this feature without relying on support from framework. Filesystem 
> based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10542) Full outer joins in tez produce incorrect results in certain cases

2015-06-15 Thread Goun Na (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585725#comment-14585725
 ] 

Goun Na commented on HIVE-10542:


No patch available for Hive 1.1?

> Full outer joins in tez produce incorrect results in certain cases
> --
>
> Key: HIVE-10542
> URL: https://issues.apache.org/jira/browse/HIVE-10542
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.0.0, 1.2.0, 1.1.0, 1.3.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
>Priority: Blocker
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-10542.1.patch, HIVE-10542.2.patch, 
> HIVE-10542.3.patch, HIVE-10542.4.patch, HIVE-10542.5.patch, 
> HIVE-10542.6.patch, HIVE-10542.7.patch, HIVE-10542.8.patch, HIVE-10542.9.patch
>
>
> If there is no records for one of the tables in the full outer join, we do 
> not read the other input and end up not producing rows which we should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585683#comment-14585683
 ] 

Damien Carol commented on HIVE-6500:


[~ashutoshc] Did you miss the property *hive.stats.tmp.loc* in 
_common/src/java/org/apache/hadoop/hive/conf/HiveConf.java_ ?

> Stats collection via filesystem
> ---
>
> Key: HIVE-6500
> URL: https://issues.apache.org/jira/browse/HIVE-6500
> Project: Hive
>  Issue Type: New Feature
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch
>
>
> Recently, support for stats gathering via counter was [added | 
> https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
> following issues:
> * [Length of counter group name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
> * [Length of counter name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
> * [Number of distinct counter groups are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
> * [Number of distinct counters are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
> Although, these limits are configurable, but setting them to higher value 
> implies increased memory load on AM and job history server.
> Now, whether these limits makes sense or not is [debatable | 
> https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
> Hive doesn't make use of counters features of framework so that it we can 
> evolve this feature without relying on support from framework. Filesystem 
> based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-6500) Stats collection via filesystem

2015-06-15 Thread Damien Carol (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585678#comment-14585678
 ] 

Damien Carol commented on HIVE-6500:


[~leftylev] This JIRA added a new property NOT documented *hive.stats.tmp.loc*
Also this property is not added in "hive-default.xml" system.

> Stats collection via filesystem
> ---
>
> Key: HIVE-6500
> URL: https://issues.apache.org/jira/browse/HIVE-6500
> Project: Hive
>  Issue Type: New Feature
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.13.0
>
> Attachments: HIVE-6500.2.patch, HIVE-6500.3.patch, HIVE-6500.patch
>
>
> Recently, support for stats gathering via counter was [added | 
> https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
> following issues:
> * [Length of counter group name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
> * [Length of counter name is limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
> * [Number of distinct counter groups are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
> * [Number of distinct counters are limited | 
> https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
> Although, these limits are configurable, but setting them to higher value 
> implies increased memory load on AM and job history server.
> Now, whether these limits makes sense or not is [debatable | 
> https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that 
> Hive doesn't make use of counters features of framework so that it we can 
> evolve this feature without relying on support from framework. Filesystem 
> based counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)