[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362833#comment-14362833
 ] 

Lefty Leverenz commented on HIVE-9957:
--

Since this is broken in 1.1.0 and fixed in 1.2.0, should it be documented in 
the wiki?

* [Getting Started -- Requirements | 
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-Requirements]
* [Installing Hive | 
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation#AdminManualInstallation-InstallingHive]

> Hive 1.1.0 not compatible with Hadoop 2.4.0
> ---
>
> Key: HIVE-9957
> URL: https://issues.apache.org/jira/browse/HIVE-9957
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Vivek Shrivastava
>Assignee: Sergio Peña
> Fix For: 1.2.0
>
> Attachments: HIVE-9957.1.patch
>
>
> Getting this exception while accessing data through Hive. 
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.(Hadoop23Shims.java:1152)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9968) Hive DDL docs don't include partitioned views

2015-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362839#comment-14362839
 ] 

Lefty Leverenz commented on HIVE-9968:
--

[~jbeard], would you like to take care of this yourself?

* [About This Wiki -- How to get permission to edit | 
https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki#AboutThisWiki-Howtogetpermissiontoedit]

> Hive DDL docs don't include partitioned views
> -
>
> Key: HIVE-9968
> URL: https://issues.apache.org/jira/browse/HIVE-9968
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.8.0, 1.0.0
>Reporter: Jeremy Beard
>  Labels: documentation
>
> Partitioned views have been in Hive for over four years, but the Hive DDL 
> syntax in the documentation doesn't mention it (!).
> - 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/AlterView
> - https://cwiki.apache.org/confluence/display/Hive/PartitionedViews
> - https://issues.apache.org/jira/browse/HIVE-1941



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9971) Clean up operator class

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362870#comment-14362870
 ] 

Hive QA commented on HIVE-9971:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704719/HIVE-9971.1.patch

{color:red}ERROR:{color} -1 due to 204 failed/errored test(s), 7762 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_vectorization_ppd
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_bucket
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_mapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_coalesce_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_data_types
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_10_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_precision
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_trailing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_distinct_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_left_outer_join
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_non_string_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partitioned_date_time
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce_groupby_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_mapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_nested_udf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_date_funcs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_nested_mapjoin
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vecto

[jira] [Updated] (HIVE-9819) Add timeout check inside the HMS server

2015-03-16 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-9819:

Attachment: HIVE-9819.patch

Re-run this patch for testing based on the fixing in HIVE-9906

> Add timeout check inside the HMS server
> ---
>
> Key: HIVE-9819
> URL: https://issues.apache.org/jira/browse/HIVE-9819
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-9819.patch, HIVE-9819.patch, HIVE-9819.patch
>
>
> In HIVE-9253, a timeout check mechanism is added for long running methods in 
> HMS server. We should add this check to each of the inner loops inside the 
> HMS server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9896) \N un-recognized in AVRO format Hive tables

2015-03-16 Thread Madhan Sundararajan Devaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Sundararajan Devaki updated HIVE-9896:
-
Description: 
We Sqooped (1.4.5) data from many RDBMS into HDFS in text format with options 
--null-non-string '\N' --null-string '\N'.
When we load these into Hive tables in text format the \N is properly 
recognized as NULL and we are able to use SQL clauses such as IS NULL and IS 
NOT NULL against columns.
However, when we convert the text files into AVRO (1.7.6) with SNAPPY 
compression through Pig (0.12) and try to query using the above SQL clauses, 
the query does not return results as expected.
Further, we have to use column_name = '\N' or column_name <> '\N' as a 
workaround.

  was:
We Sqooped (1.4.5) data from many RDBMS into HDFS in text format with options 
--null-non-string '\N' --null-string '\N'.
When we load these into Hive tables in text format the \N is properly 
recognized as NULL and we are able to use SQL clauses such as IS NULL and IS 
NOT NULL against columns.
However, when we convert the text files into AVRO (1.7.6) with SNAPPY 
compression and try to query using the above SQL clauses, the query does not 
return results as expected.
Further, we have to use column_name = '\N' or column_name <> '\N' as a 
workaround.


> \N un-recognized in AVRO format Hive tables
> ---
>
> Key: HIVE-9896
> URL: https://issues.apache.org/jira/browse/HIVE-9896
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, File Formats, Hive
>Affects Versions: 0.13.0
> Environment: CDH5.2.1, RHEL6.5, Java 7
>Reporter: Madhan Sundararajan Devaki
>
> We Sqooped (1.4.5) data from many RDBMS into HDFS in text format with options 
> --null-non-string '\N' --null-string '\N'.
> When we load these into Hive tables in text format the \N is properly 
> recognized as NULL and we are able to use SQL clauses such as IS NULL and IS 
> NOT NULL against columns.
> However, when we convert the text files into AVRO (1.7.6) with SNAPPY 
> compression through Pig (0.12) and try to query using the above SQL clauses, 
> the query does not return results as expected.
> Further, we have to use column_name = '\N' or column_name <> '\N' as a 
> workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9819) Add timeout check inside the HMS server

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363043#comment-14363043
 ] 

Hive QA commented on HIVE-9819:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704741/HIVE-9819.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7764 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3041/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3041/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3041/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12704741 - PreCommit-HIVE-TRUNK-Build

> Add timeout check inside the HMS server
> ---
>
> Key: HIVE-9819
> URL: https://issues.apache.org/jira/browse/HIVE-9819
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-9819.patch, HIVE-9819.patch, HIVE-9819.patch
>
>
> In HIVE-9253, a timeout check mechanism is added for long running methods in 
> HMS server. We should add this check to each of the inner loops inside the 
> HMS server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9972) Append to existing, already closed RCFile

2015-03-16 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HIVE-9972:

Attachment: RCFile-append.patch

> Append to existing, already closed RCFile
> -
>
> Key: HIVE-9972
> URL: https://issues.apache.org/jira/browse/HIVE-9972
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Reporter: Max Lapan
> Attachments: RCFile-append.patch
>
>
> In our project we use RCFiles to store income streams of data, which 
> periodically processed by MR jobs. To minimise delays in data availability 
> for this jobs, I extended RCFile.Writer functionality to support append of 
> new data to already closed file, which allows new data to be instantly 
> available right after close call.
> I think this kind of functionality can be useful not only in our case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite

2015-03-16 Thread Douglas Moore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Douglas Moore updated HIVE-9368:

Description: 
Join order in explain is different from that provided by Calcite, this was 
observed during the POC. 

Logical plan from Calcite :
{code}
2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner 
(CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], lvl5_id=[$5], 
view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13], 
account_id=[$14], mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative cost 
= {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 636
  HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], 
agg#0=[SUM($15)]): rowcount = 2.53152774E8, cumulative cost = 
{3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 634
HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24], 
$f6=[$6], $f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21], 
$f13=[$18], $f14=[$19], $f15=[*($13, $20)]): rowcount = 3.401053197411791E11, 
cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 632
  HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9], 
cost_center_lvl1_id=[$10], cost_pool_lvl6_id=[$11], activity_id=[$12], 
view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15], client_id=[$16], 
lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20], 
time_key_num0=[$0], activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3], 
ssn_id=[$4], account_id=[$5], driver_pct=[$6], lvl5_id=[$25], 
current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22], 
current_ind0=[$23], activity_id1=[$24]): rowcount = 3.401053197411791E11, 
cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 692
HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)), 
=($12, $1))], joinType=[inner]): rowcount = 3.401053197411791E11, cumulative 
cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 690
  HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2], 
fprs_id=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount = 
2.926396239E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 590

HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]): 
rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
  HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount = 
1.0871372980143067E8, cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu, 
0.0 io}, id = 688
HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount = 
1.4392118216323378E7, cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io}, id 
= 653
  HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], activity_id=[$5], 
view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount = 
6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587

HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount = 
6870067.0, cumulative cost = {0}, id = 464
  HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51], 
activity_id=[$60]): rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 
0.0 io}, id = 628
HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]): 
rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
  HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]): 
rowcount = 40683.0, cumulative cost = {0}, id = 467
HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]): 
rowcount = 794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
  HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
HiveTableScan(table=[[fidelity.fobi_fprs_dim_mv_orc]]): 
rowcount = 1589540.0, cumulative cost = {0}, id = 466
{code}

Plan #1 with Fetch column stats on 
{code}
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 4 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20150113185454_d7ce6ecf-2d50-45ed-8a88-6283bb091b0e:3
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: driver
  filterExpr: (((time_key_num is not null and plan_id is not 
null)

[jira] [Updated] (HIVE-9974) Sensitive data redaction: data appears in name of mapreduce job

2015-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9974:
--
Attachment: HIVE-9974.1.patch

Here's the patch.
There was one left method where needed the data redaction. Because execute() is 
executed after compileInternal(), then we just get the query already redacted 
from the conf variable. 

> Sensitive data redaction: data appears in name of mapreduce job
> ---
>
> Key: HIVE-9974
> URL: https://issues.apache.org/jira/browse/HIVE-9974
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9974.1.patch
>
>
> Set up a cluster, configured a redaction rule to redact "B0096EZHM2", and ran 
> Hive queries on the cluster.
> Looking at the YARN RM web UI and Job History Server web UI, I see that the 
> mapreduce jobs spawned by the Hive queries have the sensitive data 
> ("B0096EZHM2") showing in the job names:
> e.g., "select product, useri...product='B0096EZHM2'(Stage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9974) Sensitive data redaction: data appears in name of mapreduce job

2015-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363245#comment-14363245
 ] 

Sergio Peña commented on HIVE-9974:
---

[~xuefuz] Could you help me review this code?
Thanks.

> Sensitive data redaction: data appears in name of mapreduce job
> ---
>
> Key: HIVE-9974
> URL: https://issues.apache.org/jira/browse/HIVE-9974
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9974.1.patch
>
>
> Set up a cluster, configured a redaction rule to redact "B0096EZHM2", and ran 
> Hive queries on the cluster.
> Looking at the YARN RM web UI and Job History Server web UI, I see that the 
> mapreduce jobs spawned by the Hive queries have the sensitive data 
> ("B0096EZHM2") showing in the job names:
> e.g., "select product, useri...product='B0096EZHM2'(Stage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite

2015-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar resolved HIVE-9368.
---
Resolution: Done
  Assignee: Mostafa Mokhtar  (was: Vikram Dixit K)

> Physical optimizer : Join order in Explain is different from join order 
> provided by Calcite
> ---
>
> Key: HIVE-9368
> URL: https://issues.apache.org/jira/browse/HIVE-9368
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.0
>
> Attachments: explain_fetch_column_stats_off.txt, 
> explain_fetch_column_stats_on.txt
>
>
> Join order in explain is different from that provided by Calcite, this was 
> observed during the POC. 
> Logical plan from Calcite :
> {code}
> 2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
> HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], lvl5_id=[$5], 
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
> lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13], 
> account_id=[$14], mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative 
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 636
>   HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], 
> agg#0=[SUM($15)]): rowcount = 2.53152774E8, cumulative cost = 
> {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 634
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24], 
> $f6=[$6], $f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21], 
> $f13=[$18], $f14=[$19], $f15=[*($13, $20)]): rowcount = 3.401053197411791E11, 
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 632
>   HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9], 
> cost_center_lvl1_id=[$10], cost_pool_lvl6_id=[$11], activity_id=[$12], 
> view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15], client_id=[$16], 
> lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20], 
> time_key_num0=[$0], activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3], 
> ssn_id=[$4], account_id=[$5], driver_pct=[$6], lvl5_id=[$25], 
> current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22], 
> current_ind0=[$23], activity_id1=[$24]): rowcount = 3.401053197411791E11, 
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 692
> HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)), 
> =($12, $1))], joinType=[inner]): rowcount = 3.401053197411791E11, cumulative 
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 690
>   HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2], 
> fprs_id=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount = 
> 2.926396239E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 590
> 
> HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]): 
> rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
>   HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount = 
> 1.0871372980143067E8, cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu, 
> 0.0 io}, id = 688
> HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount = 
> 1.4392118216323378E7, cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io}, 
> id = 653
>   HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], activity_id=[$5], 
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
> lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount = 
> 6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587
> 
> HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount 
> = 6870067.0, cumulative cost = {0}, id = 464
>   HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51], 
> activity_id=[$60]): rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 
> cpu, 0.0 io}, id = 628
> HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]): 
> rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
>   HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]): 
> rowcount = 40683.0, cumulative cost = {0}, id = 467
> HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]): 
> rowcount = 794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
>   HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
> Hi

[jira] [Updated] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite

2015-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-9368:
--
Attachment: (was: explain_fetch_column_stats_off.txt)

> Physical optimizer : Join order in Explain is different from join order 
> provided by Calcite
> ---
>
> Key: HIVE-9368
> URL: https://issues.apache.org/jira/browse/HIVE-9368
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.0
>
>
> Join order in explain is different from that provided by Calcite, this was 
> observed during the POC. 
> Logical plan from Calcite :
> {code}
> 2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
> HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], lvl5_id=[$5], 
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
> lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13], 
> account_id=[$14], mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative 
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 636
>   HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], 
> agg#0=[SUM($15)]): rowcount = 2.53152774E8, cumulative cost = 
> {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 634
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24], 
> $f6=[$6], $f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21], 
> $f13=[$18], $f14=[$19], $f15=[*($13, $20)]): rowcount = 3.401053197411791E11, 
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 632
>   HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9], 
> cost_center_lvl1_id=[$10], cost_pool_lvl6_id=[$11], activity_id=[$12], 
> view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15], client_id=[$16], 
> lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20], 
> time_key_num0=[$0], activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3], 
> ssn_id=[$4], account_id=[$5], driver_pct=[$6], lvl5_id=[$25], 
> current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22], 
> current_ind0=[$23], activity_id1=[$24]): rowcount = 3.401053197411791E11, 
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 692
> HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)), 
> =($12, $1))], joinType=[inner]): rowcount = 3.401053197411791E11, cumulative 
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 690
>   HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2], 
> fprs_id=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount = 
> 2.926396239E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 590
> 
> HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]): 
> rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
>   HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount = 
> 1.0871372980143067E8, cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu, 
> 0.0 io}, id = 688
> HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount = 
> 1.4392118216323378E7, cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io}, 
> id = 653
>   HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], activity_id=[$5], 
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
> lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount = 
> 6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587
> 
> HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount 
> = 6870067.0, cumulative cost = {0}, id = 464
>   HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51], 
> activity_id=[$60]): rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 
> cpu, 0.0 io}, id = 628
> HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]): 
> rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
>   HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]): 
> rowcount = 40683.0, cumulative cost = {0}, id = 467
> HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]): 
> rowcount = 794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
>   HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
> HiveTableScan(table=[[fidelity.fobi_fprs_dim_mv_orc]]): 
> rowcount = 1589540.0, cumulative cost = {0}, id = 466
> {

[jira] [Updated] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite

2015-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-9368:
--
Description: 
Join order in explain is different from that provided by Calcite. 



  was:
Join order in explain is different from that provided by Calcite, this was 
observed during the POC. 

Logical plan from Calcite :
{code}
2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner 
(CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], lvl5_id=[$5], 
view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13], 
account_id=[$14], mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative cost 
= {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 636
  HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], 
agg#0=[SUM($15)]): rowcount = 2.53152774E8, cumulative cost = 
{3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 634
HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24], 
$f6=[$6], $f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21], 
$f13=[$18], $f14=[$19], $f15=[*($13, $20)]): rowcount = 3.401053197411791E11, 
cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 632
  HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9], 
cost_center_lvl1_id=[$10], cost_pool_lvl6_id=[$11], activity_id=[$12], 
view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15], client_id=[$16], 
lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20], 
time_key_num0=[$0], activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3], 
ssn_id=[$4], account_id=[$5], driver_pct=[$6], lvl5_id=[$25], 
current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22], 
current_ind0=[$23], activity_id1=[$24]): rowcount = 3.401053197411791E11, 
cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 692
HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)), 
=($12, $1))], joinType=[inner]): rowcount = 3.401053197411791E11, cumulative 
cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 690
  HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2], 
fprs_id=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount = 
2.926396239E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 590

HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]): 
rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
  HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount = 
1.0871372980143067E8, cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu, 
0.0 io}, id = 688
HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount = 
1.4392118216323378E7, cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io}, id 
= 653
  HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], activity_id=[$5], 
view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount = 
6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587

HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount = 
6870067.0, cumulative cost = {0}, id = 464
  HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51], 
activity_id=[$60]): rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 
0.0 io}, id = 628
HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]): 
rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
  HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]): 
rowcount = 40683.0, cumulative cost = {0}, id = 467
HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]): 
rowcount = 794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
  HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0, 
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
HiveTableScan(table=[[fidelity.fobi_fprs_dim_mv_orc]]): 
rowcount = 1589540.0, cumulative cost = {0}, id = 466
{code}

Plan #1 with Fetch column stats on 
{code}
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 4 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
  DagName: mmokhtar_20150113185454_d7ce6ecf-2d50-45ed-8a88-6283bb091b0e:3
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: driver
  

[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Attachment: llap_vertex_200ms.png

> Possible race condition in DynamicPartitionPruner for <200ms tasks
> --
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Description: 
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

!llap_vertex_200ms.png!

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be happening with LLAP.

  was:
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be happening with LLAP.


> Possible race condition in DynamicPartitionPruner for <200ms tasks
> --
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite

2015-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-9368:
--
Fix Version/s: (was: 1.2.0)

> Physical optimizer : Join order in Explain is different from join order 
> provided by Calcite
> ---
>
> Key: HIVE-9368
> URL: https://issues.apache.org/jira/browse/HIVE-9368
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
>
> Join order in explain is different from that provided by Calcite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9368) Physical optimizer : Join order in Explain is different from join order provided by Calcite

2015-03-16 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-9368:
--
Attachment: (was: explain_fetch_column_stats_on.txt)

> Physical optimizer : Join order in Explain is different from join order 
> provided by Calcite
> ---
>
> Key: HIVE-9368
> URL: https://issues.apache.org/jira/browse/HIVE-9368
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 0.14.0
>Reporter: Mostafa Mokhtar
>Assignee: Mostafa Mokhtar
> Fix For: 1.2.0
>
>
> Join order in explain is different from that provided by Calcite, this was 
> observed during the POC. 
> Logical plan from Calcite :
> {code}
> 2015-01-13 18:54:42,892 DEBUG [main]: parse.CalcitePlanner 
> (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
> HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], lvl5_id=[$5], 
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
> lob_id=[$10], product_id=[$11], fprs_lvl5_id=[$12], ssn_id=[$13], 
> account_id=[$14], mtd_balance=[$15]): rowcount = 2.53152774E8, cumulative 
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 636
>   HiveAggregate(group=[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14}], 
> agg#0=[SUM($15)]): rowcount = 2.53152774E8, cumulative cost = 
> {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 634
> HiveProject($f0=[$0], $f1=[$1], $f2=[$2], $f3=[$3], $f4=[$4], $f5=[$24], 
> $f6=[$6], $f7=[$7], $f8=[$8], $f9=[$9], $f10=[$10], $f11=[$11], $f12=[$21], 
> $f13=[$18], $f14=[$19], $f15=[*($13, $20)]): rowcount = 3.401053197411791E11, 
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 632
>   HiveProject(scale=[$7], time_key_num=[$8], dataset_code=[$9], 
> cost_center_lvl1_id=[$10], cost_pool_lvl6_id=[$11], activity_id=[$12], 
> view_lvl1_id=[$13], from_lvl1_id=[$14], plan_id=[$15], client_id=[$16], 
> lob_id=[$17], product_id=[$18], fprs_id=[$19], mtd_balance=[$20], 
> time_key_num0=[$0], activity_id0=[$1], plan_id0=[$2], fprs_id0=[$3], 
> ssn_id=[$4], account_id=[$5], driver_pct=[$6], lvl5_id=[$25], 
> current_ind=[$26], fprs_id1=[$27], lvl5_id0=[$21], rollup_key=[$22], 
> current_ind0=[$23], activity_id1=[$24]): rowcount = 3.401053197411791E11, 
> cumulative cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 692
> HiveJoin(condition=[AND(AND(AND(=($8, $0), =($15, $2)), =($19, $3)), 
> =($12, $1))], joinType=[inner]): rowcount = 3.401053197411791E11, cumulative 
> cost = {3.057177094767754E9 rows, 0.0 cpu, 0.0 io}, id = 690
>   HiveProject(time_key_num=[$0], activity_id=[$1], plan_id=[$2], 
> fprs_id=[$3], ssn_id=[$4], account_id=[$5], driver_pct=[$6]): rowcount = 
> 2.926396239E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 590
> 
> HiveTableScan(table=[[fidelity.fcap_drivers_part_exp_inter_bucket_256]]): 
> rowcount = 2.926396239E9, cumulative cost = {0}, id = 465
>   HiveJoin(condition=[=($12, $20)], joinType=[inner]): rowcount = 
> 1.0871372980143067E8, cumulative cost = {2.2067125966323376E7 rows, 0.0 cpu, 
> 0.0 io}, id = 688
> HiveJoin(condition=[=($5, $17)], joinType=[inner]): rowcount = 
> 1.4392118216323378E7, cumulative cost = {6880237.75 rows, 0.0 cpu, 0.0 io}, 
> id = 653
>   HiveProject(scale=[$0], time_key_num=[$1], dataset_code=[$2], 
> cost_center_lvl1_id=[$3], cost_pool_lvl6_id=[$4], activity_id=[$5], 
> view_lvl1_id=[$6], from_lvl1_id=[$7], plan_id=[$8], client_id=[$9], 
> lob_id=[$10], product_id=[$11], fprs_id=[$12], mtd_balance=[$14]): rowcount = 
> 6870067.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 587
> 
> HiveTableScan(table=[[fidelity.fcap_agg_prod_exp_nofund_decimal]]): rowcount 
> = 6870067.0, cumulative cost = {0}, id = 464
>   HiveProject(lvl5_id=[$36], rollup_key=[$48], current_ind=[$51], 
> activity_id=[$60]): rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 
> cpu, 0.0 io}, id = 628
> HiveFilter(condition=[AND(=($51, 'Y'), =($48, 'TOTACT'))]): 
> rowcount = 10170.75, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 626
>   HiveTableScan(table=[[fidelity.fobi_activity_dim_mv]]): 
> rowcount = 40683.0, cumulative cost = {0}, id = 467
> HiveProject(lvl5_id=[$36], current_ind=[$51], fprs_id=[$58]): 
> rowcount = 794770.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 622
>   HiveFilter(condition=[=($51, 'Y')]): rowcount = 794770.0, 
> cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 620
> HiveTableScan(table=[[fidelity.fobi_fprs_dim_mv_orc]]): 
> rowcount = 1589540.0, cumulative cost = {0}, id = 466
> {c

[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Description: 
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

!llap_vertex_200ms.png!

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be consistently happening with LLAP.

  was:
Race condition in the DynamicPartitionPruner between 
DynamicPartitionPruner::processVertex() and DynamicPartitionpruner::addEvent() 
for tasks which respond with both the result and success in a single heartbeat 
sequence.

{code}
2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
tez.DynamicPartitionPruner: Expecting: 1, received: 0
2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
Vertex Input: store_sales initializer failed, 
vertex=vertex_1424502260528_1113_4_04 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
dynamic parition pruning
{code}

!llap_vertex_200ms.png!

All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger this, 
which seems to be happening with LLAP.


> Possible race condition in DynamicPartitionPruner for <200ms tasks
> --
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7926

> Possible race condition in DynamicPartitionPruner for <200ms tasks
> --
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8817) Create unit test where we insert into an encrypted table and then read from it with pig

2015-03-16 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-8817:
---
Attachment: HIVE-8817.patch

[~spena], can you help me review it? Thank you!

> Create unit test where we insert into an encrypted table and then read from 
> it with pig
> ---
>
> Key: HIVE-8817
> URL: https://issues.apache.org/jira/browse/HIVE-8817
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: encryption-branch
>Reporter: Brock Noland
>Assignee: Ferdinand Xu
> Fix For: encryption-branch
>
> Attachments: HIVE-8817.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363355#comment-14363355
 ] 

Aihua Xu commented on HIVE-3454:


Thanks [~jdere]. We need to correct MathErpr.longToTimestamp() function since 
we have inconsistency with the int/long type, Correct?

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9970) Hive on spark

2015-03-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363377#comment-14363377
 ] 

Jimmy Xiang commented on HIVE-9970:
---

Any exception in your hive.log? If you can attach your log here, it will be 
great.  If you see some /tmp/spark-events folder missing in your log, you can 
try to create that folder, or set spark.eventLog.enabled=false; and run your 
cli/query again.

By the way, you don't need to do this: "add jar 
/opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar;".




> Hive on spark
> -
>
> Key: HIVE-9970
> URL: https://issues.apache.org/jira/browse/HIVE-9970
> Project: Hive
>  Issue Type: Bug
>Reporter: Amithsha
>
> Hi all,
> Recently i have configured Spark 1.2.0 and my environment is hadoop
> 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing
> insert into i am getting the following g error.
> Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63
> Total jobs = 1
> Launching Job 1 out of 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=
> In order to set a constant number of reducers:
> set mapreduce.job.reduces=
> Failed to execute spark task, with exception
> 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create
> spark client.)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> Have added the spark-assembly jar in hive lib
> And also in hive console using the command add jar followed by the steps
> set spark.home=/opt/spark-1.2.1/;
> add jar 
> /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar;
> set hive.execution.engine=spark;
> set spark.master=spark://xxx:7077;
> set spark.eventLog.enabled=true;
> set spark.executor.memory=512m;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> Can anyone suggest
> Thanks & Regards
> Amithsha



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9970) Hive on spark

2015-03-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363384#comment-14363384
 ] 

Jimmy Xiang commented on HIVE-9970:
---

When spark.eventLog.enabled is set to true, you'd set spark.eventLog.dir to an 
existing folder (default /tmp/spark-events). I added this setting to the Hive 
on Spark Getting Started page.

> Hive on spark
> -
>
> Key: HIVE-9970
> URL: https://issues.apache.org/jira/browse/HIVE-9970
> Project: Hive
>  Issue Type: Bug
>Reporter: Amithsha
>
> Hi all,
> Recently i have configured Spark 1.2.0 and my environment is hadoop
> 2.6.0 hive 1.1.0 Here i have tried hive on Spark while executing
> insert into i am getting the following g error.
> Query ID = hadoop2_20150313162828_8764adad-a8e4-49da-9ef5-35e4ebd6bc63
> Total jobs = 1
> Launching Job 1 out of 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=
> In order to set a constant number of reducers:
> set mapreduce.job.reduces=
> Failed to execute spark task, with exception
> 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create
> spark client.)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> Have added the spark-assembly jar in hive lib
> And also in hive console using the command add jar followed by the steps
> set spark.home=/opt/spark-1.2.1/;
> add jar 
> /opt/spark-1.2.1/assembly/target/scala-2.10/spark-assembly-1.2.1-hadoop2.4.0.jar;
> set hive.execution.engine=spark;
> set spark.master=spark://xxx:7077;
> set spark.eventLog.enabled=true;
> set spark.executor.memory=512m;
> set spark.serializer=org.apache.spark.serializer.KryoSerializer;
> Can anyone suggest
> Thanks & Regards
> Amithsha



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9974) Sensitive data redaction: data appears in name of mapreduce job

2015-03-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363397#comment-14363397
 ] 

Xuefu Zhang commented on HIVE-9974:
---

+1

> Sensitive data redaction: data appears in name of mapreduce job
> ---
>
> Key: HIVE-9974
> URL: https://issues.apache.org/jira/browse/HIVE-9974
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9974.1.patch
>
>
> Set up a cluster, configured a redaction rule to redact "B0096EZHM2", and ran 
> Hive queries on the cluster.
> Looking at the YARN RM web UI and Job History Server web UI, I see that the 
> mapreduce jobs spawned by the Hive queries have the sensitive data 
> ("B0096EZHM2") showing in the job names:
> e.g., "select product, useri...product='B0096EZHM2'(Stage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9974) Sensitive data redaction: data appears in name of mapreduce job

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363410#comment-14363410
 ] 

Hive QA commented on HIVE-9974:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704778/HIVE-9974.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7764 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_context_ngrams
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3042/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3042/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3042/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12704778 - PreCommit-HIVE-TRUNK-Build

> Sensitive data redaction: data appears in name of mapreduce job
> ---
>
> Key: HIVE-9974
> URL: https://issues.apache.org/jira/browse/HIVE-9974
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9974.1.patch
>
>
> Set up a cluster, configured a redaction rule to redact "B0096EZHM2", and ran 
> Hive queries on the cluster.
> Looking at the YARN RM web UI and Job History Server web UI, I see that the 
> mapreduce jobs spawned by the Hive queries have the sensitive data 
> ("B0096EZHM2") showing in the job names:
> e.g., "select product, useri...product='B0096EZHM2'(Stage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9978) LLAP: OrcColumnVectorProducer should handle reading isPresent columns only

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9978:
--
Summary: LLAP: OrcColumnVectorProducer should handle reading isPresent 
columns only  (was: OrcColumnVectorProducer should handle reading isPresent 
columns only)

> LLAP: OrcColumnVectorProducer should handle reading isPresent columns only
> --
>
> Key: HIVE-9978
> URL: https://issues.apache.org/jira/browse/HIVE-9978
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: llap
>Reporter: Gopal V
>
> LlapInputFormat does not understand the difference between empty columns list 
> and null columns list.
> The empty columns list indicates no columns read except the root struct 
> isPresent column, while the null columns list indicates that all columns are 
> being read.
> {code}
> select count(1) from store_sales join date_dim on ss_sold_date_sk = d_date_sk 
> where d_date = '1998-01-01';
> ...
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcColumnVectorProducer.createReadPipeline(OrcColumnVectorProducer.java:72)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.startRead(LlapInputFormat.java:181)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:99)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9978) LLAP: OrcColumnVectorProducer should handle reading isPresent columns only

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9978:
--
Assignee: Sergey Shelukhin

> LLAP: OrcColumnVectorProducer should handle reading isPresent columns only
> --
>
> Key: HIVE-9978
> URL: https://issues.apache.org/jira/browse/HIVE-9978
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>
> LlapInputFormat does not understand the difference between empty columns list 
> and null columns list.
> The empty columns list indicates no columns read except the root struct 
> isPresent column, while the null columns list indicates that all columns are 
> being read.
> {code}
> select count(1) from store_sales join date_dim on ss_sold_date_sk = d_date_sk 
> where d_date = '1998-01-01';
> ...
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcColumnVectorProducer.createReadPipeline(OrcColumnVectorProducer.java:72)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.startRead(LlapInputFormat.java:181)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:99)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9978) LLAP: OrcColumnVectorProducer should handle reading isPresent columns only

2015-03-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363475#comment-14363475
 ] 

Gopal V commented on HIVE-9978:
---

I see comments about this in code from [~sershe].

{code}
try {
  List includedCols = ColumnProjectionUtils.isReadAllColumns(job)
  ? null : ColumnProjectionUtils.getReadColumnIDs(job);
  if (includedCols.isEmpty()) {
includedCols = null; // Also means read all columns? WTF?
  }
{code}

> LLAP: OrcColumnVectorProducer should handle reading isPresent columns only
> --
>
> Key: HIVE-9978
> URL: https://issues.apache.org/jira/browse/HIVE-9978
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>
> LlapInputFormat does not understand the difference between empty columns list 
> and null columns list.
> The empty columns list indicates no columns read except the root struct 
> isPresent column, while the null columns list indicates that all columns are 
> being read.
> {code}
> select count(1) from store_sales join date_dim on ss_sold_date_sk = d_date_sk 
> where d_date = '1998-01-01';
> ...
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcColumnVectorProducer.createReadPipeline(OrcColumnVectorProducer.java:72)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.startRead(LlapInputFormat.java:181)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:140)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapInputFormat$LlapRecordReader.next(LlapInputFormat.java:99)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 22 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9976) LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9976:
--
Summary: LLAP: Possible race condition in DynamicPartitionPruner for <200ms 
tasks  (was: Possible race condition in DynamicPartitionPruner for <200ms tasks)

> LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks
> 
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join

2015-03-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363494#comment-14363494
 ] 

Gopal V commented on HIVE-9937:
---

[~mmccline]: Pretty impressive performance difference for a shuffle-heavy 
group-by is almost ~3x cpu savings.

But there are some off-by-one errors somewhere, the results out of a few keys 
seem incorrect in the smaller test runs. Trying to produce a narrower test-case.

> LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new 
> Vectorized Map Join
> --
>
> Key: HIVE-9937
> URL: https://issues.apache.org/jira/browse/HIVE-9937
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-9937.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9937) LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join

2015-03-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363496#comment-14363496
 ] 

Gopal V commented on HIVE-9937:
---

{code}
Caused by: java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow$StringExtractorByValue.extract(VectorExtractRow.java:427)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:675)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:93)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:835)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:135)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:835)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:160)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
... 18 more
{code}

> LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new 
> Vectorized Map Join
> --
>
> Key: HIVE-9937
> URL: https://issues.apache.org/jira/browse/HIVE-9937
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-9937.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363502#comment-14363502
 ] 

Thejas M Nair commented on HIVE-9957:
-

[~spena] Thanks for fix! However, there is one improvement that can be made in 
this fix. Exceptions are expensive, so rather than throwing them every time, we 
can use this design pattern followed for some other functions in Hadoop23Shims 
- 

{code}
  protected static final Method accessMethod;
  protected static final Method getPasswordMethod;

  static {
Method m = null;
try {
  m = FileSystem.class.getMethod("access", Path.class, FsAction.class);
} catch (NoSuchMethodException err) {
  // This version of Hadoop does not support FileSystem.access().
}
accessMethod = m;

try {
  m = Configuration.class.getMethod("getPassword", String.class);
} catch (NoSuchMethodException err) {
  // This version of Hadoop does not support getPassword(), just retrieve 
password from conf.
  m = null;
}
getPasswordMethod = m;
  }
{code}

Would you be able to make that improvement ? We can use a new jira for that.


> Hive 1.1.0 not compatible with Hadoop 2.4.0
> ---
>
> Key: HIVE-9957
> URL: https://issues.apache.org/jira/browse/HIVE-9957
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Vivek Shrivastava
>Assignee: Sergio Peña
> Fix For: 1.2.0
>
> Attachments: HIVE-9957.1.patch
>
>
> Getting this exception while accessing data through Hive. 
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.(Hadoop23Shims.java:1152)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9950) fix rehash in CuckooSetBytes and CuckooSetLong

2015-03-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363507#comment-14363507
 ] 

Thejas M Nair commented on HIVE-9950:
-

[~mmccline] Can you please review this fix ?


> fix rehash in CuckooSetBytes and CuckooSetLong
> --
>
> Key: HIVE-9950
> URL: https://issues.apache.org/jira/browse/HIVE-9950
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: HIVE-9950.1.patch
>
>
> both classes have the following
> {code}
> if (prev1 == null) {
>   prev1 = t1;
>   prev1 = t2;
> }
> {code}
> most probably it should be
> {code}
> if (prev1 == null) {
>   prev1 = t1;
>   prev2 = t2;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363503#comment-14363503
 ] 

Thejas M Nair commented on HIVE-9957:
-

[~leftylev] Yes, I think it would be useful to document it under those two 
sections. [~spena] Do you know the hadoop version that introduced this api ? 
(we should list that its broken for all 2.x versions earlier than that).

> Hive 1.1.0 not compatible with Hadoop 2.4.0
> ---
>
> Key: HIVE-9957
> URL: https://issues.apache.org/jira/browse/HIVE-9957
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Vivek Shrivastava
>Assignee: Sergio Peña
> Fix For: 1.2.0
>
> Attachments: HIVE-9957.1.patch
>
>
> Getting this exception while accessing data through Hive. 
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.(Hadoop23Shims.java:1152)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9971) Clean up operator class

2015-03-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-9971:
-
Attachment: HIVE-9971.2.patch

> Clean up operator class
> ---
>
> Key: HIVE-9971
> URL: https://issues.apache.org/jira/browse/HIVE-9971
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-9971.1.patch, HIVE-9971.2.patch
>
>
> This is mostly cleanup although it does enhance the pipeline in one respect. 
> It introduces asyn init for operators and uses it for hash table loading 
> where desired.
> There's a bunch of weird code associated with the operator class:
> - initialize isn't recursive, rather initializeOp is supposed to call 
> initializeChildren. That has led to bugs in the past.
> - setExecContext and passExecContext. Both are recursive, but passExecContext 
> calls setExecContext and then recurses again. Boo.
> - lots of (getChildren() != null) although that can't happen anymore
> - TezCacheAccess is a hack. We should just leave init of inputs up to the 
> operator that needs it.
> - Need some sanity checks that make sure that operators were all initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9960) Hive not backward compatibilie while adding optional new field to struct in parquet files

2015-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9960:
--
Attachment: HIVE-9960.1.patch

> Hive not backward compatibilie while adding optional new field to struct in 
> parquet files
> -
>
> Key: HIVE-9960
> URL: https://issues.apache.org/jira/browse/HIVE-9960
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Arup Malakar
>Assignee: Sergio Peña
> Attachments: HIVE-9960.1.patch
>
>
> I recently added an optional field to a struct, when I tried to query old 
> data with the new hive table which has the new field as column it throws 
> error. Any clue how I can make it backward compatible so that I am still able 
> to query old data with the new table definition.
>  
> I am using hive-0.14.0 release with  HIVE-8909 patch applied.
> Details:
> New optional field in a struct
> {code}
> struct Event {
> 1: optional Type type;
> 2: optional map values;
> 3: optional i32 num = -1; // <--- New field
> }
> {code}
> Main thrift definition
> {code}
>  10: optional list events;
> {code}
> Corresponding hive table definition
> {code}
>   events array< struct , num: int>>)
> {code}
> Try to read something from the old data, using the new table definition
> {{select events from table1 limit 1;}}
> Failed with exception:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> Error thrown:   
> 15/03/12 17:23:43 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> 
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) 
>   
>  
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)  
>   
>  
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)   
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   
>   
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)   
>   
>  
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)  
>   
>  
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   
>  
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>   
>   
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   
>
> at java.lang.reflect.Method.invoke(Method.java:597)   
>   
>  
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>

[jira] [Commented] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-16 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363528#comment-14363528
 ] 

Gopal V commented on HIVE-9979:
---

Even the String dictionary readers are triggering exceptions

{code}
2015-03-16 10:20:51,439 
[pool-2-thread-3(container_1_1141_01_000192_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
 1_191_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
data for column 9 RG 112 stream DATA at 62278935, 1057137 index position 0: 
compressed [62614934, 63139228)
2015-03-16 10:20:51,439 
[pool-2-thread-6(container_1_1141_01_000211_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
 1_210_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
stripe-level stream [LENGTH, kind: DICTIONARY_V2
dictionarySize: 3
] for column 9 RG 91 at 64139927, 5
...
Caused by: java.io.EOFException
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDirect(RecordReaderUtils.java:286)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:266)
at 
org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:234)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
at 
org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
... 4 more
{code}

> LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
> 
>
> Key: HIVE-9979
> URL: https://issues.apache.org/jira/browse/HIVE-9979
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>
> When the cache is enabled, queries throws different over-read exceptions.
> Looks like the batchSize changes as you read data, the end of stripe 
> batchSize is smaller than the default size (the super calls change it).
> {code}
> Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
> stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
> 46399488
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9960) Hive not backward compatibilie while adding optional new field to struct in parquet files

2015-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reassigned HIVE-9960:
-

Assignee: Sergio Peña

> Hive not backward compatibilie while adding optional new field to struct in 
> parquet files
> -
>
> Key: HIVE-9960
> URL: https://issues.apache.org/jira/browse/HIVE-9960
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Arup Malakar
>Assignee: Sergio Peña
> Attachments: HIVE-9960.1.patch
>
>
> I recently added an optional field to a struct, when I tried to query old 
> data with the new hive table which has the new field as column it throws 
> error. Any clue how I can make it backward compatible so that I am still able 
> to query old data with the new table definition.
>  
> I am using hive-0.14.0 release with  HIVE-8909 patch applied.
> Details:
> New optional field in a struct
> {code}
> struct Event {
> 1: optional Type type;
> 2: optional map values;
> 3: optional i32 num = -1; // <--- New field
> }
> {code}
> Main thrift definition
> {code}
>  10: optional list events;
> {code}
> Corresponding hive table definition
> {code}
>   events array< struct , num: int>>)
> {code}
> Try to read something from the old data, using the new table definition
> {{select events from table1 limit 1;}}
> Failed with exception:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> Error thrown:   
> 15/03/12 17:23:43 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> 
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) 
>   
>  
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)  
>   
>  
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)   
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   
>   
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)   
>   
>  
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)  
>   
>  
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   
>  
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>   
>   
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   
>
> at java.lang.reflect.Method.invoke(Method.java:597)   
>   
>  
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 

[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363532#comment-14363532
 ] 

Sergio Peña commented on HIVE-9957:
---

Yes, I will use this way to detect the method incompatibility. I see that using 
it as static, this try/catch will be called only once.
Thanks [~thejas]

[~leftylev] The hadoop version that introduces encryption is 2.6.0.

> Hive 1.1.0 not compatible with Hadoop 2.4.0
> ---
>
> Key: HIVE-9957
> URL: https://issues.apache.org/jira/browse/HIVE-9957
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Vivek Shrivastava
>Assignee: Sergio Peña
> Fix For: 1.2.0
>
> Attachments: HIVE-9957.1.patch
>
>
> Getting this exception while accessing data through Hive. 
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.(Hadoop23Shims.java:1152)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9953) fix NPE in WindowingTableFunction

2015-03-16 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363575#comment-14363575
 ] 

Jason Dere commented on HIVE-9953:
--

Just curious, is this Jira (and the other NPE fixes) the work for HIVE-9809, 
broken down into separate bugs?

> fix NPE in WindowingTableFunction
> -
>
> Key: HIVE-9953
> URL: https://issues.apache.org/jira/browse/HIVE-9953
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Attachments: HIVE-9953.1.patch
>
>
> WindowingTableFunction line 1193
> {code}
> // now
> return (s1 == null && s2 == null) || s1.equals(s2);
> // should be
> return (s1 == null && s2 == null) || (s1 != null && s1.equals(s2));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9857) Create Factorial UDF

2015-03-16 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363585#comment-14363585
 ] 

Alexander Pivovarov commented on HIVE-9857:
---

The function description was added to hive wiki. 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions

> Create Factorial UDF
> 
>
> Key: HIVE-9857
> URL: https://issues.apache.org/jira/browse/HIVE-9857
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Fix For: 1.2.0
>
> Attachments: HIVE-9857.1.patch, HIVE-9857.2.patch, HIVE-9857.3.patch
>
>
> Function signature: factorial(int a): bigint
> For example 5!= 5*4*3*2*1=120
> {code}
> select factorial(5);
> OK
> 120
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9949) remove not used parameters from String.format

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363593#comment-14363593
 ] 

Gunther Hagleitner commented on HIVE-9949:
--

+1

> remove not used parameters from String.format
> -
>
> Key: HIVE-9949
> URL: https://issues.apache.org/jira/browse/HIVE-9949
> Project: Hive
>  Issue Type: Bug
>  Components: Spark, Tez
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Attachments: HIVE-9949.1.patch
>
>
> SparkJobMonitor (79) and TezJobMonitor (788) call
> {code}
> String.format("%s: -/-\t", stageName, complete, total)
> {code}
> complete, total can be removed because pattern uses only the first parameter 
> stageName



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-16 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen reassigned HIVE-7018:
--

Assignee: Yongzhi Chen

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-16 Thread Yongzhi Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363600#comment-14363600
 ] 

Yongzhi Chen commented on HIVE-7018:


The LINK_TARGET_ID is no use, it should be removed. 

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9858) Create cbrt (cube root) UDF

2015-03-16 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363611#comment-14363611
 ] 

Alexander Pivovarov commented on HIVE-9858:
---

The function description was added to hive wiki. 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions

> Create cbrt (cube root) UDF
> ---
>
> Key: HIVE-9858
> URL: https://issues.apache.org/jira/browse/HIVE-9858
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
> Fix For: 1.2.0
>
> Attachments: HIVE-9858.1.patch, HIVE-9858.1.patch, HIVE-9858.2.patch
>
>
> returns the cube root of a double value
> cbrt(double a) : double
> For example:
> {code}
> select cbrt(87860583272930481.0);
> OK
> 444561.0
> {code}
> I noticed that Math.pow(a, 1.0/3.0) and hive power UDF return 
> 444560.965 for the example above.
> However Math.cbrt returns 444561.0
> This is why we should have hive cbrt function in hive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9953) fix NPE in WindowingTableFunction

2015-03-16 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363620#comment-14363620
 ] 

Alexander Pivovarov commented on HIVE-9953:
---

yes. This is the mail thread on it 
http://mail-archives.apache.org/mod_mbox/hive-dev/201503.mbox/%3CCADx-ob3tLwG91Qh5YS4P83-=0ESg0E=ttppjtver4jdwqhe...@mail.gmail.com%3E

> fix NPE in WindowingTableFunction
> -
>
> Key: HIVE-9953
> URL: https://issues.apache.org/jira/browse/HIVE-9953
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 1.2.0
>
> Attachments: HIVE-9953.1.patch
>
>
> WindowingTableFunction line 1193
> {code}
> // now
> return (s1 == null && s2 == null) || s1.equals(s2);
> // should be
> return (s1 == null && s2 == null) || (s1 != null && s1.equals(s2));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9947) ScriptOperator replaceAll uses unescaped dot and result is not assigned

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363633#comment-14363633
 ] 

Gunther Hagleitner commented on HIVE-9947:
--

This is weird. Looking at the code the blacklisting happens before we strip out 
the "unsafe" chars. So it seems we don't need to call replaceAll at all. Doing 
so will break the blacklisting (which is why you see the new lines in the 
output?) 

> ScriptOperator replaceAll uses unescaped dot and result is not assigned
> ---
>
> Key: HIVE-9947
> URL: https://issues.apache.org/jira/browse/HIVE-9947
> Project: Hive
>  Issue Type: Bug
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: HIVE-9947.1.patch, HIVE-9947.2.patch
>
>
> ScriptOperator line 155
> {code}
> //now
> b.replaceAll(".", "_");
> // should be
> b = b.replace('.', '_');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-03-16 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363650#comment-14363650
 ] 

Jason Dere commented on HIVE-3454:
--

Whoops, yes it is longToTimestamp() that should be fixed, since the point of 
this is to correct the int/long to timestamp behavior.

> Problem with CAST(BIGINT as TIMESTAMP)
> --
>
> Key: HIVE-3454
> URL: https://issues.apache.org/jira/browse/HIVE-3454
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
> 0.13.1
>Reporter: Ryan Harris
>Assignee: Aihua Xu
>  Labels: newbie, newdev, patch
> Attachments: HIVE-3454.1.patch.txt, HIVE-3454.3.patch, HIVE-3454.patch
>
>
> Ran into an issue while working with timestamp conversion.
> CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
> time from the BIGINT returned by unix_timestamp()
> Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-16 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-7018:
---
Attachment: HIVE-7018.1.patch

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Attachments: HIVE-7018.1.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9979:
---
Assignee: Prasanth Jayachandran  (was: Sergey Shelukhin)

> LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
> 
>
> Key: HIVE-9979
> URL: https://issues.apache.org/jira/browse/HIVE-9979
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
>
> When the cache is enabled, queries throws different over-read exceptions.
> Looks like the batchSize changes as you read data, the end of stripe 
> batchSize is smaller than the default size (the super calls change it).
> {code}
> Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
> stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
> 46399488
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9947) ScriptOperator replaceAll uses unescaped dot and result is not assigned

2015-03-16 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9947:
--
Attachment: HIVE-9947.3.patch

removed replaceAll line

> ScriptOperator replaceAll uses unescaped dot and result is not assigned
> ---
>
> Key: HIVE-9947
> URL: https://issues.apache.org/jira/browse/HIVE-9947
> Project: Hive
>  Issue Type: Bug
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: HIVE-9947.1.patch, HIVE-9947.2.patch, HIVE-9947.3.patch
>
>
> ScriptOperator line 155
> {code}
> //now
> b.replaceAll(".", "_");
> // should be
> b = b.replace('.', '_');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9730) make sure logging is never called when not needed in perf-sensitive places

2015-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363710#comment-14363710
 ] 

Sergey Shelukhin commented on HIVE-9730:


[~gopalv] ping?

> make sure logging is never called when not needed in perf-sensitive places
> --
>
> Key: HIVE-9730
> URL: https://issues.apache.org/jira/browse/HIVE-9730
> Project: Hive
>  Issue Type: Improvement
>  Components: Logging
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.2.0
>
> Attachments: HIVE-9730.patch, log4j-llap.png
>
>
> log4j logging has really inefficient serialization
> !log4j-llap.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9870) Add JvmPauseMonitor threads to HMS and HS2 daemons

2015-03-16 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363733#comment-14363733
 ] 

Harsh J commented on HIVE-9870:
---

[~vgumashta] - Just checking to see if any other changes are required or if 
this can be committed?

> Add JvmPauseMonitor threads to HMS and HS2 daemons
> --
>
> Key: HIVE-9870
> URL: https://issues.apache.org/jira/browse/HIVE-9870
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Affects Versions: 1.1.0
>Reporter: Harsh J
>Assignee: Harsh J
>Priority: Minor
> Attachments: HIVE-9870.patch, HIVE-9870.patch, HIVE-9870.patch
>
>
> The hadoop-common carries in it a nifty thread that prints GC or non-GC 
> pauses within the JVM if it exceeds a specific threshold.
> This has been immeasurably useful in supporting several clusters, in 
> identifying GC or other form of process pauses to be the root cause of some 
> event being investigated.
> The HMS and HS2 daemons are good targets for running similar threads within 
> it. It can be loaded in an if-available style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363764#comment-14363764
 ] 

Aihua Xu commented on HIVE-7018:


+1. Looks good to me. The code is not used and causes confusing. 

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Attachments: HIVE-7018.1.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8789) UT: fix udf_context_aware

2015-03-16 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363779#comment-14363779
 ] 

Aihua Xu commented on HIVE-8789:


Seems the query we are executing will not start a MR/spark job so MapredContext 
won't be set. 

> UT: fix udf_context_aware 
> --
>
> Key: HIVE-8789
> URL: https://issues.apache.org/jira/browse/HIVE-8789
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: spark-branch
>Reporter: Thomas Friedrich
>Assignee: Aihua Xu
>Priority: Minor
>
> The test udf_context_aware fails with a NPE here:
> Thread [main] (Suspended (exception NullPointerException))
>   DummyContextUDF.evaluate(GenericUDF$DeferredObject[]) line: 42  
>   ExprNodeGenericFuncEvaluator._evaluate(Object, int) line: 169   
>   ExprNodeGenericFuncEvaluator(ExprNodeEvaluator).evaluate(Object, 
> int) line: 77   
>   ExprNodeGenericFuncEvaluator(ExprNodeEvaluator).evaluate(Object) 
> line: 65
>   SelectOperator.processOp(Object, int) line: 77  
>   TableScanOperator(Operator).forward(Object, ObjectInspector) line: 
> 815   
>   TableScanOperator.processOp(Object, int) line: 95   
>   FetchOperator.pushRow(InspectableObject) line: 577  
>   FetchOperator.pushRow() line: 569   
>   FetchTask.fetch(List) line: 138 
>   Driver.getResults(List) line: 1661  
>   CliDriver.processLocalCmd(String, CommandProcessor, CliSessionState) 
> line: 267  
>   CliDriver.processCmd(String) line: 199  
>   CliDriver.processLine(String, boolean) line: 410
>   CliDriver.processLine(String) line: 345 
>   QTestUtil.executeClient(String) line: 832   
>   TestSparkCliDriver.runTest(String, String, String) line: 136
>   TestSparkCliDriver.testCliDriver_udf_context_aware() line: 120  
>   NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
> available [native method]  
>   NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57  
>   DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43  
>   Method.invoke(Object, Object...) line: 606  
>   TestSparkCliDriver(TestCase).runTest() line: 176
>   TestSparkCliDriver(TestCase).runBare() line: 141
>   TestResult$1.protect() line: 122
>   TestResult.runProtected(Test, Protectable) line: 142
>   TestResult.run(TestCase) line: 125  
>   TestSparkCliDriver(TestCase).run(TestResult) line: 129  
>   TestSuite.runTest(Test, TestResult) line: 255   
>   TestSuite.run(TestResult) line: 250 
>   SuiteMethod(JUnit38ClassRunner).run(RunNotifier) line: 84   
>   JUnit4Provider.execute(Class, RunNotifier, String[]) line: 264   
>   JUnit4Provider.executeTestSet(Class, RunListener, RunNotifier) line: 
> 153 
>   JUnit4Provider.invoke(Object) line: 124 
>   ForkedBooter.invokeProviderInSameClassLoader(Object, Object, 
> ProviderConfiguration, boolean, StartupConfiguration, boolean) line: 200   
>   ForkedBooter.runSuitesInProcess(Object, StartupConfiguration, 
> ProviderConfiguration, PrintStream) line: 153 
>   ForkedBooter.main(String[]) line: 103   
> While debugging I found that the MapredContext object is null here:
> Reporter reporter = context.getReporter();



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9976) LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363839#comment-14363839
 ] 

Gunther Hagleitner commented on HIVE-9976:
--

[~sseth] I thought this can't happen. The Tez API was supposed to guarantee 
delivery of events before completion. Should I open a Tez issue?

> LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks
> 
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9971) Clean up operator class

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363857#comment-14363857
 ] 

Hive QA commented on HIVE-9971:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704823/HIVE-9971.2.patch

{color:red}ERROR:{color} -1 due to 60 failed/errored test(s), 7754 tests 
executed
*Failed tests:*
{noformat}
TestSparkCliDriver-parallel_join1.q-ptf_general_queries.q-avro_joins.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_bucket
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_mapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_data_types
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_10_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_4
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_precision
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_trailing
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_udf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_non_string_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partitioned_date_time
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce_groupby_decimal
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_mapjoin1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_varchar_simple
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_12
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_limit
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_date_funcs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_rcfile_columnar
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_timestamp_funcs
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_data_types
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_orderby_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_string_concat
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_12
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_14
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_9
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_div0
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_part
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vectorization_part_project
org.apache.hadoop.hiv

[jira] [Commented] (HIVE-9976) LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363909#comment-14363909
 ] 

Siddharth Seth commented on HIVE-9976:
--

I'll take a look. Assuming this was run with Tez 0.7 snapshot ?

> LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks
> 
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9756) LLAP: use log4j 2 for llap

2015-03-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363919#comment-14363919
 ] 

Siddharth Seth commented on HIVE-9756:
--

[~gopalv] - Tez is moving to slf4j in the 0.7 release (TEZ-2176). 
Unfortauntely, Hadoop provides log4j as well - so this may be problematic 
anyway. We'll find out once the Tez patch goes in. 

> LLAP: use log4j 2 for llap
> --
>
> Key: HIVE-9756
> URL: https://issues.apache.org/jira/browse/HIVE-9756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gunther Hagleitner
>Assignee: Gopal V
>
> For the INFO logging, we'll need to use the log4j-jcl 2.x upgrade-path to get 
> throughput friendly logging.
> http://logging.apache.org/log4j/2.0/manual/async.html#Performance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to "none", allowing authentication without password

2015-03-16 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9934:
---
Attachment: HIVE-9934.2.patch

(cc [~prasadm] [~xuefuz]). I was able to reproduce the issue after disabling 
JDBC authentication and use the Hadoop provided {{SaslPlainServerFactory}}. I 
need to do the latter because Hive provided Sasl server implementation checks 
the case when password is empty, therefore the issue could be prevented. 
However, if the Hadoop version class gets loaded first (which doesn't check 
whether password is null or empty), then the issue could still happen.

In this patch I also included a simple uni test. Desirably we should write an 
end-to-end test, however that involves non-trivial work. I'll put that in a 
follow-up JIRA.

> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password
> --
>
> Key: HIVE-9934
> URL: https://issues.apache.org/jira/browse/HIVE-9934
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.1.0
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch
>
>
> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password.
> See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
> “If you supply an empty string, an empty byte/char array, or null to the 
> Context.SECURITY_CREDENTIALS environment property, then the authentication 
> mechanism will be "none". This is because the LDAP requires the password to 
> be nonempty for simple authentication. The protocol automatically converts 
> the authentication to "none" if a password is not supplied.”
>  
> Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
> NamingException being thrown during creation of initial context, it does not 
> fail when the context result is an “unauthenticated” positive response from 
> the LDAP server. The end result is, one can authenticate with HiveServer2 
> using the LdapAuthenticationProviderImpl with only a user name and an empty 
> password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9960) Hive not backward compatibilie while adding optional new field to struct in parquet files

2015-03-16 Thread Arup Malakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363971#comment-14363971
 ] 

Arup Malakar commented on HIVE-9960:


[~spena] I have been using hive 0.14 with HIVE-8909 applied on it. This patch 
doesn't apply cleanly on top of that, as this patch seems to be meant for 
trunk. On the other hand I am unable to compile trunk against hadoop 2.3.0 as 
it complains about package KeyProvider etc. I would need to port this patch for 
0.14, with HIVE-8909, to be able to test it.

> Hive not backward compatibilie while adding optional new field to struct in 
> parquet files
> -
>
> Key: HIVE-9960
> URL: https://issues.apache.org/jira/browse/HIVE-9960
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Arup Malakar
>Assignee: Sergio Peña
> Attachments: HIVE-9960.1.patch
>
>
> I recently added an optional field to a struct, when I tried to query old 
> data with the new hive table which has the new field as column it throws 
> error. Any clue how I can make it backward compatible so that I am still able 
> to query old data with the new table definition.
>  
> I am using hive-0.14.0 release with  HIVE-8909 patch applied.
> Details:
> New optional field in a struct
> {code}
> struct Event {
> 1: optional Type type;
> 2: optional map values;
> 3: optional i32 num = -1; // <--- New field
> }
> {code}
> Main thrift definition
> {code}
>  10: optional list events;
> {code}
> Corresponding hive table definition
> {code}
>   events array< struct , num: int>>)
> {code}
> Try to read something from the old data, using the new table definition
> {{select events from table1 limit 1;}}
> Failed with exception:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> Error thrown:   
> 15/03/12 17:23:43 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> 
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) 
>   
>  
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)  
>   
>  
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)   
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   
>   
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)   
>   
>  
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)  
>   
>  
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   
>  
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>   
>   
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 

[jira] [Updated] (HIVE-9981) Avoid throwing many exceptions when attempting to create new hdfs encryption shim

2015-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9981:
--
Attachment: HIVE-9981.1.patch

Attached is the patch for this fix.

I had to check for the getEncryptionZoneForPath method from 
isHdfsEncryptionSupported method instead of using the static {} blocker because 
I was getting compilations errors that HdfsAdmin class was not found. I don't 
why, but I need the class only for encryption support check anyways.

> Avoid throwing many exceptions when attempting to create new hdfs encryption 
> shim
> -
>
> Key: HIVE-9981
> URL: https://issues.apache.org/jira/browse/HIVE-9981
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9981.1.patch
>
>
> Hadoop23Shims.createHdfsEncryptionShim() is throwing an exception for verions 
> lower than 2.6.0 everytime a query is executed.
> Exceptions are expensive, so rather than throwing them every time, we can use 
> this design pattern followed for some other functions in Hadoop23Shims -
> {code}
>   protected static final Method accessMethod;
>   protected static final Method getPasswordMethod;
>   static {
> Method m = null;
> try {
>   m = FileSystem.class.getMethod("access", Path.class, FsAction.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support FileSystem.access().
> }
> accessMethod = m;
> try {
>   m = Configuration.class.getMethod("getPassword", String.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support getPassword(), just retrieve 
> password from conf.
>   m = null;
> }
> getPasswordMethod = m;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9960) Hive not backward compatibilie while adding optional new field to struct in parquet files

2015-03-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364020#comment-14364020
 ] 

Sergio Peña commented on HIVE-9960:
---

Thanks [~amalakar]. If you can test this patch on hive 0.14, it would be great.

Regarding the KeyProvider issue. Is a compilation problem? or a runtime problem?
There is a patch for hadoop + keyprovider incompatibility issues here HIVE-9957 
that was committed to trunk 2 days ago. This is for runtime errors with hadoop 
lower than 2.6.0.

> Hive not backward compatibilie while adding optional new field to struct in 
> parquet files
> -
>
> Key: HIVE-9960
> URL: https://issues.apache.org/jira/browse/HIVE-9960
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Arup Malakar
>Assignee: Sergio Peña
> Attachments: HIVE-9960.1.patch
>
>
> I recently added an optional field to a struct, when I tried to query old 
> data with the new hive table which has the new field as column it throws 
> error. Any clue how I can make it backward compatible so that I am still able 
> to query old data with the new table definition.
>  
> I am using hive-0.14.0 release with  HIVE-8909 patch applied.
> Details:
> New optional field in a struct
> {code}
> struct Event {
> 1: optional Type type;
> 2: optional map values;
> 3: optional i32 num = -1; // <--- New field
> }
> {code}
> Main thrift definition
> {code}
>  10: optional list events;
> {code}
> Corresponding hive table definition
> {code}
>   events array< struct , num: int>>)
> {code}
> Try to read something from the old data, using the new table definition
> {{select events from table1 limit 1;}}
> Failed with exception:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> Error thrown:   
> 15/03/12 17:23:43 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> 
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) 
>   
>  
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)  
>   
>  
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)   
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   
>   
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)   
>   
>  
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)  
>   
>  
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   
>  
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>   
>   
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   
>

[jira] [Updated] (HIVE-9982) CBO (Calcite Return Path): Prune TS Relnode schema

2015-03-16 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-9982:
-
Attachment: HIVE-9982.patch

> CBO (Calcite Return Path): Prune TS Relnode schema
> --
>
> Key: HIVE-9982
> URL: https://issues.apache.org/jira/browse/HIVE-9982
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 1.2.0
>
> Attachments: HIVE-9982.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9953) fix NPE in WindowingTableFunction

2015-03-16 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364038#comment-14364038
 ] 

Jason Dere commented on HIVE-9953:
--

Would you be able to link all of these Jiras to HIVE-9809, or to make them 
subtasks of that Jira?

> fix NPE in WindowingTableFunction
> -
>
> Key: HIVE-9953
> URL: https://issues.apache.org/jira/browse/HIVE-9953
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 1.2.0
>
> Attachments: HIVE-9953.1.patch
>
>
> WindowingTableFunction line 1193
> {code}
> // now
> return (s1 == null && s2 == null) || s1.equals(s2);
> // should be
> return (s1 == null && s2 == null) || (s1 != null && s1.equals(s2));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9982) CBO (Calcite Return Path): Prune TS Relnode schema

2015-03-16 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran resolved HIVE-9982.
--
Resolution: Fixed

> CBO (Calcite Return Path): Prune TS Relnode schema
> --
>
> Key: HIVE-9982
> URL: https://issues.apache.org/jira/browse/HIVE-9982
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Fix For: 1.2.0
>
> Attachments: HIVE-9982.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9976) LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364039#comment-14364039
 ] 

Gunther Hagleitner commented on HIVE-9976:
--

[~gopalv] said that it was on tez 0.7.

> LLAP: Possible race condition in DynamicPartitionPruner for <200ms tasks
> 
>
> Key: HIVE-9976
> URL: https://issues.apache.org/jira/browse/HIVE-9976
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tez
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Gunther Hagleitner
> Attachments: llap_vertex_200ms.png
>
>
> Race condition in the DynamicPartitionPruner between 
> DynamicPartitionPruner::processVertex() and 
> DynamicPartitionpruner::addEvent() for tasks which respond with both the 
> result and success in a single heartbeat sequence.
> {code}
> 2015-03-16 07:05:01,589 ERROR [InputInitializer [Map 1] #0] 
> tez.DynamicPartitionPruner: Expecting: 1, received: 0
> 2015-03-16 07:05:01,590 ERROR [Dispatcher thread: Central] impl.VertexImpl: 
> Vertex Input: store_sales initializer failed, 
> vertex=vertex_1424502260528_1113_4_04 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Incorrect event count in 
> dynamic parition pruning
> {code}
> !llap_vertex_200ms.png!
> All 4 upstream vertices of Map 1 need to finish within ~200ms to trigger 
> this, which seems to be consistently happening with LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9966) Get rid of customBucketMapJoin field from MapJoinDesc

2015-03-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9966:
---
Attachment: HIVE-9966.patch

Patch which contains additional clean-up around that flag. Passed all the 
relevant tests  
bucket_map_join_tez1.q,tez_bmj_schema_evolution.q,tez_smb_main.q,bucket_map_join_tez2.q

> Get rid of customBucketMapJoin field from MapJoinDesc
> -
>
> Key: HIVE-9966
> URL: https://issues.apache.org/jira/browse/HIVE-9966
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning, Tez
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-9966.patch
>
>
> Currently, its used to determine whether BMJ is running in mapper or reducer 
> in ReduceSinkMapJoinProc rule. But this determination can be made locally by 
> examining operator tree in the rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9960) Hive not backward compatibilie while adding optional new field to struct in parquet files

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364044#comment-14364044
 ] 

Hive QA commented on HIVE-9960:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704824/HIVE-9960.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7769 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3044/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3044/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3044/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12704824 - PreCommit-HIVE-TRUNK-Build

> Hive not backward compatibilie while adding optional new field to struct in 
> parquet files
> -
>
> Key: HIVE-9960
> URL: https://issues.apache.org/jira/browse/HIVE-9960
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Arup Malakar
>Assignee: Sergio Peña
> Attachments: HIVE-9960.1.patch
>
>
> I recently added an optional field to a struct, when I tried to query old 
> data with the new hive table which has the new field as column it throws 
> error. Any clue how I can make it backward compatible so that I am still able 
> to query old data with the new table definition.
>  
> I am using hive-0.14.0 release with  HIVE-8909 patch applied.
> Details:
> New optional field in a struct
> {code}
> struct Event {
> 1: optional Type type;
> 2: optional map values;
> 3: optional i32 num = -1; // <--- New field
> }
> {code}
> Main thrift definition
> {code}
>  10: optional list events;
> {code}
> Corresponding hive table definition
> {code}
>   events array< struct , num: int>>)
> {code}
> Try to read something from the old data, using the new table definition
> {{select events from table1 limit 1;}}
> Failed with exception:
> {code}
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> Error thrown:   
> 15/03/12 17:23:43 [main]: ERROR CliDriver: Failed with exception 
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ArrayIndexOutOfBoundsException: 2   
> 
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) 
>   
>  
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)  
>   
>  
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)   
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)  
>   
>   
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
>   
>   
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)   
>   
>  
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)  
>  

[jira] [Commented] (HIVE-9981) Avoid throwing many exceptions when attempting to create new hdfs encryption shim

2015-03-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364049#comment-14364049
 ] 

Thejas M Nair commented on HIVE-9981:
-

[~spena] It does not look like this will avoid the exceptions in every call to 
this function with pre 2.6.0 hadoop versions. getEncryptionZoneForPathMethod 
will always be null.
Maybe use a Boolean object isHdfsEncryptionSupported instead ? (if null 
initialize it to true/false).



> Avoid throwing many exceptions when attempting to create new hdfs encryption 
> shim
> -
>
> Key: HIVE-9981
> URL: https://issues.apache.org/jira/browse/HIVE-9981
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9981.1.patch
>
>
> Hadoop23Shims.createHdfsEncryptionShim() is throwing an exception for verions 
> lower than 2.6.0 everytime a query is executed.
> Exceptions are expensive, so rather than throwing them every time, we can use 
> this design pattern followed for some other functions in Hadoop23Shims -
> {code}
>   protected static final Method accessMethod;
>   protected static final Method getPasswordMethod;
>   static {
> Method m = null;
> try {
>   m = FileSystem.class.getMethod("access", Path.class, FsAction.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support FileSystem.access().
> }
> accessMethod = m;
> try {
>   m = Configuration.class.getMethod("getPassword", String.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support getPassword(), just retrieve 
> password from conf.
>   m = null;
> }
> getPasswordMethod = m;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9966) Get rid of customBucketMapJoin field from MapJoinDesc

2015-03-16 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364068#comment-14364068
 ] 

Vikram Dixit K commented on HIVE-9966:
--

+1 LGTM pending tests.

> Get rid of customBucketMapJoin field from MapJoinDesc
> -
>
> Key: HIVE-9966
> URL: https://issues.apache.org/jira/browse/HIVE-9966
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning, Tez
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-9966.patch
>
>
> Currently, its used to determine whether BMJ is running in mapper or reducer 
> in ReduceSinkMapJoinProc rule. But this determination can be made locally by 
> examining operator tree in the rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to "none", allowing authentication without password

2015-03-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364092#comment-14364092
 ] 

Xuefu Zhang commented on HIVE-9934:
---

+1

> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password
> --
>
> Key: HIVE-9934
> URL: https://issues.apache.org/jira/browse/HIVE-9934
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.1.0
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch
>
>
> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password.
> See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
> “If you supply an empty string, an empty byte/char array, or null to the 
> Context.SECURITY_CREDENTIALS environment property, then the authentication 
> mechanism will be "none". This is because the LDAP requires the password to 
> be nonempty for simple authentication. The protocol automatically converts 
> the authentication to "none" if a password is not supplied.”
>  
> Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
> NamingException being thrown during creation of initial context, it does not 
> fail when the context result is an “unauthenticated” positive response from 
> the LDAP server. The end result is, one can authenticate with HiveServer2 
> using the LdapAuthenticationProviderImpl with only a user name and an empty 
> password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9981) Avoid throwing many exceptions when attempting to create new hdfs encryption shim

2015-03-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-9981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-9981:
--
Attachment: HIVE-9981.2.patch

Thanks [~thejas]
I did not see that issue. This patch uses a Boolean object instead to keep 
track of the encryption support.

> Avoid throwing many exceptions when attempting to create new hdfs encryption 
> shim
> -
>
> Key: HIVE-9981
> URL: https://issues.apache.org/jira/browse/HIVE-9981
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9981.1.patch, HIVE-9981.2.patch
>
>
> Hadoop23Shims.createHdfsEncryptionShim() is throwing an exception for verions 
> lower than 2.6.0 everytime a query is executed.
> Exceptions are expensive, so rather than throwing them every time, we can use 
> this design pattern followed for some other functions in Hadoop23Shims -
> {code}
>   protected static final Method accessMethod;
>   protected static final Method getPasswordMethod;
>   static {
> Method m = null;
> try {
>   m = FileSystem.class.getMethod("access", Path.class, FsAction.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support FileSystem.access().
> }
> accessMethod = m;
> try {
>   m = Configuration.class.getMethod("getPassword", String.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support getPassword(), just retrieve 
> password from conf.
>   m = null;
> }
> getPasswordMethod = m;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9953) fix NPE in WindowingTableFunction

2015-03-16 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364133#comment-14364133
 ] 

Alexander Pivovarov commented on HIVE-9953:
---

Done

> fix NPE in WindowingTableFunction
> -
>
> Key: HIVE-9953
> URL: https://issues.apache.org/jira/browse/HIVE-9953
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Trivial
> Fix For: 1.2.0
>
> Attachments: HIVE-9953.1.patch
>
>
> WindowingTableFunction line 1193
> {code}
> // now
> return (s1 == null && s2 == null) || s1.equals(s2);
> // should be
> return (s1 == null && s2 == null) || (s1 != null && s1.equals(s2));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9983) Vectorizer doesn't vectorize (1) partitions with different schema (2) any MapWork with >1 table scans in MR

2015-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9983:
---
Component/s: Vectorization

> Vectorizer doesn't vectorize (1) partitions with different schema (2) any 
> MapWork with >1 table scans in MR
> ---
>
> Key: HIVE-9983
> URL: https://issues.apache.org/jira/browse/HIVE-9983
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Matt McCline
>
> For some test, tables are created as such:
> {noformat}
> CREATE TABLE orc_llap_part(
> csmallint SMALLINT,
> cint INT,
> cbigint BIGINT,
> cfloat FLOAT,
> cdouble DOUBLE,
> cstring1 STRING,
> cstring2 STRING,
> ctimestamp1 TIMESTAMP,
> ctimestamp2 TIMESTAMP,
> cboolean1 BOOLEAN,
> cboolean2 BOOLEAN
> ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC;
> CREATE TABLE orc_llap_dim_part(
> cbigint BIGINT
> ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC;
> INSERT OVERWRITE TABLE orc_llap_part PARTITION (ctinyint)
> SELECT csmallint, cint, cbigint, cfloat, cdouble, cstring1, cstring2, 
> ctimestamp1, ctimestamp2, cboolean1, cboolean2, ctinyint FROM alltypesorc;
> INSERT OVERWRITE TABLE orc_llap_dim_part PARTITION (ctinyint)
> SELECT sum(cbigint) as cbigint, ctinyint FROM alltypesorc WHERE ctinyint > 10 
> AND ctinyint < 21 GROUP BY ctinyint;
> {noformat}
> The query is:
> {noformat}
> explain
>   SELECT oft.ctinyint, oft.cint FROM orc_llap_part oft
>   INNER JOIN orc_llap_dim_part od ON oft.ctinyint = od.ctinyint;
> {noformat}
> This results in a failure to vectorize in MR:
> {noformat}
> Could not vectorize partition 
> pfile:/Users/sergey/git/hive3/itests/qtest/target/warehouse/orc_llap_dim_part/ctinyint=11.
>   Its column names cbigint do not match the other column names 
> csmallint,cint,cbigint,cfloat,cdouble,cstring1,cstring2,ctimestamp1,ctimestamp2,cboolean1,cboolean2
> {noformat}
> This is comparing schemas from different tables because MapWork has 2 
> TableScan-s; in Tez this error will never happen as MapWork will not have 2 
> scans.
> In Tez (and MR as well), the other case can happen, namely partitions of the 
> same table having different schemas.
> Tez case can be solved by making a super-schema to include all variations and 
> handling missing columns where necessary.
> MR case may be harder to solve.
> Of note is that despite schema being different (and not a prefix of a schema 
> by coincidence or some such), query passes if validation is commented out. 
> Perhaps in some cases it can work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9983) Vectorizer doesn't vectorize (1) partitions with different schema anywhere (2) any MapWork with >1 table scans in MR

2015-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9983:
---
Summary: Vectorizer doesn't vectorize (1) partitions with different schema 
anywhere (2) any MapWork with >1 table scans in MR  (was: Vectorizer doesn't 
vectorize (1) partitions with different schema (2) any MapWork with >1 table 
scans in MR)

> Vectorizer doesn't vectorize (1) partitions with different schema anywhere 
> (2) any MapWork with >1 table scans in MR
> 
>
> Key: HIVE-9983
> URL: https://issues.apache.org/jira/browse/HIVE-9983
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Matt McCline
>
> For some test, tables are created as such:
> {noformat}
> CREATE TABLE orc_llap_part(
> csmallint SMALLINT,
> cint INT,
> cbigint BIGINT,
> cfloat FLOAT,
> cdouble DOUBLE,
> cstring1 STRING,
> cstring2 STRING,
> ctimestamp1 TIMESTAMP,
> ctimestamp2 TIMESTAMP,
> cboolean1 BOOLEAN,
> cboolean2 BOOLEAN
> ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC;
> CREATE TABLE orc_llap_dim_part(
> cbigint BIGINT
> ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC;
> INSERT OVERWRITE TABLE orc_llap_part PARTITION (ctinyint)
> SELECT csmallint, cint, cbigint, cfloat, cdouble, cstring1, cstring2, 
> ctimestamp1, ctimestamp2, cboolean1, cboolean2, ctinyint FROM alltypesorc;
> INSERT OVERWRITE TABLE orc_llap_dim_part PARTITION (ctinyint)
> SELECT sum(cbigint) as cbigint, ctinyint FROM alltypesorc WHERE ctinyint > 10 
> AND ctinyint < 21 GROUP BY ctinyint;
> {noformat}
> The query is:
> {noformat}
> explain
>   SELECT oft.ctinyint, oft.cint FROM orc_llap_part oft
>   INNER JOIN orc_llap_dim_part od ON oft.ctinyint = od.ctinyint;
> {noformat}
> This results in a failure to vectorize in MR:
> {noformat}
> Could not vectorize partition 
> pfile:/Users/sergey/git/hive3/itests/qtest/target/warehouse/orc_llap_dim_part/ctinyint=11.
>   Its column names cbigint do not match the other column names 
> csmallint,cint,cbigint,cfloat,cdouble,cstring1,cstring2,ctimestamp1,ctimestamp2,cboolean1,cboolean2
> {noformat}
> This is comparing schemas from different tables because MapWork has 2 
> TableScan-s; in Tez this error will never happen as MapWork will not have 2 
> scans.
> In Tez (and MR as well), the other case can happen, namely partitions of the 
> same table having different schemas.
> Tez case can be solved by making a super-schema to include all variations and 
> handling missing columns where necessary.
> MR case may be harder to solve.
> Of note is that despite schema being different (and not a prefix of a schema 
> by coincidence or some such), query passes if validation is commented out. 
> Perhaps in some cases it can work?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9981) Avoid throwing many exceptions when attempting to create new hdfs encryption shim

2015-03-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364141#comment-14364141
 ] 

Thejas M Nair commented on HIVE-9981:
-

+1
Thanks [~spena]!


> Avoid throwing many exceptions when attempting to create new hdfs encryption 
> shim
> -
>
> Key: HIVE-9981
> URL: https://issues.apache.org/jira/browse/HIVE-9981
> Project: Hive
>  Issue Type: Improvement
>  Components: Encryption
>Reporter: Sergio Peña
>Assignee: Sergio Peña
> Attachments: HIVE-9981.1.patch, HIVE-9981.2.patch
>
>
> Hadoop23Shims.createHdfsEncryptionShim() is throwing an exception for verions 
> lower than 2.6.0 everytime a query is executed.
> Exceptions are expensive, so rather than throwing them every time, we can use 
> this design pattern followed for some other functions in Hadoop23Shims -
> {code}
>   protected static final Method accessMethod;
>   protected static final Method getPasswordMethod;
>   static {
> Method m = null;
> try {
>   m = FileSystem.class.getMethod("access", Path.class, FsAction.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support FileSystem.access().
> }
> accessMethod = m;
> try {
>   m = Configuration.class.getMethod("getPassword", String.class);
> } catch (NoSuchMethodException err) {
>   // This version of Hadoop does not support getPassword(), just retrieve 
> password from conf.
>   m = null;
> }
> getPasswordMethod = m;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9984) JoinReorder's getOutputSize is exponential

2015-03-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-9984:
-
Attachment: HIVE-9984.1.patch

> JoinReorder's getOutputSize is exponential
> --
>
> Key: HIVE-9984
> URL: https://issues.apache.org/jira/browse/HIVE-9984
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-9984.1.patch
>
>
> Found by [~mmokhtar]. Causes major issues in large plans (50+ joins). Simple 
> fix would be to memoize the recursion. There should also be a flag to switch 
> this opt off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9984) JoinReorder's getOutputSize is exponential

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364176#comment-14364176
 ] 

Gunther Hagleitner commented on HIVE-9984:
--

[~gopalv] take a look?

> JoinReorder's getOutputSize is exponential
> --
>
> Key: HIVE-9984
> URL: https://issues.apache.org/jira/browse/HIVE-9984
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-9984.1.patch
>
>
> Found by [~mmokhtar]. Causes major issues in large plans (50+ joins). Simple 
> fix would be to memoize the recursion. There should also be a flag to switch 
> this opt off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9985) Vectorization: NPE for added columns in ORC non-partitioned tables

2015-03-16 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9985:
---
Attachment: HIVE-9985.01.patch

> Vectorization: NPE for added columns in ORC non-partitioned tables
> --
>
> Key: HIVE-9985
> URL: https://issues.apache.org/jira/browse/HIVE-9985
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-9985.01.patch
>
>
> If you add STRING columns to a non-partitioned table (ORC format) and try to 
> read the added STRING column using vectorization, you will get a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7018) Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but not others

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364193#comment-14364193
 ] 

Hive QA commented on HIVE-7018:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704842/HIVE-7018.1.patch

{color:green}SUCCESS:{color} +1 7769 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3045/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3045/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3045/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12704842 - PreCommit-HIVE-TRUNK-Build

> Table and Partition tables have column LINK_TARGET_ID in Mysql scripts but 
> not others
> -
>
> Key: HIVE-7018
> URL: https://issues.apache.org/jira/browse/HIVE-7018
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Yongzhi Chen
> Attachments: HIVE-7018.1.patch
>
>
> It appears that at least postgres and oracle do not have the LINK_TARGET_ID 
> column while mysql does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9979) LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data

2015-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364251#comment-14364251
 ] 

Sergey Shelukhin commented on HIVE-9979:


Filed HIVE-9986 for the callstack in the comment

> LLAP: LLAP Cached readers for StringDirectTreeReaders over-read data
> 
>
> Key: HIVE-9979
> URL: https://issues.apache.org/jira/browse/HIVE-9979
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: llap
>Reporter: Gopal V
>Assignee: Prasanth Jayachandran
>
> When the cache is enabled, queries throws different over-read exceptions.
> Looks like the batchSize changes as you read data, the end of stripe 
> batchSize is smaller than the default size (the super calls change it).
> {code}
> Caused by: java.io.EOFException: Can't finish byte read from uncompressed 
> stream DATA position: 262144 length: 262144 range: 0 offset: 46399488 limit: 
> 46399488
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.commonReadByteArrays(RecordReaderImpl.java:1556)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$BytesColumnVectorUtil.readOrcByteArrays(RecordReaderImpl.java:1569)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringDirectTreeReader.nextVector(RecordReaderImpl.java:1691)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StringTreeReader.nextVector(RecordReaderImpl.java:1517)
> at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:115)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:108)
> at 
> org.apache.hadoop.hive.llap.io.decode.EncodedDataConsumer.consumeData(EncodedDataConsumer.java:35)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:314)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9986) LLAP: EOFException in reader

2015-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9986:
---
Description: 
>From HIVE-9979
{noformat}
2015-03-16 10:20:51,439 
[pool-2-thread-3(container_1_1141_01_000192_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
 1_191_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
data for column 9 RG 112 stream DATA at 62278935, 1057137 index position 0: 
compressed [62614934, 63139228)
2015-03-16 10:20:51,439 
[pool-2-thread-6(container_1_1141_01_000211_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
 1_210_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
stripe-level stream [LENGTH, kind: DICTIONARY_V2
dictionarySize: 3
] for column 9 RG 91 at 64139927, 5
...
Caused by: java.io.EOFException
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDirect(RecordReaderUtils.java:286)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:266)
at 
org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:234)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
at 
org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
... 4 more
{noformat}

  was:
{noformat}
2015-03-16 10:20:51,439 
[pool-2-thread-3(container_1_1141_01_000192_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
 1_191_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
data for column 9 RG 112 stream DATA at 62278935, 1057137 index position 0: 
compressed [62614934, 63139228)
2015-03-16 10:20:51,439 
[pool-2-thread-6(container_1_1141_01_000211_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
 1_210_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
stripe-level stream [LENGTH, kind: DICTIONARY_V2
dictionarySize: 3
] for column 9 RG 91 at 64139927, 5
...
Caused by: java.io.EOFException
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDirect(RecordReaderUtils.java:286)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:266)
at 
org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:234)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
at 
org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
... 4 more
{noformat}


> LLAP: EOFException in reader
> 
>
> Key: HIVE-9986
> URL: https://issues.apache.org/jira/browse/HIVE-9986
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> From HIVE-9979
> {noformat}
> 2015-03-16 10:20:51,439 
> [pool-2-thread-3(container_1_1141_01_000192_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
>  1_191_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
> data for column 9 RG 112 stream DATA at 62278935, 1057137 index position 0: 
> compressed [62614934, 63139228)
> 2015-03-16 10:20:51,439 
> [pool-2-thread-6(container_1_1141_01_000211_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
>  1_210_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
> stripe-level stream [LENGTH, kind: DICTIONARY_V2
> dictionarySize: 3
> ] for column 9 RG 91 at 64139927, 5
> ...
> Caused by: java.io.EOFException
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDirect(RecordReaderUtils.java:286)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:266)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:234)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9986) LLAP: EOFException in reader

2015-03-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9986:
---
Fix Version/s: llap

> LLAP: EOFException in reader
> 
>
> Key: HIVE-9986
> URL: https://issues.apache.org/jira/browse/HIVE-9986
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Fix For: llap
>
>
> From HIVE-9979
> {noformat}
> 2015-03-16 10:20:51,439 
> [pool-2-thread-3(container_1_1141_01_000192_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
>  1_191_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
> data for column 9 RG 112 stream DATA at 62278935, 1057137 index position 0: 
> compressed [62614934, 63139228)
> 2015-03-16 10:20:51,439 
> [pool-2-thread-6(container_1_1141_01_000211_gopal_20150316102020_c8c92488-6a61-401e-8298-401dace286dc:1_Map
>  1_210_0)] INFO org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl: Getting 
> stripe-level stream [LENGTH, kind: DICTIONARY_V2
> dictionarySize: 3
> ] for column 9 RG 91 at 64139927, 5
> ...
> Caused by: java.io.EOFException
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDirect(RecordReaderUtils.java:286)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderUtils.readDiskRanges(RecordReaderUtils.java:266)
> at 
> org.apache.hadoop.hive.ql.io.orc.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:234)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:280)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:44)
> at 
> org.apache.hadoop.hive.common.CallableWithNdc.call(CallableWithNdc.java:37)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9984) JoinReorder's getOutputSize is exponential

2015-03-16 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9984:
--
Attachment: HIVE-9984.2.patch

I don't think the HashMap is a good idea, IdentityHashMap is safer.

Made the memoization more obvious in the attached patch.

> JoinReorder's getOutputSize is exponential
> --
>
> Key: HIVE-9984
> URL: https://issues.apache.org/jira/browse/HIVE-9984
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-9984.1.patch, HIVE-9984.2.patch
>
>
> Found by [~mmokhtar]. Causes major issues in large plans (50+ joins). Simple 
> fix would be to memoize the recursion. There should also be a flag to switch 
> this opt off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9947) ScriptOperator replaceAll uses unescaped dot and result is not assigned

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364318#comment-14364318
 ] 

Hive QA commented on HIVE-9947:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704848/HIVE-9947.3.patch

{color:green}SUCCESS:{color} +1 7769 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3046/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3046/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3046/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12704848 - PreCommit-HIVE-TRUNK-Build

> ScriptOperator replaceAll uses unescaped dot and result is not assigned
> ---
>
> Key: HIVE-9947
> URL: https://issues.apache.org/jira/browse/HIVE-9947
> Project: Hive
>  Issue Type: Bug
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: HIVE-9947.1.patch, HIVE-9947.2.patch, HIVE-9947.3.patch
>
>
> ScriptOperator line 155
> {code}
> //now
> b.replaceAll(".", "_");
> // should be
> b = b.replace('.', '_');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9943) CBO (Calcite Return Path): GroupingID translation from Calcite [CBO branch]

2015-03-16 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran resolved HIVE-9943.
--
Resolution: Fixed

> CBO (Calcite Return Path): GroupingID translation from Calcite [CBO branch]
> ---
>
> Key: HIVE-9943
> URL: https://issues.apache.org/jira/browse/HIVE-9943
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: cbo-branch
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: cbo-branch
>
> Attachments: HIVE-9943.cbo.patch
>
>
> The translation from Calcite back to Hive might produce wrong results while 
> interacting with other Calcite optimization rules. Further, we could ease 
> translation of Aggregate operator with grouping sets to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9971) Clean up operator class

2015-03-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-9971:
-
Attachment: HIVE-9971.3.patch

> Clean up operator class
> ---
>
> Key: HIVE-9971
> URL: https://issues.apache.org/jira/browse/HIVE-9971
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-9971.1.patch, HIVE-9971.2.patch, HIVE-9971.3.patch
>
>
> This is mostly cleanup although it does enhance the pipeline in one respect. 
> It introduces asyn init for operators and uses it for hash table loading 
> where desired.
> There's a bunch of weird code associated with the operator class:
> - initialize isn't recursive, rather initializeOp is supposed to call 
> initializeChildren. That has led to bugs in the past.
> - setExecContext and passExecContext. Both are recursive, but passExecContext 
> calls setExecContext and then recurses again. Boo.
> - lots of (getChildren() != null) although that can't happen anymore
> - TezCacheAccess is a hack. We should just leave init of inputs up to the 
> operator that needs it.
> - Need some sanity checks that make sure that operators were all initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9975) Renaming a nonexisting partition should not throw out NullPointerException

2015-03-16 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-9975:
--
Attachment: HIVE-9975.patch

It is a simple change and adds the checking to the renamed partition. If it is 
null (does not exist), throws out HiveException. 

> Renaming a nonexisting partition should not throw out NullPointerException
> --
>
> Key: HIVE-9975
> URL: https://issues.apache.org/jira/browse/HIVE-9975
> Project: Hive
>  Issue Type: Bug
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>Priority: Minor
> Attachments: HIVE-9975.patch
>
>
> Renaming a nonexisting partition should not throw out NullPointerException. 
> create table testpart (col1 int, col2 string, col3 string) partitioned by 
> (part string);
> alter table testpart partition (part = 'nonexisting') rename to partition 
> (part = 'existing');
> we get NPE like following:
> {code}
> 15/03/16 10:16:11 ERROR exec.DDLTask: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.renamePartition(DDLTask.java:944)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:350)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1642)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1402)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1187)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1053)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1043)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-16 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364420#comment-14364420
 ] 

Rui Li commented on HIVE-9697:
--

Hi [~xuefuz], I remember there was some discussion about this one. Any 
conclusions about how we'll deal with it?

> Hive on Spark is not as aggressive as MR on map join [Spark Branch]
> ---
>
> Key: HIVE-9697
> URL: https://issues.apache.org/jira/browse/HIVE-9697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We have a finding during running some Big-Bench cases:
> when the same small table size threshold is used, Map Join operator will not 
> be generated in Stage Plans for Hive on Spark, while will be generated for 
> Hive on MR.
> For example, When we run BigBench Q25, the meta info of one input ORC table 
> is as below:
> totalSize=1748955 (about 1.5M)
> rawDataSize=123050375 (about 120M)
> If we use the following parameter settings,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=2500;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1; (100M)
> Map Join will be enabled for Hive on MR mode, while will not be enabled for 
> Hive on Spark.
> We found that for Hive on MR, the HDFS file size for the table 
> (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
> will be used to compare with the threshold 100M (smaller than 100M), while 
> for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
> 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
> for this case. And as a result Hive on Spark will get much lower performance 
> data than Hive on MR for this case.
> When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
> MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
> similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to "none", allowing authentication without password

2015-03-16 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364424#comment-14364424
 ] 

Hive QA commented on HIVE-9934:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12704870/HIVE-9934.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 7769 tests executed
*Failed tests:*
{noformat}
TestCustomAuthentication - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3047/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3047/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3047/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12704870 - PreCommit-HIVE-TRUNK-Build

> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password
> --
>
> Key: HIVE-9934
> URL: https://issues.apache.org/jira/browse/HIVE-9934
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.1.0
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch
>
>
> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password.
> See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
> “If you supply an empty string, an empty byte/char array, or null to the 
> Context.SECURITY_CREDENTIALS environment property, then the authentication 
> mechanism will be "none". This is because the LDAP requires the password to 
> be nonempty for simple authentication. The protocol automatically converts 
> the authentication to "none" if a password is not supplied.”
>  
> Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
> NamingException being thrown during creation of initial context, it does not 
> fail when the context result is an “unauthenticated” positive response from 
> the LDAP server. The end result is, one can authenticate with HiveServer2 
> using the LdapAuthenticationProviderImpl with only a user name and an empty 
> password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9934) Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to degrade the authentication mechanism to "none", allowing authentication without password

2015-03-16 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-9934:
---
Attachment: HIVE-9934.3.patch

There's a redundant field {{hiveServer2}} in the previous patch. This patch 
removes it - it shouldn't affect test results.

> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password
> --
>
> Key: HIVE-9934
> URL: https://issues.apache.org/jira/browse/HIVE-9934
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.1.0
>Reporter: Chao
>Assignee: Chao
> Attachments: HIVE-9934.1.patch, HIVE-9934.2.patch, HIVE-9934.3.patch
>
>
> Vulnerability in LdapAuthenticationProviderImpl enables HiveServer2 client to 
> degrade the authentication mechanism to "none", allowing authentication 
> without password.
> See: http://docs.oracle.com/javase/jndi/tutorial/ldap/security/simple.html
> “If you supply an empty string, an empty byte/char array, or null to the 
> Context.SECURITY_CREDENTIALS environment property, then the authentication 
> mechanism will be "none". This is because the LDAP requires the password to 
> be nonempty for simple authentication. The protocol automatically converts 
> the authentication to "none" if a password is not supplied.”
>  
> Since the LdapAuthenticationProviderImpl.Authenticate method is relying on a 
> NamingException being thrown during creation of initial context, it does not 
> fail when the context result is an “unauthenticated” positive response from 
> the LDAP server. The end result is, one can authenticate with HiveServer2 
> using the LdapAuthenticationProviderImpl with only a user name and an empty 
> password.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9819) Add timeout check inside the HMS server

2015-03-16 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364463#comment-14364463
 ] 

Dong Chen commented on HIVE-9819:
-

Failed test is not related. Verified in local env and passed.

I think this patch is ready for merging.

> Add timeout check inside the HMS server
> ---
>
> Key: HIVE-9819
> URL: https://issues.apache.org/jira/browse/HIVE-9819
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-9819.patch, HIVE-9819.patch, HIVE-9819.patch
>
>
> In HIVE-9253, a timeout check mechanism is added for long running methods in 
> HMS server. We should add this check to each of the inner loops inside the 
> HMS server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9894) Use new parquet Types API builder to construct DATE data type

2015-03-16 Thread Dong Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364469#comment-14364469
 ] 

Dong Chen commented on HIVE-9894:
-

Thanks for your review!! [~Ferd], [~spena], [~brocknoland]

> Use new parquet Types API builder to construct DATE data type
> -
>
> Key: HIVE-9894
> URL: https://issues.apache.org/jira/browse/HIVE-9894
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Dong Chen
>Assignee: Dong Chen
> Attachments: HIVE-9894.patch
>
>
> The DATE type was implemented in HIVE-8119. And new parquet Types API builder 
> was used in HIVE-9657 for all data types. But DATE is missed.
> We should also use new Types API for DATE type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-16 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-9957:
-
Labels: TODOC1.2  (was: )

> Hive 1.1.0 not compatible with Hadoop 2.4.0
> ---
>
> Key: HIVE-9957
> URL: https://issues.apache.org/jira/browse/HIVE-9957
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Vivek Shrivastava
>Assignee: Sergio Peña
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-9957.1.patch
>
>
> Getting this exception while accessing data through Hive. 
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.(Hadoop23Shims.java:1152)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9957) Hive 1.1.0 not compatible with Hadoop 2.4.0

2015-03-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364472#comment-14364472
 ] 

Lefty Leverenz commented on HIVE-9957:
--

Added a TODOC1.2 label.

Does this only apply to Hive 1.1.0?

> Hive 1.1.0 not compatible with Hadoop 2.4.0
> ---
>
> Key: HIVE-9957
> URL: https://issues.apache.org/jira/browse/HIVE-9957
> Project: Hive
>  Issue Type: Bug
>  Components: Encryption
>Reporter: Vivek Shrivastava
>Assignee: Sergio Peña
>  Labels: TODOC1.2
> Fix For: 1.2.0
>
> Attachments: HIVE-9957.1.patch
>
>
> Getting this exception while accessing data through Hive. 
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hdfs.DFSClient.getKeyProvider()Lorg/apache/hadoop/crypto/key/KeyProvider;
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.(Hadoop23Shims.java:1152)
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.createHdfsEncryptionShim(Hadoop23Shims.java:1279)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.getHdfsEncryptionShim(SessionState.java:392)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1756)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1875)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1689)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1427)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:10132)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9977) Compactor not running on partitions after dynamic partitioned insert

2015-03-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-9977:
-
Attachment: HIVE-9977.patch

This patch adds a new thrift call to add partitions after they are moved in the 
MoveTask.  It is called by Hive.loadDynamicPartitions.  It also fixes the 
Initiator to not attempt to compact temp tables or partitioned tables when no 
partitions are present.

> Compactor not running on partitions after dynamic partitioned insert
> 
>
> Key: HIVE-9977
> URL: https://issues.apache.org/jira/browse/HIVE-9977
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-9977.patch
>
>
> When an insert, update, or delete is done using dynamic partitioning the lock 
> is obtained on the table instead of on the individual partitions, since the 
> partitions are not known at lock acquisition time.  The compactor is using 
> the locks to determine which partitions to check to see if they need 
> compacted.  Since the individual partitions aren't locked they aren't checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9984) JoinReorder's getOutputSize is exponential

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364497#comment-14364497
 ] 

Gunther Hagleitner commented on HIVE-9984:
--

+1 your patch is better.

> JoinReorder's getOutputSize is exponential
> --
>
> Key: HIVE-9984
> URL: https://issues.apache.org/jira/browse/HIVE-9984
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-9984.1.patch, HIVE-9984.2.patch
>
>
> Found by [~mmokhtar]. Causes major issues in large plans (50+ joins). Simple 
> fix would be to memoize the recursion. There should also be a flag to switch 
> this opt off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9947) ScriptOperator replaceAll uses unescaped dot and result is not assigned

2015-03-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364525#comment-14364525
 ] 

Gunther Hagleitner commented on HIVE-9947:
--

+1

> ScriptOperator replaceAll uses unescaped dot and result is not assigned
> ---
>
> Key: HIVE-9947
> URL: https://issues.apache.org/jira/browse/HIVE-9947
> Project: Hive
>  Issue Type: Bug
>Reporter: Alexander Pivovarov
>Assignee: Alexander Pivovarov
>Priority: Minor
> Attachments: HIVE-9947.1.patch, HIVE-9947.2.patch, HIVE-9947.3.patch
>
>
> ScriptOperator line 155
> {code}
> //now
> b.replaceAll(".", "_");
> // should be
> b = b.replace('.', '_');
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-16 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364539#comment-14364539
 ] 

Xuefu Zhang commented on HIVE-9697:
---

[~lirui], I don't think we had a closure on this. totalSize is closer to file 
size, while rawDataSize closer to memory size required. While using totalSize 
is more aggressive in taking map join, some file format, such as ORC/Parquet, 
is very good at compression (10x is comment). Thus, if whether to do map join 
is based on file size, the executor can run OOM. On the other hand, rawDateSize 
is more conservative on memory estimation, which also gives less opportunity 
for map-join.

I'm not sure which one is better for Hive on Spark. File size is what 
hive.auto.convert.join.noconditionaltask.size implies and what user can see, 
while rawDataSize is closer to memory required. However, once OOM happens, user 
gets no result. It's worse than a result that comes slower, right?

Any thoughts?

> Hive on Spark is not as aggressive as MR on map join [Spark Branch]
> ---
>
> Key: HIVE-9697
> URL: https://issues.apache.org/jira/browse/HIVE-9697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We have a finding during running some Big-Bench cases:
> when the same small table size threshold is used, Map Join operator will not 
> be generated in Stage Plans for Hive on Spark, while will be generated for 
> Hive on MR.
> For example, When we run BigBench Q25, the meta info of one input ORC table 
> is as below:
> totalSize=1748955 (about 1.5M)
> rawDataSize=123050375 (about 120M)
> If we use the following parameter settings,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=2500;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1; (100M)
> Map Join will be enabled for Hive on MR mode, while will not be enabled for 
> Hive on Spark.
> We found that for Hive on MR, the HDFS file size for the table 
> (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
> will be used to compare with the threshold 100M (smaller than 100M), while 
> for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
> 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
> for this case. And as a result Hive on Spark will get much lower performance 
> data than Hive on MR for this case.
> When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
> MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
> similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >