[jira] [Commented] (HIVE-7341) Support for Table replication across HCatalog instances

2014-08-19 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101912#comment-14101912
 ] 

Lefty Leverenz commented on HIVE-7341:
--

Thanks for the doc note, [~sushanth].  When you say should mostly be covered 
by javadocs and the bug report that leaves a little wiggle room for user docs, 
although I don't see a good place for this in the HCatalog wikidocs.  Would 
this only be done by external systems such as Falcon, or could it also be done 
directly by a Hive/HCat administrator?

 Support for Table replication across HCatalog instances
 ---

 Key: HIVE-7341
 URL: https://issues.apache.org/jira/browse/HIVE-7341
 Project: Hive
  Issue Type: New Feature
  Components: HCatalog
Affects Versions: 0.13.1
Reporter: Mithun Radhakrishnan
Assignee: Mithun Radhakrishnan
 Fix For: 0.14.0

 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, 
 HIVE-7341.4.patch, HIVE-7341.5.patch


 The HCatClient currently doesn't provide very much support for replicating 
 HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
 instances. 
 Systems similar to Apache Falcon might find the need to replicate partition 
 data between 2 clusters, and keep the HCatalog metadata in sync between the 
 two. This poses a couple of problems:
 # The definition of the source table might change (in column schema, I/O 
 formats, record-formats, serde-parameters, etc.) The system will need a way 
 to diff 2 tables and update the target-metastore with the changes. E.g. 
 {code}
 targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
 hcatClient.updateTableSchema(dbName, tableName, targetTable);
 {code}
 # The current {{HCatClient.addPartitions()}} API requires that the 
 partition's schema be derived from the table's schema, thereby requiring that 
 the table-schema be resolved *before* partitions with the new schema are 
 added to the table. This is problematic, because it introduces race 
 conditions when 2 partitions with differing column-schemas (e.g. right after 
 a schema change) are copied in parallel. This can be avoided if each 
 HCatAddPartitionDesc kept track of the partition's schema, in flight.
 # The source and target metastores might be running different/incompatible 
 versions of Hive. 
 The impending patch attempts to address these concerns (with some caveats).
 # {{HCatTable}} now has 
 ## a {{diff()}} method, to compare against another HCatTable instance
 ## a {{resolve(diff)}} method to copy over specified table-attributes from 
 another HCatTable
 ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
 {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
 in other class-loaders may be used for comparison
 # {{HCatPartition}} now provides finer-grained control over a Partition's 
 column-schema, StorageDescriptor settings, etc. This allows partitions to be 
 copied completely from source, with the ability to override specific 
 properties if required (e.g. location).
 # {{HCatClient.updateTableSchema()}} can now update the entire 
 table-definition, not just the column schema.
 # I've cleaned up and removed most of the redundancy between the HCatTable, 
 HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
 separate the table-attributes from the add-table-operation's attributes. By 
 providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
 in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
 deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
 HCatAddPartitionDesc.
 I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7777) add CSV support for Serde

2014-08-19 Thread Ferdinand Xu (JIRA)
Ferdinand Xu created HIVE-:
--

 Summary: add CSV support for Serde
 Key: HIVE-
 URL: https://issues.apache.org/jira/browse/HIVE-
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Ferdinand Xu
Assignee: Ferdinand Xu


There is no official support for csvSerde for hive while there is an open 
source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high 
frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7778) hive deal with sql witch has whitespace character

2014-08-19 Thread peter zhao (JIRA)
peter zhao created HIVE-7778:


 Summary: hive deal with sql witch has whitespace character
 Key: HIVE-7778
 URL: https://issues.apache.org/jira/browse/HIVE-7778
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.1
Reporter: peter zhao
Priority: Minor



i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,becaust 
ibatis usiing xml file to hold the sql str.it has some format,so hive server 
recive the sql like this   \t set hive.exec.dynamic.partition.mode=nonstrict  
,so 
in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it 
deal with \t not very good.then generat variable key is set 
hive.exec.dynamic.partition.mode, and the right key may be 
hive.exec.dynamic.partition.mode, so my next select by partition sql throw 
a strict exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7778) hive deal with sql witch has whitespace character

2014-08-19 Thread peter zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

peter zhao updated HIVE-7778:
-

Description: 
i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because 
ibatis usiing xml file to hold the sql str.it has some format,so hive server 
recive the sql like this   \t set hive.exec.dynamic.partition.mode=nonstrict  
,so 
in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it 
deal with \t or any other whitespace charactors not very good.then generat 
variable key is set hive.exec.dynamic.partition.mode, and the right key may 
be hive.exec.dynamic.partition.mode, so my next select by partition sql 
throw a strict exception.

  String command = getStatement().trim();
  String[] tokens = statement.split(\\s); //this position may be 
change to command.split(\\s); 
  String commandArgs = command.substring(tokens[0].length()).trim();

  was:

i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,becaust 
ibatis usiing xml file to hold the sql str.it has some format,so hive server 
recive the sql like this   \t set hive.exec.dynamic.partition.mode=nonstrict  
,so 
in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it 
deal with \t not very good.then generat variable key is set 
hive.exec.dynamic.partition.mode, and the right key may be 
hive.exec.dynamic.partition.mode, so my next select by partition sql throw 
a strict exception.


 hive deal with sql witch has whitespace character
 -

 Key: HIVE-7778
 URL: https://issues.apache.org/jira/browse/HIVE-7778
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.1
Reporter: peter zhao
Priority: Minor

 i run sql set hive.exec.dynamic.partition.mode=nonstrict with 
 ibatis,because ibatis usiing xml file to hold the sql str.it has some 
 format,so hive server recive the sql like this   \t set 
 hive.exec.dynamic.partition.mode=nonstrict  ,so 
 in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, 
 it deal with \t or any other whitespace charactors not very good.then 
 generat variable key is set hive.exec.dynamic.partition.mode, and the right 
 key may be hive.exec.dynamic.partition.mode, so my next select by 
 partition sql throw a strict exception.
   String command = getStatement().trim();
   String[] tokens = statement.split(\\s); //this position may be 
 change to command.split(\\s); 
   String commandArgs = command.substring(tokens[0].length()).trim();



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7778) hive deal with sql witch has whitespace character

2014-08-19 Thread peter zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

peter zhao updated HIVE-7778:
-

Description: 
i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because 
ibatis usiing xml file to hold the sql str.it has some format,so hive server 
recive the sql like this   \t set hive.exec.dynamic.partition.mode=nonstrict  
,so 
in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it 
deal with \t or any other whitespace charactors not very good.then generat 
variable key is set hive.exec.dynamic.partition.mode, and the right key may 
be hive.exec.dynamic.partition.mode, so my next select by partition sql 
throw a strict exception.

  String command = getStatement().trim();
  String[] tokens = statement.split(\\\s); //this position may be 
change to command.split(\\\s); 
  String commandArgs = command.substring(tokens\[0\].length()).trim();

  was:
i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because 
ibatis usiing xml file to hold the sql str.it has some format,so hive server 
recive the sql like this   \t set hive.exec.dynamic.partition.mode=nonstrict  
,so 
in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it 
deal with \t or any other whitespace charactors not very good.then generat 
variable key is set hive.exec.dynamic.partition.mode, and the right key may 
be hive.exec.dynamic.partition.mode, so my next select by partition sql 
throw a strict exception.

  String command = getStatement().trim();
  String[] tokens = statement.split(\\s); //this position may be 
change to command.split(\\s); 
  String commandArgs = command.substring(tokens[0].length()).trim();


 hive deal with sql witch has whitespace character
 -

 Key: HIVE-7778
 URL: https://issues.apache.org/jira/browse/HIVE-7778
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.1
Reporter: peter zhao
Priority: Minor

 i run sql set hive.exec.dynamic.partition.mode=nonstrict with 
 ibatis,because ibatis usiing xml file to hold the sql str.it has some 
 format,so hive server recive the sql like this   \t set 
 hive.exec.dynamic.partition.mode=nonstrict  ,so 
 in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, 
 it deal with \t or any other whitespace charactors not very good.then 
 generat variable key is set hive.exec.dynamic.partition.mode, and the right 
 key may be hive.exec.dynamic.partition.mode, so my next select by 
 partition sql throw a strict exception.
   String command = getStatement().trim();
   String[] tokens = statement.split(\\\s); //this position may be 
 change to command.split(\\\s); 
   String commandArgs = command.substring(tokens\[0\].length()).trim();



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7513) Add ROW__ID VirtualColumn

2014-08-19 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101926#comment-14101926
 ] 

Lefty Leverenz commented on HIVE-7513:
--

Is this just behind-the-scenes or does it need some user doc?

 Add ROW__ID VirtualColumn
 -

 Key: HIVE-7513
 URL: https://issues.apache.org/jira/browse/HIVE-7513
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.14.0

 Attachments: HIVE-7513.10.patch, HIVE-7513.11.patch, 
 HIVE-7513.12.patch, HIVE-7513.13.patch, HIVE-7513.14.patch, 
 HIVE-7513.3.patch, HIVE-7513.4.patch, HIVE-7513.5.patch, HIVE-7513.8.patch, 
 HIVE-7513.9.patch, HIVE-7513.codeOnly.txt


 In order to support Update/Delete we need to read rowId from AcidInputFormat 
 and pass that along through the operator pipeline (built from the WHERE 
 clause of the SQL Statement) so that it can be written to the delta file by 
 the update/delete (sink) operators.
 The parser will add this column to the projection list to make sure it's 
 passed along.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101937#comment-14101937
 ] 

Hive QA commented on HIVE-6329:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662662/HIVE-6329.10.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5821 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/395/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/395/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-395/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662662

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, 
 HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, 
 HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, 
 HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7728) Enable q-tests for TABLESAMPLE feature [Spark Branch]

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101940#comment-14101940
 ] 

Hive QA commented on HIVE-7728:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662698/HIVE-7728.1-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5927 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/60/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/60/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-60/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662698

 Enable q-tests for TABLESAMPLE feature  [Spark Branch]
 --

 Key: HIVE-7728
 URL: https://issues.apache.org/jira/browse/HIVE-7728
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7728.1-spark.patch


 Enable q-tests for TABLESAMPLE feature since automatic test environment is 
 ready.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7779) Support windowing and analytic functions.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7779:


Issue Type: Sub-task  (was: Task)
Parent: HIVE-7292

 Support windowing and analytic functions.[Spark Branch]
 ---

 Key: HIVE-7779
 URL: https://issues.apache.org/jira/browse/HIVE-7779
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 Verify the functionality and fix found issues, which should include:
 # windowing functions
 # the OVER clause
 # analytic functions



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7779) Support windowing and analytic functions.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7779:
---

 Summary: Support windowing and analytic functions.[Spark Branch]
 Key: HIVE-7779
 URL: https://issues.apache.org/jira/browse/HIVE-7779
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li


Verify the functionality and fix found issues, which should include:
# windowing functions
# the OVER clause
# analytic functions



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7780) Query with OVER clause return duplicate results[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7780:
---

 Summary: Query with OVER clause return duplicate results[Spark 
Branch]
 Key: HIVE-7780
 URL: https://issues.apache.org/jira/browse/HIVE-7780
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li


A simple query with the OVER clause return duplicate results.
{code:sql}
hive select address, count(id) over(partition by address) from test;
Query ID = root_2014081915_f5506fcc-4950-424b-a134-56fc5b06d6eb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapreduce.job.reduces=number
OK
QD  1
SH  2
SH  2
SZ  2
SZ  2
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-19 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101968#comment-14101968
 ] 

Lars Francke commented on HIVE-5799:


Thanks for getting to this. It's needed badly!

The patch looks mostly good, I have a couple of minor comments regarding 
style/checkstyle. If you're interested in them could you please update RB?

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, 
 HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, 
 HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6329) Support column level encryption/decryption

2014-08-19 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101969#comment-14101969
 ] 

Navis commented on HIVE-6329:
-

Cannot reproduce fail of testCliDriver_hbase_queries, both in hadoop-1 and 
hadoop-2.

 Support column level encryption/decryption
 --

 Key: HIVE-6329
 URL: https://issues.apache.org/jira/browse/HIVE-6329
 Project: Hive
  Issue Type: New Feature
  Components: Security, Serializers/Deserializers
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, 
 HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, 
 HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, 
 HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt


 Receiving some requirements on encryption recently but hive is not supporting 
 it. Before the full implementation via HIVE-5207, this might be useful for 
 some cases.
 {noformat}
 hive create table encode_test(id int, name STRING, phone STRING, address 
 STRING) 
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
  WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 
 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') 
 STORED AS TEXTFILE;
 OK
 Time taken: 0.584 seconds
 hive insert into table encode_test select 
 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows);
 ..
 OK
 Time taken: 5.121 seconds
 hive select * from encode_test;
 OK
 100   navis MDEwLTAwMDAtMDAwMA==  U2VvdWwsIFNlb2Nobw==
 Time taken: 0.078 seconds, Fetched: 1 row(s)
 hive 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7733) Ambiguous column reference error on query

2014-08-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7733:


Status: Patch Available  (was: Open)

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
 Attachments: HIVE-7733.1.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7733) Ambiguous column reference error on query

2014-08-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7733:


Attachment: HIVE-7733.1.patch.txt

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
 Attachments: HIVE-7733.1.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7771) ORC PPD fails for some decimal predicates

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101990#comment-14101990
 ] 

Hive QA commented on HIVE-7771:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662665/HIVE-7771.1.patch

{color:green}SUCCESS:{color} +1 5819 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/396/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/396/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-396/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662665

 ORC PPD fails for some decimal predicates
 -

 Key: HIVE-7771
 URL: https://issues.apache.org/jira/browse/HIVE-7771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7771.1.patch


 Some queries like 
 {code}
 select * from table where dcol=11.22BD;
 {code}
 fails when ORC predicate pushdown is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7780) Query with OVER clause return duplicate results[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li resolved HIVE-7780.
-

Resolution: Not a Problem

 Query with OVER clause return duplicate results[Spark Branch]
 -

 Key: HIVE-7780
 URL: https://issues.apache.org/jira/browse/HIVE-7780
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 A simple query with the OVER clause return duplicate results.
 {code:sql}
 hive select address, count(id) over(partition by address) from test;
 Query ID = root_2014081915_f5506fcc-4950-424b-a134-56fc5b06d6eb
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks determined at compile time: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapreduce.job.reduces=number
 OK
 QD1
 SH2
 SH2
 SZ2
 SZ2
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7781:
---

 Summary: Enable windowing and analytic function qtests.[Spark 
Branch]
 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7781:


Issue Type: Sub-task  (was: Task)
Parent: HIVE-7292

 Enable windowing and analytic function qtests.[Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102040#comment-14102040
 ] 

Hive QA commented on HIVE-5799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662668/HIVE-5799.11.patch.txt

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5820 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
org.apache.hive.jdbc.miniHS2.TestHiveServer2SessionTimeout.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/397/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/397/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-397/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662668

 session/operation timeout for hiveserver2
 -

 Key: HIVE-5799
 URL: https://issues.apache.org/jira/browse/HIVE-5799
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, 
 HIVE-5799.11.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, 
 HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, 
 HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt


 Need some timeout facility for preventing resource leakages from instable  or 
 bad clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7781:


Status: Patch Available  (was: Open)

 Enable windowing and analytic function qtests.[Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7781.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7781:


Attachment: HIVE-7781.1-spark.patch

Miss ptf.q and ptf_streaming.q as they depends on join operation.

 Enable windowing and analytic function qtests.[Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7781.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7779) Support windowing and analytic functions.[Spark Branch]

2014-08-19 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li resolved HIVE-7779.
-

Resolution: Fixed

verified through qtest and in real test environment, no issue found.

 Support windowing and analytic functions.[Spark Branch]
 ---

 Key: HIVE-7779
 URL: https://issues.apache.org/jira/browse/HIVE-7779
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 Verify the functionality and fix found issues, which should include:
 # windowing functions
 # the OVER clause
 # analytic functions



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2014-08-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7664:


Status: Patch Available  (was: Open)

 VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized 
 execution and takes 25% CPU
 -

 Key: HIVE-7664
 URL: https://issues.apache.org/jira/browse/HIVE-7664
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7664.1.patch.txt


 In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
 VectorizedBatchUtil.addRowToBatchFrom().
 Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
 it wasn't optimized for Vectorized processing.
 addRowToBatchFrom is called for every row and for each row and every column 
 in the batch getPrimitiveCategory is called to figure the type of each 
 column, column types are stored in a HashMap, for VectorGroupByOperator 
 columns types won't change between batches, so column types shouldn't be 
 looked up for every row.
 I recommend storing the column type in StructObjectInspector so that other 
 components can leverage this optimization.
 Also addRowToBatchFrom has a case statement for every row and every column 
 used for type casting I recommend encapsulating the type logic in templatized 
 methods.   
 {code}
 Stack Trace   Sample CountPercentage(%)
 VectorizedBatchUtil.addRowToBatchFrom 86  26.543
AbstractPrimitiveObjectInspector.getPrimitiveCategory()34  10.494
LazyBinaryStructObjectInspector.getStructFieldData 25  7.716
StandardStructObjectInspector.getStructFieldData   4   1.235
 {code}
 The query used : 
 {code}
 select 
 ss_sold_date_sk
 from
 store_sales
 where
 ss_sold_date between '1998-01-01' and '1998-06-01'
 group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
 having sum(ss_list_price)  50;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2014-08-19 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7664:


Attachment: HIVE-7664.1.patch.txt

Preliminary test

 VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized 
 execution and takes 25% CPU
 -

 Key: HIVE-7664
 URL: https://issues.apache.org/jira/browse/HIVE-7664
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7664.1.patch.txt


 In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
 VectorizedBatchUtil.addRowToBatchFrom().
 Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
 it wasn't optimized for Vectorized processing.
 addRowToBatchFrom is called for every row and for each row and every column 
 in the batch getPrimitiveCategory is called to figure the type of each 
 column, column types are stored in a HashMap, for VectorGroupByOperator 
 columns types won't change between batches, so column types shouldn't be 
 looked up for every row.
 I recommend storing the column type in StructObjectInspector so that other 
 components can leverage this optimization.
 Also addRowToBatchFrom has a case statement for every row and every column 
 used for type casting I recommend encapsulating the type logic in templatized 
 methods.   
 {code}
 Stack Trace   Sample CountPercentage(%)
 VectorizedBatchUtil.addRowToBatchFrom 86  26.543
AbstractPrimitiveObjectInspector.getPrimitiveCategory()34  10.494
LazyBinaryStructObjectInspector.getStructFieldData 25  7.716
StandardStructObjectInspector.getStructFieldData   4   1.235
 {code}
 The query used : 
 {code}
 select 
 ss_sold_date_sk
 from
 store_sales
 where
 ss_sold_date between '1998-01-01' and '1998-06-01'
 group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
 having sum(ss_list_price)  50;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7774) Issues with location path for temporary external tables

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102089#comment-14102089
 ] 

Hive QA commented on HIVE-7774:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662678/HIVE-7774.1.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5819 tests executed
*Failed tests:*
{noformat}
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/398/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/398/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-398/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662678

 Issues with location path for temporary external tables
 ---

 Key: HIVE-7774
 URL: https://issues.apache.org/jira/browse/HIVE-7774
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7774.1.patch


 Depending on the location string passed into temp external table, a query 
 requiring a map/reduce job will fail.  Example:
 {noformat}
 create temporary external table tmp1 (c1 string) location '/tmp/tmp1';
 describe extended tmp1;
 select count(*) from tmp1;
 {noformat}
 Will result in the following error:
 {noformat}
 Diagnostic Messages for this Task:
 Error: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
   ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
   ... 22 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:123)
   ... 22 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:398)
   ... 23 more
 FAILED: Execution 

[jira] [Commented] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102093#comment-14102093
 ] 

Hive QA commented on HIVE-7781:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662728/HIVE-7781.1-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5925 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/61/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/61/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-61/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662728

 Enable windowing and analytic function qtests.[Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7781.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Error with Meta Data from SELECT 0 as mx

2014-08-19 Thread Damien Carol

Hi,

It seems there are a bug in trunk version.

When I'm doing this query :

create table mag_new as
SELECT row_sequence() + tbl.mx, nommagasin, idsociete, idregion, 
codepostal from magasin, (select 0 as mx) as tbl;


Metastore complains about missing table : _dummy_database._dummy_table


Complete log :

2014-08-19 12:05:02,673 ERROR [pool-5-thread-3]: stats.StatsUtils 
(StatsUtils.java:getTableColumnStats(474)) - Failed to retrieve table 
statistics:
org.apache.hadoop.hive.ql.metadata.HiveException: 
NoSuchObjectException(message:Specified database/table does not exist : 
_dummy_database._dummy_table)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getTableColumnStatistics(Hive.java:2563)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:470)
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:147)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:100)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
at 
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:149)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9484)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)

at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:413)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309)
at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1003)
at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:997)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:99)
at 
org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:170)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:306)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:293)

at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)

at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508)
at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)

at com.sun.proxy.$Proxy21.executeStatementAsync(Unknown Source)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:259)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:346)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at 
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

--

Damien CAROL

 * tél : +33 (0)4 74 96 88 14
 * fax : +33 (0)4 74 96 31 88
 * email :dca...@blitzbs.com mailto:dca...@blitzbs.com

BLITZ BUSINESS SERVICE



[jira] [Commented] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102116#comment-14102116
 ] 

Hive QA commented on HIVE-7646:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662702/HIVE-7646.1.patch

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 5831 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_17
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/399/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/399/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-399/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662702

 Modify parser to support new grammar for Insert,Update,Delete
 -

 Key: HIVE-7646
 URL: https://issues.apache.org/jira/browse/HIVE-7646
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-7646.1.patch, HIVE-7646.patch


 need parser to recognize constructs such as :
 {code:sql}
 INSERT INTO Cust (Customer_Number, Balance, Address)
 VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave');
 {code}
 {code:sql}
 DELETE FROM Cust WHERE Balance  5.0
 {code}
 {code:sql}
 UPDATE Cust
 SET column1=value1,column2=value2,...
 WHERE some_column=some_value
 {code}
 also useful
 {code:sql}
 select a,b from values((1,2),(3,4)) as FOO(a,b)
 {code}
 This makes writing tests easier.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]

2014-08-19 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-7773:
-

Attachment: HIVE-7773.spark.patch

I found the problem is that IOContext is used to store and retrieve input path 
for the operators. IOContext is a singleton when I submit the query via hive 
cli. Since spark tasks are threads within a JVM, the input path in IOContext 
will get messed up if concurrent tasks have different input paths. In my test 
case, two map works run concurrently for two different tables.
This patch makes sure we always use a thread local IOContext.

 Union all query finished with errors [Spark Branch]
 ---

 Key: HIVE-7773
 URL: https://issues.apache.org/jira/browse/HIVE-7773
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Priority: Critical
 Attachments: HIVE-7773.spark.patch


 When I run a union all query, I found the following error in spark log (the 
 query finished with correct results though):
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 {noformat}
 Judging from the log, I think we don't properly handle the input paths when 
 cloning the job conf, so it may also affect other queries with multiple maps 
 or reduces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102153#comment-14102153
 ] 

Hive QA commented on HIVE-7733:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662715/HIVE-7733.1.patch.txt

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5819 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_ambiguous_col
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/400/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/400/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-400/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662715

 Ambiguous column reference error on query
 -

 Key: HIVE-7733
 URL: https://issues.apache.org/jira/browse/HIVE-7733
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Jason Dere
 Attachments: HIVE-7733.1.patch.txt


 {noformat}
 CREATE TABLE agg1 
   ( 
  col0 INT, 
  col1 STRING, 
  col2 DOUBLE 
   ); 
 explain SELECT single_use_subq11.a1 AS a1, 
single_use_subq11.a2 AS a2 
 FROM   (SELECT Sum(agg1.col2) AS a1 
 FROM   agg1 
 GROUP  BY agg1.col0) single_use_subq12 
JOIN (SELECT alias.a2 AS a0, 
 alias.a1 AS a1, 
 alias.a1 AS a2 
  FROM   (SELECT agg1.col1 AS a0, 
 '42'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1 
  UNION ALL 
  SELECT agg1.col1 AS a0, 
 '41'  AS a1, 
 agg1.col0 AS a2 
  FROM   agg1) alias 
  GROUP  BY alias.a2, 
alias.a1) single_use_subq11 
  ON ( single_use_subq11.a0 = single_use_subq11.a0 );
 {noformat}
 Gets the following error:
 FAILED: SemanticException [Error 10007]: Ambiguous column reference a2
 Looks like this query had been working in 0.12 but starting failing with this 
 error in 0.13



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102188#comment-14102188
 ] 

Hive QA commented on HIVE-7664:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662732/HIVE-7664.1.patch.txt

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5819 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/401/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/401/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-401/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662732

 VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized 
 execution and takes 25% CPU
 -

 Key: HIVE-7664
 URL: https://issues.apache.org/jira/browse/HIVE-7664
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7664.1.patch.txt


 In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
 VectorizedBatchUtil.addRowToBatchFrom().
 Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
 it wasn't optimized for Vectorized processing.
 addRowToBatchFrom is called for every row and for each row and every column 
 in the batch getPrimitiveCategory is called to figure the type of each 
 column, column types are stored in a HashMap, for VectorGroupByOperator 
 columns types won't change between batches, so column types shouldn't be 
 looked up for every row.
 I recommend storing the column type in StructObjectInspector so that other 
 components can leverage this optimization.
 Also addRowToBatchFrom has a case statement for every row and every column 
 used for type casting I recommend encapsulating the type logic in templatized 
 methods.   
 {code}
 Stack Trace   Sample CountPercentage(%)
 VectorizedBatchUtil.addRowToBatchFrom 86  26.543
AbstractPrimitiveObjectInspector.getPrimitiveCategory()34  10.494
LazyBinaryStructObjectInspector.getStructFieldData 25  7.716
StandardStructObjectInspector.getStructFieldData   4   1.235
 {code}
 The query used : 
 {code}
 select 
 ss_sold_date_sk
 from
 store_sales
 where
 ss_sold_date between '1998-01-01' and '1998-06-01'
 group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
 having sum(ss_list_price)  50;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2014-08-19 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102204#comment-14102204
 ] 

Remus Rusanu commented on HIVE-7664:


shouldn't there be a case for DECIMAL primitive category? I see a 
DecimalAccessor, but no case covering it in the BatchAccessor  ctor.

 VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized 
 execution and takes 25% CPU
 -

 Key: HIVE-7664
 URL: https://issues.apache.org/jira/browse/HIVE-7664
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7664.1.patch.txt


 In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
 VectorizedBatchUtil.addRowToBatchFrom().
 Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
 it wasn't optimized for Vectorized processing.
 addRowToBatchFrom is called for every row and for each row and every column 
 in the batch getPrimitiveCategory is called to figure the type of each 
 column, column types are stored in a HashMap, for VectorGroupByOperator 
 columns types won't change between batches, so column types shouldn't be 
 looked up for every row.
 I recommend storing the column type in StructObjectInspector so that other 
 components can leverage this optimization.
 Also addRowToBatchFrom has a case statement for every row and every column 
 used for type casting I recommend encapsulating the type logic in templatized 
 methods.   
 {code}
 Stack Trace   Sample CountPercentage(%)
 VectorizedBatchUtil.addRowToBatchFrom 86  26.543
AbstractPrimitiveObjectInspector.getPrimitiveCategory()34  10.494
LazyBinaryStructObjectInspector.getStructFieldData 25  7.716
StandardStructObjectInspector.getStructFieldData   4   1.235
 {code}
 The query used : 
 {code}
 select 
 ss_sold_date_sk
 from
 store_sales
 where
 ss_sold_date between '1998-01-01' and '1998-06-01'
 group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
 having sum(ss_list_price)  50;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-19 Thread Suhas Vasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Vasu updated HIVE-7762:
-

Status: Patch Available  (was: Open)

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.2.patch, HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-19 Thread Suhas Vasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Vasu updated HIVE-7762:
-

Attachment: HIVE-7762.2.patch

Rebasing the patch

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.2.patch, HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-19 Thread Suhas Vasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suhas Vasu updated HIVE-7762:
-

Status: Open  (was: Patch Available)

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.2.patch, HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341

2014-08-19 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102304#comment-14102304
 ] 

Mithun Radhakrishnan commented on HIVE-7770:


Yikes, will post a patch shortly.

 Undo backward-incompatible behaviour change introduced by HIVE-7341
 ---

 Key: HIVE-7770
 URL: https://issues.apache.org/jira/browse/HIVE-7770
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Sushanth Sowmyan
Assignee: Mithun Radhakrishnan
  Labels: regression

 HIVE-7341 introduced a backward-incompatibility regression in Exception 
 signatures for HCatPartition.getColumns() that breaks compilation for 
 external tools like Falcon. This bug tracks a scrub of any other issues we 
 discover, so we can put them back to how it used to be. This bug needs 
 resolution in the same release as HIVE-7341, and thus, must be resolved in 
 0.14.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102306#comment-14102306
 ] 

Hive QA commented on HIVE-7762:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662754/HIVE-7762.2.patch

{color:green}SUCCESS:{color} +1 5819 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/402/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/402/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-402/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662754

 Enhancement while getting partitions via webhcat client
 ---

 Key: HIVE-7762
 URL: https://issues.apache.org/jira/browse/HIVE-7762
 Project: Hive
  Issue Type: Improvement
  Components: WebHCat
Reporter: Suhas Vasu
Priority: Minor
 Attachments: HIVE-7762.2.patch, HIVE-7762.patch


 Hcatalog creates partitions in lower case, whereas getting partitions from 
 hcatalog via webhcat client doesn't handle this. So the client starts 
 throwing exceptions.
 Ex:
 CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
 STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
 TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
 Then i try to get partitions by:
 {noformat}
 String inputTableName = in_table;
 String database = default;
 MapString, String partitionSpec = new HashMapString, String();
 partitionSpec.put(Year, 2014);
 partitionSpec.put(Month, 08);
 partitionSpec.put(Date, 11);
 partitionSpec.put(Hour, 00);
 partitionSpec.put(Minute, 00);
 HCatClient client = get(catalogUrl);
 HCatPartition hCatPartition = client.getPartition(database, 
 inputTableName, partitionSpec);
 {noformat}
 This throws up saying:
 {noformat}
 Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : 
 Exception occurred while processing HCat request : Invalid partition-key 
 specified: year
   at 
 org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
 {noformat}
 The same code works if i do
 {noformat}
 partitionSpec.put(year, 2014);
 partitionSpec.put(month, 08);
 partitionSpec.put(date, 11);
 partitionSpec.put(hour, 00);
 partitionSpec.put(minute, 00);
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

2014-08-19 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102305#comment-14102305
 ] 

Mostafa Mokhtar commented on HIVE-7664:
---

[~navis]
Can you please add a code review.


 VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized 
 execution and takes 25% CPU
 -

 Key: HIVE-7664
 URL: https://issues.apache.org/jira/browse/HIVE-7664
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7664.1.patch.txt


 In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in 
 VectorizedBatchUtil.addRowToBatchFrom().
 Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like 
 it wasn't optimized for Vectorized processing.
 addRowToBatchFrom is called for every row and for each row and every column 
 in the batch getPrimitiveCategory is called to figure the type of each 
 column, column types are stored in a HashMap, for VectorGroupByOperator 
 columns types won't change between batches, so column types shouldn't be 
 looked up for every row.
 I recommend storing the column type in StructObjectInspector so that other 
 components can leverage this optimization.
 Also addRowToBatchFrom has a case statement for every row and every column 
 used for type casting I recommend encapsulating the type logic in templatized 
 methods.   
 {code}
 Stack Trace   Sample CountPercentage(%)
 VectorizedBatchUtil.addRowToBatchFrom 86  26.543
AbstractPrimitiveObjectInspector.getPrimitiveCategory()34  10.494
LazyBinaryStructObjectInspector.getStructFieldData 25  7.716
StandardStructObjectInspector.getStructFieldData   4   1.235
 {code}
 The query used : 
 {code}
 select 
 ss_sold_date_sk
 from
 store_sales
 where
 ss_sold_date between '1998-01-01' and '1998-06-01'
 group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
 having sum(ss_list_price)  50;
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-6930) Beeline should nicely format timestamps when displaying results

2014-08-19 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-6930:
--

Assignee: Ferdinand Xu

 Beeline should nicely format timestamps when displaying results
 ---

 Key: HIVE-6930
 URL: https://issues.apache.org/jira/browse/HIVE-6930
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Gwen Shapira
Assignee: Ferdinand Xu

 When I have a timestamp column in my query, I get the results back as the 
 bigint with number of seconds since epoch. Not very user friendly or readable.
 This means that all my queries need to include stuff like:
 select from_unixtime(cast(round(transaction_ts/1000) as bigint))...
 which is not too readable either :)
 Other SQL query tools automatically convert timestamps to some standard 
 readable date format. They even let users specify the default formatting by 
 setting a parameter (for example NLS_DATE_FORMAT for Oracle).
 I'd love to see something like that in beeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6930) Beeline should nicely format timestamps when displaying results

2014-08-19 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102326#comment-14102326
 ] 

Lars Francke commented on HIVE-6930:


If this is implemented then it should be an optional thing, disabled by 
default. Otherwise you'd run into issues with timezones etc.

 Beeline should nicely format timestamps when displaying results
 ---

 Key: HIVE-6930
 URL: https://issues.apache.org/jira/browse/HIVE-6930
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Gwen Shapira
Assignee: Ferdinand Xu

 When I have a timestamp column in my query, I get the results back as the 
 bigint with number of seconds since epoch. Not very user friendly or readable.
 This means that all my queries need to include stuff like:
 select from_unixtime(cast(round(transaction_ts/1000) as bigint))...
 which is not too readable either :)
 Other SQL query tools automatically convert timestamps to some standard 
 readable date format. They even let users specify the default formatting by 
 setting a parameter (for example NLS_DATE_FORMAT for Oracle).
 I'd love to see something like that in beeline.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7782) tez default engine not overridden by hive.execution.engine=mr in hive cli session

2014-08-19 Thread Hari Sekhon (JIRA)
Hari Sekhon created HIVE-7782:
-

 Summary: tez default engine not overridden by 
hive.execution.engine=mr in hive cli session
 Key: HIVE-7782
 URL: https://issues.apache.org/jira/browse/HIVE-7782
 Project: Hive
  Issue Type: Bug
  Components: CLI, Tez
 Environment: HDP2.1
Reporter: Hari Sekhon
Priority: Minor


I've deployed hive.execution.engine=tez as the default on my secondary HDP 
cluster I find that hive cli interactive sessions where I do
{code}
set hive.execution.engine=mr
{code}
still execute with Tez as shown in the Resource Manager applications view. Now 
this may make sense since it's connected a Tez session by that point but it's 
also misleading because the job progress output in the cli changes to look like 
MapReduce rather than Tez and the query time is increased although only to 
15-16 secs rather than the 25-30+ secs I usually see with MR. The Resource 
Manager shows both of these jobs as TEZ application type. Is this a bug in the 
way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is 
reporting it?
{code}
hive

Logging initialized using configuration in 
file:/etc/hive/conf.dist/hive-log4j.properties
hive select count(*) from sample_07;
Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c
Total jobs = 1
Launching Job 1 out of 1


Status: Running (application id: application_1408444369445_0031)

Map 1: -/-  Reducer 2: 0/1
Map 1: 0/1  Reducer 2: 0/1
Map 1: 0/1  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 1/1
Status: Finished successfully
OK
823
Time taken: 8.492 seconds, Fetched: 1 row(s)
hive set hive.execution.engine=mr;
hive select count(*) from sample_07;
Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapreduce.job.reduces=number
Starting Job = job_1408444369445_0032, Tracking URL = 
http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1408444369445_0032
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2014-08-19 16:48:35,242 Stage-1 map = 0%,  reduce = 0%
2014-08-19 16:48:40,539 Stage-1 map = 100%,  reduce = 0%
2014-08-19 16:48:44,676 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_1408444369445_0032
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
823
Time taken: 16.579 seconds, Fetched: 1 row(s)
{code}
If I exit hive shell and restart it instead using {code}--hiveconf 
hive.execution.engine=mr{code} to set before session is established then it 
does a proper MapReduce job according to RM and it also takes the longer 
expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez 
session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7782) tez default engine not overridden by hive.execution.engine=mr in hive cli session

2014-08-19 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HIVE-7782:
--

Description: 
I've deployed hive.execution.engine=tez as the default on my secondary HDP 
cluster I find that hive cli interactive sessions where I do
{code}
set hive.execution.engine=mr
{code}
still execute with Tez as shown in the Resource Manager applications view. Now 
this may make sense since it's connected a Tez session by that point but it's 
also misleading because the job progress output in the cli changes to look like 
MapReduce rather than Tez and the query time is increased from 8 to to 15-16 
secs but still less than the 25-30+ secs I usually see with MR. The Resource 
Manager shows both of these jobs as TEZ application type regardless of setting 
hive.execution.engine=mr. Is this a bug in the way Hive is submitting the job 
(Tez vs MR) or a bug in the way the RM is reporting it?
{code}
hive

Logging initialized using configuration in 
file:/etc/hive/conf.dist/hive-log4j.properties
hive select count(*) from sample_07;
Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c
Total jobs = 1
Launching Job 1 out of 1


Status: Running (application id: application_1408444369445_0031)

Map 1: -/-  Reducer 2: 0/1
Map 1: 0/1  Reducer 2: 0/1
Map 1: 0/1  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 1/1
Status: Finished successfully
OK
823
Time taken: 8.492 seconds, Fetched: 1 row(s)
hive set hive.execution.engine=mr;
hive select count(*) from sample_07;
Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapreduce.job.reduces=number
Starting Job = job_1408444369445_0032, Tracking URL = 
http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1408444369445_0032
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2014-08-19 16:48:35,242 Stage-1 map = 0%,  reduce = 0%
2014-08-19 16:48:40,539 Stage-1 map = 100%,  reduce = 0%
2014-08-19 16:48:44,676 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_1408444369445_0032
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
823
Time taken: 16.579 seconds, Fetched: 1 row(s)
{code}
If I exit hive shell and restart it instead using {code}--hiveconf 
hive.execution.engine=mr{code} to set before session is established then it 
does a proper MapReduce job according to RM and it also takes the longer 
expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez 
session.

  was:
I've deployed hive.execution.engine=tez as the default on my secondary HDP 
cluster I find that hive cli interactive sessions where I do
{code}
set hive.execution.engine=mr
{code}
still execute with Tez as shown in the Resource Manager applications view. Now 
this may make sense since it's connected a Tez session by that point but it's 
also misleading because the job progress output in the cli changes to look like 
MapReduce rather than Tez and the query time is increased although only to 
15-16 secs rather than the 25-30+ secs I usually see with MR. The Resource 
Manager shows both of these jobs as TEZ application type. Is this a bug in the 
way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is 
reporting it?
{code}
hive

Logging initialized using configuration in 
file:/etc/hive/conf.dist/hive-log4j.properties
hive select count(*) from sample_07;
Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c
Total jobs = 1
Launching Job 1 out of 1


Status: Running (application id: application_1408444369445_0031)

Map 1: -/-  Reducer 2: 0/1
Map 1: 0/1  Reducer 2: 0/1
Map 1: 0/1  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 0/1
Map 1: 1/1  Reducer 2: 1/1
Status: Finished successfully
OK
823
Time taken: 8.492 seconds, Fetched: 1 row(s)
hive set hive.execution.engine=mr;
hive select count(*) from sample_07;
Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapreduce.job.reduces=number
Starting Job = job_1408444369445_0032, Tracking URL = 
http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/
Kill 

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-19 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review50977
---


Hi,

This patch looks really good! I was not clear when I said how we should define 
the new fetchResults method. I hope my response below is clear, if not, please 
let me know!


service/src/java/org/apache/hive/service/cli/CLIServiceClient.java
https://reviews.apache.org/r/24293/#comment88863

Thank you very much for removing the thrift enum! That resolves the thrift 
enum compatability issue!

I should have been a more clear on the other issue I was describing. I have 
felt for some time we should change the way we do RPC in Hive. Today we define 
specific methods for the use case at hand. This causes method explosion. For 
example after this patch we would have three method signatures which fetch 
results.

Going forward I think we should define methods differently. For example, 
for this method I think we should define the classes:

FetchResultsRequest and FetchResultsResponse

and then have a new method:

FetchResultsResponse fetchResults(FetchResultsRequest request) throws 
HiveSQLException 

and then all of the arguments can be defined inside FetchResultsRequest. 
That way everytime we add an argument, we don't to define a new public RPC 
method. I have described this approach on this mail here:


http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E


- Brock Noland


On Aug. 14, 2014, 3:09 p.m., Dong Chen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24293/
 ---
 
 (Updated Aug. 14, 2014, 3:09 p.m.)
 
 
 Review request for hive.
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-4629: HS2 should support an API to retrieve query logs
 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
   service/if/TCLIService.thrift 80086b4 
   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
   
 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
  808b73f 
   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
 f665146 
   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
   
 service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
  c9fd5f9 
   
 service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
  caf413d 
   
 service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
  fd4e94d 
   
 service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
  ebca996 
   
 service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
  05991e0 
   
 service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
  315dbea 
   
 service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
  0ec2543 
   
 service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
  3d3fddc 
   
 service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
 PRE-CREATION 
   
 service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
 e0d17a1 
   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
 45fbd61 
   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
 PRE-CREATION 
   
 service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
 21c33bc 
   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
 de54ca1 
   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
 9785e95 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
 4c3164e 
   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
 b39d64d 
   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
 816bea4 
   

[jira] [Commented] (HIVE-6093) table creation should fail when user does not have permissions on db

2014-08-19 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102368#comment-14102368
 ] 

Thiruvel Thirumoolan commented on HIVE-6093:


Thanks [~thejas]

 table creation should fail when user does not have permissions on db
 

 Key: HIVE-6093
 URL: https://issues.apache.org/jira/browse/HIVE-6093
 Project: Hive
  Issue Type: Bug
  Components: Authorization, HCatalog, Metastore
Affects Versions: 0.12.0, 0.13.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
Priority: Minor
  Labels: authorization, metastore, security
 Fix For: 0.14.0

 Attachments: HIVE-6093-1.patch, HIVE-6093.1.patch, HIVE-6093.1.patch, 
 HIVE-6093.patch


 Its possible to create a table under a database where the user does not have 
 write permission. It can be done by specifying a LOCATION where the user has 
 write access (say /tmp/foo). This should be restricted.
 HdfsAuthorizationProvider (which typically runs on client) checks the 
 database directory during table creation. But 
 StorageBasedAuthorizationProvider does not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7773:
---

Attachment: HIVE-7773.2-spark.patch

Same patch, I just removed the section commented out in IOContext

 Union all query finished with errors [Spark Branch]
 ---

 Key: HIVE-7773
 URL: https://issues.apache.org/jira/browse/HIVE-7773
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Priority: Critical
 Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch


 When I run a union all query, I found the following error in spark log (the 
 query finished with correct results though):
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 {noformat}
 Judging from the log, I think we don't properly handle the input paths when 
 cloning the job conf, so it may also affect other queries with multiple maps 
 or reduces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7773) Union all query finished with errors [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102374#comment-14102374
 ] 

Brock Noland commented on HIVE-7773:


Hi [~lirui], yes thank you very much for updating IOContext. I have removed the 
section of code which was commented out. I also hit that issue when looking at 
joins! FYI [~szehon]

+1 pending tests

 Union all query finished with errors [Spark Branch]
 ---

 Key: HIVE-7773
 URL: https://issues.apache.org/jira/browse/HIVE-7773
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Rui Li
Priority: Critical
 Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch


 When I run a union all query, I found the following error in spark log (the 
 query finished with correct results though):
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 {noformat}
 Judging from the log, I think we don't properly handle the input paths when 
 cloning the job conf, so it may also affect other queries with multiple maps 
 or reduces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7769) add --SORT_BEFORE_DIFF to union all .q tests

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102383#comment-14102383
 ] 

Brock Noland commented on HIVE-7769:


+1

 add --SORT_BEFORE_DIFF to union all .q tests
 

 Key: HIVE-7769
 URL: https://issues.apache.org/jira/browse/HIVE-7769
 Project: Hive
  Issue Type: Bug
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7769.patch


 Some union all test cases do not generate deterministic ordered result. We 
 need to add  --SORT_BEFORE_DIFF to those .q tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7773:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 Union all query finished with errors [Spark Branch]
 ---

 Key: HIVE-7773
 URL: https://issues.apache.org/jira/browse/HIVE-7773
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Priority: Critical
 Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch


 When I run a union all query, I found the following error in spark log (the 
 query finished with correct results though):
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 {noformat}
 Judging from the log, I think we don't properly handle the input paths when 
 cloning the job conf, so it may also affect other queries with multiple maps 
 or reduces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102460#comment-14102460
 ] 

Brock Noland edited comment on HIVE-4629 at 8/19/14 4:58 PM:
-

Hi Dong,

I tried posting this on RB but it went down. Thank you very much for removing 
the thrift enum compatibility problem! I had another comment with regards to 
the method signature which I think I did not explain well. I think the new 
method should be:

{noformat}
FetchResultsResponse fetchResults(FetchResultsRequest) throws ...
{noformat}

The problem with how we've defined RPC methods to date has led to an explosion 
of RPC methods which is problematic. This is described in [more detail in this 
thread|http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E].

Let me know what you think!!

Cheers,
Brock


was (Author: brocknoland):
Hi Dong,

I tried posting this on RB but it went down. Thank you very much for removing 
the thrift enum compatibility problem! I had another comment with regards to 
the method signature which I think I did not explain well. I think the new 
method should be:

{noformat}
FetchResultsResponse fetchResults(FetchResultsRequest) throws ...
{noformat}

The problem with how we've defined RPC methods to date has led to an explosion 
of RPC methods which is problematic. This is described in [more detail in this 
thread|http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E
].

Let me know what you think!!

Cheers,
Brock

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102460#comment-14102460
 ] 

Brock Noland commented on HIVE-4629:


Hi Dong,

I tried posting this on RB but it went down. Thank you very much for removing 
the thrift enum compatibility problem! I had another comment with regards to 
the method signature which I think I did not explain well. I think the new 
method should be:

{noformat}
FetchResultsResponse fetchResults(FetchResultsRequest) throws ...
{noformat}

The problem with how we've defined RPC methods to date has led to an explosion 
of RPC methods which is problematic. This is described in [more detail in this 
thread|http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E
].

Let me know what you think!!

Cheers,
Brock

 HS2 should support an API to retrieve query logs
 

 Key: HIVE-4629
 URL: https://issues.apache.org/jira/browse/HIVE-4629
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2
Reporter: Shreepadma Venugopalan
Assignee: Dong Chen
 Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
 HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
 HIVE-4629.5.patch, HIVE-4629.6.patch


 HiveServer2 should support an API to retrieve query logs. This is 
 particularly relevant because HiveServer2 supports async execution but 
 doesn't provide a way to report progress. Providing an API to retrieve query 
 logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7783) QTests throw exception during cleanup

2014-08-19 Thread Ashish Kumar Singh (JIRA)
Ashish Kumar Singh created HIVE-7783:


 Summary: QTests throw exception during cleanup
 Key: HIVE-7783
 URL: https://issues.apache.org/jira/browse/HIVE-7783
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh


qTests during cleanup try to drop tables read only tables and throw exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7783) QTests throw exception during cleanup

2014-08-19 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh reassigned HIVE-7783:


Assignee: Ashish Kumar Singh

 QTests throw exception during cleanup
 -

 Key: HIVE-7783
 URL: https://issues.apache.org/jira/browse/HIVE-7783
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh

 qTests during cleanup try to drop tables read only tables and throw 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7734) Join stats annotation rule is not updating columns statistics correctly

2014-08-19 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7734:
-

   Resolution: Fixed
Fix Version/s: (was: 0.13.0)
   0.14.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Gunther and Brock for the review.

 Join stats annotation rule is not updating columns statistics correctly
 ---

 Key: HIVE-7734
 URL: https://issues.apache.org/jira/browse/HIVE-7734
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7734.1.patch, HIVE-7734.2.patch


 HIVE-7679 is not doing the correct thing. The scale down/up factor updating 
 column stats was wrong as ratio = newRowCount/oldRowCount is always infinite 
 (oldRowCount = 0). The old row count should be retrieved from parent 
 corresponding to the current column whose statistics is being updated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24853: HIVE-7783: QTests throw exception during cleanup

2014-08-19 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24853/
---

Review request for hive.


Repository: hive-git


Description
---

HIVE-7783: QTests throw exception during cleanup


Diffs
-

  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
af4a3e575ca5aea2746a5b51862ff178b59e403d 

Diff: https://reviews.apache.org/r/24853/diff/


Testing
---

Ran a couple of qTests locally.


Thanks,

Ashish Singh



Re: Review Request 24853: HIVE-7783: QTests throw exception during cleanup

2014-08-19 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24853/
---

(Updated Aug. 19, 2014, 5:29 p.m.)


Review request for hive.


Changes
---

Link Hive JIRA.


Bugs: HIVE-7783
https://issues.apache.org/jira/browse/HIVE-7783


Repository: hive-git


Description
---

HIVE-7783: QTests throw exception during cleanup


Diffs
-

  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
af4a3e575ca5aea2746a5b51862ff178b59e403d 

Diff: https://reviews.apache.org/r/24853/diff/


Testing
---

Ran a couple of qTests locally.


Thanks,

Ashish Singh



[jira] [Updated] (HIVE-7571) RecordUpdater should read virtual columns from row

2014-08-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7571:
-

Status: Patch Available  (was: Open)

 RecordUpdater should read virtual columns from row
 --

 Key: HIVE-7571
 URL: https://issues.apache.org/jira/browse/HIVE-7571
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch


 Currently RecordUpdater.update and delete take rowid and original transaction 
 as parameters.  These values are already present in the row as part of the 
 new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer 
 from there.  And the writer will already have to handle skipping ROW__ID when 
 writing, so it needs to be aware of that column anyone.
 We could instead read the values from ROW__ID and then remove it from the 
 object inspector in FileSinkOperator, but this will be hard in the 
 vectorization case where rows are being dealt with 10k at a time.
 For these reasons it makes more sense to do this work in the writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7783) QTests throw exception during cleanup

2014-08-19 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7783:
-

Attachment: HIVE-7783.patch

RB: https://reviews.apache.org/r/24853/

 QTests throw exception during cleanup
 -

 Key: HIVE-7783
 URL: https://issues.apache.org/jira/browse/HIVE-7783
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7783.patch


 qTests during cleanup try to drop tables read only tables and throw 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7571) RecordUpdater should read virtual columns from row

2014-08-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-7571:
-

Attachment: HIVE-7571.patch

A patch that changes the RecordUpdater interface to assume that the ROWID is 
passed as a virtual column.  This changes the update and delete calls to no 
longer explicitly ask for transaction id and row id in the interface.

 RecordUpdater should read virtual columns from row
 --

 Key: HIVE-7571
 URL: https://issues.apache.org/jira/browse/HIVE-7571
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch


 Currently RecordUpdater.update and delete take rowid and original transaction 
 as parameters.  These values are already present in the row as part of the 
 new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer 
 from there.  And the writer will already have to handle skipping ROW__ID when 
 writing, so it needs to be aware of that column anyone.
 We could instead read the values from ROW__ID and then remove it from the 
 object inspector in FileSinkOperator, but this will be hard in the 
 vectorization case where rows are being dealt with 10k at a time.
 For these reasons it makes more sense to do this work in the writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO

2014-08-19 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created HIVE-7784:
-

 Summary: Created the needed indexes on Hive.PART_COL_STATS for CBO 
 Key: HIVE-7784
 URL: https://issues.apache.org/jira/browse/HIVE-7784
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0


With CBO we need the correct set of indexes to provide an efficient Read/Write 
access.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24853: HIVE-7783: QTests throw exception during cleanup

2014-08-19 Thread Venki Korukanti

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24853/#review50985
---



itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java
https://reviews.apache.org/r/24853/#comment88870

Isn't this fixed by HIVE-7684? Are you still seeing errors on trunk?


- Venki Korukanti


On Aug. 19, 2014, 5:29 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24853/
 ---
 
 (Updated Aug. 19, 2014, 5:29 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7783
 https://issues.apache.org/jira/browse/HIVE-7783
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7783: QTests throw exception during cleanup
 
 
 Diffs
 -
 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
 af4a3e575ca5aea2746a5b51862ff178b59e403d 
 
 Diff: https://reviews.apache.org/r/24853/diff/
 
 
 Testing
 ---
 
 Ran a couple of qTests locally.
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Updated] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO

2014-08-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7784:
--

Description: 
With CBO we need the correct set of indexes to provide an efficient Read/Write 
access.
These indexes improve performance of Explain plan and Analyzed table by 60% and 
300%.

{code}
MySQL 
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
(DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE;

MsSQL
CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
(DB_NAME,TABLE_NAME,COLUMN_NAME);

Oracle 
CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
(DB_NAME,TABLE_NAME,COLUMN_NAME);

Postgres
CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree 
(DB_NAME,TABLE_NAME,COLUMN_NAME);
{code}


  was:
With CBO we need the correct set of indexes to provide an efficient Read/Write 
access.




 Created the needed indexes on Hive.PART_COL_STATS for CBO 
 --

 Key: HIVE-7784
 URL: https://issues.apache.org/jira/browse/HIVE-7784
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0


 With CBO we need the correct set of indexes to provide an efficient 
 Read/Write access.
 These indexes improve performance of Explain plan and Analyzed table by 60% 
 and 300%.
 {code}
 MySQL 
  CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE;
 MsSQL
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Oracle 
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Postgres
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24853: HIVE-7783: QTests throw exception during cleanup

2014-08-19 Thread Ashish Singh


 On Aug. 19, 2014, 5:37 p.m., Venki Korukanti wrote:
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java, line 629
  https://reviews.apache.org/r/24853/diff/1/?file=664330#file664330line629
 
  Isn't this fixed by HIVE-7684? Are you still seeing errors on trunk?

Ahh.. I had older versoin of trunk. Good to know, this is already fixed. Thanks.


- Ashish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24853/#review50985
---


On Aug. 19, 2014, 5:29 p.m., Ashish Singh wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/24853/
 ---
 
 (Updated Aug. 19, 2014, 5:29 p.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-7783
 https://issues.apache.org/jira/browse/HIVE-7783
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-7783: QTests throw exception during cleanup
 
 
 Diffs
 -
 
   itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 
 af4a3e575ca5aea2746a5b51862ff178b59e403d 
 
 Diff: https://reviews.apache.org/r/24853/diff/
 
 
 Testing
 ---
 
 Ran a couple of qTests locally.
 
 
 Thanks,
 
 Ashish Singh
 




[jira] [Updated] (HIVE-7771) ORC PPD fails for some decimal predicates

2014-08-19 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7771:
-

Attachment: HIVE-7771.2.patch

[~daijy] I added BigDecimal support in SARG creation in this patch. 

 ORC PPD fails for some decimal predicates
 -

 Key: HIVE-7771
 URL: https://issues.apache.org/jira/browse/HIVE-7771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch


 Some queries like 
 {code}
 select * from table where dcol=11.22BD;
 {code}
 fails when ORC predicate pushdown is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7783) QTests throw exception during cleanup

2014-08-19 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh resolved HIVE-7783.
--

Resolution: Duplicate

Fixed by HIVE-7684.

 QTests throw exception during cleanup
 -

 Key: HIVE-7783
 URL: https://issues.apache.org/jira/browse/HIVE-7783
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7783.patch


 qTests during cleanup try to drop tables read only tables and throw 
 exceptions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24834: HIVE-7771: ORC PPD fails for some decimal predicates

2014-08-19 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24834/
---

(Updated Aug. 19, 2014, 5:51 p.m.)


Review request for hive, Gopal V and Gunther Hagleitner.


Changes
---

Added support for BigDecimal in SARG construction.


Repository: hive-git


Description
---

Some queries like 
{code}
select * from table where dcol=11.22BD;
{code}
fails when ORC predicate pushdown is enabled.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java f5023bb 
  ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java 2c53f65 
  ql/src/test/queries/clientpositive/orc_ppd_decimal.q a93590e 
  ql/src/test/results/clientpositive/orc_ppd_decimal.q.out 0c11ea8 

Diff: https://reviews.apache.org/r/24834/diff/


Testing
---


Thanks,

Prasanth_J



[jira] [Commented] (HIVE-7771) ORC PPD fails for some decimal predicates

2014-08-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102532#comment-14102532
 ] 

Prasanth J commented on HIVE-7771:
--

[~daijy] The predicate object will support BigDecimal now. However during SARG 
evaluation BigDecimal will be converted to HiveDecimal to match the type of 
decimal column statistics in ORC. 

 ORC PPD fails for some decimal predicates
 -

 Key: HIVE-7771
 URL: https://issues.apache.org/jira/browse/HIVE-7771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch


 Some queries like 
 {code}
 select * from table where dcol=11.22BD;
 {code}
 fails when ORC predicate pushdown is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Dropping support for JDK6 in Apache Hadoop

2014-08-19 Thread Arun C Murthy
[Apologies for the wide distribution.]

Dear HBase/Hive/Pig/Oozie communities,

 We, over at Hadoop are considering dropping support for JDK6 this year. 

 As you maybe aware we just released hadoop-2.5.0 and are now considering 
making the next release i.e. hadoop-2.6.0 the *last* release of Apache Hadoop 
which supports JDK6. This means, from hadoop-2.7.0 onwards we will not support 
JDK6 anymore and we *may* start relying on JDK7-specific apis.

 Now, the above releases a proposal and we do not want to pull the trigger 
without talking to projects downstream - hence the request for you feedback.

 Please feel free to forward this to other communities you might deem to be at 
risk from this too.

thanks,
Arun


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Review Request 24834: HIVE-7771: ORC PPD fails for some decimal predicates

2014-08-19 Thread j . prasanth . j

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24834/
---

(Updated Aug. 19, 2014, 5:59 p.m.)


Review request for hive, Gopal V and Gunther Hagleitner.


Changes
---

Added unit test for big decimal support in search argument.


Repository: hive-git


Description
---

Some queries like 
{code}
select * from table where dcol=11.22BD;
{code}
fails when ORC predicate pushdown is enabled.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java f5023bb 
  ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java 2c53f65 
  ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java 
b1524f7 
  ql/src/test/queries/clientpositive/orc_ppd_decimal.q a93590e 
  ql/src/test/results/clientpositive/orc_ppd_decimal.q.out 0c11ea8 

Diff: https://reviews.apache.org/r/24834/diff/


Testing
---


Thanks,

Prasanth_J



[jira] [Updated] (HIVE-7771) ORC PPD fails for some decimal predicates

2014-08-19 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7771:
-

Attachment: HIVE-7771.3.patch

Added unit test for big decimal support in search argument.

 ORC PPD fails for some decimal predicates
 -

 Key: HIVE-7771
 URL: https://issues.apache.org/jira/browse/HIVE-7771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch, HIVE-7771.3.patch


 Some queries like 
 {code}
 select * from table where dcol=11.22BD;
 {code}
 fails when ORC predicate pushdown is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-08-19 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--

Attachment: HIVE-6361.patch

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde
 Attachments: HIVE-6361.patch


 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-08-19 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--

Attachment: (was: HIVE-6361.patch)

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde

 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-08-19 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--

Status: Open  (was: Patch Available)

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde

 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-08-19 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--

Status: Patch Available  (was: Open)

Attaching re-based patch. Commit db5e7181d329331d76d6d53741beb87b44f5263f, 
parent commit 253a869dc62c7d36b1020a70932ddd35cb44cb81.

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde

 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7771) ORC PPD fails for some decimal predicates

2014-08-19 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102564#comment-14102564
 ] 

Daniel Dai commented on HIVE-7771:
--

+1, works for me now.

 ORC PPD fails for some decimal predicates
 -

 Key: HIVE-7771
 URL: https://issues.apache.org/jira/browse/HIVE-7771
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch, HIVE-7771.3.patch


 Some queries like 
 {code}
 select * from table where dcol=11.22BD;
 {code}
 fails when ORC predicate pushdown is enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]

2014-08-19 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102589#comment-14102589
 ] 

Venki Korukanti commented on HIVE-7747:
---

When I run the spark job locally (spark.master=local), it completes 
successfully. It repros only on the Spark cluster. Similar exception is seen in 
HIVE-7437. HIVE-7437 suggested shading jetty/servlet classes, but I still see 
the same exception.

 Submitting a query to Spark from HiveServer2 fails [Spark Branch]
 -

 Key: HIVE-7747
 URL: https://issues.apache.org/jira/browse/HIVE-7747
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch


 {{spark.serializer}} is set to 
 {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine 
 from Hive CLI.
 Spark tasks fails with following error:
 {code}
 Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most 
 recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): 
 java.lang.IllegalStateException: unread block data
 
 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7774) Issues with location path for temporary external tables

2014-08-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102593#comment-14102593
 ] 

Ashutosh Chauhan commented on HIVE-7774:


+1

 Issues with location path for temporary external tables
 ---

 Key: HIVE-7774
 URL: https://issues.apache.org/jira/browse/HIVE-7774
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-7774.1.patch


 Depending on the location string passed into temp external table, a query 
 requiring a map/reduce job will fail.  Example:
 {noformat}
 create temporary external table tmp1 (c1 string) location '/tmp/tmp1';
 describe extended tmp1;
 select count(*) from tmp1;
 {noformat}
 Will result in the following error:
 {noformat}
 Diagnostic Messages for this Task:
 Error: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 9 more
 Caused by: java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
   at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
   ... 14 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
   ... 17 more
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
   ... 22 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:123)
   ... 22 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration 
 and input path are inconsistent
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:398)
   ... 23 more
 FAILED: Execution Error, return code 2 from 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 {noformat}
 If the location is set to 'hdfs:/tmp/tmp1', it gets the following error:
 {noformat}
 java.io.IOException: cannot find dir = 
 hdfs://node-1.example.com:8020/tmp/tmp1/tmp1.txt in pathToPartitionInfo: 
 [hdfs:/tmp/tmp1]
   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:344)
   at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:306)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:108)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:455)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
   at 
 

[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-08-19 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--

Status: Open  (was: Patch Available)

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde
 Attachments: HIVE-6361.patch


 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6361) Un-fork Sqlline

2014-08-19 Thread Julian Hyde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde updated HIVE-6361:
--

Status: Patch Available  (was: Open)

 Un-fork Sqlline
 ---

 Key: HIVE-6361
 URL: https://issues.apache.org/jira/browse/HIVE-6361
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.12.0
Reporter: Julian Hyde
Assignee: Julian Hyde
 Attachments: HIVE-6361.patch


 I propose to merge the two development forks of sqlline: Hive's beeline 
 module, and the fork at https://github.com/julianhyde/sqlline.
 How did the forks come about? Hive’s SQL command-line interface Beeline was 
 created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it 
 was a useful but low-activity project languishing on SourceForge without an 
 active owner. Around the same time, Julian Hyde independently started a 
 github repo based on the same code base. Now several projects are using 
 Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading 
 Lingual and Optiq.
 Merging these two forks will allow us to pool our resources. (Case in point: 
 Drill issue DRILL-327 had already been fixed in a later version of sqlline; 
 it still exists in beeline.)
 I propose the following steps:
 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline.
 2. Port fixes to hive-beeline into hive-sqlline.
 3. Make hive-beeline depend on hive-sqlline, and remove code that is 
 identical. What remains in the hive-beeline module is Beeline.java (a derived 
 class of Sqlline.java) and Hive-specific extensions.
 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline.
 This achieves continuity for Hive’s users, gives the users of the non-Hive 
 sqlline a version with minimal dependencies, unifies the two code lines, and 
 brings everything under the Apache roof.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]

2014-08-19 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7593:
---

Attachment: HIVE-7593.1-spark.patch

 Instantiate SparkClient per user session [Spark Branch]
 ---

 Key: HIVE-7593
 URL: https://issues.apache.org/jira/browse/HIVE-7593
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Chinna Rao Lalam
 Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch


 SparkContext is the main class via which Hive talk to Spark cluster. 
 SparkClient encapsulates a SparkContext instance. Currently all user sessions 
 share a single SparkClient instance in HiveServer2. While this is good enough 
 for a POC, even for our first two milestones, this is not desirable for a 
 multi-tenancy environment and gives least flexibility to Hive users. Here is 
 what we propose:
 1. Have a SparkClient instance per user session. The SparkClient instance is 
 created when user executes its first query in the session. It will get 
 destroyed when user session ends.
 2. The SparkClient is instantiated based on the spark configurations that are 
 available to the user, including those defined at the global level and those 
 overwritten by the user (thru set command, for instance).
 3. Ideally, when user changes any spark configuration during the session, the 
 old SparkClient instance should be destroyed and a new one based on the new 
 configurations is created. This may turn out to be a little hard, and thus 
 it's a nice-to-have. If not implemented, we need to document that 
 subsequent configuration changes will not take effect in the current session.
 Please note that there is a thread-safety issue on Spark side where multiple 
 SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need 
 to work with Spark community to get this addressed.
 Besides above functional requirements, avoid potential issues is also a 
 consideration. For instance, sharing SC among users is bad, as resources 
 (such as jar for UDF) will be also shared, which is problematic. On the other 
 hand, one SC per job seems too expensive, as the resource needs to be 
 re-rendered even there isn't any change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7281) DbTxnManager acquiring wrong level of lock for dynamic partitioning

2014-08-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102621#comment-14102621
 ] 

Ashutosh Chauhan commented on HIVE-7281:


Patch looks fine. 
But, I wonder if Entity type of DummyPartition make sense at all. It seems this 
entity is created only in DP case to be used for locking and authorization 
purposes. And since in locking case (as argued in this ticket) as well as auth 
case (probably) it make sense to use Table entity. I dont see what useful 
purpose DummyPartition serves. On the contrary, it results in confusion like 
the topic of this jira. Shall we just delete this DummyPartition entity. cc: 
[~thejas]

 DbTxnManager acquiring wrong level of lock for dynamic partitioning
 ---

 Key: HIVE-7281
 URL: https://issues.apache.org/jira/browse/HIVE-7281
 Project: Hive
  Issue Type: Bug
  Components: Locking, Transactions
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7281.patch


 Currently DbTxnManager.acquireLocks() locks the DUMMY_PARTITION for dynamic 
 partitioning.  But this is not adequate.  This will not prevent drop 
 operations on partitions being written to.  The lock should be at the table 
 level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO

2014-08-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7784:
--

Attachment: HIVE-7784.1.patch

 Created the needed indexes on Hive.PART_COL_STATS for CBO 
 --

 Key: HIVE-7784
 URL: https://issues.apache.org/jira/browse/HIVE-7784
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7784.1.patch


 With CBO we need the correct set of indexes to provide an efficient 
 Read/Write access.
 These indexes improve performance of Explain plan and Analyzed table by 60% 
 and 300%.
 {code}
 MySQL 
  CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE;
 MsSQL
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Oracle 
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Postgres
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7735) Implement Char, Varchar in ParquetSerDe

2014-08-19 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102633#comment-14102633
 ] 

Szehon Ho commented on HIVE-7735:
-

Hi Mohit, the patch became stale after change to Virtual Column.  Can you 
please rebase?

 Implement Char, Varchar in ParquetSerDe
 ---

 Key: HIVE-7735
 URL: https://issues.apache.org/jira/browse/HIVE-7735
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
  Labels: Parquet
 Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, 
 HIVE-7735.2.patch, HIVE-7735.patch


 This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe.
 Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7769) add --SORT_BEFORE_DIFF to union all .q tests

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7769:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you for your contribution! I have committed this to trunk!

 add --SORT_BEFORE_DIFF to union all .q tests
 

 Key: HIVE-7769
 URL: https://issues.apache.org/jira/browse/HIVE-7769
 Project: Hive
  Issue Type: Bug
Reporter: Na Yang
Assignee: Na Yang
 Fix For: 0.14.0

 Attachments: HIVE-7769.patch


 Some union all test cases do not generate deterministic ordered result. We 
 need to add  --SORT_BEFORE_DIFF to those .q tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7571) RecordUpdater should read virtual columns from row

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102668#comment-14102668
 ] 

Hive QA commented on HIVE-7571:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662789/HIVE-7571.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5819 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/403/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/403/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-403/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662789

 RecordUpdater should read virtual columns from row
 --

 Key: HIVE-7571
 URL: https://issues.apache.org/jira/browse/HIVE-7571
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 0.13.0
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch


 Currently RecordUpdater.update and delete take rowid and original transaction 
 as parameters.  These values are already present in the row as part of the 
 new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer 
 from there.  And the writer will already have to handle skipping ROW__ID when 
 writing, so it needs to be aware of that column anyone.
 We could instead read the values from ROW__ID and then remove it from the 
 object inspector in FileSinkOperator, but this will be hard in the 
 vectorization case where rows are being dealt with 10k at a time.
 For these reasons it makes more sense to do this work in the writer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7717) Add .q tests coverage for union all [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102675#comment-14102675
 ] 

Brock Noland commented on HIVE-7717:


I merged HIVE-7769 into the branch!

 Add .q tests coverage for union all [Spark Branch]
 

 Key: HIVE-7717
 URL: https://issues.apache.org/jira/browse/HIVE-7717
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7717.1-spark.patch, HIVE-7717.2-spark.patch


 Add automation test coverage for union all, by searching through the 
 q-tests in ql/src/test/queries/clientpositive/ for union tests (like 
 union*.q) and verifying/enabling them on spark.
 Steps to do:
 1.  Enable a qtest q-test-name.q in 
 itests/src/test/resources/testconfiguration.properties by adding the .q test 
 files to spark.query.files.
 2.  Run mvn test -Dtest=TestSparkCliDriver -Dqfile=q-test-name.q 
 -Dtest.output.overwrite=true -Phadoop-2 to generate the output (located in 
 ql/src/test/results/clientpositive/spark).  File will be called 
 q-test-name.q.out.
 3.  Check the generated output is good by verifying the results.  For 
 comparison, check the MR version in 
 ql/src/test/results/clientpositive/q-test-name.q.out.  The reason its 
 separate is because the explain plan outputs are different for Spark/MR.
 4.  Checkin the modification to testconfiguration.properties, and the 
 generated q.out file as well.  You only have to generate the output once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7773:
---

Assignee: Rui Li
  Status: Patch Available  (was: Open)

Marking Patch Available


 Union all query finished with errors [Spark Branch]
 ---

 Key: HIVE-7773
 URL: https://issues.apache.org/jira/browse/HIVE-7773
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Critical
 Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch


 When I run a union all query, I found the following error in spark log (the 
 query finished with correct results though):
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 {noformat}
 Judging from the log, I think we don't properly handle the input paths when 
 cloning the job conf, so it may also affect other queries with multiple maps 
 or reduces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102694#comment-14102694
 ] 

Brock Noland commented on HIVE-7781:


+1

 Enable windowing and analytic function qtests.[Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7781.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7779) Support windowing and analytic functions [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7779:
---

Summary: Support windowing and analytic functions [Spark Branch]  (was: 
Support windowing and analytic functions.[Spark Branch])

 Support windowing and analytic functions [Spark Branch]
 ---

 Key: HIVE-7779
 URL: https://issues.apache.org/jira/browse/HIVE-7779
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li

 Verify the functionality and fix found issues, which should include:
 # windowing functions
 # the OVER clause
 # analytic functions



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7781:
---

Summary: Enable windowing and analytic function qtests [Spark Branch]  
(was: Enable windowing and analytic function qtests.[Spark Branch])

 Enable windowing and analytic function qtests [Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Attachments: HIVE-7781.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO

2014-08-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7784:
--

Attachment: HIVE-7784.2.patch

 Created the needed indexes on Hive.PART_COL_STATS for CBO 
 --

 Key: HIVE-7784
 URL: https://issues.apache.org/jira/browse/HIVE-7784
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7784.1.patch, HIVE-7784.2.patch


 With CBO we need the correct set of indexes to provide an efficient 
 Read/Write access.
 These indexes improve performance of Explain plan and Analyzed table by 60% 
 and 300%.
 {code}
 MySQL 
  CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE;
 MsSQL
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Oracle 
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Postgres
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO

2014-08-19 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102698#comment-14102698
 ] 

Mostafa Mokhtar commented on HIVE-7784:
---

[~ashutoshc]

Code review link https://reviews.apache.org/r/24861/diff/#

 Created the needed indexes on Hive.PART_COL_STATS for CBO 
 --

 Key: HIVE-7784
 URL: https://issues.apache.org/jira/browse/HIVE-7784
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7784.1.patch, HIVE-7784.2.patch


 With CBO we need the correct set of indexes to provide an efficient 
 Read/Write access.
 These indexes improve performance of Explain plan and Analyzed table by 60% 
 and 300%.
 {code}
 MySQL 
  CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE;
 MsSQL
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Oracle 
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 Postgres
 CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree 
 (DB_NAME,TABLE_NAME,COLUMN_NAME);
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7781:
---

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you so much for your contribution! I have committed this to spark!!

 Enable windowing and analytic function qtests [Spark Branch]
 

 Key: HIVE-7781
 URL: https://issues.apache.org/jira/browse/HIVE-7781
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7781.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7717) Add .q tests coverage for union all [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102697#comment-14102697
 ] 

Brock Noland commented on HIVE-7717:


Also FYI I committed HIVE-7781 so you'll need to pull the latest HEAD.

 Add .q tests coverage for union all [Spark Branch]
 

 Key: HIVE-7717
 URL: https://issues.apache.org/jira/browse/HIVE-7717
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Na Yang
Assignee: Na Yang
 Attachments: HIVE-7717.1-spark.patch, HIVE-7717.2-spark.patch


 Add automation test coverage for union all, by searching through the 
 q-tests in ql/src/test/queries/clientpositive/ for union tests (like 
 union*.q) and verifying/enabling them on spark.
 Steps to do:
 1.  Enable a qtest q-test-name.q in 
 itests/src/test/resources/testconfiguration.properties by adding the .q test 
 files to spark.query.files.
 2.  Run mvn test -Dtest=TestSparkCliDriver -Dqfile=q-test-name.q 
 -Dtest.output.overwrite=true -Phadoop-2 to generate the output (located in 
 ql/src/test/results/clientpositive/spark).  File will be called 
 q-test-name.q.out.
 3.  Check the generated output is good by verifying the results.  For 
 comparison, check the MR version in 
 ql/src/test/results/clientpositive/q-test-name.q.out.  The reason its 
 separate is because the explain plan outputs are different for Spark/MR.
 4.  Checkin the modification to testconfiguration.properties, and the 
 generated q.out file as well.  You only have to generate the output once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7728) Enable q-tests for TABLESAMPLE feature [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7728:
---

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you very much for your contribution! I have committed this to spark!

 Enable q-tests for TABLESAMPLE feature  [Spark Branch]
 --

 Key: HIVE-7728
 URL: https://issues.apache.org/jira/browse/HIVE-7728
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
 Fix For: spark-branch

 Attachments: HIVE-7728.1-spark.patch


 Enable q-tests for TABLESAMPLE feature since automatic test environment is 
 ready.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]

2014-08-19 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102719#comment-14102719
 ] 

Brock Noland commented on HIVE-7702:


Let's try and add the following tests in this JIRA:

{noformat}
  enforce_order.q,\
  filter_join_breaktask.q,\
  filter_join_breaktask2.q,\
  groupby1.q,\
  groupby2.q,\
  groupby3.q,\
  having.q,\
  insert1.q,\
  insert_into1.q,\
  insert_into2.q,\
{noformat}

 Start running .q file tests on spark [Spark Branch]
 ---

 Key: HIVE-7702
 URL: https://issues.apache.org/jira/browse/HIVE-7702
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Brock Noland
Assignee: Chinna Rao Lalam

 Spark can currently only support a few queries, however there are some .q 
 file tests which will pass today. The basic idea is that we should get some 
 number of these actually working (10-20) so we can actually start testing the 
 project.
 A good starting point might be the udf*, varchar*, or alter* tests:
 https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive
 To generate the output file for test XXX.q, you'd do:
 {noformat}
 mvn clean install -DskipTests -Phadoop-2
 cd itests
 mvn clean install -DskipTests -Phadoop-2
 cd qtest-spark
 mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}
 which would generate XXX.q.out which we can check-in to source control as a 
 golden file.
 Multiple tests can be run at a give time as so:
 {noformat}
 mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true 
 -Phadoop-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2014-08-19 Thread Mostafa Mokhtar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Mokhtar updated HIVE-7723:
--

Attachment: HIVE-7723.5.patch

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
 HIVE-7723.4.patch, HIVE-7723.5.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  cd2.cd_marital_status and
  i_color in 

[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity

2014-08-19 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102728#comment-14102728
 ] 

Mostafa Mokhtar commented on HIVE-7723:
---

[~gopalv]
Link to code review https://reviews.apache.org/r/24864/diff/#

 Explain plan for complex query with lots of partitions is slow due to 
 in-efficient collection used to find a matching ReadEntity
 

 Key: HIVE-7723
 URL: https://issues.apache.org/jira/browse/HIVE-7723
 Project: Hive
  Issue Type: Bug
  Components: CLI, Physical Optimizer
Affects Versions: 0.13.1
Reporter: Mostafa Mokhtar
Assignee: Mostafa Mokhtar
 Fix For: 0.14.0

 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, 
 HIVE-7723.4.patch, HIVE-7723.5.patch


 Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it 
 showed that ReadEntity.equals is taking ~40% of the CPU.
 ReadEntity.equals is called from the snippet below.
 Again and again the set is iterated over to get the actual match, a HashMap 
 is a better option for this case as Set doesn't have a Get method.
 Also for ReadEntity equals is case-insensitive while hash is , which is an 
 undesired behavior.
 {code}
 public static ReadEntity addInput(SetReadEntity inputs, ReadEntity 
 newInput) {
 // If the input is already present, make sure the new parent is added to 
 the input.
 if (inputs.contains(newInput)) {
   for (ReadEntity input : inputs) {
 if (input.equals(newInput)) {
   if ((newInput.getParents() != null)  
 (!newInput.getParents().isEmpty())) {
 input.getParents().addAll(newInput.getParents());
 input.setDirect(input.isDirect() || newInput.isDirect());
   }
   return input;
 }
   }
   assert false;
 } else {
   inputs.add(newInput);
   return newInput;
 }
 // make compile happy
 return null;
   }
 {code}
 This is the query used : 
 {code}
 select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number 
 ,cs1.b_streen_name ,cs1.b_city
  ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city 
 ,cs1.c_zip ,cs1.syear ,cs1.cnt
  ,cs1.s1 ,cs1.s2 ,cs1.s3
  ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt
 from
 (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as 
 store_name
  ,s_zip as store_zip ,ad1.ca_street_number as b_street_number 
 ,ad1.ca_street_name as b_streen_name
  ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as 
 c_street_number
  ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip 
 as c_zip
  ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) 
 as cnt
  ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 
 ,sum(ss_coupon_amt) as s3
   FROM   store_sales
 JOIN store_returns ON store_sales.ss_item_sk = 
 store_returns.sr_item_sk and store_sales.ss_ticket_number = 
 store_returns.sr_ticket_number
 JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
 JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
 JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
 JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
 JOIN store ON store_sales.ss_store_sk = store.s_store_sk
 JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= 
 cd1.cd_demo_sk
 JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = 
 cd2.cd_demo_sk
 JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
 JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = 
 hd1.hd_demo_sk
 JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = 
 hd2.hd_demo_sk
 JOIN customer_address ad1 ON store_sales.ss_addr_sk = 
 ad1.ca_address_sk
 JOIN customer_address ad2 ON customer.c_current_addr_sk = 
 ad2.ca_address_sk
 JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
 JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
 JOIN item ON store_sales.ss_item_sk = item.i_item_sk
 JOIN
  (select cs_item_sk
 ,sum(cs_ext_list_price) as 
 sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund
   from catalog_sales JOIN catalog_returns
   ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
 and catalog_sales.cs_order_number = catalog_returns.cr_order_number
   group by cs_item_sk
   having 
 sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
  cs_ui
 ON store_sales.ss_item_sk = cs_ui.cs_item_sk
   WHERE  
  cd1.cd_marital_status  

[jira] [Created] (HIVE-7785) CBO: Projection Pruning needs to handle cross Joins

2014-08-19 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-7785:


 Summary: CBO: Projection Pruning needs to handle cross Joins
 Key: HIVE-7785
 URL: https://issues.apache.org/jira/browse/HIVE-7785
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran


Projection pruning needs to handle cross joins. 
Ex: select r1.x from r1 join r2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7773) Union all query finished with errors [Spark Branch]

2014-08-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102755#comment-14102755
 ] 

Hive QA commented on HIVE-7773:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12662771/HIVE-7773.2-spark.patch

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5925 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union8
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/62/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/62/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-62/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12662771

 Union all query finished with errors [Spark Branch]
 ---

 Key: HIVE-7773
 URL: https://issues.apache.org/jira/browse/HIVE-7773
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
Priority: Critical
 Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch


 When I run a union all query, I found the following error in spark log (the 
 query finished with correct results though):
 {noformat}
 java.lang.RuntimeException: Map operator initialization failed
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at 
 org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
 at 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input 
 path are inconsistent
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404)
 at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93)
 ... 16 more
 {noformat}
 Judging from the log, I think we don't properly handle the input paths when 
 cloning the job conf, so it may also affect other queries with multiple maps 
 or reduces.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   >