[jira] [Commented] (HIVE-4934) Improve documentation of OVER clause

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080569#comment-14080569
 ] 

Lefty Leverenz commented on HIVE-4934:
--

Thanks [~lars_francke].  Was there a reason for the line break that put 
"SELECT" and "a," on separate lines?  (I removed it to match all the other 
examples, but you can restore it if it has a purpose.)

{code}
SELECT
 a,
 COUNT(b) OVER (PARTITION BY c),
 SUM(b) OVER (PARTITION BY c)
FROM T;
{code}

Also thanks for changing the formatting of code samples.

* [PARTITION BY with partitioning, ORDER BY, and window specification | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-PARTITIONBYwithpartitioning,ORDERBY,andwindowspecification]

> Improve documentation of OVER clause
> 
>
> Key: HIVE-4934
> URL: https://issues.apache.org/jira/browse/HIVE-4934
> Project: Hive
>  Issue Type: Bug
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
>
> {code}
> CREATE TABLE test (foo INT);
> SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
> FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Only COMPLETE mode supported for NTile function
> SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
> ...works...
> {code}
> I'm not sure if that is a bug or necessary. Either way the error message is 
> not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A 
> cursory glance at the code didn't help me either.
> Edit: It is not a bug, it wasn't clear to me that the OVER clause only 
> applies to the directly preceding function.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4933) Document how aliases work with the OVER clause

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080570#comment-14080570
 ] 

Lefty Leverenz commented on HIVE-4933:
--

Thanks [~lars_francke], this is helpful.  (I added a missing "AS" for the b_sum 
alias.)

* [PARTITION BY with partitioning, ORDER BY, and window specification | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics#LanguageManualWindowingAndAnalytics-PARTITIONBYwithpartitioning,ORDERBY,andwindowspecification]

> Document how aliases work with the OVER clause
> --
>
> Key: HIVE-4933
> URL: https://issues.apache.org/jira/browse/HIVE-4933
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
>
> {code}
> CREATE TABLE test (foo INT);
> hive> SELECT SUM(foo) AS bar OVER (PARTITION BY foo) FROM test;
> MismatchedTokenException(175!=110)
>   at 
> org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
>   at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.fromClause(HiveParser_FromClauseParser.java:1424)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.fromClause(HiveParser.java:35998)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:33974)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.regular_body(HiveParser.java:33882)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatement(HiveParser.java:33389)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:33169)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1284)
>   at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983)
>   at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> FAILED: ParseException line 1:20 mismatched input 'OVER' expecting FROM near 
> 'bar' in from clause{code}
> The same happens without the {{AS}} but it works when leaving out the alias 
> entirely.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics

2014-07-30 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7438:


Attachment: (was: hive on spark job statistic design.docx)

> Counters, statistics, and metrics
> -
>
> Key: HIVE-7438
> URL: https://issues.apache.org/jira/browse/HIVE-7438
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: hive on spark job statistic design.docx
>
>
> Hive makes use of MapReduce counters for statistics and possibly for other 
> purposes. For Hive on Spark, we should achieve the same functionality using 
> Spark's accumulators.
> Hive also collects metrics from MapReduce jobs traditionally. Spark job very 
> likely publishes a different set of metrics, which, if made available, would 
> help user to get insights into their spark jobs. Thus, we should obtain the 
> metrics and make them available as we do for MapReduce.
> This task therefore includes 1. identify Hive's existing functionality w.r.t. 
> counters, statistics, and metrics; 2. design and implement the same 
> functionality in Spark.
> Please refer to the design document for more information. 
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics

2014-07-30 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7438:


Attachment: hive on spark job statistic design.docx

> Counters, statistics, and metrics
> -
>
> Key: HIVE-7438
> URL: https://issues.apache.org/jira/browse/HIVE-7438
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: hive on spark job statistic design.docx
>
>
> Hive makes use of MapReduce counters for statistics and possibly for other 
> purposes. For Hive on Spark, we should achieve the same functionality using 
> Spark's accumulators.
> Hive also collects metrics from MapReduce jobs traditionally. Spark job very 
> likely publishes a different set of metrics, which, if made available, would 
> help user to get insights into their spark jobs. Thus, we should obtain the 
> metrics and make them available as we do for MapReduce.
> This task therefore includes 1. identify Hive's existing functionality w.r.t. 
> counters, statistics, and metrics; 2. design and implement the same 
> functionality in Spark.
> Please refer to the design document for more information. 
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080558#comment-14080558
 ] 

Hive QA commented on HIVE-7554:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658658/HIVE-7554.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5857 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/116/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-116/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658658

> Parquet Hive should resolve column names in case insensitive manner
> ---
>
> Key: HIVE-7554
> URL: https://issues.apache.org/jira/browse/HIVE-7554
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-7554.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7567) support automatic calculating reduce task number

2014-07-30 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li reassigned HIVE-7567:
---

Assignee: Chengxiang Li

> support automatic calculating reduce task number
> 
>
> Key: HIVE-7567
> URL: https://issues.apache.org/jira/browse/HIVE-7567
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>
> Hive have its own machenism to calculate reduce task number, we need to 
> implement it on spark job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7438) Counters, statistics, and metrics

2014-07-30 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-7438:


Attachment: hive on spark job statistic design.docx

Add a design doc for hive on spark job statistic collection.

> Counters, statistics, and metrics
> -
>
> Key: HIVE-7438
> URL: https://issues.apache.org/jira/browse/HIVE-7438
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chengxiang Li
> Attachments: hive on spark job statistic design.docx
>
>
> Hive makes use of MapReduce counters for statistics and possibly for other 
> purposes. For Hive on Spark, we should achieve the same functionality using 
> Spark's accumulators.
> Hive also collects metrics from MapReduce jobs traditionally. Spark job very 
> likely publishes a different set of metrics, which, if made available, would 
> help user to get insights into their spark jobs. Thus, we should obtain the 
> metrics and make them available as we do for MapReduce.
> This task therefore includes 1. identify Hive's existing functionality w.r.t. 
> counters, statistics, and metrics; 2. design and implement the same 
> functionality in Spark.
> Please refer to the design document for more information. 
> https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark#HiveonSpark-CountersandMetrics



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-07-30 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080527#comment-14080527
 ] 

Mithun Radhakrishnan commented on HIVE-7223:


I should have the patch up for this change tomorrow. I'll only deal with the 
Thrift/Hive changes in this JIRA. The corresponding changes to HCatClient will 
go up on a separate JIRA, so as not to clash with HIVE-7341.

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7330) Create SparkTask

2014-07-30 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-7330:
---

Attachment: HIVE-7330-spark.patch

> Create SparkTask
> 
>
> Key: HIVE-7330
> URL: https://issues.apache.org/jira/browse/HIVE-7330
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-7330-spark.patch
>
>
> SparkTask handles the execution of SparkWork. It will execute a graph of map 
> and reduce work using a SparkClient instance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-07-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Status: Patch Available  (was: Open)

> Support for Table replication across HCatalog instances
> ---
>
> Key: HIVE-7341
> URL: https://issues.apache.org/jira/browse/HIVE-7341
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 0.14.0
>
> Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch
>
>
> The HCatClient currently doesn't provide very much support for replicating 
> HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
> instances. 
> Systems similar to Apache Falcon might find the need to replicate partition 
> data between 2 clusters, and keep the HCatalog metadata in sync between the 
> two. This poses a couple of problems:
> # The definition of the source table might change (in column schema, I/O 
> formats, record-formats, serde-parameters, etc.) The system will need a way 
> to diff 2 tables and update the target-metastore with the changes. E.g. 
> {code}
> targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
> hcatClient.updateTableSchema(dbName, tableName, targetTable);
> {code}
> # The current {{HCatClient.addPartitions()}} API requires that the 
> partition's schema be derived from the table's schema, thereby requiring that 
> the table-schema be resolved *before* partitions with the new schema are 
> added to the table. This is problematic, because it introduces race 
> conditions when 2 partitions with differing column-schemas (e.g. right after 
> a schema change) are copied in parallel. This can be avoided if each 
> HCatAddPartitionDesc kept track of the partition's schema, in flight.
> # The source and target metastores might be running different/incompatible 
> versions of Hive. 
> The impending patch attempts to address these concerns (with some caveats).
> # {{HCatTable}} now has 
> ## a {{diff()}} method, to compare against another HCatTable instance
> ## a {{resolve(diff)}} method to copy over specified table-attributes from 
> another HCatTable
> ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
> {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
> in other class-loaders may be used for comparison
> # {{HCatPartition}} now provides finer-grained control over a Partition's 
> column-schema, StorageDescriptor settings, etc. This allows partitions to be 
> copied completely from source, with the ability to override specific 
> properties if required (e.g. location).
> # {{HCatClient.updateTableSchema()}} can now update the entire 
> table-definition, not just the column schema.
> # I've cleaned up and removed most of the redundancy between the HCatTable, 
> HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
> separate the table-attributes from the add-table-operation's attributes. By 
> providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
> in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
> deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
> HCatAddPartitionDesc.
> I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-7390:
---

Attachment: HIVE-7390.5.patch

> Make quote character optional and configurable in BeeLine CSV/TSV output
> 
>
> Key: HIVE-7390
> URL: https://issues.apache.org/jira/browse/HIVE-7390
> Project: Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.13.1
>Reporter: Jim Halfpenny
>Assignee: Ferdinand Xu
> Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
> HIVE-7390.4.patch, HIVE-7390.5.patch, HIVE-7390.patch
>
>
> Currently when either the CSV or TSV output formats are used in beeline each 
> column is wrapped in single quotes. Quote wrapping of columns should be 
> optional and the user should be able to choose the character used to wrap the 
> columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7341) Support for Table replication across HCatalog instances

2014-07-30 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7341:
---

Attachment: HIVE-7341.2.patch

Improved patch, to ensure deprecated APIs still function.

{{HCatAddPartitionDesc.create(db, table, location, partKeyValMap)}} doesn't 
throw an UnsupportedException now.

> Support for Table replication across HCatalog instances
> ---
>
> Key: HIVE-7341
> URL: https://issues.apache.org/jira/browse/HIVE-7341
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Fix For: 0.14.0
>
> Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch
>
>
> The HCatClient currently doesn't provide very much support for replicating 
> HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) 
> instances. 
> Systems similar to Apache Falcon might find the need to replicate partition 
> data between 2 clusters, and keep the HCatalog metadata in sync between the 
> two. This poses a couple of problems:
> # The definition of the source table might change (in column schema, I/O 
> formats, record-formats, serde-parameters, etc.) The system will need a way 
> to diff 2 tables and update the target-metastore with the changes. E.g. 
> {code}
> targetTable.resolve( sourceTable, targetTable.diff(sourceTable) );
> hcatClient.updateTableSchema(dbName, tableName, targetTable);
> {code}
> # The current {{HCatClient.addPartitions()}} API requires that the 
> partition's schema be derived from the table's schema, thereby requiring that 
> the table-schema be resolved *before* partitions with the new schema are 
> added to the table. This is problematic, because it introduces race 
> conditions when 2 partitions with differing column-schemas (e.g. right after 
> a schema change) are copied in parallel. This can be avoided if each 
> HCatAddPartitionDesc kept track of the partition's schema, in flight.
> # The source and target metastores might be running different/incompatible 
> versions of Hive. 
> The impending patch attempts to address these concerns (with some caveats).
> # {{HCatTable}} now has 
> ## a {{diff()}} method, to compare against another HCatTable instance
> ## a {{resolve(diff)}} method to copy over specified table-attributes from 
> another HCatTable
> ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and 
> {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed 
> in other class-loaders may be used for comparison
> # {{HCatPartition}} now provides finer-grained control over a Partition's 
> column-schema, StorageDescriptor settings, etc. This allows partitions to be 
> copied completely from source, with the ability to override specific 
> properties if required (e.g. location).
> # {{HCatClient.updateTableSchema()}} can now update the entire 
> table-definition, not just the column schema.
> # I've cleaned up and removed most of the redundancy between the HCatTable, 
> HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to 
> separate the table-attributes from the add-table-operation's attributes. By 
> providing fluent-interfaces in HCatTable, and composing an HCatTable instance 
> in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are 
> deprecated, in favour of those in HCatTable. Likewise, HCatPartition and 
> HCatAddPartitionDesc.
> I'll post a patch for trunk shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23799: HIVE-7390: refactor csv output format with in RFC mode and add one more option to support formatting as the csv format in hive cli

2014-07-30 Thread cheng xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23799/
---

(Updated July 31, 2014, 5:34 a.m.)


Review request for hive.


Changes
---

(1) fix code style issues
(2) add option to specify the delimiter for DSV format
(3) add delimiter-separated values format support


Bugs: HIVE-7390
https://issues.apache.org/jira/browse/HIVE-7390


Repository: hive-git


Description
---

HIVE-7390: refactor csv output format with in RFC mode and add one more option 
to support formatting as the csv format in hive cli


Diffs (updated)
-

  beeline/pom.xml 6ec1d1aff3f35c097aa6054aae84faf2d63854f1 
  beeline/src/java/org/apache/hive/beeline/BeeLine.java 
528a98e29c23421f9352bdf7c5edd3a9fae0e3ea 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 
75f7d38cb97fb753a8f39c19488b9ce0a8d77590 
  beeline/src/java/org/apache/hive/beeline/SeparatedValuesOutputFormat.java 
7853c3f38f3c3fb9ae0b9939c714f1dc940ba053 
  beeline/src/main/resources/BeeLine.properties 
390d062b8dc52dfa790c7351f3db44c1e0dd7e37 
  
itests/hive-unit/src/test/java/org/apache/hive/beeline/TestBeeLineWithArgs.java 
bd97aff5959fd9040fc0f0a1f6b782f2aa6f 
  pom.xml b5a5697e6a3b689c2b244ba0338be541261eaa3d 

Diff: https://reviews.apache.org/r/23799/diff/


Testing
---


Thanks,

cheng xu



[jira] [Commented] (HIVE-7348) Beeline could not parse ; separated queries provided with -e option

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080508#comment-14080508
 ] 

Hive QA commented on HIVE-7348:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658787/HIVE-7348.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5857 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.beeline.TestBeelineArgParsing.testQueryScripts
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/115/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/115/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-115/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658787

> Beeline could not parse ; separated queries provided with -e option
> ---
>
> Key: HIVE-7348
> URL: https://issues.apache.org/jira/browse/HIVE-7348
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7348.1.patch, HIVE-7348.patch
>
>
> Beeline could not parse ; separated queries provided with -e option. This 
> works fine on hive cli.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7526) Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

2014-07-30 Thread Chao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao updated HIVE-7526:
---

Attachment: HIVE-7526.4-spark.patch

Hi [~xuefuz], I take your suggestions and proposed another patch. Please take a 
look. Thanks.

> Research to use groupby transformation to replace Hive existing 
> partitionByKey and SparkCollector combination
> -
>
> Key: HIVE-7526
> URL: https://issues.apache.org/jira/browse/HIVE-7526
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chao
> Attachments: HIVE-7526.2.patch, HIVE-7526.3.patch, 
> HIVE-7526.4-spark.patch, HIVE-7526.patch
>
>
> Currently SparkClient shuffles data by calling paritionByKey(). This 
> transformation outputs  tuples. However, Hive's ExecMapper 
> expects > tuples, and Spark's groupByKey() seems 
> outputing this directly. Thus, using groupByKey, we may be able to avoid its 
> own key clustering mechanism (in HiveReduceFunction). This research is to 
> have a try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-07-30 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080468#comment-14080468
 ] 

Navis commented on HIVE-7532:
-

It's applied per session. HiveConf in ObjectStore is what is in thread local of 
HMSHandler.

> allow disabling direct sql per query with external metastore
> 
>
> Key: HIVE-7532
> URL: https://issues.apache.org/jira/browse/HIVE-7532
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Navis
> Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
> HIVE-7532.2.patch.txt
>
>
> Currently with external metastore, direct sql can only be disabled via 
> metastore config globally. Perhaps it makes sense to have the ability to 
> propagate the setting per query from client to override the metastore 
> setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080466#comment-14080466
 ] 

Hive QA commented on HIVE-7562:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658783/HIVE-7562.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5857 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/114/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/114/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-114/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658783

> Cleanup ExecReducer
> ---
>
> Key: HIVE-7562
> URL: https://issues.apache.org/jira/browse/HIVE-7562
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-7562.patch
>
>
> ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7567) support automatic calculating reduce task number

2014-07-30 Thread Chengxiang Li (JIRA)
Chengxiang Li created HIVE-7567:
---

 Summary: support automatic calculating reduce task number
 Key: HIVE-7567
 URL: https://issues.apache.org/jira/browse/HIVE-7567
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Chengxiang Li


Hive have its own machenism to calculate reduce task number, we need to 
implement it on spark job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7432) Remove deprecated Avro's Schema.parse usages

2014-07-30 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7432:
-

Attachment: HIVE-7432.1.patch

Parser.parse maintains state and can not be reused. Added a util method to take 
care of creating avro schema from string, file or inputstream. Should take care 
of test failures.

> Remove deprecated Avro's Schema.parse usages
> 
>
> Key: HIVE-7432
> URL: https://issues.apache.org/jira/browse/HIVE-7432
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7432.1.patch, HIVE-7432.patch
>
>
> Schema.parse has been deprecated by Avro, however it is being used at 
> multiple places in Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24081: HIVE-7432: Remove deprecated Avro's Schema.parse usages

2014-07-30 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24081/
---

(Updated July 31, 2014, 3:49 a.m.)


Review request for hive.


Changes
---

Parser.parse maintains state and can not be reused. Add a util method to take 
care of creating avro schema from string, file or inputstream.


Bugs: HIVE-7432
https://issues.apache.org/jira/browse/HIVE-7432


Repository: hive-git


Description
---

HIVE-7432: Remove deprecated Avro's Schema.parse usages


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
60b43888b957fe315720c4ee5562b9b67a07d0e2 
  
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
 b55474331736ecbdeb5958dad9342e132642d889 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
8c5cf3e87078fd87d0dc9b41d9545486d76903f3 
  
serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java 
3dceb6384000e255e87df832f6189c80c636531b 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
915f01679183904d0d93b9b8a88dc1a64ac2af78 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java 
198bd24dcb1c2552fd45b919ecb39ef7a29ed321 
  
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
 76c1940fb05a0c8c6b74d570d6d788829e17de01 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java 
072225dcc80bfdb84f0a31f67693616393c264df 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
67d557082eec88eefdde76cb1fead6d51f7784a4 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java 
f8161da44312c2ad9b4dd2bab2aa242692a42d5a 
  
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
 cf3b16ce65d07c0a714530ef4a26adef7188ea2e 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
8dd61097433ab0c2b1c3e326978bf06337f815e6 
  
serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java
 4b8cc98bfc75ea01f25944d7833f00da1b6911f0 

Diff: https://reviews.apache.org/r/24081/diff/


Testing
---

qTests


Thanks,

Ashish Singh



[jira] [Updated] (HIVE-7565) Fix exception in Greedy Join reordering Algo

2014-07-30 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7565:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks [~jpullokkaran]!

> Fix exception in Greedy Join reordering Algo
> 
>
> Key: HIVE-7565
> URL: https://issues.apache.org/jira/browse/HIVE-7565
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7565.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7436) Load Spark configuration into Hive driver

2014-07-30 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080441#comment-14080441
 ] 

Chengxiang Li commented on HIVE-7436:
-

In Hive on Tez mode, hive driver load tez-site.xml with TezConfiguration, based 
on Configuration.java::addDefaultResource().

> Load Spark configuration into Hive driver
> -
>
> Key: HIVE-7436
> URL: https://issues.apache.org/jira/browse/HIVE-7436
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
> Fix For: spark-branch
>
> Attachments: HIVE-7436-Spark.1.patch, HIVE-7436-Spark.2.patch, 
> HIVE-7436-Spark.3.patch
>
>
> load Spark configuration into Hive driver, there are 3 ways to setup spark 
> configurations:
> #  Java property.
> #  Configure properties in spark configuration file(spark-defaults.conf).
> #  Hive configuration file(hive-site.xml).
> The below configuration has more priority, and would overwrite previous 
> configuration with the same property name.
> Please refer to [http://spark.apache.org/docs/latest/configuration.html] for 
> all configurable properties of spark, and you can configure spark 
> configuration in Hive through following ways:
> # Configure through spark configuration file.
> #* Create spark-defaults.conf, and place it in the /etc/spark/conf 
> configuration directory. configure properties in spark-defaults.conf in java 
> properties format.
> #* Create the $SPARK_CONF_DIR environment variable and set it to the location 
> of spark-defaults.conf.
> export SPARK_CONF_DIR=/etc/spark/conf
> #* Add $SAPRK_CONF_DIR to the $HADOOP_CLASSPATH environment variable.
> export HADOOP_CLASSPATH=$SPARK_CONF_DIR:$HADOOP_CLASSPATH
> # Configure through hive configuration file.
> #* edit hive-site.xml in hive conf directory, configure properties in 
> spark-defaults.conf in xml format.
> Hive driver default spark properties:
> ||name||default value||description||
> |spark.master|local|Spark master url.|
> |spark.app.name|Hive on Spark|Default Spark application name.|
> NO PRECOMMIT TESTS. This is for spark-branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-07-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080426#comment-14080426
 ] 

Sergey Shelukhin commented on HIVE-7532:


Will this change configuration for the query/session, or for entire metastore 
or thread including other users? I thought it should be possible to send for 
individual calls to metastore, which is less clean... but it seems like this 
patch will reconfigure metastore

> allow disabling direct sql per query with external metastore
> 
>
> Key: HIVE-7532
> URL: https://issues.apache.org/jira/browse/HIVE-7532
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Navis
> Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
> HIVE-7532.2.patch.txt
>
>
> Currently with external metastore, direct sql can only be disabled via 
> metastore config globally. Perhaps it makes sense to have the ability to 
> propagate the setting per query from client to override the metastore 
> setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080424#comment-14080424
 ] 

Hive QA commented on HIVE-7547:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658784/HIVE-7547.4.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5859 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/113/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/113/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-113/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658784

> Add ipAddress and userName to ExecHook
> --
>
> Key: HIVE-7547
> URL: https://issues.apache.org/jira/browse/HIVE-7547
> Project: Hive
>  Issue Type: New Feature
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.4.patch, 
> HIVE-7547.patch
>
>
> Auditing tools should be able to know about the ipAddress and userName of the 
> user executing operations.  
> These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080407#comment-14080407
 ] 

Ferdinand Xu commented on HIVE-7390:


Thanks for Lars Francke and Szehon Ho about your comments. For current CSV and 
TSV, just make it work in the right way(quoted at the correct time) and for 
customized delimiter support, I think we can add a new output format called 
DSV(short for Delimiter-separated values) and one beeline option to specify the 
delimiter for user. 

> Make quote character optional and configurable in BeeLine CSV/TSV output
> 
>
> Key: HIVE-7390
> URL: https://issues.apache.org/jira/browse/HIVE-7390
> Project: Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.13.1
>Reporter: Jim Halfpenny
>Assignee: Ferdinand Xu
> Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
> HIVE-7390.4.patch, HIVE-7390.patch
>
>
> Currently when either the CSV or TSV output formats are used in beeline each 
> column is wrapped in single quotes. Quote wrapping of columns should be 
> optional and the user should be able to choose the character used to wrap the 
> columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7565) Fix exception in Greedy Join reordering Algo

2014-07-30 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7565:
-

Status: Patch Available  (was: Open)

> Fix exception in Greedy Join reordering Algo
> 
>
> Key: HIVE-7565
> URL: https://issues.apache.org/jira/browse/HIVE-7565
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7565.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7565) Fix exception in Greedy Join reordering Algo

2014-07-30 Thread Laljo John Pullokkaran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-7565:
-

Attachment: HIVE-7565.patch

> Fix exception in Greedy Join reordering Algo
> 
>
> Key: HIVE-7565
> URL: https://issues.apache.org/jira/browse/HIVE-7565
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Laljo John Pullokkaran
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-7565.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7566) HIVE can't count hbase NULL column value properly

2014-07-30 Thread Kent Kong (JIRA)
Kent Kong created HIVE-7566:
---

 Summary: HIVE can't count hbase NULL column value properly
 Key: HIVE-7566
 URL: https://issues.apache.org/jira/browse/HIVE-7566
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.13.0
 Environment: HIVE version 0.13.0
HBase version 0.98.0
Reporter: Kent Kong


HBase table structure is like this:
table name : 'testtable'
column family : 'data'
column 1 : 'name'
column 2 : 'color'

HIVE mapping table is structure is like this:
table name : 'hb_testtable'
column 1 : 'name'
column 2 : 'color'

in hbase, put two rows
James, blue
May

then do select in hive
select * from hb_testtable where color is null

the result is 
May, NULL

then try count 
select count(*) from hb_testtable where color is null

the result is 0, which should be 1




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7565) Fix exception in Greedy Join reordering Algo

2014-07-30 Thread Laljo John Pullokkaran (JIRA)
Laljo John Pullokkaran created HIVE-7565:


 Summary: Fix exception in Greedy Join reordering Algo
 Key: HIVE-7565
 URL: https://issues.apache.org/jira/browse/HIVE-7565
 Project: Hive
  Issue Type: Sub-task
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf

2014-07-30 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080395#comment-14080395
 ] 

Navis commented on HIVE-6437:
-

[~thejas] Updated the patch, thanks.

> DefaultHiveAuthorizationProvider should not initialize a new HiveConf
> -
>
> Key: HIVE-6437
> URL: https://issues.apache.org/jira/browse/HIVE-6437
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, 
> HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, 
> HIVE-6437.6.patch.txt, HIVE-6437.7.patch.txt
>
>
> During a HS2 connection, every SessionState got initializes a new 
> DefaultHiveAuthorizationProvider object (on stock configs).
> In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that 
> may prove too expensive, and unnecessary to do, since SessionState itself 
> sends in a fully applied HiveConf to it in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6437) DefaultHiveAuthorizationProvider should not initialize a new HiveConf

2014-07-30 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6437:


Attachment: HIVE-6437.7.patch.txt

> DefaultHiveAuthorizationProvider should not initialize a new HiveConf
> -
>
> Key: HIVE-6437
> URL: https://issues.apache.org/jira/browse/HIVE-6437
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Affects Versions: 0.13.0
>Reporter: Harsh J
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-6437.1.patch.txt, HIVE-6437.2.patch.txt, 
> HIVE-6437.3.patch.txt, HIVE-6437.4.patch.txt, HIVE-6437.5.patch.txt, 
> HIVE-6437.6.patch.txt, HIVE-6437.7.patch.txt
>
>
> During a HS2 connection, every SessionState got initializes a new 
> DefaultHiveAuthorizationProvider object (on stock configs).
> In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that 
> may prove too expensive, and unnecessary to do, since SessionState itself 
> sends in a fully applied HiveConf to it in the first place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24043: DefaultHiveAuthorizationProvider should not initialize a new HiveConf

2014-07-30 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24043/
---

(Updated July 31, 2014, 2:05 a.m.)


Review request for hive.


Changes
---

Addressed comments


Bugs: HIVE-6437
https://issues.apache.org/jira/browse/HIVE-6437


Repository: hive-git


Description
---

During a HS2 connection, every SessionState got initializes a new 
DefaultHiveAuthorizationProvider object (on stock configs).

In turn, DefaultHiveAuthorizationProvider carries a {{new HiveConf(…)}} that 
may prove too expensive, and unnecessary to do, since SessionState itself sends 
in a fully applied HiveConf to it in the first place.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
  
contrib/src/java/org/apache/hadoop/hive/contrib/metastore/hooks/TestURLHook.java
 39562ea 
  contrib/src/test/queries/clientnegative/url_hook.q c346432 
  contrib/src/test/queries/clientpositive/url_hook.q PRE-CREATION 
  contrib/src/test/results/clientnegative/url_hook.q.out 601fd93 
  contrib/src/test/results/clientpositive/url_hook.q.out PRE-CREATION 
  data/conf/hive-site.xml fe8080a 
  itests/hive-unit/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java 
e8d405d 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetastoreVersion.java
 0bb022e 
  itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java 2fefa06 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
5cc1cd8 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
d26183b 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
5add436 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStoreProxy.java 
1cf09d4 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 81323f6 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/DefaultHiveAuthorizationProvider.java
 2fa512c 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/StorageBasedAuthorizationProvider.java
 0dfd997 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/HiveRoleGrant.java
 ce07f32 
  
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/plugin/sqlstd/SQLStdHiveAccessController.java
 ce12edb 
  ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java d218271 
  ql/src/test/queries/clientnegative/authorization_cannot_create_all_role.q 
de91e91 
  ql/src/test/queries/clientnegative/authorization_cannot_create_default_role.q 
42a42f6 
  ql/src/test/queries/clientnegative/authorization_cannot_create_none_role.q 
0d14cde 
  ql/src/test/queries/clientnegative/authorization_caseinsensitivity.q d5ea284 
  ql/src/test/queries/clientnegative/authorization_drop_db_cascade.q edeae9b 
  ql/src/test/queries/clientnegative/authorization_drop_db_empty.q 46d4d0f 
  ql/src/test/queries/clientnegative/authorization_drop_role_no_admin.q a7aa17f 
  ql/src/test/queries/clientnegative/authorization_priv_current_role_neg.q 
463358a 
  ql/src/test/queries/clientnegative/authorization_role_cycles1.q a819d20 
  ql/src/test/queries/clientnegative/authorization_role_cycles2.q 423f030 
  ql/src/test/queries/clientnegative/authorization_role_grant.q c5c500a 
  ql/src/test/queries/clientnegative/authorization_role_grant2.q 7fdf157 
  ql/src/test/queries/clientnegative/authorization_role_grant_nosuchrole.q 
f456165 
  ql/src/test/queries/clientnegative/authorization_role_grant_otherrole.q 
f91abdb 
  ql/src/test/queries/clientnegative/authorization_role_grant_otheruser.q 
a530043 
  ql/src/test/queries/clientnegative/authorization_rolehierarchy_privs.q 
d9f4c7c 
  ql/src/test/queries/clientnegative/authorization_set_role_neg2.q 03f748f 
  ql/src/test/queries/clientnegative/authorization_show_grant_otherrole.q 
a709d16 
  ql/src/test/queries/clientnegative/authorization_show_grant_otheruser_all.q 
2073cda 
  
ql/src/test/queries/clientnegative/authorization_show_grant_otheruser_alltabs.q 
672b81b 
  ql/src/test/queries/clientnegative/authorization_show_grant_otheruser_wtab.q 
7d95a9d 
  ql/src/test/queries/clientpositive/authorization_1_sql_std.q 381937c 
  ql/src/test/queries/clientpositive/authorization_admin_almighty1.q 45c4a7d 
  ql/src/test/queries/clientpositive/authorization_admin_almighty2.q ce99670 
  ql/src/test/queries/clientpositive/authorization_create_func1.q 65a7b33 
  ql/src/test/queries/clientpositive/authorization_create_macro1.q fb60500 
  ql/src/test/queries/clientpositive/authorization_insert.q 6cce469 
  ql/src/test/queries/clientpositive/authorization_owner_actions_db.q 36ab260 
  ql/src/test/queries/clientpositive/authorization_role_grant1.q c062ef2 
  ql/src/test/queries/clientpositive/authorization_role_grant2.q 34e19a2 
  ql/src/test/queries/clientpositive/authorization_set_show_current_role.q 
6b5af6e 
  ql/src/test/queries/clientpositive/authorization_show_g

[jira] [Commented] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080394#comment-14080394
 ] 

Hive QA commented on HIVE-7096:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658805/HIVE-7096.5.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/112/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/112/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-112/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-112/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'itests/qtest/testconfiguration.properties'
Reverted 'ql/src/test/results/clientpositive/vectorization_9.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_14.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_16.q.out'
Reverted 'ql/src/test/results/clientpositive/tez/vectorization_15.q.out'
Reverted 'ql/src/test/results/clientpositive/vectorization_15.q.out'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/optimizer/physical/TestVectorizer.java'
Reverted 'ql/src/test/queries/clientpositive/vectorization_15.q'
Reverted 'ql/src/test/queries/clientpositive/vectorization_9.q'
Reverted 'ql/src/test/queries/clientpositive/vectorization_14.q'
Reverted 'ql/src/test/queries/clientpositive/vectorization_16.q'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceWork.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/BaseWork.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordProcessor.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/results/clientpositive/tez/vectorized_shufflejoin.q.out 
ql/src/test/results/clientpositive/tez/vectorization_9.q.out 
ql/src/test/results/clientpositive/tez/vectorized_timestamp_funcs.q.out 
ql/src/test/results/clientpositive/tez/vectorization_13.q.out 
ql/src/test/results/clientpositive/tez/vectorization_part_project.q.out 
ql/src/test/results/clientpositive/tez/vectorized_nested_mapjoin.q.out 
ql/src/test/results/clientpositive/tez/vectorization_short_regress.q.out 
ql/src/test/results/clientpositive/tez/vectorization_12.q.out 
ql

[jira] [Commented] (HIVE-7029) Vectorize ReduceWork

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080392#comment-14080392
 ] 

Hive QA commented on HIVE-7029:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658713/HIVE-7029.8.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5834 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/111/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/111/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-111/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658713

> Vectorize ReduceWork
> 
>
> Key: HIVE-7029
> URL: https://issues.apache.org/jira/browse/HIVE-7029
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-7029.1.patch, HIVE-7029.2.patch, HIVE-7029.3.patch, 
> HIVE-7029.4.patch, HIVE-7029.5.patch, HIVE-7029.6.patch, HIVE-7029.7.patch, 
> HIVE-7029.8.patch
>
>
> This will enable vectorization team to independently work on vectorization on 
> reduce side even before vectorized shuffle is ready.
> NOTE: Tez only (i.e. TezTask only)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7420:
-

Attachment: (was: HIVE-7420.3.patch)

> Parameterize tests for HCatalog Pig interfaces for testing against all 
> storage formats
> --
>
> Key: HIVE-7420
> URL: https://issues.apache.org/jira/browse/HIVE-7420
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Reporter: David Chen
>Assignee: David Chen
> Attachments: HIVE-7420-without-HIVE-7457.2.patch, 
> HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, 
> HIVE-7420.3.patch
>
>
> Currently, HCatalog tests only test against RCFile with a few testing against 
> ORC. The tests should be covering other Hive storage formats as well.
> HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with 
> all Hive storage formats and with that patch, all test suites built on 
> HCatMapReduceTest are running and passing against Sequence File, Text, and 
> ORC in addition to RCFile.
> Similar changes should be made to make the tests for HCatLoader and 
> HCatStorer generic so that they can be run against all Hive storage formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-07-30 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080384#comment-14080384
 ] 

Navis commented on HIVE-7532:
-

Attached the patch without generated sources and RB link.

> allow disabling direct sql per query with external metastore
> 
>
> Key: HIVE-7532
> URL: https://issues.apache.org/jira/browse/HIVE-7532
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Navis
> Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
> HIVE-7532.2.patch.txt
>
>
> Currently with external metastore, direct sql can only be disabled via 
> metastore config globally. Perhaps it makes sense to have the ability to 
> propagate the setting per query from client to override the metastore 
> setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7420:
-

Attachment: HIVE-7420.3.patch
HIVE-7420-without-HIVE-7457.3.patch

> Parameterize tests for HCatalog Pig interfaces for testing against all 
> storage formats
> --
>
> Key: HIVE-7420
> URL: https://issues.apache.org/jira/browse/HIVE-7420
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Reporter: David Chen
>Assignee: David Chen
> Attachments: HIVE-7420-without-HIVE-7457.2.patch, 
> HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, 
> HIVE-7420.3.patch
>
>
> Currently, HCatalog tests only test against RCFile with a few testing against 
> ORC. The tests should be covering other Hive storage formats as well.
> HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with 
> all Hive storage formats and with that patch, all test suites built on 
> HCatMapReduceTest are running and passing against Sequence File, Text, and 
> ORC in addition to RCFile.
> Similar changes should be made to make the tests for HCatLoader and 
> HCatStorer generic so that they can be run against all Hive storage formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7420:
-

Attachment: (was: HIVE-7420-without-HIVE-7457.3.patch)

> Parameterize tests for HCatalog Pig interfaces for testing against all 
> storage formats
> --
>
> Key: HIVE-7420
> URL: https://issues.apache.org/jira/browse/HIVE-7420
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Reporter: David Chen
>Assignee: David Chen
> Attachments: HIVE-7420-without-HIVE-7457.2.patch, 
> HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, 
> HIVE-7420.3.patch
>
>
> Currently, HCatalog tests only test against RCFile with a few testing against 
> ORC. The tests should be covering other Hive storage formats as well.
> HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with 
> all Hive storage formats and with that patch, all test suites built on 
> HCatMapReduceTest are running and passing against Sequence File, Text, and 
> ORC in addition to RCFile.
> Similar changes should be made to make the tests for HCatLoader and 
> HCatStorer generic so that they can be run against all Hive storage formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 23797: HIVE-7420: Parameterize tests for HCatalog Pig interfaces for testing against all storage formats.

2014-07-30 Thread David Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23797/
---

(Updated July 31, 2014, 1:38 a.m.)


Review request for hive.


Bugs: HIVE-7420
https://issues.apache.org/jira/browse/HIVE-7420


Repository: hive-git


Description
---

HIVE-7420: Parameterize tests for HCatalog Pig interfaces for testing against 
all storage formats.


HIVE-7457: Minor HCatalog Pig Adapter test clean up.


Diffs (updated)
-

  hcatalog/hcatalog-pig-adapter/pom.xml 
4d2ca519d413b7de0a6a8b50f9a099c3539fc432 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/MockLoader.java
 c87b95a00af03d2531eb8bbdda4e307c3aac1fe2 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestE2EScenarios.java
 a4b55c8463b3563f1e602ae2d0809dd318bcfa7f 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
 82fc8a9391667138780be8796931793661f61ebb 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoaderComplexSchema.java
 eadbf20afc525dd9f33e9e7fb2a5d5cb89907d7e 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorer.java
 fcfc6428e7db80b8bfe0ce10e37d7b0ee6e58e20 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorerMulti.java
 76080f7635548ed9af114c823180d8da9ea8f6c2 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatStorerWrapper.java
 7f0bca763eb07db3822c6d6028357e81278803c9 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatLoader.java
 82eb0d72b4f885184c094113f775415c06bdce98 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatLoaderComplexSchema.java
 05387711289279cab743f51aee791069609b904a 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatPigStorer.java
 a9b452101c15fb7a3f0d8d0339f7d0ad97383441 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestOrcHCatStorer.java
 1084092828a9ac5e37f5b50b9c6bbd03f70b48fd 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestPigHCatUtil.java
 a8ce61aaad42b03e4de346530d0724f3d69776b9 
  ql/src/test/org/apache/hadoop/hive/ql/io/StorageFormats.java 
19fdeb5ed3dba7a3bcba71fb285d92d3f6aabea9 

Diff: https://reviews.apache.org/r/23797/diff/


Testing
---


Thanks,

David Chen



[jira] [Updated] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-07-30 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7532:


Attachment: HIVE-7532.2.nogen

> allow disabling direct sql per query with external metastore
> 
>
> Key: HIVE-7532
> URL: https://issues.apache.org/jira/browse/HIVE-7532
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Navis
> Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.nogen, 
> HIVE-7532.2.patch.txt
>
>
> Currently with external metastore, direct sql can only be disabled via 
> metastore config globally. Perhaps it makes sense to have the ability to 
> propagate the setting per query from client to override the metastore 
> setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7457) Minor HCatalog Pig Adapter test clean up

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7457:
-

Attachment: HIVE-7457.4.patch

Attach a new patch rebased on trunk.

> Minor HCatalog Pig Adapter test clean up
> 
>
> Key: HIVE-7457
> URL: https://issues.apache.org/jira/browse/HIVE-7457
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Chen
>Assignee: David Chen
>Priority: Minor
> Attachments: HIVE-7457.1.patch, HIVE-7457.2.patch, HIVE-7457.3.patch, 
> HIVE-7457.4.patch
>
>
> Minor cleanup to the HCatalog Pig Adapter tests in preparation for HIVE-7420:
>  * Run through Hive Eclipse formatter.
>  * Convert JUnit 3-style tests to follow JUnit 4 conventions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24137: allow disabling direct sql per query with external metastore

2014-07-30 Thread Navis Ryu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24137/
---

Review request for hive.


Bugs: HIVE-7532
https://issues.apache.org/jira/browse/HIVE-7532


Repository: hive-git


Description
---

Currently with external metastore, direct sql can only be disabled via 
metastore config globally. Perhaps it makes sense to have the ability to 
propagate the setting per query from client to override the metastore setting, 
e.g. if one particular query causes it to fail.


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
  common/src/java/org/apache/hadoop/hive/conf/SystemVariables.java ee98d17 
  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestMetaStoreEventListener.java
 9e416b5 
  metastore/if/hive_metastore.thrift 55f41db 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
5cc1cd8 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
d26183b 
  metastore/src/java/org/apache/hadoop/hive/metastore/IHMSHandler.java 1675751 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
5add436 
  
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java 
c28c46a 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStoreProxy.java 
1cf09d4 
  metastore/src/java/org/apache/hadoop/hive/metastore/RetryingHMSHandler.java 
86172b9 
  
metastore/src/java/org/apache/hadoop/hive/metastore/events/ConfigChangeEvent.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/24137/diff/


Testing
---


Thanks,

Navis Ryu



[jira] [Updated] (HIVE-7420) Parameterize tests for HCatalog Pig interfaces for testing against all storage formats

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-7420:
-

Attachment: HIVE-7420-without-HIVE-7457.3.patch
HIVE-7420.3.patch

Attached new patch rebased on trunk.

> Parameterize tests for HCatalog Pig interfaces for testing against all 
> storage formats
> --
>
> Key: HIVE-7420
> URL: https://issues.apache.org/jira/browse/HIVE-7420
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Reporter: David Chen
>Assignee: David Chen
> Attachments: HIVE-7420-without-HIVE-7457.2.patch, 
> HIVE-7420-without-HIVE-7457.3.patch, HIVE-7420.1.patch, HIVE-7420.2.patch, 
> HIVE-7420.3.patch
>
>
> Currently, HCatalog tests only test against RCFile with a few testing against 
> ORC. The tests should be covering other Hive storage formats as well.
> HIVE-7286 turns HCatMapReduceTest into a test fixture that can be run with 
> all Hive storage formats and with that patch, all test suites built on 
> HCatMapReduceTest are running and passing against Sequence File, Text, and 
> ORC in addition to RCFile.
> Similar changes should be made to make the tests for HCatLoader and 
> HCatStorer generic so that they can be run against all Hive storage formats.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7532) allow disabling direct sql per query with external metastore

2014-07-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080349#comment-14080349
 ] 

Sergey Shelukhin commented on HIVE-7532:


There are some unrelated generated code changes. I actually noticed I get 
similar changes but I would usually reset unrelated files. I wonder if everyone 
gets those and if we should update it in separate jira if so?

Is is possible to post an RB? patch is quite big

> allow disabling direct sql per query with external metastore
> 
>
> Key: HIVE-7532
> URL: https://issues.apache.org/jira/browse/HIVE-7532
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Navis
> Attachments: HIVE-7532.1.patch.txt, HIVE-7532.2.patch.txt
>
>
> Currently with external metastore, direct sql can only be disabled via 
> metastore config globally. Perhaps it makes sense to have the ability to 
> propagate the setting per query from client to override the metastore 
> setting, e.g. if one particular query causes it to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7559) StarterProject: Move configuration from SparkClient to HiveConf

2014-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080341#comment-14080341
 ] 

Xuefu Zhang commented on HIVE-7559:
---

[~brocknoland] Thanks for noticing this, but I think we're not in a hurry to do 
this, as there has been ongoing discussion about the configuration business 
(HIVE-7436). We will need a followup discussion on the topic. It appears that 
this can wait until we make a final decision. 

> StarterProject: Move configuration from SparkClient to HiveConf
> ---
>
> Key: HIVE-7559
> URL: https://issues.apache.org/jira/browse/HIVE-7559
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Affects Versions: spark-branch
>Reporter: Brock Noland
>Priority: Minor
>  Labels: StarterProject
>
> The SparkClient class has some configuration keys and defaults. These should 
> be moved to HiveConf.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7563) ClassLoader should be released from LogFactory

2014-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080343#comment-14080343
 ] 

Szehon Ho commented on HIVE-7563:
-

Looks reasonable, +1

> ClassLoader should be released from LogFactory
> --
>
> Key: HIVE-7563
> URL: https://issues.apache.org/jira/browse/HIVE-7563
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
> Attachments: HIVE-7563.1.patch.txt
>
>
> NO PRECOMMIT TESTS
> LogFactory uses ClassLoader as a key in map, which makes the classloader 
> impossible to be unloaded. LogFactory.release() should be called explicitly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7564) Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7564:
--

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to spark branch.

> Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]
> --
>
> Key: HIVE-7564
> URL: https://issues.apache.org/jira/browse/HIVE-7564
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-7564.patch
>
>
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7564) Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7564:
--

Status: Patch Available  (was: Open)

> Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]
> --
>
> Key: HIVE-7564
> URL: https://issues.apache.org/jira/browse/HIVE-7564
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-7564.patch
>
>
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7564) Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7564:
--

Attachment: HIVE-7564.patch

> Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]
> --
>
> Key: HIVE-7564
> URL: https://issues.apache.org/jira/browse/HIVE-7564
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: spark-branch
>
> Attachments: HIVE-7564.patch
>
>
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

2014-07-30 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080338#comment-14080338
 ] 

David Chen commented on HIVE-4329:
--

Some notes about this patch:

 * {{\*OutputFormatContainer}} classes now wrap a {{HiveOutputFormat}} rather 
than a mapred {{OutputFormat}}.
 * {{\*RecordWriterContainer}} classes now wrap a 
{{FileSinkOperator.RecordWriter}} rather than a mapred {{RecordWriter}}.
 * {{InternalUtil.initializeOutputSerDe}} and 
{{InternalUtil.initializeDeserializer}} now take the properties from the 
{{TableDesc}} created from the table contained in {{HCatTableInfo}} rather than 
creating the properties manually. As a result, 
{{InternalUtil.setSerDeProperties}} has been removed.
 * Fixed a {{NullPointerException}} in {{AvroSerDe.initialize}} that occurrs if 
{{columnCommentProperty}} is null.

Test coverage:

 * Remove disabled Serde list from {{HCatMapReduceTest}} so that all 
{{HCatMapReduceTest}} suites are also run against {{AvroSerDe}} and 
{{ParquetHiveSerDe}}

To do:

 * Fix case where static partitioning is used.
 * Clean up if necessary
 * Remove diagnostic print statements.

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> ---
>
> Key: HIVE-4329
> URL: https://issues.apache.org/jira/browse/HIVE-4329
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.14.0
> Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>Reporter: Sean Busbey
>Assignee: David Chen
> Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>   at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>   at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7564) Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7564:
--

Description: NO PRECOMMIT TESTS. This is for spark branch only.

> Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]
> --
>
> Key: HIVE-7564
> URL: https://issues.apache.org/jira/browse/HIVE-7564
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> NO PRECOMMIT TESTS. This is for spark branch only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7564) Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]

2014-07-30 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-7564:
-

 Summary: Remove some redundant code plus a bit of cleanup in 
SparkClient [Spark Branch]
 Key: HIVE-7564
 URL: https://issues.apache.org/jira/browse/HIVE-7564
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.5.patch

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.5.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-4329:
-

Status: Patch Available  (was: In Progress)

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> ---
>
> Key: HIVE-4329
> URL: https://issues.apache.org/jira/browse/HIVE-4329
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.10.0
> Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>Reporter: Sean Busbey
>Assignee: David Chen
> Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>   at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>   at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-4329:
-

Attachment: HIVE-4329.0.patch

Writing via HCatalog is now working for both Avro and Parquet Serdes for 
everything except static partitioning. For static partitioning, there is a 
mismatch between the expected schema and the schema set in the table properties 
due the partition column not being present; I am looking into this problem 
right now.

I am uploading a patch for initial review and to run through pre-commit tests.

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> ---
>
> Key: HIVE-4329
> URL: https://issues.apache.org/jira/browse/HIVE-4329
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.10.0
> Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>Reporter: Sean Busbey
>Assignee: David Chen
> Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>   at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>   at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

2014-07-30 Thread David Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Chen updated HIVE-4329:
-

Affects Version/s: (was: 0.10.0)
   0.14.0

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> ---
>
> Key: HIVE-4329
> URL: https://issues.apache.org/jira/browse/HIVE-4329
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.14.0
> Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>Reporter: Sean Busbey
>Assignee: David Chen
> Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>   at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>   at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter

2014-07-30 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080326#comment-14080326
 ] 

David Chen commented on HIVE-4329:
--

RB: https://reviews.apache.org/r/24136

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> ---
>
> Key: HIVE-4329
> URL: https://issues.apache.org/jira/browse/HIVE-4329
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Serializers/Deserializers
>Affects Versions: 0.10.0
> Environment: discovered in Pig, but it looks like the root cause 
> impacts all non-Hive users
>Reporter: Sean Busbey
>Assignee: David Chen
> Attachments: HIVE-4329.0.patch
>
>
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails 
> with the following stacktrace:
> {code}
> java.lang.ClassCastException: org.apache.hadoop.io.NullWritable cannot be 
> cast to org.apache.hadoop.io.LongWritable
>   at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat$1.write(AvroContainerOutputFormat.java:84)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:253)
>   at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:53)
>   at 
> org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:242)
>   at org.apache.hcatalog.pig.HCatStorer.putNext(HCatStorer.java:52)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:559)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's 
> signature mandates a LongWritable key and HCat's FileRecordWriterContainer 
> forces a NullWritable. I'm not sure of a general fix, other than redefining 
> HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive 
> OutputFormats, and there's no reason AvroContainerOutputFormat couldn't also 
> be changed, since it's ignoring the key. That way fixing things so 
> FileRecordWriterContainer can always use NullWritable could get spun into a 
> different issue?
> The underlying cause for failure to write to AvroSerde tables is that 
> AvroContainerOutputFormat doesn't meaningfully implement getRecordWriter, so 
> fixing the above will just push the failure into the placeholder RecordWriter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24136: HIVE-4329: HCatalog should use getHiveRecordWriter.

2014-07-30 Thread David Chen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24136/
---

Review request for hive.


Bugs: HIVE-4329
https://issues.apache.org/jira/browse/HIVE-4329


Repository: hive-git


Description
---

HIVE-4329: HCatalog should use getHiveRecordWriter.


Diffs
-

  hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
93a03adeab7ba3c3c91344955d303e4252005239 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DefaultOutputFormatContainer.java
 3a07b0ca7c1956d45e611005cbc5ba2464596471 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DefaultRecordWriterContainer.java
 209d7bcef5624100c6cdbc2a0a137dcaf1c1fc42 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/DynamicPartitionFileRecordWriterContainer.java
 4df912a935221e527c106c754ff233d212df9246 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputFormatContainer.java
 1a7595fd6dd0a5ffbe529bc24015c482068233bf 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileRecordWriterContainer.java
 2a883d6517bfe732b6a6dffa647d9d44e4145b38 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 bfa8657cd1b16aec664aab3e22b430b304a3698d 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseOutputFormat.java
 4f7a74a002cedf3b54d0133041184fbcd9d9c4ab 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatMapRedUtil.java
 b651cb323771843da43667016a7dd2c9d9a1ddac 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java
 694739821a202780818924d54d10edb707cfbcfa 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
 1980ef50af42499e0fed8863b6ff7a45f926d9fc 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java
 9b979395e47e54aac87487cb990824e3c3a2ee19 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/OutputFormatContainer.java
 d83b003f9c16e78a39b3cc7ce810ff19f70848c2 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/RecordWriterContainer.java
 5905b46178b510b3a43311739fea2b95f47b4ed7 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/StaticPartitionFileRecordWriterContainer.java
 b3ea76e6a79f94e09972bc060c06105f60087b71 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/HCatMapReduceTest.java
 ee57f3fd126af2e36039f84686a4169ef6267593 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatDynamicPartitioned.java
 0d87c6ce2b9a2169c3b7c9d80ff33417279fb465 
  
hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/PigHCatUtil.java
 7c9003e86c61dc9e4f10e05b0c29e40ded73c793 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java 
69545b046db06fd56f35a0da09d3d6960832484d 

Diff: https://reviews.apache.org/r/24136/diff/


Testing
---


Thanks,

David Chen



[jira] [Updated] (HIVE-7563) ClassLoader should be released from LogFactory

2014-07-30 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7563:


Status: Patch Available  (was: Open)

> ClassLoader should be released from LogFactory
> --
>
> Key: HIVE-7563
> URL: https://issues.apache.org/jira/browse/HIVE-7563
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
> Attachments: HIVE-7563.1.patch.txt
>
>
> NO PRECOMMIT TESTS
> LogFactory uses ClassLoader as a key in map, which makes the classloader 
> impossible to be unloaded. LogFactory.release() should be called explicitly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7563) ClassLoader should be released from LogFactory

2014-07-30 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7563:


Description: 
NO PRECOMMIT TESTS

LogFactory uses ClassLoader as a key in map, which makes the classloader 
impossible to be unloaded. LogFactory.release() should be called explicitly.

  was:LogFactory uses ClassLoader as a key in map, which makes the classloader 
impossible to be unloaded. LogFactory.release() should be called explicitly.


> ClassLoader should be released from LogFactory
> --
>
> Key: HIVE-7563
> URL: https://issues.apache.org/jira/browse/HIVE-7563
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
> Attachments: HIVE-7563.1.patch.txt
>
>
> NO PRECOMMIT TESTS
> LogFactory uses ClassLoader as a key in map, which makes the classloader 
> impossible to be unloaded. LogFactory.release() should be called explicitly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7563) ClassLoader should be released from LogFactory

2014-07-30 Thread Navis (JIRA)
Navis created HIVE-7563:
---

 Summary: ClassLoader should be released from LogFactory
 Key: HIVE-7563
 URL: https://issues.apache.org/jira/browse/HIVE-7563
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: HIVE-7563.1.patch.txt

LogFactory uses ClassLoader as a key in map, which makes the classloader 
impossible to be unloaded. LogFactory.release() should be called explicitly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7563) ClassLoader should be released from LogFactory

2014-07-30 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7563:


Attachment: HIVE-7563.1.patch.txt

> ClassLoader should be released from LogFactory
> --
>
> Key: HIVE-7563
> URL: https://issues.apache.org/jira/browse/HIVE-7563
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
> Attachments: HIVE-7563.1.patch.txt
>
>
> NO PRECOMMIT TESTS
> LogFactory uses ClassLoader as a key in map, which makes the classloader 
> impossible to be unloaded. LogFactory.release() should be called explicitly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080299#comment-14080299
 ] 

Thejas M Nair commented on HIVE-7547:
-

+1
Thanks for fixing the kerberos mode!


> Add ipAddress and userName to ExecHook
> --
>
> Key: HIVE-7547
> URL: https://issues.apache.org/jira/browse/HIVE-7547
> Project: Hive
>  Issue Type: New Feature
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.4.patch, 
> HIVE-7547.patch
>
>
> Auditing tools should be able to know about the ipAddress and userName of the 
> user executing operations.  
> These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7526) Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

2014-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080283#comment-14080283
 ] 

Xuefu Zhang commented on HIVE-7526:
---

Chao, Based on our last conversation, I don't think your patch is final or 
ready to be reviewed. Please continue working on your patch and update when you 
think it's ready. Here is what I have emphasized:

1. Define a SparkShuffle interface that's similar to existing ShuffleTran.
2. Have two implementation of this interface: sortBy and groupBy.
3. For sortBy, use a local key clustering mechanism.
4. Have ReduceTran contain a reference to SparkShuffle and HiveReduceFunction 
instance.

Let me know if you have additional questions.

> Research to use groupby transformation to replace Hive existing 
> partitionByKey and SparkCollector combination
> -
>
> Key: HIVE-7526
> URL: https://issues.apache.org/jira/browse/HIVE-7526
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Chao
> Attachments: HIVE-7526.2.patch, HIVE-7526.3.patch, HIVE-7526.patch
>
>
> Currently SparkClient shuffles data by calling paritionByKey(). This 
> transformation outputs  tuples. However, Hive's ExecMapper 
> expects > tuples, and Spark's groupByKey() seems 
> outputing this directly. Thus, using groupByKey, we may be able to avoid its 
> own key clustering mechanism (in HiveReduceFunction). This research is to 
> have a try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080267#comment-14080267
 ] 

Lefty Leverenz commented on HIVE-7509:
--

Configuration parameter *hive.merge.orcfile.stripe.level* needs to be added to 
the wiki by the time 0.14.0 is released, but 
*hive.merge.input.format.stripe.level* is internal only so it doesn't belong in 
the wiki.

Besides adding *hive.merge.orcfile.stripe.level* to the Configuration 
Properties doc, a new section could be added to the ORC Files doc listing all 
the ORC configs or pointing to an ORC section in Configuration Properties 
(which hasn't been created yet).

* [Configuration Properties -- ORC parameters | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.orc.splits.include.file.footer]
* [ORC Files | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC]

> Fast stripe level merging for ORC
> -
>
> Key: HIVE-7509
> URL: https://issues.apache.org/jira/browse/HIVE-7509
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: TODOC14, orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
> HIVE-7509.4.patch, HIVE-7509.5.patch
>
>
> Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
> through CONCATENATE command and conditional merge task. This fast merging is 
> ideal for merging many small ORC files to a larger file without decompressing 
> and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7554) Parquet Hive should resolve column names in case insensitive manner

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7554:
---

Status: Patch Available  (was: Open)

> Parquet Hive should resolve column names in case insensitive manner
> ---
>
> Key: HIVE-7554
> URL: https://issues.apache.org/jira/browse/HIVE-7554
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-7554.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7348) Beeline could not parse ; separated queries provided with -e option

2014-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080254#comment-14080254
 ] 

Szehon Ho commented on HIVE-7348:
-

This looks good, can we add one test?

> Beeline could not parse ; separated queries provided with -e option
> ---
>
> Key: HIVE-7348
> URL: https://issues.apache.org/jira/browse/HIVE-7348
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7348.1.patch, HIVE-7348.patch
>
>
> Beeline could not parse ; separated queries provided with -e option. This 
> works fine on hive cli.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 24127: Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

2014-07-30 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24127/
---

Review request for hive.


Repository: hive-git


Description
---

An attempt to fix the last patch by moving groupBy op to ShuffleTran.
Also, since now SparkTran::transform may have input/output value types other 
than BytesWritable, we need to make it generic as well..
Also added a CompTran class, which is basically a composition of 
transformations. It offers better type compatibility than ChainedTran.
This is NOT the perfect solution, and may subject to further change.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ChainedTran.java 4991568 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CompTran.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 01a70e9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 
841db87 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/IdentityTran.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 98d08e6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java d1af86d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java 33e7d45 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java cf85af1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 
440dd93 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkTran.java 6aa732f 

Diff: https://reviews.apache.org/r/24127/diff/


Testing
---


Thanks,

Chao Sun



[jira] [Commented] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080248#comment-14080248
 ] 

Szehon Ho commented on HIVE-7562:
-

+1, pending test

> Cleanup ExecReducer
> ---
>
> Key: HIVE-7562
> URL: https://issues.apache.org/jira/browse/HIVE-7562
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-7562.patch
>
>
> ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7509:
-

Labels: TODOC14 orcfile  (was: orcfile)

> Fast stripe level merging for ORC
> -
>
> Key: HIVE-7509
> URL: https://issues.apache.org/jira/browse/HIVE-7509
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: TODOC14, orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
> HIVE-7509.4.patch, HIVE-7509.5.patch
>
>
> Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
> through CONCATENATE command and conditional merge task. This fast merging is 
> ideal for merging many small ORC files to a larger file without decompressing 
> and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

2014-07-30 Thread Ashish Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080245#comment-14080245
 ] 

Ashish Kumar Singh commented on HIVE-7446:
--

Test errors are not related to this patch.

[~tomwhite] could you take a look at this trivial patch.

> Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
> --
>
> Key: HIVE-7446
> URL: https://issues.apache.org/jira/browse/HIVE-7446
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7446.patch
>
>
> HIVE-6806 adds native support for creating hive table stored as Avro. It 
> would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed 
> tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7509:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks [~hagleitn] and [~leftylev] for the reviews.

> Fast stripe level merging for ORC
> -
>
> Key: HIVE-7509
> URL: https://issues.apache.org/jira/browse/HIVE-7509
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
> HIVE-7509.4.patch, HIVE-7509.5.patch
>
>
> Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
> through CONCATENATE command and conditional merge task. This fast merging is 
> ideal for merging many small ORC files to a larger file without decompressing 
> and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7509:
-

Fix Version/s: 0.14.0

> Fast stripe level merging for ORC
> -
>
> Key: HIVE-7509
> URL: https://issues.apache.org/jira/browse/HIVE-7509
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.14.0
>
> Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
> HIVE-7509.4.patch, HIVE-7509.5.patch
>
>
> Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
> through CONCATENATE command and conditional merge task. This fast merging is 
> ideal for merging many small ORC files to a larger file without decompressing 
> and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7562:
---

Assignee: Brock Noland
  Status: Patch Available  (was: Open)

> Cleanup ExecReducer
> ---
>
> Key: HIVE-7562
> URL: https://issues.apache.org/jira/browse/HIVE-7562
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-7562.patch
>
>
> ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7348) Beeline could not parse ; separated queries provided with -e option

2014-07-30 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7348:
-

Attachment: HIVE-7348.1.patch

> Beeline could not parse ; separated queries provided with -e option
> ---
>
> Key: HIVE-7348
> URL: https://issues.apache.org/jira/browse/HIVE-7348
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7348.1.patch, HIVE-7348.patch
>
>
> Beeline could not parse ; separated queries provided with -e option. This 
> works fine on hive cli.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24086: HIVE-7348: Beeline could not parse ; separated queries provided with -e option

2014-07-30 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24086/
---

(Updated July 30, 2014, 11:48 p.m.)


Review request for hive.


Changes
---

Move changes to only effect -e path.


Bugs: HIVE-7348
https://issues.apache.org/jira/browse/HIVE-7348


Repository: hive-git


Description
---

HIVE-7348: Beeline could not parse ; separated queries provided with -e option


Diffs (updated)
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java 
10fd2e2daac78ca43d45c74fcbad6b720a8d28ad 

Diff: https://reviews.apache.org/r/24086/diff/


Testing
---

Tested manually.


Thanks,

Ashish Singh



[jira] [Updated] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7562:
---

Attachment: HIVE-7562.patch

> Cleanup ExecReducer
> ---
>
> Key: HIVE-7562
> URL: https://issues.apache.org/jira/browse/HIVE-7562
> Project: Hive
>  Issue Type: Improvement
>Reporter: Brock Noland
> Attachments: HIVE-7562.patch
>
>
> ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7562) Cleanup ExecReducer

2014-07-30 Thread Brock Noland (JIRA)
Brock Noland created HIVE-7562:
--

 Summary: Cleanup ExecReducer
 Key: HIVE-7562
 URL: https://issues.apache.org/jira/browse/HIVE-7562
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
 Attachments: HIVE-7562.patch

ExecReducer places member variables at random with random visibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7547:


Attachment: HIVE-7547.4.patch

> Add ipAddress and userName to ExecHook
> --
>
> Key: HIVE-7547
> URL: https://issues.apache.org/jira/browse/HIVE-7547
> Project: Hive
>  Issue Type: New Feature
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.4.patch, 
> HIVE-7547.patch
>
>
> Auditing tools should be able to know about the ipAddress and userName of the 
> user executing operations.  
> These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24084/
---

(Updated July 30, 2014, 11:46 p.m.)


Review request for hive.


Bugs: HIVE-7547
https://issues.apache.org/jira/browse/HIVE-7547


Repository: hive-git


Description
---

Passing the ipAddress and userName (already calculated in ThriftCLIService for 
other purposes) through several layers down to the hooks.


Diffs (updated)
-

  
itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java
 PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 
  service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
816bea4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
5c87bcb 

Diff: https://reviews.apache.org/r/24084/diff/


Testing
---

Added tests in both kerberos and non-kerberos mode.


Thanks,

Szehon Ho



[jira] [Updated] (HIVE-7547) Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7547:


Attachment: HIVE-7547.3.patch

Thanks Thejas for pointing that out.  I refactored the code to use SessionState.

The SessionState's ipAddress didnt seem to be set for Kerberos mode, so I'm 
also changing how its being set to work for all modes.  Let me know if its not 
right.

> Add ipAddress and userName to ExecHook
> --
>
> Key: HIVE-7547
> URL: https://issues.apache.org/jira/browse/HIVE-7547
> Project: Hive
>  Issue Type: New Feature
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-7547.2.patch, HIVE-7547.3.patch, HIVE-7547.patch
>
>
> Auditing tools should be able to know about the ipAddress and userName of the 
> user executing operations.  
> These could be made available through the Hive execution-hooks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Review Request 24084: HIVE-7547 - Add ipAddress and userName to ExecHook

2014-07-30 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24084/
---

(Updated July 30, 2014, 11:40 p.m.)


Review request for hive.


Changes
---

Incorporating Brock and Thejas review comments.  As Thejas pointed out, turns 
out ipAddress is already stored in sessionState, so using that and code becomes 
a lot cleaner.  

However, the ipAddress calculated in TSetIpAddressProcessor doesnt work in 
kerberos mode, so fixing it so its set in all modes.


Bugs: HIVE-7547
https://issues.apache.org/jira/browse/HIVE-7547


Repository: hive-git


Description
---

Passing the ipAddress and userName (already calculated in ThriftCLIService for 
other purposes) through several layers down to the hooks.


Diffs (updated)
-

  
itests/hive-minikdc/src/test/java/org/apache/hive/minikdc/TestHs2HooksWithMiniKdc.java
 PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java e512199 
  ql/src/java/org/apache/hadoop/hive/ql/hooks/HookContext.java b11cb86 
  service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
816bea4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
5c87bcb 

Diff: https://reviews.apache.org/r/24084/diff/


Testing
---

Added tests in both kerberos and non-kerberos mode.


Thanks,

Szehon Ho



[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080204#comment-14080204
 ] 

Lefty Leverenz commented on HIVE-7509:
--

Good doc fixes, thanks [~prasanth_j].

+1 for docs only.

> Fast stripe level merging for ORC
> -
>
> Key: HIVE-7509
> URL: https://issues.apache.org/jira/browse/HIVE-7509
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
> HIVE-7509.4.patch, HIVE-7509.5.patch
>
>
> Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
> through CONCATENATE command and conditional merge task. This fast merging is 
> ideal for merging many small ORC files to a larger file without decompressing 
> and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: (was: HIVE-7096.4.patch)

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7509) Fast stripe level merging for ORC

2014-07-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080178#comment-14080178
 ] 

Hive QA commented on HIVE-7509:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12658680/HIVE-7509.5.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5842 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/110/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/110/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-110/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12658680

> Fast stripe level merging for ORC
> -
>
> Key: HIVE-7509
> URL: https://issues.apache.org/jira/browse/HIVE-7509
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-7509.1.patch, HIVE-7509.2.patch, HIVE-7509.3.patch, 
> HIVE-7509.4.patch, HIVE-7509.5.patch
>
>
> Similar to HIVE-1950, add support for fast stripe level merging of ORC files 
> through CONCATENATE command and conditional merge task. This fast merging is 
> ideal for merging many small ORC files to a larger file without decompressing 
> and decoding the data of small orc files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: (was: HIVE-7096.4.patch)

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: (was: HIVE-7096.4.patch)

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Component/s: Tez

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Attachment: HIVE-7096.4.patch

This patch works with tez-0.5 only. Since only the tez branch has been upgraded 
to that version, this is only applicable to that hive branch.

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7096) Support grouped splits in Tez partitioned broadcast join

2014-07-30 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-7096:
-

Affects Version/s: tez-branch

> Support grouped splits in Tez partitioned broadcast join
> 
>
> Key: HIVE-7096
> URL: https://issues.apache.org/jira/browse/HIVE-7096
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gunther Hagleitner
>Assignee: Vikram Dixit K
> Attachments: HIVE-7096.1.patch, HIVE-7096.2.patch, HIVE-7096.3.patch, 
> HIVE-7096.4.patch, HIVE-7096.tez.branch.patch
>
>
> Same checks for schema + deser + file format done in HiveSplitGenerator need 
> to be done in the CustomPartitionVertex.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7390) Make quote character optional and configurable in BeeLine CSV/TSV output

2014-07-30 Thread Lars Francke (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080128#comment-14080128
 ] 

Lars Francke commented on HIVE-7390:


You summed it up nicely, thanks.

The original intention of this issue was to make the quote character optional 
and configurable so Jim must have had a use-case for that. I can't think of a 
good one atm.

I can however think of a good reason for a configurable delimiter. Comma, 
semicolon or tab occur relatively frequently in data but some other character 
(\001 or "|") might not occur in the data and being able to pick this as the 
delimiter allows to make parsing way simpler (just split on delimiter instead 
of looking for quoted strings etc.). This is especially interesting when you 
then want to mount another table on that data in Hive or post-process in any 
other simple way where you don't have access to a full fledged CSV parsing 
library.

So: Picking the delimiter is often very helpful in avoiding a whole class of 
parsing issues and allows to just split on the delimiter.

I think that we can easily catch most common issues with two changes:

1. Fix current CSV and TSV. As you say: No debate on that
2. Allow delimiter to be specified and keep "normal quoting" mode

That allows everyone who really understands his data to avoid quoting and 
everyone else can get properly formatted CSVs for a full CSV parser. In the 
same vein I think that {{surroundingSpacesNeedQuotes}} should stay disabled.

But as I said: This is kinda hijacking Jim's original issue...

> Make quote character optional and configurable in BeeLine CSV/TSV output
> 
>
> Key: HIVE-7390
> URL: https://issues.apache.org/jira/browse/HIVE-7390
> Project: Hive
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: 0.13.1
>Reporter: Jim Halfpenny
>Assignee: Ferdinand Xu
> Attachments: HIVE-7390.1.patch, HIVE-7390.2.patch, HIVE-7390.3.patch, 
> HIVE-7390.4.patch, HIVE-7390.patch
>
>
> Currently when either the CSV or TSV output formats are used in beeline each 
> column is wrapped in single quotes. Quote wrapping of columns should be 
> optional and the user should be able to choose the character used to wrap the 
> columns.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-07-30 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080125#comment-14080125
 ] 

Gunther Hagleitner commented on HIVE-7506:
--

[~damien.carol] I think the use for this is different that analyze. The ability 
to update certain stats without scanning any data or without "hacking the 
backend db" is useful in a number of cases. It helps (esp for CBO work) to set 
up unit tests quickly and verify both cbo and the stats subsystem. It also 
helps when experimenting with the system if you're just trying out hive/hadoop 
on a small cluster. Finally it gives you a quick and clean way to fix things 
when something went wrong wrt stats in your environment.

> MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
> table (or a partition of a table)
> --
>
> Key: HIVE-7506
> URL: https://issues.apache.org/jira/browse/HIVE-7506
> Project: Hive
>  Issue Type: New Feature
>  Components: Database/Schema
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Minor
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>
> Two motivations:
> (1) CBO depends heavily on the statistics of a column in a table (or a 
> partition of a table). If we would like to test whether CBO chooses the best 
> plan under different statistics, it would be time consuming if we load the 
> whole table and create the statistics from ground up.
> (2) As database runs,  the statistics of a column in a table (or a partition 
> of a table) may change. We need a way or a mechanism to synchronize. 
> We propose the following command to achieve that:
> ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
> STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-07-30 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7506:
-

Priority: Minor  (was: Critical)

> MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
> table (or a partition of a table)
> --
>
> Key: HIVE-7506
> URL: https://issues.apache.org/jira/browse/HIVE-7506
> Project: Hive
>  Issue Type: New Feature
>  Components: Database/Schema
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Minor
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>
> Two motivations:
> (1) CBO depends heavily on the statistics of a column in a table (or a 
> partition of a table). If we would like to test whether CBO chooses the best 
> plan under different statistics, it would be time consuming if we load the 
> whole table and create the statistics from ground up.
> (2) As database runs,  the statistics of a column in a table (or a partition 
> of a table) may change. We need a way or a mechanism to synchronize. 
> We propose the following command to achieve that:
> ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
> STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7488) pass column names being used for inputs to authorization api

2014-07-30 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080118#comment-14080118
 ] 

Jason Dere commented on HIVE-7488:
--

+1. Test failures not related?

> pass column names being used for inputs to authorization api
> 
>
> Key: HIVE-7488
> URL: https://issues.apache.org/jira/browse/HIVE-7488
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7488.1.patch, HIVE-7488.2.patch, 
> HIVE-7488.3.patch.txt, HIVE-7488.4.patch, HIVE-7488.5.patch, HIVE-7488.6.patch
>
>
> HivePrivilegeObject in the authorization api has support for columns, but the 
> columns being used are not being populated for non grant-revoke queries.
> This is for enabling any implementation of the api to use this column 
> information for its authorization decisions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HIVE-7506) MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

2014-07-30 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reopened HIVE-7506:
--


> MetadataUpdater: provide a mechanism to edit the statistics of a column in a 
> table (or a partition of a table)
> --
>
> Key: HIVE-7506
> URL: https://issues.apache.org/jira/browse/HIVE-7506
> Project: Hive
>  Issue Type: New Feature
>  Components: Database/Schema
>Reporter: pengcheng xiong
>Assignee: pengcheng xiong
>Priority: Critical
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>
> Two motivations:
> (1) CBO depends heavily on the statistics of a column in a table (or a 
> partition of a table). If we would like to test whether CBO chooses the best 
> plan under different statistics, it would be time consuming if we load the 
> whole table and create the statistics from ground up.
> (2) As database runs,  the statistics of a column in a table (or a partition 
> of a table) may change. We need a way or a mechanism to synchronize. 
> We propose the following command to achieve that:
> ALTER TABLE table_name PARTITION partition_spec [COLUMN col_name] UPDATE 
> STATISTICS col_statistics [COMMENT col_comment]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7503) Support Hive's multi-table insert query with Spark

2014-07-30 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080076#comment-14080076
 ] 

Xuefu Zhang commented on HIVE-7503:
---

Assigned to myself for initial research.

> Support Hive's multi-table insert query with Spark
> --
>
> Key: HIVE-7503
> URL: https://issues.apache.org/jira/browse/HIVE-7503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> For Hive's multi insert query 
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
> may be an MR job for each insert.  When we achieve this with Spark, it would 
> be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things 
> worse, the source of the insert may be re-computed unless it's staged. Even 
> with this, the inserts will happen sequentially, making the performance 
> suffer.
> This task is to find out what takes in Spark to enable this without requiring 
> staging the source and sequential insertion. If this has to be solved in 
> Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7503) Support Hive's multi-table insert query with Spark

2014-07-30 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-7503:
-

Assignee: Xuefu Zhang

> Support Hive's multi-table insert query with Spark
> --
>
> Key: HIVE-7503
> URL: https://issues.apache.org/jira/browse/HIVE-7503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> For Hive's multi insert query 
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
> may be an MR job for each insert.  When we achieve this with Spark, it would 
> be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things 
> worse, the source of the insert may be re-computed unless it's staged. Even 
> with this, the inserts will happen sequentially, making the performance 
> suffer.
> This task is to find out what takes in Spark to enable this without requiring 
> staging the source and sequential insertion. If this has to be solved in 
> Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >