date:20140807

[jira] [Commented] (HIVE-6959) Remove vectorization related constant expression folding code once Constant propagation optimizer for Hive is committed

2014-08-07 Thread Hari Sankar Sivarama Subramaniyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089905#comment-14089905
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-6959:
-

[~ashutoshc] 1. vectorization_14.q has been fixed with the new patch
2. vector_coalesce happens because of HIVE-7650 and I have disabled constant 
propagation for the specific  test case temporarily.
3. vector_cast_constant.q exposed an existing hive vectorization issue and I 
have uploaded this change in the new patch,.



> Remove vectorization related constant expression folding code once Constant 
> propagation optimizer for Hive is committed
> ---
>
> Key: HIVE-6959
> URL: https://issues.apache.org/jira/browse/HIVE-6959
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.3.patch, 
> HIVE-6959.4.patch
>
>
> HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 
> is committed, we should remove any vectorization related code which 
> duplicates this feature. For example, a fn to be cleaned is 
> VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this 
> change, constant propagation should kick in when vectorization is enabled. 
> i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside 
> ConstantPropagate::transform().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review requests & JIRA process

2014-08-07 Thread Sergey Shelukhin

I think the process that works best for getting patches reviewed is adding
specific people to RB/JIRA and asking, usually the people who are active in
a related area of code. And nagging them until they either review or refuse
to review :)


On Thu, Aug 7, 2014 at 4:31 AM, Lars Francke  wrote:

> Hey,
>
> just wanted to bump this in case anyone has any opinions as well as what
> the procedure is for patches that don't get a review for a longer period of
> time. I'd like to avoid a lot of unnecessary work on my end :)
>
> Cheers,
> Lars
>
>
> On Wed, Aug 6, 2014 at 1:15 AM, Lars Francke 
> wrote:
>
> > Hi everyone,
> >
> > I have a couple of review requests that I'd love for someone to look at.
> > I'll list them below. I have however two more questions.
> >
> > Two of my issues are clean ups of existing code (HIVE-7622 & HIVE-7543).
> I
> > realize that they don't bring immediate benefit and I had planned to fix
> > some more of the issues Checkstyle, my IDE and SonarQube[1] complain
> about.
> > Is this okay for you guys or would you rather I stop this? I ask because
> > they take a significant amount of time not only for myself but also for a
> > reviewer and they go stale fast. I think it helps to have a clean
> codebase
> > for what it's worth.
> >
> > The second question is about the JIRA process: What's the best way to get
> > someone to review patches? I currently always create a review, attach the
> > patch to the Issue and set it to PATCH AVAILABLE. The documentation is
> not
> > quite clear about the process[2].
> >
> > These are the issues in need of reviews:
> >
> > *  (huge but I'd
> > appreciate an answer fast to avoid having to rebase it often)
> > * 
> > * 
> > * 
> >
> > Thanks!
> >
> > Cheers,
> > Lars
> >
> > [1]  I have a publicly accessible server set
> > up with Hive analyzed, happy to send the link to anyone interested
> > http://i.imgur.com/e3KjR26.png
> >
> > [2] <
> >
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-MakingChanges
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Created] (HIVE-7653) Hive AvroSerDe does not support circular references in Schema

2014-08-07 Thread Sachin Goyal (JIRA)

Sachin Goyal created HIVE-7653:
--

 Summary: Hive AvroSerDe does not support circular references in 
Schema
 Key: HIVE-7653
 URL: https://issues.apache.org/jira/browse/HIVE-7653
 Project: Hive
  Issue Type: Bug
Reporter: Sachin Goyal


Avro allows nullable circular references but Hive AvroSerDe does not.

Example of circular references (passing in Avro but failing in AvroSerDe):
{code}
class AvroCycleParent {
  AvroCycleChild child;
  public AvroCycleChild getChild () {return child;}
  public void setChild (AvroCycleChild child) {this.child = child;}
}

class AvroCycleChild {
  AvroCycleParent parent;
  public AvroCycleParent getParent () {return parent;}
  public void setParent (AvroCycleParent parent) {this.parent = parent;}
}
{code}

Due to this discrepancy, Hive is unable to read Avro records having 
circular-references. For some third-party code with such references, it becomes 
very hard to directly serialize it with Avro and use in Hive.

I have a patch for this with a unit-test and I will submit it shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7620) Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089898#comment-14089898
 ] 

Sergey Shelukhin commented on HIVE-7620:


How often is getHadoopSaslProperties called? Might make sense to cache method 
object to invoke if it's often. Otherwise looks good

> Hive metastore fails to start in secure mode due to 
> "java.lang.NoSuchFieldError: SASL_PROPS" error
> --
>
> Key: HIVE-7620
> URL: https://issues.apache.org/jira/browse/HIVE-7620
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Hadoop 2.5-snapshot with kerberos authentication on
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7620.1.patch
>
>
> When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
> with following error
> {code}
> 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
> Server threw an exception...
> java.lang.NoSuchFieldError: SASL_PROPS
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
> {code}
> Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
> error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7652) Check OutputCollector after closing ExecMapper/ExecReducer

2014-08-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7652:
--

Status: Patch Available  (was: Open)

> Check OutputCollector after closing ExecMapper/ExecReducer
> --
>
> Key: HIVE-7652
> URL: https://issues.apache.org/jira/browse/HIVE-7652
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: spark-branch
>
> Attachments: HIVE-7652.1-spark.patch
>
>
> Some operators such as Group By are adding output records to 
> {{OutputCollector}} after closing the ExecMapper/ExecReducer. Need to check 
> if {{lastRecordOutput}} has any records after closing the 
> ExecMapper/ExecReducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7649) Support column stats with temporary tables

2014-08-07 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7649:
-

Attachment: HIVE-7649.1.patch

> Support column stats with temporary tables
> --
>
> Key: HIVE-7649
> URL: https://issues.apache.org/jira/browse/HIVE-7649
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-7649.1.patch
>
>
> Column stats currently not supported with temp tables, see if they can be 
> added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7649) Support column stats with temporary tables

2014-08-07 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7649:
-

Status: Patch Available  (was: Open)

> Support column stats with temporary tables
> --
>
> Key: HIVE-7649
> URL: https://issues.apache.org/jira/browse/HIVE-7649
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-7649.1.patch
>
>
> Column stats currently not supported with temp tables, see if they can be 
> added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7446) Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

2014-08-07 Thread Ashish Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089885#comment-14089885
 ] 

Ashish Kumar Singh commented on HIVE-7446:
--

[~szehon] thanks for your valuable insight here. This helped me discover that 
as of now alter table is actually not working for avro backed tables in Hive. 
Alter table updates HMS with new schema, but schema in the avro file containing 
avro-backed table's data keeps the original schema. On trying to read from the 
avro-backed table after altering table, avro throws exception while trying to 
read the avro file. This exception is thrown because of difference in expected 
and actual schemas.

Based on offline discussion with [~tomwhite], Avro allows files written with 
the old schema to be read with the new schema as long as certain rules are 
followed, e.g. adding a new field has a default value. The full set of rules 
are at http://avro.apache.org/docs/current/spec.html#Schema+Resolution.

I will upload a patch that should fix this with appropriate tests.

> Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables
> --
>
> Key: HIVE-7446
> URL: https://issues.apache.org/jira/browse/HIVE-7446
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7446.patch
>
>
> HIVE-6806 adds native support for creating hive table stored as Avro. It 
> would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed 
> tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Status: Patch Available  (was: Open)

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Attachment: HIVE-7223.2.patch

Sorry, I didn't realize the generated code needed to be included. Here's the 
complete patch.
(This includes changes to *cpp, *php, etc., though.)

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-7223.1.patch, HIVE-7223.2.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-07 Thread Mithun Radhakrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-7223:
---

Status: Open  (was: Patch Available)

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.13.0, 0.12.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-7223.1.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7652) Check OutputCollector after closing ExecMapper/ExecReducer

2014-08-07 Thread Venki Korukanti (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-7652:
--

Attachment: HIVE-7652.1-spark.patch

> Check OutputCollector after closing ExecMapper/ExecReducer
> --
>
> Key: HIVE-7652
> URL: https://issues.apache.org/jira/browse/HIVE-7652
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: spark-branch
>
> Attachments: HIVE-7652.1-spark.patch
>
>
> Some operators such as Group By are adding output records to 
> {{OutputCollector}} after closing the ExecMapper/ExecReducer. Need to check 
> if {{lastRecordOutput}} has any records after closing the 
> ExecMapper/ExecReducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7652) Check OutputCollector after closing ExecMapper/ExecReducer

2014-08-07 Thread Venki Korukanti (JIRA)

Venki Korukanti created HIVE-7652:
-

 Summary: Check OutputCollector after closing ExecMapper/ExecReducer
 Key: HIVE-7652
 URL: https://issues.apache.org/jira/browse/HIVE-7652
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: spark-branch


Some operators such as Group By are adding output records to 
{{OutputCollector}} after closing the ExecMapper/ExecReducer. Need to check if 
{{lastRecordOutput}} has any records after closing the ExecMapper/ExecReducer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7492) Enhance SparkCollector

2014-08-07 Thread Venki Korukanti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089871#comment-14089871
 ] 

Venki Korukanti commented on HIVE-7492:
---

GroupBy operator is adding output records to Output Collector after closing the 
ExecMapper/ExecReducer. Need to check if {{lastRecordOutput}} has any records 
after closing the ExecMapper/ExecReducer. I will log a JIRA and attach the 
patch.

> Enhance SparkCollector
> --
>
> Key: HIVE-7492
> URL: https://issues.apache.org/jira/browse/HIVE-7492
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Venki Korukanti
> Fix For: spark-branch
>
> Attachments: HIVE-7492-1-spark.patch, HIVE-7492.2-spark.patch
>
>
> SparkCollector is used to collect the rows generated by HiveMapFunction or 
> HiveReduceFunction. It currently is backed by a ArrayList, and thus has 
> unbounded memory usage. Ideally, the collector should have a bounded memory 
> usage, and be able to spill to disc when its quota is reached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7639) Bring tez-branch upto api changes in TEZ-1379, TEZ-1057, TEZ-1382

2014-08-07 Thread Gopal V (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7639:
--

Status: Patch Available  (was: Open)

> Bring tez-branch upto api changes in TEZ-1379, TEZ-1057, TEZ-1382
> -
>
> Key: HIVE-7639
> URL: https://issues.apache.org/jira/browse/HIVE-7639
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Gopal V
>Assignee: Gopal V
> Fix For: tez-branch
>
> Attachments: HIVE-7639.1-tez.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7651) Investigate why union two RDDs generated from two MapTrans does not get the right result

2014-08-07 Thread Na Yang (JIRA)

Na Yang created HIVE-7651:
-

 Summary: Investigate why union two RDDs generated from two 
MapTrans does not get the right result
 Key: HIVE-7651
 URL: https://issues.apache.org/jira/browse/HIVE-7651
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Na Yang


If the SparkWork has two map works as root, then use the current 
generate(basework) API to generate two mapTran. union the RDDs processed by the 
two mapTrans does not generate the correct result. 

If two input RDDs come from different data tables, then the union result is 
empty.
if two input RDDs come from the same data table, then the union result is not 
correct. The same row of data happen 4 times in the union result.

Need to investigate why this happen and how to fix it.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-4123) The RLE encoding for ORC can be improved

2014-08-07 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089866#comment-14089866
 ] 

Lefty Leverenz commented on HIVE-4123:
--

Doc note:  This added configuration parameter *hive.exec.orc.write.format* with 
a default value of 0.11, which was changed to null by HIVE-5091 before the 
release.

*hive.exec.orc.write.format* is documented in the wiki here:

* [Configuration Properties -- hive.exec.orc.write.format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.write.format]

> The RLE encoding for ORC can be improved
> 
>
> Key: HIVE-4123
> URL: https://issues.apache.org/jira/browse/HIVE-4123
> Project: Hive
>  Issue Type: New Feature
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Owen O'Malley
>Assignee: Prasanth J
>  Labels: orcfile
> Fix For: 0.12.0
>
> Attachments: HIVE-4123-8.patch, HIVE-4123.1.git.patch.txt, 
> HIVE-4123.2.git.patch.txt, HIVE-4123.3.patch.txt, HIVE-4123.4.patch.txt, 
> HIVE-4123.5.txt, HIVE-4123.6.txt, HIVE-4123.7.txt, HIVE-4123.8.txt, 
> HIVE-4123.8.txt, HIVE-4123.patch.txt, ORC-Compression-Ratio-Comparison.xlsx
>
>
> The run length encoding of integers can be improved:
> * tighter bit packing
> * allow delta encoding
> * allow longer runs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7650) Constant Folding performs partial evaluation on Coalesce operator

2014-08-07 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

Hari Sankar Sivarama Subramaniyan created HIVE-7650:
---

 Summary: Constant Folding performs partial evaluation on Coalesce 
operator
 Key: HIVE-7650
 URL: https://issues.apache.org/jira/browse/HIVE-7650
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan


EXPLAIN SELECT ctinyint, cdouble, cint, coalesce(ctinyint+10, 
(cdouble+log2(cint)), 0) 
FROM alltypesorc
WHERE (ctinyint IS NULL) LIMIT 10;

SELECT ctinyint, cdouble, cint, coalesce(ctinyint+10, (cdouble+log2(cint)), 0) 
FROM alltypesorc
WHERE (ctinyint IS NULL) LIMIT 10;

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: alltypesorc
Statistics: Num rows: 165041 Data size: 2640659 Basic stats: 
COMPLETE Column stats: NONE
Filter Operator
  predicate: ctinyint is null (type: boolean)
  Statistics: Num rows: 82520 Data size: 1320321 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: null (type: void), cdouble (type: double), cint 
(type: int), COALESCE((null + 10),(cdouble + log2(cint)),0) (type: double)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 82520 Data size: 1320321 Basic stats: 
COMPLETE Column stats: NONE
Limit
  Number of rows: 10
  Statistics: Num rows: 10 Data size: 160 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 10 Data size: 160 Basic stats: 
COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
Fetch Operator
  limit: 10
  Processor Tree:
ListSink

As you see in the above plan, COALESCE((null + 10) can further be evaluated to 
null. Currently constant folding fails to perform this recursive folding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5091) ORC files should have an option to pad stripes to the HDFS block boundaries

2014-08-07 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089861#comment-14089861
 ] 

Lefty Leverenz commented on HIVE-5091:
--

Doc note:  This changed the default value of configuration parameter 
*hive.exec.orc.write.format* from 0.11 to null (before it was first released in 
0.12).

*hive.exec.orc.write.format* was created in HIVE-4123, and it's documented in 
the wiki here:

* [Configuration Properties -- hive.exec.orc.write.format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.write.format]

> ORC files should have an option to pad stripes to the HDFS block boundaries
> ---
>
> Key: HIVE-5091
> URL: https://issues.apache.org/jira/browse/HIVE-5091
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.12.0
>
> Attachments: HIVE-5091.D12249.1.patch, HIVE-5091.D12249.2.patch, 
> HIVE-5091.D12249.3.patch
>
>
> With ORC stripes being large, if a stripe straddles an HDFS block, the 
> locality of read is suboptimal. It would be good to add padding to ensure 
> that stripes don't straddle HDFS blocks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with "-e"

2014-08-07 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7647:


Attachment: HIVE-7647.1.patch

The attached fix results in the following change in behavior.
{code}
[root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
-u 'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 8;"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ

+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 00-  | All Occupations  | 135185230  | 42270   |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-  | Management occupations   | 6152650| 100310  |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-1011  | Chief executives | 301930 | 160440  |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-1021  | General and operations managers  | 1697690| 107970  |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-1031  | Legislators  | 64650  | 37980   |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-2021  | Marketing managers   | 166790 | 118160  |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-2022  | Sales managers   | 333910 | 110390  |
+--+--++-+
8 rows selected (3.523 seconds)
Beeline version 0.12.0-cdh5.1.0 by Apache Hive
Closing: org.apache.hive.jdbc.HiveConnection

[root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
-u 'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 8;"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ

+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 00-  | All Occupations  | 135185230  | 42270   |
| 11-  | Management occupations   | 6152650| 100310  |
| 11-1011  | Chief executives | 301930 | 160440  |
| 11-1021  | General and operations managers  | 1697690| 107970  |
| 11-1031  | Legislators  | 64650  | 37980   |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
| 11-2021  | Marketing managers   | 166790 | 118160  |
| 11-2022  | Sales managers   | 333910 | 110390  |
+--+--++-+
8 rows selected (0.862 seconds)
Beeline version 0.12.0-cdh5.1.0 by Apache Hive
Closing: org.apache.hive.jdbc.HiveConnection
{code}

> Beeline does not honor --headerInterva

[jira] [Updated] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with "-e"

2014-08-07 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7647:


Fix Version/s: 0.14.0
Affects Version/s: (was: 0.13.0)
   0.14.0
   Status: Patch Available  (was: Open)

> Beeline does not honor --headerInterval and --color when executing with "-e"
> 
>
> Key: HIVE-7647
> URL: https://issues.apache.org/jira/browse/HIVE-7647
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.14.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-7647.1.patch
>
>
> --showHeader is being honored
> [root@localhost ~]# beeline --showHeader=false -u 
> 'jdbc:hive2://localhost:1/default' -n hive -d 
> org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 10;"
> Connecting to jdbc:hive2://localhost:1/default
> Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
> Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> -hiveconf (No such file or directory)
> +--+--++-+
> | 00-  | All Occupations  | 135185230  | 42270   |
> | 11-  | Management occupations   | 6152650| 100310  |
> | 11-1011  | Chief executives | 301930 | 160440  |
> | 11-1021  | General and operations managers  | 1697690| 107970  |
> | 11-1031  | Legislators  | 64650  | 37980   |
> | 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
> | 11-2021  | Marketing managers   | 166790 | 118160  |
> | 11-2022  | Sales managers   | 333910 | 110390  |
> | 11-2031  | Public relations managers| 51730  | 101220  |
> | 11-3011  | Administrative services managers | 246930 | 79500   |
> +--+--++-+
> 10 rows selected (0.838 seconds)
> Beeline version 0.12.0-cdh5.1.0 by Apache Hive
> Closing: org.apache.hive.jdbc.HiveConnection
> --outputFormat is being honored.
> [root@localhost ~]# beeline --outputFormat=csv -u 
> 'jdbc:hive2://localhost:1/default' -n hive -d 
> org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 10;"
> Connecting to jdbc:hive2://localhost:1/default
> Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
> Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 'code','description','total_emp','salary'
> '00-','All Occupations','135185230','42270'
> '11-','Management occupations','6152650','100310'
> '11-1011','Chief executives','301930','160440'
> '11-1021','General and operations managers','1697690','107970'
> '11-1031','Legislators','64650','37980'
> '11-2011','Advertising and promotions managers','36100','94720'
> '11-2021','Marketing managers','166790','118160'
> '11-2022','Sales managers','333910','110390'
> '11-2031','Public relations managers','51730','101220'
> '11-3011','Administrative services managers','246930','79500'
> 10 rows selected (0.664 seconds)
> Beeline version 0.12.0-cdh5.1.0 by Apache Hive
> Closing: org.apache.hive.jdbc.HiveConnection
> both --color & --headerInterval are being honored when executing using "-f" 
> option (reads query from a file rather than the commandline) (cannot really 
> see the color here but use the terminal colors)
> [root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
> -u 'jdbc:hive2://localhost:1/default' -n hive -d 
> org.apache.hive.jdbc.HiveDriver -f /tmp/tmp.sql  
> Connecting to jdbc:hive2://localhost:1/default
> Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
> Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 0.12.0-cdh5.1.0 by Apache Hive
> 0: jdbc:hive2://localhost> select * from sample_07 limit 8;
> +--+--++-+
> |   code   | description  | total_emp  | salary  |
> +--+--++-+
> | 00-  | All Occupations  | 135185230  | 42270   |
> | 11-  | Management occupations   | 6152650| 100310  |
> | 11-1011  | Chief executives | 301930 | 160440  |
> | 11-1021  | General and operations managers  | 1697690| 107970  |
> | 11-1031  | Legislators  | 64650  | 37980   |
> +--+--++-+
> |   code   | descrip

[jira] [Work started] (HIVE-7585) Implement the graph transformation execution

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7585 started by Na Yang.

> Implement the graph transformation execution
> 
>
> Key: HIVE-7585
> URL: https://issues.apache.org/jira/browse/HIVE-7585
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>
> Part of the task of supporting union all for Hive on Spark
> In HIVE-7586, SparkPlanGenerator generates a graph transformation from a 
> SparkWork. This task is the implementation of the graph tran's execution.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Work started] (HIVE-7586) Generate plan for spark work which uses spark union transformation

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7586 started by Na Yang.

> Generate plan for spark work which uses spark union transformation
> --
>
> Key: HIVE-7586
> URL: https://issues.apache.org/jira/browse/HIVE-7586
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>
> This task is part of the supporting union all for Hive on Spark task
> SparkPlanGenerator: need to generate a plan from SparkWork, which needs to 
> use Spark's union transformation to achieve the functionality. The goal is 
> generate a graph transformation including union tran instead of a chain 
> transformation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7585) Implement the graph transformation execution

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7585:
--

Description: 
Part of the task of supporting union all for Hive on Spark

In HIVE-7586, SparkPlanGenerator generates a graph transformation from a 
SparkWork. This task is the implementation of the graph tran's execution.  

  was:
Part of the task of supporting union all for Hive on Spark

SparkPlan modeling: represent the spark job in terms of a graph (rather than) 
list of SparkTran instances. We may need to enhance SparkTran interface. 

Summary: Implement the graph transformation execution  (was: SparkPlan 
modeling: represent the spark job in terms of a graph (rather than) list of 
SparkTran instances.)

> Implement the graph transformation execution
> 
>
> Key: HIVE-7585
> URL: https://issues.apache.org/jira/browse/HIVE-7585
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>
> Part of the task of supporting union all for Hive on Spark
> In HIVE-7586, SparkPlanGenerator generates a graph transformation from a 
> SparkWork. This task is the implementation of the graph tran's execution.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7586) Generate plan for spark work which uses spark union transformation

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7586:
--

Description: 
This task is part of the supporting union all for Hive on Spark task

SparkPlanGenerator: need to generate a plan from SparkWork, which needs to use 
Spark's union transformation to achieve the functionality. The goal is generate 
a graph transformation including union tran instead of a chain transformation. 



  was:
This task is part of the supporting union all for Hive on Spark task

SparkPlanGenerator: need to generate a plan from SparkWork, which needs to use 
Spark's union transformation to achieve the functionality.


> Generate plan for spark work which uses spark union transformation
> --
>
> Key: HIVE-7586
> URL: https://issues.apache.org/jira/browse/HIVE-7586
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>
> This task is part of the supporting union all for Hive on Spark task
> SparkPlanGenerator: need to generate a plan from SparkWork, which needs to 
> use Spark's union transformation to achieve the functionality. The goal is 
> generate a graph transformation including union tran instead of a chain 
> transformation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7541) Support union all on Spark

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7541:
--

Attachment: HIVE-7541.1-spark.patch

> Support union all on Spark
> --
>
> Key: HIVE-7541
> URL: https://issues.apache.org/jira/browse/HIVE-7541
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
> Attachments: HIVE-7541.1-spark.patch
>
>
> For union all operator, we will use Spark's union transformation. Refer to 
> the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7541) Support union all on Spark

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7541:
--

Attachment: (was: HIVE-7541.1-spark.patch)

> Support union all on Spark
> --
>
> Key: HIVE-7541
> URL: https://issues.apache.org/jira/browse/HIVE-7541
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
>
> For union all operator, we will use Spark's union transformation. Refer to 
> the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7441) Custom partition scheme gets rewritten with hive scheme upon concatenate

2014-08-07 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-7441:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Chaoyu for the contribution!

> Custom partition scheme gets rewritten with hive scheme upon concatenate
> 
>
> Key: HIVE-7441
> URL: https://issues.apache.org/jira/browse/HIVE-7441
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
> Environment: CDH4.5 and CDH5.0
>Reporter: Johndee Burks
>Assignee: Chaoyu Tang
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-7441.1.patch, HIVE-7441.patch
>
>
> If I take a given data directories. The directories contain a data file that 
> is rc format and only contains one character "1".
> {code}
> /j1/part1
> /j1/part2
> {code}
> Create the table over the directories using the following command:
> {code}
> create table j1 (a int) partitioned by (b string) stored as rcfile location 
> '/j1' ;
> {code}
> I add these directories to a table for example j1 using the following 
> commands:
> {code}
> alter table j1 add partition (b = 'part1') location '/j1/part1';
> alter table j1 add partition (b = 'part2') location '/j1/part2';
> {code}
> I then do the following command to the first partition: 
> {code}
> alter table j1 partition (b = 'part1') concatenate;
> {code}
> Hive changes the partition location from on hdfs
> {code}
> /j1/part1
> {code}
> to 
> {code}
> /j1/b=part1
> {code}
> However it does not update the partition location in the metastore and 
> partition is then lost to the table. It is hard to find this out until you 
> start querying your data and notice there is missing data. The table even 
> still shows the partition when you do "show partitions".



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7492) Enhance SparkCollector

2014-08-07 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089839#comment-14089839
 ] 

Chao commented on HIVE-7492:


Hi [~vkorukanti] and [~brocknoland], after applying the patch, running 
following simple query:

{code}
select key, sum(value) from src group by key
{code}

produces no result. However, it works before this patch.
Can you take a look? Thanks!

> Enhance SparkCollector
> --
>
> Key: HIVE-7492
> URL: https://issues.apache.org/jira/browse/HIVE-7492
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Venki Korukanti
> Fix For: spark-branch
>
> Attachments: HIVE-7492-1-spark.patch, HIVE-7492.2-spark.patch
>
>
> SparkCollector is used to collect the rows generated by HiveMapFunction or 
> HiveReduceFunction. It currently is backed by a ArrayList, and thus has 
> unbounded memory usage. Ideally, the collector should have a bounded memory 
> usage, and be able to spill to disc when its quota is reached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7541) Support union all on Spark

2014-08-07 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-7541:
--

Attachment: HIVE-7541.1-spark.patch

> Support union all on Spark
> --
>
> Key: HIVE-7541
> URL: https://issues.apache.org/jira/browse/HIVE-7541
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Na Yang
> Attachments: HIVE-7541.1-spark.patch
>
>
> For union all operator, we will use Spark's union transformation. Refer to 
> the design doc on wiki for more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7620) Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089831#comment-14089831
 ] 

Hive QA commented on HIVE-7620:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660323/HIVE-7620.1.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5885 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/213/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/213/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-213/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660323

> Hive metastore fails to start in secure mode due to 
> "java.lang.NoSuchFieldError: SASL_PROPS" error
> --
>
> Key: HIVE-7620
> URL: https://issues.apache.org/jira/browse/HIVE-7620
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Hadoop 2.5-snapshot with kerberos authentication on
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7620.1.patch
>
>
> When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
> with following error
> {code}
> 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
> Server threw an exception...
> java.lang.NoSuchFieldError: SASL_PROPS
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
> {code}
> Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
> error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24467: HIVE-7373: Hive should not remove trailing zeros for decimal numbers

2014-08-07 Thread Sergio Pena


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24467/
---

Review request for hive.


Bugs: HIVE-7373
https://issues.apache.org/jira/browse/HIVE-7373


Repository: hive-git


Description
---

Removes trim() call from HiveDecimal normalize/enforcePrecisionScale methods. 
This change affects the Decimal128 getHiveDecimalString() method; so a new 
'actualScale' variable is used that stores the actual scale of a value passed 
to Decimal128.

The rest of the changes are added to fix decimal query tests to match the new 
HiveDecimal value.


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java d4cc32d 
  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java ad09015 
  common/src/test/org/apache/hadoop/hive/common/type/TestDecimal128.java 
46236a5 
  common/src/test/org/apache/hadoop/hive/common/type/TestHiveDecimal.java 
1384a45 
  data/files/kv10.txt PRE-CREATION 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
 82fc8a9 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java f5023bb 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorTypeCasts.java
 2a871c5 
  ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java 
b1524f7 
  ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPDivide.java 
4c5b3a5 
  ql/src/test/queries/clientpositive/decimal_trailing.q PRE-CREATION 
  ql/src/test/queries/clientpositive/literal_decimal.q 08b21dc 
  ql/src/test/results/clientpositive/avro_decimal.q.out 1868de3 
  ql/src/test/results/clientpositive/avro_decimal_native.q.out bc87a7d 
  ql/src/test/results/clientpositive/compute_stats_decimal.q.out 2a65efe 
  ql/src/test/results/clientpositive/decimal_2.q.out 794bad0 
  ql/src/test/results/clientpositive/decimal_3.q.out 524fa62 
  ql/src/test/results/clientpositive/decimal_4.q.out 7444e83 
  ql/src/test/results/clientpositive/decimal_5.q.out 52dae22 
  ql/src/test/results/clientpositive/decimal_6.q.out 4338b52 
  ql/src/test/results/clientpositive/decimal_precision.q.out ea08b73 
  ql/src/test/results/clientpositive/decimal_trailing.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/decimal_udf.q.out 02a0caa 
  ql/src/test/results/clientpositive/literal_decimal.q.out 2f2df6a 
  ql/src/test/results/clientpositive/orc_predicate_pushdown.q.out 890cb2c 
  ql/src/test/results/clientpositive/parquet_decimal.q.out b2d542f 
  ql/src/test/results/clientpositive/parquet_decimal1.q.out 9ff0950 
  ql/src/test/results/clientpositive/serde_regex.q.out e231a09 
  ql/src/test/results/clientpositive/udf_case.q.out 6c186bd 
  ql/src/test/results/clientpositive/udf_when.q.out cbb1210 
  ql/src/test/results/clientpositive/vector_between_in.q.out 78e340b 
  ql/src/test/results/clientpositive/vector_decimal_aggregate.q.out 2c4d552 
  ql/src/test/results/clientpositive/vector_decimal_cast.q.out a508732 
  ql/src/test/results/clientpositive/vector_decimal_expressions.q.out 094eb8e 
  ql/src/test/results/clientpositive/vector_decimal_mapjoin.q.out 71a3def 
  ql/src/test/results/clientpositive/vector_decimal_math_funcs.q.out 717e81a 
  ql/src/test/results/clientpositive/windowing_decimal.q.out 88d11af 
  ql/src/test/results/clientpositive/windowing_navfn.q.out 95d7942 
  ql/src/test/results/clientpositive/windowing_rank.q.out 9976fdb 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
 523ad7d 

Diff: https://reviews.apache.org/r/24467/diff/


Testing
---


Thanks,

Sergio Pena

[jira] [Commented] (HIVE-7223) Support generic PartitionSpecs in Metastore partition-functions

2014-08-07 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089759#comment-14089759
 ] 

Alan Gates commented on HIVE-7223:
--

[~mithun], you'll need to attach a patch that includes the generated thrift, as 
the CI system doesn't generate the thrift classes.

> Support generic PartitionSpecs in Metastore partition-functions
> ---
>
> Key: HIVE-7223
> URL: https://issues.apache.org/jira/browse/HIVE-7223
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog, Metastore
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-7223.1.patch
>
>
> Currently, the functions in the HiveMetaStore API that handle multiple 
> partitions do so using List. E.g. 
> {code}
> public List listPartitions(String db_name, String tbl_name, short 
> max_parts);
> public List listPartitionsByFilter(String db_name, String 
> tbl_name, String filter, short max_parts);
> public int add_partitions(List new_parts);
> {code}
> Partition objects are fairly heavyweight, since each Partition carries its 
> own copy of a StorageDescriptor, partition-values, etc. Tables with tens of 
> thousands of partitions take so long to have their partitions listed that the 
> client times out with default hive.metastore.client.socket.timeout. There is 
> the additional expense of serializing and deserializing metadata for large 
> sets of partitions, w.r.t time and heap-space. Reducing the thrift traffic 
> should help in this regard.
> In a date-partitioned table, all sub-partitions for a particular date are 
> *likely* (but not expected) to have:
> # The same base directory (e.g. {{/feeds/search/20140601/}})
> # Similar directory structure (e.g. {{/feeds/search/20140601/[US,UK,IN]}})
> # The same SerDe/StorageHandler/IOFormat classes
> # Sorting/Bucketing/SkewInfo settings
> In this “most likely” scenario (henceforth termed “normal”), it’s possible to 
> represent the partition-list (for a date) in a more condensed form: a list of 
> LighterPartition instances, all sharing a common StorageDescriptor whose 
> location points to the root directory. 
> We can go one better for the {{add_partitions()}} case: When adding all 
> partitions for a given date, the “normal” case affords us the ability to 
> specify the top-level date-directory, where sub-partitions can be inferred 
> from the HDFS directory-path.
> These extensions are hard to introduce at the metastore-level, since 
> partition-functions explicitly specify {{List}} arguments. I 
> wonder if a {{PartitionSpec}} interface might help:
> {code}
> public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... 
> ; 
> public int add_partitions( PartitionSpec new_parts ) throws … ;
> {code}
> where the PartitionSpec looks like:
> {code}
> public interface PartitionSpec {
> public List getPartitions();
> public List getPartNames();
> public Iterator getPartitionIter();
> public Iterator getPartNameIter();
> }
> {code}
> For addPartitions(), an {{HDFSDirBasedPartitionSpec}} class could implement 
> {{PartitionSpec}}, store a top-level directory, and return Partition 
> instances from sub-directory names, while storing a single StorageDescriptor 
> for all of them.
> Similarly, list_partitions() could return a List, where each 
> PartitionSpec corresponds to a set or partitions that can share a 
> StorageDescriptor.
> By exposing iterator semantics, neither the client nor the metastore need 
> instantiate all partitions at once. That should help with memory requirements.
> In case no smart grouping is possible, we could just fall back on a 
> {{DefaultPartitionSpec}} which composes {{List}}, and is no worse 
> than status quo.
> PartitionSpec abstracts away how a set of partitions may be represented. A 
> tighter representation allows us to communicate metadata for a larger number 
> of Partitions, with less Thrift traffic.
> Given that Thrift doesn’t support polymorphism, we’d have to implement the 
> PartitionSpec as a Thrift Union of supported implementations. (We could 
> convert from the Thrift PartitionSpec to the appropriate Java PartitionSpec 
> sub-class.)
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2014-08-07 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089730#comment-14089730
 ] 

Lefty Leverenz commented on HIVE-5970:
--

Just for the record, ORC parameters are now documented in the wiki:

* [Configuration Properties -- ORC File Format | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat]

> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
> Attachments: test_data
>
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7635) Query having same aggregate functions but different case throws IndexOutOfBoundsException

2014-08-07 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089714#comment-14089714
 ] 

Szehon Ho commented on HIVE-7635:
-

+1, pending test

> Query having same aggregate functions but different case throws 
> IndexOutOfBoundsException
> -
>
> Key: HIVE-7635
> URL: https://issues.apache.org/jira/browse/HIVE-7635
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-7635.1.patch, HIVE-7635.patch
>
>
> A query having same aggregate functions (e.g. count) but in different case  
> does not work and throws IndexOutOfBoundsException.
> {code}
> Query:
> SELECT key, COUNT(value) FROM src GROUP BY key HAVING count(value) >= 4
> ---
> Error log:
> 14/08/06 11:00:45 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
> 2, Size: 2
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
>   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>   at java.util.ArrayList.get(ArrayList.java:322)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4173)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5165)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8337)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9178)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9431)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:207)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:414)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1023)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:960)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7649) Support column stats with temporary tables

2014-08-07 Thread Jason Dere (JIRA)

Jason Dere created HIVE-7649:


 Summary: Support column stats with temporary tables
 Key: HIVE-7649
 URL: https://issues.apache.org/jira/browse/HIVE-7649
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Reporter: Jason Dere
Assignee: Jason Dere






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7649) Support column stats with temporary tables

2014-08-07 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-7649:
-

Description: Column stats currently not supported with temp tables, see if 
they can be added.

> Support column stats with temporary tables
> --
>
> Key: HIVE-7649
> URL: https://issues.apache.org/jira/browse/HIVE-7649
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Column stats currently not supported with temp tables, see if they can be 
> added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24460: HIVE-7620 : Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24460/#review49937
---



shims/0.23/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge23.java


May want to initialize SASL_PROPS_FIELD. In the case that 
SASL_PROPERTIES_RESOLVER_CLASS != null, SASL_PROPS_FIELD is never initialized, 
but we check the value at line 69.


- Jason Dere


On Aug. 7, 2014, 5:49 p.m., Thejas Nair wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24460/
> ---
> 
> (Updated Aug. 7, 2014, 5:49 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-7620
> https://issues.apache.org/jira/browse/HIVE-7620
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See https://issues.apache.org/jira/browse/HIVE-7620
> 
> 
> Diffs
> -
> 
>   
> shims/0.23/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge23.java
>  PRE-CREATION 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java 
> bf9c84f 
> 
> Diff: https://reviews.apache.org/r/24460/diff/
> 
> 
> Testing
> ---
> 
> Manual testing with Hadoop 2.5-snapshot and 2.2
> 
> 
> Thanks,
> 
> Thejas Nair
> 
>

[jira] [Commented] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089680#comment-14089680
 ] 

Hive QA commented on HIVE-5871:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660319/HIVE-5871.4.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5885 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/212/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/212/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-212/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660319

> Use multiple-characters as field delimiter
> --
>
> Key: HIVE-5871
> URL: https://issues.apache.org/jira/browse/HIVE-5871
> Project: Hive
>  Issue Type: Improvement
>  Components: Contrib
>Affects Versions: 0.12.0
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
> HIVE-5871.patch
>
>
> By default, hive only allows user to use single character as field delimiter. 
> Although there's RegexSerDe to specify multiple-character delimiter, it can 
> be daunting to use, especially for amateurs.
> In the patch, I add a new SerDe named MultiDelimitSerDe. With 
> MultiDelimitSerDe, users can specify a multiple-character field delimiter 
> when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7648) authorization api should provide table/db object for create table/dbname

2014-08-07 Thread Thejas M Nair (JIRA)

Thejas M Nair created HIVE-7648:
---

 Summary: authorization api should provide table/db object for 
create table/dbname
 Key: HIVE-7648
 URL: https://issues.apache.org/jira/browse/HIVE-7648
 Project: Hive
  Issue Type: Bug
  Components: Authorization
Reporter: Thejas M Nair
Assignee: Thejas M Nair


For create table, the Authorizer.checkPrivileges call provides only the 
database name. If the table name is passed, it will be possible for the 
authorization api implementation to appropriately set the permissions of the 
new table.
Similarly, in case of create-database, the api call should provide database 
object for the database being created.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types

2014-08-07 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---

Attachment: HIVE-5760.2.patch

> Add vectorized support for CHAR/VARCHAR data types
> --
>
> Key: HIVE-5760
> URL: https://issues.apache.org/jira/browse/HIVE-5760
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Matt McCline
> Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch
>
>
> Add support to allow queries referencing VARCHAR columns and expression 
> results to run efficiently in vectorized mode. This should re-use the code 
> for the STRING type to the extent possible and beneficial. Include unit tests 
> and end-to-end tests. Consider re-using or extending existing end-to-end 
> tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types

2014-08-07 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---

Status: Patch Available  (was: In Progress)

> Add vectorized support for CHAR/VARCHAR data types
> --
>
> Key: HIVE-5760
> URL: https://issues.apache.org/jira/browse/HIVE-5760
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Matt McCline
> Attachments: HIVE-5760.1.patch, HIVE-5760.2.patch
>
>
> Add support to allow queries referencing VARCHAR columns and expression 
> results to run efficiently in vectorized mode. This should re-use the code 
> for the STRING type to the extent possible and beneficial. Include unit tests 
> and end-to-end tests. Consider re-using or extending existing end-to-end 
> tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types

2014-08-07 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-5760:
---

Status: In Progress  (was: Patch Available)

> Add vectorized support for CHAR/VARCHAR data types
> --
>
> Key: HIVE-5760
> URL: https://issues.apache.org/jira/browse/HIVE-5760
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Matt McCline
> Attachments: HIVE-5760.1.patch
>
>
> Add support to allow queries referencing VARCHAR columns and expression 
> results to run efficiently in vectorized mode. This should re-use the code 
> for the STRING type to the extent possible and beneficial. Include unit tests 
> and end-to-end tests. Consider re-using or extending existing end-to-end 
> tests for vectorized string operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with "-e"

2014-08-07 Thread Naveen Gangam (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-7647:


Description: 
--showHeader is being honored
[root@localhost ~]# beeline --showHeader=false -u 
'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 10;"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
-hiveconf (No such file or directory)

+--+--++-+
| 00-  | All Occupations  | 135185230  | 42270   |
| 11-  | Management occupations   | 6152650| 100310  |
| 11-1011  | Chief executives | 301930 | 160440  |
| 11-1021  | General and operations managers  | 1697690| 107970  |
| 11-1031  | Legislators  | 64650  | 37980   |
| 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
| 11-2021  | Marketing managers   | 166790 | 118160  |
| 11-2022  | Sales managers   | 333910 | 110390  |
| 11-2031  | Public relations managers| 51730  | 101220  |
| 11-3011  | Administrative services managers | 246930 | 79500   |
+--+--++-+
10 rows selected (0.838 seconds)
Beeline version 0.12.0-cdh5.1.0 by Apache Hive
Closing: org.apache.hive.jdbc.HiveConnection

--outputFormat is being honored.
[root@localhost ~]# beeline --outputFormat=csv -u 
'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 10;"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ

'code','description','total_emp','salary'
'00-','All Occupations','135185230','42270'
'11-','Management occupations','6152650','100310'
'11-1011','Chief executives','301930','160440'
'11-1021','General and operations managers','1697690','107970'
'11-1031','Legislators','64650','37980'
'11-2011','Advertising and promotions managers','36100','94720'
'11-2021','Marketing managers','166790','118160'
'11-2022','Sales managers','333910','110390'
'11-2031','Public relations managers','51730','101220'
'11-3011','Administrative services managers','246930','79500'
10 rows selected (0.664 seconds)
Beeline version 0.12.0-cdh5.1.0 by Apache Hive
Closing: org.apache.hive.jdbc.HiveConnection

both --color & --headerInterval are being honored when executing using "-f" 
option (reads query from a file rather than the commandline) (cannot really see 
the color here but use the terminal colors)

[root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
-u 'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -f /tmp/tmp.sql  
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ

Beeline version 0.12.0-cdh5.1.0 by Apache Hive
0: jdbc:hive2://localhost> select * from sample_07 limit 8;
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 00-  | All Occupations  | 135185230  | 42270   |
| 11-  | Management occupations   | 6152650| 100310  |
| 11-1011  | Chief executives | 301930 | 160440  |
| 11-1021  | General and operations managers  | 1697690| 107970  |
| 11-1031  | Legislators  | 64650  | 37980   |
+--+--++-+
|   code   | description  | total_emp  | salary  |
+--+--++-+
| 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
| 11-2021  | Marketing managers   | 166790 | 118160  |
| 11-2022  | Sales managers   | 333910 | 110390  |
+--+--++-+
8 rows selected (0.921 seconds)
0: jdbc:hive2://localhost> Closing: org.apache.hive.jdbc.HiveConnection


But when running 
beeline --showheader=true --color=true --headerInterval=5 -u 
'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 8;"

headerInterval & color are being overridden in the code.
>From Beeline.java
if (commands.size()

[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-07 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089604#comment-14089604
 ] 

Thejas M Nair commented on HIVE-7634:
-

+1

> Use Configuration.getPassword() if available to eliminate passwords from 
> hive-site.xml
> --
>
> Key: HIVE-7634
> URL: https://issues.apache.org/jira/browse/HIVE-7634
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-7634.1.patch
>
>
> HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
> to be retrieved from a configured credential provider, while also being able 
> to fall back to the HiveConf setting if no provider is set up.  Hive should 
> use this API for versions of Hadoop that support this API. This would give 
> users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24439: HIVE-7634: Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-07 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24439/#review49938
---

Ship it!


Ship It!

- Thejas Nair


On Aug. 7, 2014, 2:16 a.m., Jason Dere wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24439/
> ---
> 
> (Updated Aug. 7, 2014, 2:16 a.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Bugs: HIVE-7634
> https://issues.apache.org/jira/browse/HIVE-7634
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Create shims method getPassword(), which uses Configuration.getPassword() if 
> available, or falls back to Configuration.get().
> 
> 
> Diffs
> -
> 
>   beeline/pom.xml 6ec1d1a 
>   beeline/src/java/org/apache/hive/beeline/HiveSchemaTool.java de3ad4e 
>   
> itests/hive-unit-hadoop2/src/test/java/org/apache/hadoop/hive/ql/security/TestPasswordWithCredentialProvider.java
>  PRE-CREATION 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestPasswordWithConfig.java
>  PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java 
> 3ab5827 
>   metastore/src/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java 
> e78cd75 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java
>  b009a88 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java 
> 98d75b5 
>   shims/0.20/src/main/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java 
> 5d70e03 
>   shims/0.20S/src/main/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java 
> b85a69c 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> 40757f5 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> 697d4b7 
> 
> Diff: https://reviews.apache.org/r/24439/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jason Dere
> 
>

[jira] [Created] (HIVE-7647) Beeline does not honor --headerInterval and --color when executing with "-e"

2014-08-07 Thread Naveen Gangam (JIRA)

Naveen Gangam created HIVE-7647:
---

 Summary: Beeline does not honor --headerInterval and --color when 
executing with "-e"
 Key: HIVE-7647
 URL: https://issues.apache.org/jira/browse/HIVE-7647
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
Priority: Minor


--showHeader is being honored
[root@localhost ~]# beeline --showHeader=false -u 
'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 10;"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
-hiveconf (No such file or directory)
hive.aux.jars.path=file:/usr/lib/hive/lib/hive-hbase-handler-0.12.0-cdh5.0.1.jar,file:/usr/lib/hbase/hbase-hadoop-compat.jar,file:/usr/lib/hbase/hbase-protocol.jar,file:/usr/lib/hbase/hbase-hadoop2-compat.jar,file:/usr/lib/hbase/hbase-client.jar,file:/usr/lib/hbase/lib/htrace-core-2.01.jar,file:/usr/lib/hbase/lib/htrace-core.jar,file:/usr/lib/hbase/hbase-server.jar,file:/usr/lib/hbase/hbase-common.jar,file:/usr/share/java/mysql-connector-java.jar,file:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
 (No such file or directory)
+--+--++-+
| 00-  | All Occupations  | 135185230  | 42270   |
| 11-  | Management occupations   | 6152650| 100310  |
| 11-1011  | Chief executives | 301930 | 160440  |
| 11-1021  | General and operations managers  | 1697690| 107970  |
| 11-1031  | Legislators  | 64650  | 37980   |
| 11-2011  | Advertising and promotions managers  | 36100  | 94720   |
| 11-2021  | Marketing managers   | 166790 | 118160  |
| 11-2022  | Sales managers   | 333910 | 110390  |
| 11-2031  | Public relations managers| 51730  | 101220  |
| 11-3011  | Administrative services managers | 246930 | 79500   |
+--+--++-+
10 rows selected (0.838 seconds)
Beeline version 0.12.0-cdh5.1.0 by Apache Hive
Closing: org.apache.hive.jdbc.HiveConnection

--outputFormat is being honored.
[root@localhost ~]# beeline --outputFormat=csv -u 
'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -e "select * from sample_07 limit 10;"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
-hiveconf (No such file or directory)
hive.aux.jars.path=file:/usr/lib/hive/lib/hive-hbase-handler-0.12.0-cdh5.0.1.jar,file:/usr/lib/hbase/hbase-hadoop-compat.jar,file:/usr/lib/hbase/hbase-protocol.jar,file:/usr/lib/hbase/hbase-hadoop2-compat.jar,file:/usr/lib/hbase/hbase-client.jar,file:/usr/lib/hbase/lib/htrace-core-2.01.jar,file:/usr/lib/hbase/lib/htrace-core.jar,file:/usr/lib/hbase/hbase-server.jar,file:/usr/lib/hbase/hbase-common.jar,file:/usr/share/java/mysql-connector-java.jar,file:/usr/share/cmf/lib/postgresql-9.0-801.jdbc4.jar
 (No such file or directory)
'code','description','total_emp','salary'
'00-','All Occupations','135185230','42270'
'11-','Management occupations','6152650','100310'
'11-1011','Chief executives','301930','160440'
'11-1021','General and operations managers','1697690','107970'
'11-1031','Legislators','64650','37980'
'11-2011','Advertising and promotions managers','36100','94720'
'11-2021','Marketing managers','166790','118160'
'11-2022','Sales managers','333910','110390'
'11-2031','Public relations managers','51730','101220'
'11-3011','Administrative services managers','246930','79500'
10 rows selected (0.664 seconds)
Beeline version 0.12.0-cdh5.1.0 by Apache Hive
Closing: org.apache.hive.jdbc.HiveConnection

both --color & --headerInterval are being honored when executing using "-f" 
option (reads query from a file rather than the commandline) (cannot really see 
the color here but use the terminal colors)

[root@localhost ~]# beeline --showheader=true --color=true --headerInterval=5 
-u 'jdbc:hive2://localhost:1/default' -n hive -d 
org.apache.hive.jdbc.HiveDriver -f /tmp/tmp.sql  
Connecting to jdbc:hive2://localhost:1/default
Connected to: Apache Hive (version 0.12.0-cdh5.0.1)
Driver: Hive JDBC (version 0.12.0-cdh5.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
-hiveconf (No such file or directory)
hive.aux.jars.path=file:/usr/lib/hive/lib/hive-hbase-handler-0.12.0-cdh5.0.1.jar,file:/usr/lib/hbase/hbase-hadoop-compat.jar,file:/usr/lib/hbase/hbase-protocol.jar,file:/usr/lib/hbase/hbase-hadoop2-compat.jar,file:/usr/lib/hbas

[jira] [Commented] (HIVE-7519) Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown

2014-08-07 Thread Ashish Kumar Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089593#comment-14089593
 ] 

Ashish Kumar Singh commented on HIVE-7519:
--

Above errors do not look related.

> Refactor QTestUtil to remove its duplication with QFileClient for qtest setup 
> and teardown 
> ---
>
> Key: HIVE-7519
> URL: https://issues.apache.org/jira/browse/HIVE-7519
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7519.1.patch, HIVE-7519.2.patch, HIVE-7519.patch
>
>
> QTestUtil hard codes creation and dropping of source tables for qtests. 
> QFileClient does the same thing but in a better way, uses q_test_init.sql and 
> q_test_cleanup.sql scripts. As QTestUtil is growing quite large it makes 
> sense to refactor it to use QFileClient's approach. This will also remove 
> duplication of code addressing same purpose.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7357) Add vectorized support for BINARY data type

2014-08-07 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089591#comment-14089591
 ] 

Matt McCline commented on HIVE-7357:


Thank You!

> Add vectorized support for BINARY data type
> ---
>
> Key: HIVE-7357
> URL: https://issues.apache.org/jira/browse/HIVE-7357
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7357.1.patch, HIVE-7357.2.patch, HIVE-7357.3.patch, 
> HIVE-7357.4.patch, HIVE-7357.5.patch, HIVE-7357.6.patch, HIVE-7357.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7421) Make VectorUDFDateString use the same date parsing and formatting as GenericUDFDate

2014-08-07 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089592#comment-14089592
 ] 

Matt McCline commented on HIVE-7421:


Thank You!

> Make VectorUDFDateString use the same date parsing and formatting as 
> GenericUDFDate
> ---
>
> Key: HIVE-7421
> URL: https://issues.apache.org/jira/browse/HIVE-7421
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7421.1.patch
>
>
> One of several found by Raj Bains.
> M/R or Tez.
> {code}
> set hive.vectorized.execution.enabled=true;
> {code}
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.lang.System.arraycopy(Native Method)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setConcat(BytesColumnVector.java:190)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.StringConcatColScalar.evaluate(StringConcatColScalar.java:78)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFDateDiffColCol.evaluate(VectorUDFDateDiffColCol.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongScalarAddLongColumn.evaluate(LongScalarAddLongColumn.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongColAddLongColumn.evaluate(LongColAddLongColumn.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.LongColDivideLongScalar.evaluate(LongColDivideLongScalar.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FuncFloorDoubleToLong.evaluate(FuncFloorDoubleToLong.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.evaluateBatch(VectorHashKeyWrapperBatch.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:289)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:711)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7519) Refactor QTestUtil to remove its duplication with QFileClient for qtest setup and teardown

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089574#comment-14089574
 ] 

Hive QA commented on HIVE-7519:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660312/HIVE-7519.2.patch

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 5868 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/211/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/211/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-211/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660312

> Refactor QTestUtil to remove its duplication with QFileClient for qtest setup 
> and teardown 
> ---
>
> Key: HIVE-7519
> URL: https://issues.apache.org/jira/browse/HIVE-7519
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ashish Kumar Singh
>Assignee: Ashish Kumar Singh
> Attachments: HIVE-7519.1.patch, HIVE-7519.2.patch, HIVE-7519.patch
>
>
> QTestUtil hard codes creation and dropping of source tables for qtests. 
> QFileClient does the same thing but in a better way, uses q_test_init.sql and 
> q_test_cleanup.sql scripts. As QTestUtil is growing quite large it makes 
> sense to refactor it to use QFileClient's approach. This will also remove 
> duplication of code addressing same purpose.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete

2014-08-07 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-7646:


 Summary: Modify parser to support new grammar for 
Insert,Update,Delete
 Key: HIVE-7646
 URL: https://issues.apache.org/jira/browse/HIVE-7646
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.13.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman


need parser to recognize constructs such as
INSERT INTO Cust (Customer_Number, Balance, Address)
VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave');

DELETE FROM Cust WHERE Balance > 5.0;

UPDATE Cust
SET column1=value1,column2=value2,...
WHERE some_column=some_value;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7637) Change throws clause for Hadoop23Shims.ProxyFileSystem23.access()

2014-08-07 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089567#comment-14089567
 ] 

Thejas M Nair commented on HIVE-7637:
-

+1

> Change throws clause for Hadoop23Shims.ProxyFileSystem23.access()
> -
>
> Key: HIVE-7637
> URL: https://issues.apache.org/jira/browse/HIVE-7637
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-7637.1.patch
>
>
> Looks like the changes from HIVE-7583 don't build correctly with Hadoop-2.6.0 
> because the ProxyFileSystem23 version of access() throws Exception, which is 
> not one of the exceptions listed in the throws clause of FileSystem.access(). 
> The method in ProxyFileSystem23 should have its throws clause modified to 
> match FileSystem's.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7357:
---

Component/s: Vectorization

> Add vectorized support for BINARY data type
> ---
>
> Key: HIVE-7357
> URL: https://issues.apache.org/jira/browse/HIVE-7357
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7357.1.patch, HIVE-7357.2.patch, HIVE-7357.3.patch, 
> HIVE-7357.4.patch, HIVE-7357.5.patch, HIVE-7357.6.patch, HIVE-7357.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7357:
---

Affects Version/s: 0.13.0
   0.13.1

> Add vectorized support for BINARY data type
> ---
>
> Key: HIVE-7357
> URL: https://issues.apache.org/jira/browse/HIVE-7357
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7357.1.patch, HIVE-7357.2.patch, HIVE-7357.3.patch, 
> HIVE-7357.4.patch, HIVE-7357.5.patch, HIVE-7357.6.patch, HIVE-7357.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7357) Add vectorized support for BINARY data type

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7357:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Matt!

> Add vectorized support for BINARY data type
> ---
>
> Key: HIVE-7357
> URL: https://issues.apache.org/jira/browse/HIVE-7357
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7357.1.patch, HIVE-7357.2.patch, HIVE-7357.3.patch, 
> HIVE-7357.4.patch, HIVE-7357.5.patch, HIVE-7357.6.patch, HIVE-7357.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7421) Make VectorUDFDateString use the same date parsing and formatting as GenericUDFDate

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7421:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Matt!

> Make VectorUDFDateString use the same date parsing and formatting as 
> GenericUDFDate
> ---
>
> Key: HIVE-7421
> URL: https://issues.apache.org/jira/browse/HIVE-7421
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7421.1.patch
>
>
> One of several found by Raj Bains.
> M/R or Tez.
> {code}
> set hive.vectorized.execution.enabled=true;
> {code}
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.lang.System.arraycopy(Native Method)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setConcat(BytesColumnVector.java:190)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.StringConcatColScalar.evaluate(StringConcatColScalar.java:78)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFDateDiffColCol.evaluate(VectorUDFDateDiffColCol.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongScalarAddLongColumn.evaluate(LongScalarAddLongColumn.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongColAddLongColumn.evaluate(LongColAddLongColumn.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.LongColDivideLongScalar.evaluate(LongColDivideLongScalar.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FuncFloorDoubleToLong.evaluate(FuncFloorDoubleToLong.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.evaluateBatch(VectorHashKeyWrapperBatch.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:289)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:711)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7421) Make VectorUDFDateString use the same date parsing and formatting as GenericUDFDate

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7421:
---

Component/s: Vectorization

> Make VectorUDFDateString use the same date parsing and formatting as 
> GenericUDFDate
> ---
>
> Key: HIVE-7421
> URL: https://issues.apache.org/jira/browse/HIVE-7421
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7421.1.patch
>
>
> One of several found by Raj Bains.
> M/R or Tez.
> {code}
> set hive.vectorized.execution.enabled=true;
> {code}
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.lang.System.arraycopy(Native Method)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setConcat(BytesColumnVector.java:190)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.StringConcatColScalar.evaluate(StringConcatColScalar.java:78)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFDateDiffColCol.evaluate(VectorUDFDateDiffColCol.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongScalarAddLongColumn.evaluate(LongScalarAddLongColumn.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongColAddLongColumn.evaluate(LongColAddLongColumn.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.LongColDivideLongScalar.evaluate(LongColDivideLongScalar.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FuncFloorDoubleToLong.evaluate(FuncFloorDoubleToLong.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.evaluateBatch(VectorHashKeyWrapperBatch.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:289)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:711)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7421) Make VectorUDFDateString use the same date parsing and formatting as GenericUDFDate

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7421:
---

Affects Version/s: 0.13.0
   0.13.1

> Make VectorUDFDateString use the same date parsing and formatting as 
> GenericUDFDate
> ---
>
> Key: HIVE-7421
> URL: https://issues.apache.org/jira/browse/HIVE-7421
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Matt McCline
>Assignee: Matt McCline
> Fix For: 0.14.0
>
> Attachments: HIVE-7421.1.patch
>
>
> One of several found by Raj Bains.
> M/R or Tez.
> {code}
> set hive.vectorized.execution.enabled=true;
> {code}
> Stack trace:
> {code}
> Caused by: java.lang.NullPointerException
>   at java.lang.System.arraycopy(Native Method)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setConcat(BytesColumnVector.java:190)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.StringConcatColScalar.evaluate(StringConcatColScalar.java:78)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorUDFDateDiffColCol.evaluate(VectorUDFDateDiffColCol.java:59)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongScalarAddLongColumn.evaluate(LongScalarAddLongColumn.java:65)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.LongColAddLongColumn.evaluate(LongColAddLongColumn.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.LongColDivideLongScalar.evaluate(LongColDivideLongScalar.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpression.evaluateChildren(VectorExpression.java:112)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.expressions.gen.FuncFloorDoubleToLong.evaluate(FuncFloorDoubleToLong.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.evaluateBatch(VectorHashKeyWrapperBatch.java:147)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.processBatch(VectorGroupByOperator.java:289)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.processOp(VectorGroupByOperator.java:711)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:139)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
>   at 
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089555#comment-14089555
 ] 

Chao commented on HIVE-7624:


Hi [~ruili], I think this patch overlaps a little bit with HIVE-7597, on 
{{GenMapRedUtils}}. I can't apply the patch due to the conflict.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7492) Enhance SparkCollector

2014-08-07 Thread Venki Korukanti (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089544#comment-14089544
 ] 

Venki Korukanti commented on HIVE-7492:
---

Hi [~brocknoland], 

I was about to create a JIRA for the same, but have following questions:
* how cleanup works in case the task exits abnormally.
* where to create these tmp files on DFS.

Currently RowContainer is used in join operator (mainline hive not just spark 
branch), so it can create temp files as part of the Reduce task if the output 
exceeds in memory blocksize. In case of MapReduce tasks MR framework overrides 
the default tmp dir location with a location under JVM working directory (See 
[here|https://github.com/apache/hadoop-common/blob/0d1861b3eaf134e085055ae0b51c3ba7921b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java#L207])
 using jvm arg java.io.tmpdir and the working directory of the JVM is deleted 
by framework whenever the JVM exits or job is killed. As RowContainer temp 
files are also created under this temp dir using  
[java.io.File.createTempDir|http://docs.oracle.com/javase/7/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File)],
 they will also get cleaned up.

I was looking at Spark code. Spark provides an API 
org.apache.spark.util.Utils.createTempDir() which also adds a shutdown hook to 
delete the tmpdir when jvm exits. Should we use the same API and provide it to 
RowContainer? It will be still on local FS.

> Enhance SparkCollector
> --
>
> Key: HIVE-7492
> URL: https://issues.apache.org/jira/browse/HIVE-7492
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Venki Korukanti
> Fix For: spark-branch
>
> Attachments: HIVE-7492-1-spark.patch, HIVE-7492.2-spark.patch
>
>
> SparkCollector is used to collect the rows generated by HiveMapFunction or 
> HiveReduceFunction. It currently is backed by a ArrayList, and thus has 
> unbounded memory usage. Ideally, the collector should have a bounded memory 
> usage, and be able to spill to disc when its quota is reached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Issue Comment Deleted] (HIVE-7620) Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7620:


Comment: was deleted

(was: Verified that the patch works with hadoop 2.5-snapshot and 2.2 .
)

> Hive metastore fails to start in secure mode due to 
> "java.lang.NoSuchFieldError: SASL_PROPS" error
> --
>
> Key: HIVE-7620
> URL: https://issues.apache.org/jira/browse/HIVE-7620
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Hadoop 2.5-snapshot with kerberos authentication on
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7620.1.patch
>
>
> When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
> with following error
> {code}
> 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
> Server threw an exception...
> java.lang.NoSuchFieldError: SASL_PROPS
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
> {code}
> Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
> error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7620) Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-7620:


Environment: Hadoop 2.5-snapshot with kerberos authentication on

> Hive metastore fails to start in secure mode due to 
> "java.lang.NoSuchFieldError: SASL_PROPS" error
> --
>
> Key: HIVE-7620
> URL: https://issues.apache.org/jira/browse/HIVE-7620
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
> Environment: Hadoop 2.5-snapshot with kerberos authentication on
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7620.1.patch
>
>
> When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
> with following error
> {code}
> 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
> Server threw an exception...
> java.lang.NoSuchFieldError: SASL_PROPS
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
> {code}
> Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
> error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7620) Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089533#comment-14089533
 ] 

Thejas M Nair commented on HIVE-7620:
-

I verified that the patch works with hadoop 2.5-snapshot and 2.2, with kerberos 
mode (ie metastore comes up and works).


> Hive metastore fails to start in secure mode due to 
> "java.lang.NoSuchFieldError: SASL_PROPS" error
> --
>
> Key: HIVE-7620
> URL: https://issues.apache.org/jira/browse/HIVE-7620
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7620.1.patch
>
>
> When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
> with following error
> {code}
> 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
> Server threw an exception...
> java.lang.NoSuchFieldError: SASL_PROPS
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
> {code}
> Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
> error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7620) Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089521#comment-14089521
 ] 

Thejas M Nair commented on HIVE-7620:
-

Verified that the patch works with hadoop 2.5-snapshot and 2.2 .


> Hive metastore fails to start in secure mode due to 
> "java.lang.NoSuchFieldError: SASL_PROPS" error
> --
>
> Key: HIVE-7620
> URL: https://issues.apache.org/jira/browse/HIVE-7620
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Attachments: HIVE-7620.1.patch
>
>
> When Hive metastore is started in a Hadoop 2.5 cluster, it fails to start 
> with following error
> {code}
> 14/07/31 17:45:58 [main]: ERROR metastore.HiveMetaStore: Metastore Thrift 
> Server threw an exception...
> java.lang.NoSuchFieldError: SASL_PROPS
>   at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S.getHadoopSaslProperties(HadoopThriftAuthBridge20S.java:126)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getMetaStoreSaslProperties(MetaStoreUtils.java:1483)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:5225)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:5152)
> {code}
> Changes in HADOOP-10451 to remove SaslRpcServer.SASL_PROPS are causing this 
> error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Review Request 24460: HIVE-7620 : Hive metastore fails to start in secure mode due to "java.lang.NoSuchFieldError: SASL_PROPS" error

2014-08-07 Thread Thejas Nair


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24460/
---

Review request for hive.


Bugs: HIVE-7620
https://issues.apache.org/jira/browse/HIVE-7620


Repository: hive-git


Description
---

See https://issues.apache.org/jira/browse/HIVE-7620


Diffs
-

  
shims/0.23/src/main/java/org/apache/hadoop/hive/thrift/HadoopThriftAuthBridge23.java
 PRE-CREATION 
  shims/common/src/main/java/org/apache/hadoop/hive/shims/ShimLoader.java 
bf9c84f 

Diff: https://reviews.apache.org/r/24460/diff/


Testing
---

Manual testing with Hadoop 2.5-snapshot and 2.2


Thanks,

Thejas Nair

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-4629:


Attachment: HIVE-4629.6.patch

Update the patch (v6) to trigger test. It addresses review comments and fixes 
one failed case related with this patch in HIVE QA.

> HS2 should support an API to retrieve query logs
> 
>
> Key: HIVE-4629
> URL: https://issues.apache.org/jira/browse/HIVE-4629
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
> HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, 
> HIVE-4629.5.patch, HIVE-4629.6.patch
>
>
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7637) Change throws clause for Hadoop23Shims.ProxyFileSystem23.access()

2014-08-07 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089511#comment-14089511
 ] 

Jason Dere commented on HIVE-7637:
--

Test failures do not appear to be related.

> Change throws clause for Hadoop23Shims.ProxyFileSystem23.access()
> -
>
> Key: HIVE-7637
> URL: https://issues.apache.org/jira/browse/HIVE-7637
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-7637.1.patch
>
>
> Looks like the changes from HIVE-7583 don't build correctly with Hadoop-2.6.0 
> because the ProxyFileSystem23 version of access() throws Exception, which is 
> not one of the exceptions listed in the throws clause of FileSystem.access(). 
> The method in ProxyFileSystem23 should have its throws clause modified to 
> match FileSystem's.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen



> On Aug. 5, 2014, 8:56 a.m., Lars Francke wrote:
> > service/src/java/org/apache/hive/service/cli/operation/OperationLog.java, 
> > line 58
> > 
> >
> > can be final and then renamed
> 
> Dong Chen wrote:
> Thank you! I made it final and it is a good point. But a little confused 
> about the renamed? Do you mean the variable name "threadLocalOperationLog"?
> 
> Lars Francke wrote:
> static finals have the naming convention of being all upper case with 
> underscores in between. So it should be THREAD_LOCAL_OPERATION_LOG

Oh, right! I should keep this in mind. :)


- Dong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49573
---


On Aug. 7, 2014, 5:37 p.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 7, 2014, 5:37 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24293/diff/
> 
> 
> Testing
> ---
> 
> UT passed.
> 
> 
> Thanks,
> 
> Dong Chen
> 
>

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/
---

(Updated Aug. 7, 2014, 5:37 p.m.)


Review request for hive.


Changes
---

A little change: rename the static final variable "threadLocalOperationLog"


Repository: hive-git


Description
---

HIVE-4629: HS2 should support an API to retrieve query logs
HiveServer2 should support an API to retrieve query logs. This is particularly 
relevant because HiveServer2 supports async execution but doesn't provide a way 
to report progress. Providing an API to retrieve query logs will help report 
progress to the client.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
  service/if/TCLIService.thrift 80086b4 
  service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
  service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
 808b73f 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
 PRE-CREATION 
  service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
  service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
  service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
  service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
  service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
f665146 
  service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 c9fd5f9 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
caf413d 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 fd4e94d 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
ebca996 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 05991e0 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
315dbea 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 0ec2543 
  
service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
 3d3fddc 
  service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
e0d17a1 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 45fbd61 
  service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
21c33bc 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
de54ca1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
4c3164e 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
b39d64d 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
816bea4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
5c87bcb 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java 
e3384d3 
  
service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/24293/diff/


Testing
---

UT passed.


Thanks,

Dong Chen

[jira] [Updated] (HIVE-4629) HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-4629:


Attachment: HIVE-4629.5.patch

> HS2 should support an API to retrieve query logs
> 
>
> Key: HIVE-4629
> URL: https://issues.apache.org/jira/browse/HIVE-4629
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Reporter: Shreepadma Venugopalan
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, 
> HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch
>
>
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen



> On Aug. 7, 2014, 4:57 p.m., Lars Francke wrote:
> > service/if/TCLIService.thrift, line 1043
> > 
> >
> > I have a partial patch that changes all of them and I planned on 
> > submitting it when I'm back from holiday.
> 
> Lars Francke wrote:
> Sorry I messed up RB. This was meant as a reply to your Thrift comment 
> answer and not a new issue.

That's OK. :)
Got it. Thanks. So I will just change the 3 Thrift comments related with this 
patch. 


- Dong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49911
---


On Aug. 7, 2014, 5:06 p.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 7, 2014, 5:06 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24293/diff/
> 
> 
> Testing
> ---
> 
> UT passed.
> 
> 
> Thanks,
> 
> Dong Chen
> 
>

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/
---

(Updated Aug. 7, 2014, 5:06 p.m.)


Review request for hive.


Changes
---

Updated patch HIVE_4629.5.patch.
1. address the review comments.
2. fix the failed case in HIVE QA. 
(org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync)


Repository: hive-git


Description
---

HIVE-4629: HS2 should support an API to retrieve query logs
HiveServer2 should support an API to retrieve query logs. This is particularly 
relevant because HiveServer2 supports async execution but doesn't provide a way 
to report progress. Providing an API to retrieve query logs will help report 
progress to the client.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
  service/if/TCLIService.thrift 80086b4 
  service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
  service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
 808b73f 
  
service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
 PRE-CREATION 
  service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
  service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
  service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
  service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
  service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
f665146 
  service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 c9fd5f9 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
caf413d 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 fd4e94d 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
ebca996 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 05991e0 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
315dbea 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 0ec2543 
  
service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
 3d3fddc 
  service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
e0d17a1 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 45fbd61 
  service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
PRE-CREATION 
  service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
21c33bc 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
de54ca1 
  service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
4c3164e 
  service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
b39d64d 
  service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
816bea4 
  service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
5c87bcb 
  
service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java 
e3384d3 
  
service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/24293/diff/


Testing
---

UT passed.


Thanks,

Dong Chen

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089459#comment-14089459
 ] 

Brock Noland commented on HIVE-7624:


During debugging I have used the code below

{noformat}
System.err.println("JoinOperator " + alias + " row = " + 
SerDeUtils.getJSONString(row, inputObjInspectors[tag]));
{noformat}

I wonder if we should not commit that to each operator for debugging since it's 
much easier to see how the rows are filtered, modified...

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Chao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089455#comment-14089455
 ] 

Chao commented on HIVE-7624:


Great! Thanks [~ruili]. I'll try this patch.

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Lars Francke



> On Aug. 7, 2014, 4:57 p.m., Lars Francke wrote:
> > service/if/TCLIService.thrift, line 1043
> > 
> >
> > I have a partial patch that changes all of them and I planned on 
> > submitting it when I'm back from holiday.

Sorry I messed up RB. This was meant as a reply to your Thrift comment answer 
and not a new issue.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49911
---


On Aug. 5, 2014, 3:47 a.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 5, 2014, 3:47 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24293/diff/
> 
> 
> Testing
> ---
> 
> UT passed.
> 
> 
> Thanks,
> 
> Dong Chen
> 
>

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Lars Francke



> On Aug. 5, 2014, 8:56 a.m., Lars Francke wrote:
> > service/src/java/org/apache/hive/service/cli/operation/OperationLog.java, 
> > line 58
> > 
> >
> > can be final and then renamed
> 
> Dong Chen wrote:
> Thank you! I made it final and it is a good point. But a little confused 
> about the renamed? Do you mean the variable name "threadLocalOperationLog"?

static finals have the naming convention of being all upper case with 
underscores in between. So it should be THREAD_LOCAL_OPERATION_LOG


> On Aug. 5, 2014, 8:56 a.m., Lars Francke wrote:
> > service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java,
> >  line 81
> > 
> >
> > I don't understand how log data ends up in the writer? I looked for 
> > accesses of it but it doesn't seem to be touched at all. What am I missing?
> > 
> > Also for a little boost if the code stays like this you can move it 
> > after the null check to avoid string conversion if the OperationLog is null
> 
> Dong Chen wrote:
> This LogDivertAppender inherits from WriterAppender, and when its method 
> subAppend(event) is invoked, the first line super.subAppend(event) will write 
> the log into writer.
> Not matter the OperationLog is null or not, the writer should be reset, 
> since the log in it will be not used any more in this Appender. Otherwise, 
> the remaining log in writer might mix with next log.
> So maybe we could keep the access and null check order. :)
>

Ahh thanks for the explanation. I missed the setWriter bit.


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49573
---


On Aug. 5, 2014, 3:47 a.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 5, 2014, 3:47 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/serv

[jira] [Commented] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-07 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089437#comment-14089437
 ] 

Brock Noland commented on HIVE-7629:


[~suma.shivaprasad] can you add a review board item?

FYI [~szehon]

> Problem in SMB Joins between two Parquet tables
> ---
>
> Key: HIVE-7629
> URL: https://issues.apache.org/jira/browse/HIVE-7629
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Suma Shivaprasad
>  Labels: Parquet
> Fix For: 0.14.0
>
> Attachments: HIVE-7629.patch
>
>
> The issue is clearly seen when two bucketed and sorted parquet tables with 
> different number of columns are involved in the join . The following 
> exception is seen
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Lars Francke


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49911
---



service/if/TCLIService.thrift


I have a partial patch that changes all of them and I planned on submitting 
it when I'm back from holiday.


- Lars Francke


On Aug. 5, 2014, 3:47 a.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 5, 2014, 3:47 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24293/diff/
> 
> 
> Testing
> ---
> 
> UT passed.
> 
> 
> Thanks,
> 
> Dong Chen
> 
>

[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-07 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7629:
---

Attachment: (was: parquet_smb_join.patch)

> Problem in SMB Joins between two Parquet tables
> ---
>
> Key: HIVE-7629
> URL: https://issues.apache.org/jira/browse/HIVE-7629
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Suma Shivaprasad
> Attachments: HIVE-7629.patch
>
>
> The issue is clearly seen when two bucketed and sorted parquet tables with 
> different number of columns are involved in the join . The following 
> exception is seen
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6959) Remove vectorization related constant expression folding code once Constant propagation optimizer for Hive is committed

2014-08-07 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6959:
---

Status: Open  (was: Patch Available)

Seems like vectorization_14.q & vector_coalesce.q failed to vectorize and 
vector_cast_constant.q failed altogether.

> Remove vectorization related constant expression folding code once Constant 
> propagation optimizer for Hive is committed
> ---
>
> Key: HIVE-6959
> URL: https://issues.apache.org/jira/browse/HIVE-6959
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-6959.1.patch, HIVE-6959.2.patch, HIVE-6959.3.patch, 
> HIVE-6959.4.patch
>
>
> HIVE-5771 covers Constant propagation optimizer for Hive. Now that HIVE-5771 
> is committed, we should remove any vectorization related code which 
> duplicates this feature. For example, a fn to be cleaned is 
> VectorizarionContext::foldConstantsForUnaryExprs(). In addition to this 
> change, constant propagation should kick in when vectorization is enabled. 
> i.e. we need to lift the HIVE_VECTORIZATION_ENABLED restriction inside 
> ConstantPropagate::transform().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-07 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7629:
---

Labels: Parquet  (was: )

> Problem in SMB Joins between two Parquet tables
> ---
>
> Key: HIVE-7629
> URL: https://issues.apache.org/jira/browse/HIVE-7629
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Suma Shivaprasad
>  Labels: Parquet
> Fix For: 0.14.0
>
> Attachments: HIVE-7629.patch
>
>
> The issue is clearly seen when two bucketed and sorted parquet tables with 
> different number of columns are involved in the join . The following 
> exception is seen
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-07 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7629:
---

Fix Version/s: 0.14.0
   Status: Patch Available  (was: Open)

> Problem in SMB Joins between two Parquet tables
> ---
>
> Key: HIVE-7629
> URL: https://issues.apache.org/jira/browse/HIVE-7629
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Suma Shivaprasad
> Fix For: 0.14.0
>
> Attachments: HIVE-7629.patch
>
>
> The issue is clearly seen when two bucketed and sorted parquet tables with 
> different number of columns are involved in the join . The following 
> exception is seen
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-07 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7629:
---

Attachment: HIVE-7629.patch

> Problem in SMB Joins between two Parquet tables
> ---
>
> Key: HIVE-7629
> URL: https://issues.apache.org/jira/browse/HIVE-7629
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Suma Shivaprasad
> Attachments: HIVE-7629.patch
>
>
> The issue is clearly seen when two bucketed and sorted parquet tables with 
> different number of columns are involved in the join . The following 
> exception is seen
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7629) Problem in SMB Joins between two Parquet tables

2014-08-07 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated HIVE-7629:
---

Attachment: parquet_smb_join.patch

> Problem in SMB Joins between two Parquet tables
> ---
>
> Key: HIVE-7629
> URL: https://issues.apache.org/jira/browse/HIVE-7629
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Suma Shivaprasad
> Attachments: HIVE-7629.patch
>
>
> The issue is clearly seen when two bucketed and sorted parquet tables with 
> different number of columns are involved in the join . The following 
> exception is seen
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:101)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:204)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:79)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
> at 
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7492) Enhance SparkCollector

2014-08-07 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-7492:
---

   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Thank you very much [~vkorukanti] for your contribution! Would you mind opening 
another jira to allow RowContainer to write to the DFS as opposed to /tmp?

I don't think this work should be done on the Spark branch and I don't think 
it's urgent. However, since many users have extremely small /tmp I don't think 
we should be writing unbounded amounts of data there.

Committed to spark!

> Enhance SparkCollector
> --
>
> Key: HIVE-7492
> URL: https://issues.apache.org/jira/browse/HIVE-7492
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Venki Korukanti
> Fix For: spark-branch
>
> Attachments: HIVE-7492-1-spark.patch, HIVE-7492.2-spark.patch
>
>
> SparkCollector is used to collect the rows generated by HiveMapFunction or 
> HiveReduceFunction. It currently is backed by a ArrayList, and thus has 
> unbounded memory usage. Ideally, the collector should have a bounded memory 
> usage, and be able to spill to disc when its quota is reached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen



> On Aug. 5, 2014, 8:56 a.m., Lars Francke wrote:
> > service/src/java/org/apache/hive/service/cli/operation/OperationLog.java, 
> > line 58
> > 
> >
> > can be final and then renamed

Thank you! I made it final and it is a good point. But a little confused about 
the renamed? Do you mean the variable name "threadLocalOperationLog"?


- Dong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49573
---


On Aug. 5, 2014, 3:47 a.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 5, 2014, 3:47 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24293/diff/
> 
> 
> Testing
> ---
> 
> UT passed.
> 
> 
> Thanks,
> 
> Dong Chen
> 
>

[jira] [Commented] (HIVE-7492) Enhance SparkCollector

2014-08-07 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089400#comment-14089400
 ] 

Brock Noland commented on HIVE-7492:


+1

> Enhance SparkCollector
> --
>
> Key: HIVE-7492
> URL: https://issues.apache.org/jira/browse/HIVE-7492
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Venki Korukanti
> Attachments: HIVE-7492-1-spark.patch, HIVE-7492.2-spark.patch
>
>
> SparkCollector is used to collect the rows generated by HiveMapFunction or 
> HiveReduceFunction. It currently is backed by a ArrayList, and thus has 
> unbounded memory usage. Ideally, the collector should have a bounded memory 
> usage, and be able to spill to disc when its quota is reached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7553) avoid the scheduling maintenance window for every jar change

2014-08-07 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089396#comment-14089396
 ] 

Brock Noland commented on HIVE-7553:


Hi [~Ferd],

Thank you so much for looking into this!! I think you have a good direction on 
some of the possible solutions. I do think this is a very big item with many 
different aspects. 

Would you be interested in creating a design document on this? There are many 
examples out there: https://cwiki.apache.org/confluence/display/Hive/DesignDocs 
e.g: https://cwiki.apache.org/confluence/display/Hive/Theta+Join

If you create a design doc, I think the big aspect of this design doc would be 
evaluate all the pros/cons of each possible solution.

Thank you again for looking at this!!

Cheers,
Brock


> avoid the scheduling maintenance window for every jar change
> 
>
> Key: HIVE-7553
> URL: https://issues.apache.org/jira/browse/HIVE-7553
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>
> When user needs to refresh existing or add a new jar to HS2, it needs to 
> restart it. As HS2 is service exposed to clients, this requires scheduling 
> maintenance window for every jar change. It would be great if we could avoid 
> that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen



> On Aug. 5, 2014, 8:56 a.m., Lars Francke wrote:
> > service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java,
> >  line 81
> > 
> >
> > I don't understand how log data ends up in the writer? I looked for 
> > accesses of it but it doesn't seem to be touched at all. What am I missing?
> > 
> > Also for a little boost if the code stays like this you can move it 
> > after the null check to avoid string conversion if the OperationLog is null

This LogDivertAppender inherits from WriterAppender, and when its method 
subAppend(event) is invoked, the first line super.subAppend(event) will write 
the log into writer.
Not matter the OperationLog is null or not, the writer should be reset, since 
the log in it will be not used any more in this Appender. Otherwise, the 
remaining log in writer might mix with next log.
So maybe we could keep the access and null check order. :)


- Dong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49573
---


On Aug. 5, 2014, 3:47 a.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 5, 2014, 3:47 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CR

[jira] [Commented] (HIVE-5760) Add vectorized support for CHAR/VARCHAR data types

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089360#comment-14089360
 ] 

Hive QA commented on HIVE-5760:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660304/HIVE-5760.1.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/210/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/210/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-210/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-210/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'itests/qtest/testconfiguration.properties'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizedRowBatchCtx.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedColumnarSerDe.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedBatchUtil.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java'
++ egrep -v '^X|^Performing status on external'
++ awk '{print $2}'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-unit-hadoop2/target itests/hive-minikdc/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen service/target contrib/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/results/clientpositive/vector_data_types.q.out 
ql/src/test/results/clientpositive/tez/vector_data_types.q.out 
ql/src/test/queries/clientpositive/vector_data_types.q
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1616512.

At revision 1616512.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660304

> Add vectorized support for CHAR/VARCHAR data types
> --
>
> Key: HIVE-5760
> URL: https://issues.apache.org/jira/browse/HIVE-5760
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Matt McCline
> Attachments: HIV

[jira] [Commented] (HIVE-7357) Add vectorized support for BINARY data type

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089359#comment-14089359
 ] 

Hive QA commented on HIVE-7357:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660307/HIVE-7357.7.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5885 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/209/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/209/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-209/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660307

> Add vectorized support for BINARY data type
> ---
>
> Key: HIVE-7357
> URL: https://issues.apache.org/jira/browse/HIVE-7357
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-7357.1.patch, HIVE-7357.2.patch, HIVE-7357.3.patch, 
> HIVE-7357.4.patch, HIVE-7357.5.patch, HIVE-7357.6.patch, HIVE-7357.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs

2014-08-07 Thread Dong Chen



> On Aug. 5, 2014, 8:56 a.m., Lars Francke wrote:
> > service/if/TCLIService.thrift, line 1043
> > 
> >
> > I know that no one else does it yet in this file and I haven't gotten 
> > around to finishing my patch.
> > 
> > But could you use this style of comments instead:
> > 
> > /** Get the output result of a query. */
> > 
> > Thank you! That will be automatically moved into a comment section 
> > (python, javadoc etc.) by the Thrift compiler.

Thanks for you reminding. This comment style makes the generated code look 
better.
Not sure whether you are working on changing all the comment style in 
TCLIService.thrift file. So I just change the 3 comments related with this fix. 
If not, I'm glad to make the changes of all the comments in the thrift through 
this patch or another new Jira.


- Dong


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24293/#review49573
---


On Aug. 5, 2014, 3:47 a.m., Dong Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24293/
> ---
> 
> (Updated Aug. 5, 2014, 3:47 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-4629: HS2 should support an API to retrieve query logs
> HiveServer2 should support an API to retrieve query logs. This is 
> particularly relevant because HiveServer2 supports async execution but 
> doesn't provide a way to report progress. Providing an API to retrieve query 
> logs will help report progress to the client.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 
>   service/if/TCLIService.thrift 80086b4 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 
>   service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java
>  808b73f 
>   
> service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchType.java
>  PRE-CREATION 
>   service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 
>   service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 
>   service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 
>   service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 
>   service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java 
> f665146 
>   service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION 
>   service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  c9fd5f9 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  caf413d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  fd4e94d 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  ebca996 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  05991e0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  315dbea 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  0ec2543 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  3d3fddc 
>   
> service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java 
> e0d17a1 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 45fbd61 
>   service/src/java/org/apache/hive/service/cli/operation/OperationLog.java 
> PRE-CREATION 
>   
> service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 
> 21c33bc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> de54ca1 
>   service/src/java/org/apache/hive/service/cli/session/HiveSession.java 
> 9785e95 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 
> 4c3164e 
>   service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java 
> b39d64d 
>   service/src/java/org/apache/hive/service/cli/session/SessionManager.java 
> 816bea4 
>   service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java 
> 5c87bcb 
>   
> service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java
>  e3384d3 
>   
> service/src/test/org/apache/hive/service/cli/operation/TestOperationLoggingAPI.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24293/diff/
> 
> 
> Testing
> ---

Re: Review Request 24404: HIVE-7635: Query having same aggregate functions but different case throws IndexOutOfBoundsException

2014-08-07 Thread Chaoyu Tang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24404/
---

(Updated Aug. 7, 2014, 3:40 p.m.)


Review request for hive.


Changes
---

Fixed the failed test due to the missing update to having.q.out for Tez. 
Recreated the diff and uploaded here. Thanks for the review.


Bugs: HIVE-7635
https://issues.apache.org/jira/browse/HIVE-7635


Repository: hive-git


Description
---

A query having same aggregate functions but in different case (e.g. SELECT key, 
COUNT(value) FROM src GROUP BY key HAVING count(value) >= 4) does not work and 
throws IndexOutOfBoundsException. The cause is that Hive treats count(value) 
and COUNT(value) in this query as two different aggregate expression when 
compiling query and generating plan. They are case sensitive.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 51838ae 
  ql/src/test/queries/clientpositive/having.q 5b1aa69 
  ql/src/test/results/clientpositive/having.q.out d912001 
  ql/src/test/results/clientpositive/tez/having.q.out e96342d 

Diff: https://reviews.apache.org/r/24404/diff/


Testing
---

1. The fix addressed the failed query with different case in aggregate function 
name in the query
2. New unit tests passed
3. patch will be submitted for pre-commit tests


Thanks,

Chaoyu Tang

[jira] [Updated] (HIVE-7635) Query having same aggregate functions but different case throws IndexOutOfBoundsException

2014-08-07 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-7635:
--

Attachment: HIVE-7635.1.patch

Fixed the failed test (due to the missing update in having.q.out for Tez). 
Uploaded the new patch here and also to RB.

> Query having same aggregate functions but different case throws 
> IndexOutOfBoundsException
> -
>
> Key: HIVE-7635
> URL: https://issues.apache.org/jira/browse/HIVE-7635
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.1
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 0.14.0
>
> Attachments: HIVE-7635.1.patch, HIVE-7635.patch
>
>
> A query having same aggregate functions (e.g. count) but in different case  
> does not work and throws IndexOutOfBoundsException.
> {code}
> Query:
> SELECT key, COUNT(value) FROM src GROUP BY key HAVING count(value) >= 4
> ---
> Error log:
> 14/08/06 11:00:45 ERROR ql.Driver: FAILED: IndexOutOfBoundsException Index: 
> 2, Size: 2
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
>   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>   at java.util.ArrayList.get(ArrayList.java:322)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4173)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5165)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8337)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9178)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9431)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:207)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:414)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:310)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1023)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:960)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:265)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:217)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:427)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:800)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:694)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7630) DROP PARTITION does not recognize built-in function

2014-08-07 Thread Gwenael Le Barzic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwenael Le Barzic updated HIVE-7630:


Description: 
Hello !

We currently have the following problem with Hive 0.13 in the HDP 2.1.

{code:shell}CREATE TABLE MyTable
(
mystring STRING,
mydate DATE
)
PARTITIONED BY (DT_PARTITION DATE);{code}

When I try to do this :
ALTER TABLE MyTable DROP PARTITION (DT_PARTITION = DATE_SUB(‘2012-09-13’,1));

I get the following error message :
NoViableAltException(26@[221:1: constant : ( Number | dateLiteral | 
StringLiteral | stringLiteralSequence | BigintLiteral | SmallintLiteral | 
TinyintLiteral | DecimalLiteral | charSetStringLiteral | booleanValue );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:116)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.constant(HiveParser_IdentifiersParser.java:6128)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionVal(HiveParser_IdentifiersParser.java:10819)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionSpec(HiveParser_IdentifiersParser.java:10664)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.dropPartitionSpec(HiveParser.java:40160)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixDropPartitions(HiveParser.java:9953)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterTableStatementSuffix(HiveParser.java:6731)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatement(HiveParser.java:6552)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2189)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:409)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:323)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:980)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1045)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:51 cannot recognize input near 'DATE_SUB' '(' '.' 
in constant

In fact, it is larger than that. You cannot get the result of a built-in 
function (for example DATE_SUB) into a variable in hive and use in later in the 
hql script.

Best regards.

Gwenael Le Barzic

  was:
Hello !

We currently have the following problem with Hive 0.13 in the HDP 2.1.

CREATE TABLE MyTable
(
mystring STRING,
mydate DATE
)
PARTITIONED BY (DT_PARTITION DATE);

When I try to do this :
ALTER TABLE MyTable DROP PARTITION (DT_PARTITION = DATE_SUB(‘2012-09-13’,1));

I get the following error message :
NoViableAltException(26@[221:1: constant : ( Number | dateLiteral | 
StringLiteral | stringLiteralSequence | BigintLiteral | SmallintLiteral | 
TinyintLiteral | DecimalLiteral | charSetStringLiteral | booleanValue );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:116)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.constant(HiveParser_IdentifiersParser.java:6128)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionVal(HiveParser_IdentifiersParser.java:10819)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionSpec(HiveParser_IdentifiersParser.java:10664)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.dropPartitionSpec(HiveParser.java:40160)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixDropPartitions(HiveParser.java:9953)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterTableStatementSuffix

[jira] [Updated] (HIVE-7630) DROP PARTITION does not recognize built-in function

2014-08-07 Thread Gwenael Le Barzic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwenael Le Barzic updated HIVE-7630:


Description: 
Hello !

We currently have the following problem with Hive 0.13 in the HDP 2.1.

{code:none}CREATE TABLE MyTable
(
mystring STRING,
mydate DATE
)
PARTITIONED BY (DT_PARTITION DATE);{code}

When I try to do this :
{code:none}ALTER TABLE MyTable DROP PARTITION (DT_PARTITION = 
DATE_SUB(‘2012-09-13’,1));{code}

I get the following error message :
{code:none}NoViableAltException(26@[221:1: constant : ( Number | dateLiteral | 
StringLiteral | stringLiteralSequence | BigintLiteral | SmallintLiteral | 
TinyintLiteral | DecimalLiteral | charSetStringLiteral | booleanValue );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:116)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.constant(HiveParser_IdentifiersParser.java:6128)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionVal(HiveParser_IdentifiersParser.java:10819)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionSpec(HiveParser_IdentifiersParser.java:10664)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.dropPartitionSpec(HiveParser.java:40160)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixDropPartitions(HiveParser.java:9953)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterTableStatementSuffix(HiveParser.java:6731)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatement(HiveParser.java:6552)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2189)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:409)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:323)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:980)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1045)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:51 cannot recognize input near 'DATE_SUB' '(' '.' 
in constant{code}

In fact, it is larger than that. You cannot get the result of a built-in 
function (for example DATE_SUB) into a variable in hive and use in later in the 
hql script.

Best regards.

Gwenael Le Barzic

  was:
Hello !

We currently have the following problem with Hive 0.13 in the HDP 2.1.

{code:shell}CREATE TABLE MyTable
(
mystring STRING,
mydate DATE
)
PARTITIONED BY (DT_PARTITION DATE);{code}

When I try to do this :
ALTER TABLE MyTable DROP PARTITION (DT_PARTITION = DATE_SUB(‘2012-09-13’,1));

I get the following error message :
NoViableAltException(26@[221:1: constant : ( Number | dateLiteral | 
StringLiteral | stringLiteralSequence | BigintLiteral | SmallintLiteral | 
TinyintLiteral | DecimalLiteral | charSetStringLiteral | booleanValue );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:116)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.constant(HiveParser_IdentifiersParser.java:6128)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionVal(HiveParser_IdentifiersParser.java:10819)
at 
org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionSpec(HiveParser_IdentifiersParser.java:10664)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.dropPartitionSpec(HiveParser.java:40160)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixDropPartitions(HiveParser.java:9953)
at 
org.apache.hadoo

[jira] [Commented] (HIVE-7634) Use Configuration.getPassword() if available to eliminate passwords from hive-site.xml

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089271#comment-14089271
 ] 

Hive QA commented on HIVE-7634:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660301/HIVE-7634.1.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5885 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/208/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/208/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-208/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660301

> Use Configuration.getPassword() if available to eliminate passwords from 
> hive-site.xml
> --
>
> Key: HIVE-7634
> URL: https://issues.apache.org/jira/browse/HIVE-7634
> Project: Hive
>  Issue Type: Bug
>  Components: Security
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-7634.1.patch
>
>
> HADOOP-10607 provides a Configuration.getPassword() API that allows passwords 
> to be retrieved from a configured credential provider, while also being able 
> to fall back to the HiveConf setting if no provider is set up.  Hive should 
> use this API for versions of Hadoop that support this API. This would give 
> users the ability to remove the passwords from their Hive configuration files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7405) Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)

2014-08-07 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089205#comment-14089205
 ] 

Hive QA commented on HIVE-7405:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12660303/HIVE-7405.7.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5883 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/207/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/207/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-207/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12660303

> Vectorize GROUP BY on the Reduce-Side (Part 1 – Basic)
> --
>
> Key: HIVE-7405
> URL: https://issues.apache.org/jira/browse/HIVE-7405
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Matt McCline
>Assignee: Matt McCline
> Attachments: HIVE-7405.1.patch, HIVE-7405.2.patch, HIVE-7405.3.patch, 
> HIVE-7405.4.patch, HIVE-7405.5.patch, HIVE-7405.6.patch, HIVE-7405.7.patch
>
>
> Vectorize the basic case that does not have any count distinct aggregation.
> Add a 4th processing mode in VectorGroupByOperator for reduce where each 
> input VectorizedRowBatch has only values for one key at a time.  Thus, the 
> values in the batch can be aggregated quickly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7624) Reduce operator initialization failed when running multiple MR query on spark

2014-08-07 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089181#comment-14089181
 ] 

Rui Li commented on HIVE-7624:
--

This patch solves the reducesinkkey0 problem. Map work and reduce work finish 
successfully.
However, no result is returned. I checked the log and found the second reduce 
work got nothing to process. Not sure what is missing here...

I quickly looked at tez code and find it sets output collector for each reduce 
sink. (OperatorUtils.setChildrenCollector) Don't know if this is related though

> Reduce operator initialization failed when running multiple MR query on spark
> -
>
> Key: HIVE-7624
> URL: https://issues.apache.org/jira/browse/HIVE-7624
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-7624.patch
>
>
> The following error occurs when I try to run a query with multiple reduce 
> works (M->R->R):
> {quote}
> 14/08/05 12:17:07 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 1)
> java.lang.RuntimeException: Reduce operator initialization failed
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:170)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:53)
> at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: cannot find field reducesinkkey0 from 
> [0:_col0]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
> …
> {quote}
> I suspect we're applying the reduce function in wrong order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

< 1 2 3 >

101 - 200 of 264 matches

Mail list logo