[jira] [Created] (HIVE-16465) NullPointer Exception when enable vectorization for Parquet file format

2017-04-17 Thread Colin Ma (JIRA)
Colin Ma created HIVE-16465:
---

 Summary: NullPointer Exception when enable vectorization for 
Parquet file format
 Key: HIVE-16465
 URL: https://issues.apache.org/jira/browse/HIVE-16465
 Project: Hive
  Issue Type: Bug
Reporter: Colin Ma
Assignee: Colin Ma
Priority: Critical


NullPointer Exception when enable vectorization for Parquet file format. It is 
caused by the null value of the InputSplit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16463) Change license for java transaction jta jar to CDDL 1.0

2017-04-17 Thread Alan Gates (JIRA)
Alan Gates created HIVE-16463:
-

 Summary: Change license for java transaction jta jar to CDDL 1.0
 Key: HIVE-16463
 URL: https://issues.apache.org/jira/browse/HIVE-16463
 Project: Hive
  Issue Type: Bug
Reporter: Alan Gates
Assignee: Alan Gates


Previously I erroneously said that this jar was under the SCSL 3.0 license.  
But further research has shown I was wrong and it is released under CDDL 1.0.  
So we need to change the license file for this jar in the binaries directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16462) Vectorization: Enabling hybrid grace disables specialization of all reduce side joins

2017-04-17 Thread Jason Dere (JIRA)
Jason Dere created HIVE-16462:
-

 Summary: Vectorization: Enabling hybrid grace disables 
specialization of all reduce side joins
 Key: HIVE-16462
 URL: https://issues.apache.org/jira/browse/HIVE-16462
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Jason Dere
Assignee: Jason Dere


Observed by [~gopalv].

Having grace hash join enabled prevents the specialized vector hash joins 
during the vectorizer stage of query planning. However 
hive.llap.enable.grace.join.in.llap will later disable grace hash join 
(LlapDecider runs after Vectorizer). If we can disable the grace hash join 
before vectorization kicks in then we can still benefit from the specialized 
vector hash joins.

This can be special cased for the llap.execution.mode=only case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16461) DagUtils checks local resource size on the remote fs

2017-04-17 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-16461:
---

 Summary: DagUtils checks local resource size on the remote fs
 Key: HIVE-16461
 URL: https://issues.apache.org/jira/browse/HIVE-16461
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


The path for local file may have no schema.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16460) In the console output, show vertex list in topological order instead of an alphabetical sort

2017-04-17 Thread Siddharth Seth (JIRA)
Siddharth Seth created HIVE-16460:
-

 Summary: In the console output, show vertex list in topological 
order instead of an alphabetical sort
 Key: HIVE-16460
 URL: https://issues.apache.org/jira/browse/HIVE-16460
 Project: Hive
  Issue Type: Improvement
Reporter: Siddharth Seth


cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Do you feel a need for schema when querying JSON files in hive?

2017-04-17 Thread S G
So no one knows about this ?
I was hoping to use some knowledge already acquired on this subject :(


On Tue, Apr 11, 2017 at 2:09 AM, S G  wrote:

> Hi,
>
> There is a concept of JsonSerDe where you need to specify a structure for
> your tables in order to query them.
>
> However, since the schema for an object is prone to change (once every few
> months is not unexpected), how do you handle that change in your hive/pig
> queries?
>
> Moreover, since JSON files are not demarcated according to schema, it is
> possible that a single JSON file has json-data for multiple evolutions of a
> schema (Like 10 objects of ClassAnimal1, 20 of ClassAnimal2, 100 of
> ClassAnimal3 etc where ClassAnimal1, ClassAnimal2 and ClassAnimal3
> represent schema for ClassAnimal at different times).
>
> For such a JSON file, what is the recommended way of querying?
>
> I know that Avro solves this problem by maintaining a single file for a
> single-kind of schema. So it will have 3 files for the above case, 1 each
> for ClassAnimal1, ClassAnimal2 and ClassAnimal3)
>
> But since Avro is binary, hard to debug and requires a schema-repository
> (for non-hive use-cases), we were hoping to solve this problem in JSON.
>
> Related questions:
> 1) Is it even a problem worth solving?
> 2) How many people use AvroSerDe as compared to JsonSerDe?
>
> Thanks
> SG
>
>