[jira] [Commented] (HIVE-11533) Loop optimization for SIMD in integer comparisons

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948096#comment-14948096
 ] 

Hive QA commented on HIVE-11533:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765289/HIVE-11533.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9563 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5565/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5565/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5565/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765289 - PreCommit-HIVE-TRUNK-Build

> Loop optimization for SIMD in integer comparisons
> -
>
> Key: HIVE-11533
> URL: https://issues.apache.org/jira/browse/HIVE-11533
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11533.1.patch, HIVE-11533.2.patch, 
> HIVE-11533.3.patch, HIVE-11533.4.patch
>
>
> Long*CompareLong* classes can be optimized with subtraction and bitwise 
> operators for better SIMD optimization.
> {code}
> for(int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] > vector2[i] ? 1 : 0;
> }
> {code}
> This issue will cover following classes;
> - LongColEqualLongColumn
> - LongColNotEqualLongColumn
> - LongColGreaterLongColumn
> - LongColGreaterEqualLongColumn
> - LongColLessLongColumn
> - LongColLessEqualLongColumn
> - LongScalarEqualLongColumn
> - LongScalarNotEqualLongColumn
> - LongScalarGreaterLongColumn
> - LongScalarGreaterEqualLongColumn
> - LongScalarLessLongColumn
> - LongScalarLessEqualLongColumn
> - LongColEqualLongScalar
> - LongColNotEqualLongScalar
> - LongColGreaterLongScalar
> - LongColGreaterEqualLongScalar
> - LongColLessLongScalar
> - LongColLessEqualLongScalar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm

2015-10-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948033#comment-14948033
 ] 

Laljo John Pullokkaran commented on HIVE-11954:
---

"getNumberOfCostlyOps" could be made either recursive or use graph walker or by 
modifying nodeutils.

> Extend logic to choose side table in MapJoin Conversion algorithm
> -
>
> Key: HIVE-11954
> URL: https://issues.apache.org/jira/browse/HIVE-11954
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, 
> HIVE-11954.03.patch, HIVE-11954.patch, HIVE-11954.patch
>
>
> Selection of side table (in memory/hash table) in MapJoin Conversion 
> algorithm needs to be more sophisticated.
> In an N way Map Join, Hive should pick an input stream as side table (in 
> memory table) that has least cost in producing relation (like TS(FIL|Proj)*).
> Cost based choice needs extended cost model; without return path its going to 
> be hard to do this.
> For the time being we could employ a modified cost based algorithm for side 
> table selection.
> New algorithm is described below:
> 1. Identify the candidate set of inputs for side table (in memory/hash table) 
> from the inputs (based on conditional task size)
> 2. For each of the input identify its cost, memory requirement. Cost is 1 for 
> each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for 
> an input is the total no of heavy weight ops in its branch.
> 3. Order set from #1 on cost & memory req (ascending order)
> 4. Pick the first element from #3 as the side table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12025) refactor bucketId generating code

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948024#comment-14948024
 ] 

Hive QA commented on HIVE-12025:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765297/HIVE-12025.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9638 tests executed
*Failed tests:*
{noformat}
TestMarkPartition - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.hcatalog.streaming.mutate.worker.TestBucketIdResolverImpl.testAttachBucketIdToRecord
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5564/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5564/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5564/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765297 - PreCommit-HIVE-TRUNK-Build

> refactor bucketId generating code
> -
>
> Key: HIVE-12025
> URL: https://issues.apache.org/jira/browse/HIVE-12025
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-12025.2.patch, HIVE-12025.patch
>
>
> HIVE-11983 adds ObjectInspectorUtils.getBucketHashCode() and 
> getBucketNumber().
> There are several (at least) places in Hive that perform this computation:
> # ReduceSinkOperator.computeBucketNumber
> # ReduceSinkOperator.computeHashCode
> # BucketIdResolverImpl - only in 2.0.0 ASF line
> # FileSinkOperator.findWriterOffset
> # GenericUDFHash
> Should refactor it and make sure they all call methods from 
> ObjectInspectorUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11880) filter bug of UNION ALL when hive.ppd.remove.duplicatefilters=true and filter condition is type incompatible column

2015-10-07 Thread WangMeng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948023#comment-14948023
 ] 

WangMeng commented on HIVE-11880:
-

[~jpullokkaran]   
 I have  added  Review Board in Issue Links.
 The  execution engine is MR( I  don't use TEZ) .You can use 
TPC-H(http://www.tpc.org/tpch/)  to  reproduce this Jira according to the 
descriptition above. Thanks.
 Different from  HIVE-11919 , only when  occurs "union type mismatch" and  one 
of the type mismatch column  is constant  and this type mismatch column is 
filter column, then  UNION ALL will  throws HIVE-11880.

> filter bug  of UNION ALL when hive.ppd.remove.duplicatefilters=true and 
> filter condition is type incompatible column 
> -
>
> Key: HIVE-11880
> URL: https://issues.apache.org/jira/browse/HIVE-11880
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1
>Reporter: WangMeng
>Assignee: WangMeng
> Attachments: HIVE-11880.01.patch, HIVE-11880.02.patch, 
> HIVE-11880.03.patch, HIVE-11880.04.patch
>
>
>For UNION ALL , when an union operator is constant column (such as '0L', 
> BIGINT Type)  and its corresponding column has incompatible type (such as INT 
> type). 
>   Query with filter condition on type incompatible column on this UNION ALL  
> will cause IndexOutOfBoundsException.
>  Such as TPC-H table "orders",in  the following query:
>  Type of 'orders'.'o_custkey' is INT normally,  while  the type of 
> corresponding constant column  "0" is BIGINT( `0L AS `o_custkey` ). 
>  This query (with filter "type incompatible column 'o_custkey' ")  will fail  
> with  java.lang.IndexOutOfBoundsException : 
> {code}
> SELECT Count(1)
> FROM   (
>   SELECT `o_orderkey` ,
>  `o_custkey`
>   FROM   `orders`
>   UNION ALL
>   SELECT `o_orderkey`,
>  0L  AS `o_custkey`
>   FROM   `orders`) `oo`
> WHERE  o_custkey<10 limit 4 ;
> {code}
> When 
> {code}
> set hive.ppd.remove.duplicatefilters=true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements

2015-10-07 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948018#comment-14948018
 ] 

Chengbing Liu commented on HIVE-11901:
--

[~thejas], I think we can add test cases for the authorization part in another 
JIRA and check this in first, if you think the patch is ok.

> StorageBasedAuthorizationProvider requires write permission on table for 
> SELECT statements
> --
>
> Key: HIVE-11901
> URL: https://issues.apache.org/jira/browse/HIVE-11901
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Attachments: HIVE-11901.01.patch
>
>
> With HIVE-7895, it will require write permission on the table directory even 
> for a SELECT statement.
> Looking at the stacktrace, it seems the method 
> {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, 
> Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats 
> a null partition as a CREATE statement, which can also be a SELECT.
> We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first   
> in order to tell which statement it is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs

2015-10-07 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948011#comment-14948011
 ] 

Chengbing Liu commented on HIVE-11149:
--

[~sershe], would you commit this?

> Fix issue with sometimes HashMap in PerfLogger.java hangs 
> --
>
> Key: HIVE-11149
> URL: https://issues.apache.org/jira/browse/HIVE-11149
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 1.2.1
>Reporter: WangMeng
>Assignee: WangMeng
> Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, 
> HIVE-11149.03.patch, HIVE-11149.04.patch
>
>
> In  Multi-thread environment,  sometimes the  HashMap in PerfLogger.java  
> will  casue massive Java Processes hang  and cost  large amounts of 
> unnecessary CPU and Memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances

2015-10-07 Thread Nemon Lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nemon Lou updated HIVE-11768:
-
Priority: Minor  (was: Major)

> java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances
> 
>
> Key: HIVE-11768
> URL: https://issues.apache.org/jira/browse/HIVE-11768
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Navis
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HIVE-11768.1.patch.txt, HIVE-11768.2.patch.txt
>
>
>   More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our 
> long running HiveServer2 instances,taken up more than 100MB on heap.
>   Most of the paths contains a suffix of ".pipeout".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11533) Loop optimization for SIMD in integer comparisons

2015-10-07 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947998#comment-14947998
 ] 

Chengxiang Li commented on HIVE-11533:
--

Very nice job, the patch looks good, just one thing to remind. I guess the 
performance data is tested with "selectedInUse" is false. while "selectedInUse" 
is true, it could not benefit from SIMD instructions, during my previous 
experience, it might downgrade performance sometimes after the optimization, 
have you verified that?

> Loop optimization for SIMD in integer comparisons
> -
>
> Key: HIVE-11533
> URL: https://issues.apache.org/jira/browse/HIVE-11533
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>Priority: Minor
> Attachments: HIVE-11533.1.patch, HIVE-11533.2.patch, 
> HIVE-11533.3.patch, HIVE-11533.4.patch
>
>
> Long*CompareLong* classes can be optimized with subtraction and bitwise 
> operators for better SIMD optimization.
> {code}
> for(int i = 0; i != n; i++) {
>   outputVector[i] = vector1[0] > vector2[i] ? 1 : 0;
> }
> {code}
> This issue will cover following classes;
> - LongColEqualLongColumn
> - LongColNotEqualLongColumn
> - LongColGreaterLongColumn
> - LongColGreaterEqualLongColumn
> - LongColLessLongColumn
> - LongColLessEqualLongColumn
> - LongScalarEqualLongColumn
> - LongScalarNotEqualLongColumn
> - LongScalarGreaterLongColumn
> - LongScalarGreaterEqualLongColumn
> - LongScalarLessLongColumn
> - LongScalarLessEqualLongColumn
> - LongColEqualLongScalar
> - LongColNotEqualLongScalar
> - LongColGreaterLongScalar
> - LongColGreaterEqualLongScalar
> - LongColLessLongScalar
> - LongColLessEqualLongScalar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12064) prevent transactional=false

2015-10-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12064:
--
Attachment: HIVE-12064.patch

> prevent transactional=false
> ---
>
> Key: HIVE-12064
> URL: https://issues.apache.org/jira/browse/HIVE-12064
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-12064.patch
>
>
> currently a tblproperty transactional=true must be set to make a table behave 
> in ACID compliant way.
> This is misleading in that it seems like changing it to transactional=false 
> makes the table non-acid but on disk layout of acid table is different than 
> plain tables.  So changing this  property may cause wrong data to be returned.
> Should prevent transactional=false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12021) HivePreFilteringRule may introduce wrong common operands

2015-10-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947975#comment-14947975
 ] 

Laljo John Pullokkaran commented on HIVE-12021:
---

The multi map should also get pruned.
Consider the following case:
(x=f(y) and x=10 and expr1) or (x=10 and expr2)

The multi-map "reductionCondition" will contain both 1) "x=10" and 2) x=f(y)

However x=f(y) is not present in all DNF elements.

One may argue transitive effects (i.e x=f(y) should also be right since we have 
x=10); i think its brittle to leave multi map entries unchanged.

> HivePreFilteringRule may introduce wrong common operands
> 
>
> Key: HIVE-12021
> URL: https://issues.apache.org/jira/browse/HIVE-12021
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 1.3.0, 1.2.1, 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0, 1.2.2
>
> Attachments: HIVE-12021.01.patch, HIVE-12021.02.patch, 
> HIVE-12021.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11887) spark tests break the build on a shared machine

2015-10-07 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947965#comment-14947965
 ] 

Rui Li commented on HIVE-11887:
---

Sorry for the late response. Seems the change was introduced in HIVE-9664.
[~nntnag17] do you have any idea for this issue? Thanks.

> spark tests break the build on a shared machine
> ---
>
> Key: HIVE-11887
> URL: https://issues.apache.org/jira/browse/HIVE-11887
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Spark download creates UDFExampleAdd jar in /tmp; when building on a shared 
> machine, someone else's jar from a build prevents this jar from being created 
> (I have no permissions to this file because it was created by a different 
> user) and the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-6643) Add a check for cross products in plans and output a warning

2015-10-07 Thread Anthony Hsu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anthony Hsu updated HIVE-6643:
--
Description: 
Now that we support old style join syntax, it is easy to write queries that 
generate a plan with a cross product.
For e.g. say you have A join B join C join D on A.x = B.x and A.y = D.y and C.z 
= D.z
So the JoinTree is:

A — B
\|__  D — C

Since we don't reorder join graphs, we will end up with a cross product between 
(A join B) and C

  was:
Now that we support old style join syntax, it is easy to write queries that 
generate a plan with a cross product.
For e.g. say you have A join B join C join D on A.x = B.x and A.y = D.y and C.z 
= D.z
So the JoinTree is:

A — B
|__  D — C

Since we don't reorder join graphs, we will end up with a cross product between 
(A join B) and C


> Add a check for cross products in plans and output a warning
> 
>
> Key: HIVE-6643
> URL: https://issues.apache.org/jira/browse/HIVE-6643
> Project: Hive
>  Issue Type: Bug
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.13.0
>
> Attachments: HIVE-6643.1.patch, HIVE-6643.2.patch, HIVE-6643.3.patch, 
> HIVE-6643.4.patch, HIVE-6643.5.patch, HIVE-6643.6.patch, HIVE-6643.7.patch
>
>
> Now that we support old style join syntax, it is easy to write queries that 
> generate a plan with a cross product.
> For e.g. say you have A join B join C join D on A.x = B.x and A.y = D.y and 
> C.z = D.z
> So the JoinTree is:
> A — B
> \|__  D — C
> Since we don't reorder join graphs, we will end up with a cross product 
> between (A join B) and C



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-10-07 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947948#comment-14947948
 ] 

Swarnim Kulkarni commented on HIVE-11609:
-

[~ashutoshc] Mind giving this another quick look and let me know if my comment 
[here|https://issues.apache.org/jira/browse/HIVE-11609?focusedCommentId=14935951&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14935951]
 makes sense?

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, 
> HIVE-11609.3.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work

2015-10-07 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni updated HIVE-11609:

Attachment: HIVE-11609.3.patch.txt

Reattaching patch rebasing with master and very minor updates.

> Capability to add a filter to hbase scan via composite key doesn't work
> ---
>
> Key: HIVE-11609
> URL: https://issues.apache.org/jira/browse/HIVE-11609
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, 
> HIVE-11609.3.patch.txt
>
>
> It seems like the capability to add filter to an hbase scan which was added 
> as part of HIVE-6411 doesn't work. This is primarily because in the 
> HiveHBaseInputFormat, the filter is added in the getsplits instead of 
> getrecordreader. This works fine for start and stop keys but not for filter 
> because a filter is respected only when an actual scan is performed. This is 
> also related to the initial refactoring that was done as part of HIVE-3420.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12061) add file type support to file metadata by expr call

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12061:

Description: 
Expr filtering, automatic caching, etc. should be aware of file types for 
advanced features. For now only ORC is supported, but I want to add boundary 
for between ORC-specific and general metastore code, that could later be used 
for other formats if needed.

NO PRECOMMIT TESTS

  was:Expr filtering, automatic caching, etc. should be aware of file types for 
advanced features. For now only ORC is supported, but I want to add boundary 
for between ORC-specific and general metastore code, that could later be used 
for other formats if needed.


> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add boundary 
> for between ORC-specific and general metastore code, that could later be used 
> for other formats if needed.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12061) add file type support to file metadata by expr call

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12061:

Attachment: HIVE-12061.patch
HIVE-12061.nogen.patch

Patch on top of HIVE-11676

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12061.nogen.patch, HIVE-12061.patch
>
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add boundary 
> for between ORC-specific and general metastore code, that could later be used 
> for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12065) FS stats collection may generate incorrect stats for multi-insert query

2015-10-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12065:

Attachment: HIVE-12065.patch

> FS stats collection may generate incorrect stats for multi-insert query
> ---
>
> Key: HIVE-12065
> URL: https://issues.apache.org/jira/browse/HIVE-12065
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12065.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12057:

Attachment: HIVE-12057.02.patch

Patch with caching 

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.01.patch, HIVE-12057.02.patch, 
> HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-12062) enable HBase metastore file metadata cache for tez tests

2015-10-07 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947883#comment-14947883
 ] 

Vikram Dixit K edited comment on HIVE-12062 at 10/8/15 12:46 AM:
-

Yes. It will propagate to the AM when we create the tez session. 
TestMiniTezCliDriver picks up the config from this site.xm and then the config 
is shipped from the client to the AM at the session creation time.


was (Author: vikram.dixit):
Yes. It will propagate to the AM when we create the tez session. It is shipped 
from the client to the AM at that time.

> enable HBase metastore file metadata cache for tez tests
> 
>
> Key: HIVE-12062
> URL: https://issues.apache.org/jira/browse/HIVE-12062
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12062.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12062) enable HBase metastore file metadata cache for tez tests

2015-10-07 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947883#comment-14947883
 ] 

Vikram Dixit K commented on HIVE-12062:
---

Yes. It will propagate to the AM when we create the tez session. It is shipped 
from the client to the AM at that time.

> enable HBase metastore file metadata cache for tez tests
> 
>
> Key: HIVE-12062
> URL: https://issues.apache.org/jira/browse/HIVE-12062
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12062.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12062) enable HBase metastore file metadata cache for tez tests

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12062:

Attachment: HIVE-12062.patch

[~daijy] should this be sufficient?

[~vikram.dixit] does this config propagate to AM in MiniTez? This is a 
client-side setting (as in, AM-side, for metastore cache).

> enable HBase metastore file metadata cache for tez tests
> 
>
> Key: HIVE-12062
> URL: https://issues.apache.org/jira/browse/HIVE-12062
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12062.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled

2015-10-07 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947872#comment-14947872
 ] 

Ashutosh Chauhan commented on HIVE-9695:


+1 LGTM

> Redundant filter operator in reducer Vertex when CBO is disabled
> 
>
> Key: HIVE-9695
> URL: https://issues.apache.org/jira/browse/HIVE-9695
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch
>
>
> There is a redundant filter operator in reducer Vertex when CBO is disabled.
> Query 
> {code}
> select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales a, store_returns b, store
> where
> a.ss_item_sk = b.sr_item_sk
> and a.ss_ticket_number = b.sr_ticket_number 
> and ss_sold_date_sk between 2450816 and 2451500
>   and sr_returned_date_sk between 2450816 and 2451500
>   and s_store_sk = ss_store_sk;
> {code}
> Plan snippet 
> {code}
>   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: (_col1 = _col27) and (_col8 = _col34)) and 
> _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) 
> and (_col49 = _col6)) (type: boolean)
> {code}
> Full plan with CBO disabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 
> (SIMPLE_EDGE)
>   DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: b
>   filterExpr: ((sr_item_sk is not null and sr_ticket_number 
> is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: 
> boolean)
>   Statistics: Num rows: 2370038095 Data size: 170506118656 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (sr_item_sk is not null and sr_ticket_number 
> is not null) (type: boolean)
> Statistics: Num rows: 706893063 Data size: 6498502768 
> Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key expressions: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>   sort order: ++
>   Map-reduce partition columns: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>   Statistics: Num rows: 706893063 Data size: 6498502768 
> Basic stats: COMPLETE Column stats: COMPLETE
>   value expressions: sr_returned_date_sk (type: int)
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
>   alias: store
>   filterExpr: s_store_sk is not null (type: boolean)
>   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: s_store_sk is not null (type: boolean)
> Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key expressions: s_store_sk (type: int)
>   sort order: +
>   Map-reduce partition columns: s_store_sk (type: int)
>   Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
>   alias: a
>   filterExpr: (((ss_item_sk is not null and ss_ticket_number 
> is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 
> AND 2451500) (type: boolean)
>   Statistics: Num rows: 28878719387 Data size: 2405805439460 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((ss_item_sk is not null and ss_ticket_number 
> is not null) and ss_store_sk is not null) (type: boolean)
> Statistics: Num rows: 8405840828 Data size: 110101408700 
> Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key

[jira] [Commented] (HIVE-11976) Extend CBO rules to being able to apply rules only once on a given operator

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947861#comment-14947861
 ] 

Hive QA commented on HIVE-11976:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765278/HIVE-11976.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9654 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5563/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5563/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5563/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765278 - PreCommit-HIVE-TRUNK-Build

> Extend CBO rules to being able to apply rules only once on a given operator
> ---
>
> Key: HIVE-11976
> URL: https://issues.apache.org/jira/browse/HIVE-11976
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11976.01.patch, HIVE-11976.02.patch, 
> HIVE-11976.03.patch, HIVE-11976.04.patch, HIVE-11976.patch
>
>
> Create a way to bail out quickly from HepPlanner if the rule has been already 
> applied on a certain operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12062) enable HBase metastore file metadata cache for tez tests

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12062:
---

Assignee: Sergey Shelukhin

> enable HBase metastore file metadata cache for tez tests
> 
>
> Key: HIVE-12062
> URL: https://issues.apache.org/jira/browse/HIVE-12062
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-10-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947848#comment-14947848
 ] 

Chaoyu Tang commented on HIVE-11786:


If the query is rewritten to get PART_ID first and then use PART_ID and 
COLUMN_NAME to query PART_COL_STATS table, probably a new index on COLUMN_NAME, 
PART_ID  (CREATE INDEX COLNAME_PARTID_IDX ON PART_COL_STATS (COLUMN_NAME, 
PART_ID)) is still needed. I am going to work on the new queries.

> Deprecate the use of redundant column in colunm stats related tables
> 
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, 
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
> These redundant columns violate database normalization rules and cause a lot 
> of inconvenience (sometimes difficult) in column stats related feature 
> implementation. For example, when renaming a table, we have to update 
> TABLE_NAME column in these tables as well which is unnecessary.
> This JIRA is first to deprecate the use of these columns at HMS code level. A 
> followed JIRA is to be opened to focus on DB schema change and upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11894) CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table column name in CTAS queries

2015-10-07 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11894:
---
Attachment: HIVE-11894.05.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table 
> column name in CTAS queries
> ---
>
> Key: HIVE-11894
> URL: https://issues.apache.org/jira/browse/HIVE-11894
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11894.01.patch, HIVE-11894.02.patch, 
> HIVE-11894.03.patch, HIVE-11894.04.patch, HIVE-11894.05.patch
>
>
> To repro, run lineage2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11212) Create vectorized types for complex types

2015-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947835#comment-14947835
 ] 

Sergey Shelukhin commented on HIVE-11212:
-

Actually nm I don't think this will affect much

> Create vectorized types for complex types
> -
>
> Key: HIVE-11212
> URL: https://issues.apache.org/jira/browse/HIVE-11212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, 
> HIVE-11212.patch
>
>
> We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11212) Create vectorized types for complex types

2015-10-07 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947834#comment-14947834
 ] 

Matt McCline commented on HIVE-11212:
-

([~owen.omalley] talk to Sergey about this might affect his merge)

> Create vectorized types for complex types
> -
>
> Key: HIVE-11212
> URL: https://issues.apache.org/jira/browse/HIVE-11212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, 
> HIVE-11212.patch
>
>
> We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11212) Create vectorized types for complex types

2015-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947831#comment-14947831
 ] 

Sergey Shelukhin commented on HIVE-11212:
-

Is it possible to hold off until llap branch merge?

> Create vectorized types for complex types
> -
>
> Key: HIVE-11212
> URL: https://issues.apache.org/jira/browse/HIVE-11212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, 
> HIVE-11212.patch
>
>
> We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column

2015-10-07 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12063:
---
Description: 
HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which 
caused many problems including treating 0.0, 0.00 and so on as 0, which has 
different precision/scale. Please refer to HIVE-7373 description. However, 
HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. 
HIVE-11835 was resolved recently to address one of the problems, where 0.0, 
0.00, and so on cannot be read into decimal(1,1).

However, HIVE-11835 didn't address the problem of showing as 0 in query result 
for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 
0.0 have different precision/scale than 0.

The proposal here is to pad zeros for query result to the type's scale. This 
not only removes the confusion described above, but also aligns with many other 
DBs. Internal decimal number representation doesn't change, however.

  was:
HIVE-7373 was to address the problem of trimming tailing zeros by Hive, which 
caused many problems including treating 0.0, 0.00 and so on as 0, which has 
different precision/scale. Please refer to HIVE-7373 description. However, 
HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. 
HIVE-11835 was resolved recently to address one of the problems, where 0.0, 
0.00, and so cannot be read into decimal(1,1).

However, HIVE-11835 didn't address the problem of showing as 0 in query result 
for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 
0.0 have different precision/scale than 0.

The proposal here is to pad zeros for query result to the type's scale. This 
not only removes the confusion described above, but also aligns with many other 
DBs. Internal decimal number representation doesn't change, however.


> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: HIVE-12063
> URL: https://issues.apache.org/jira/browse/HIVE-12063
> Project: Hive
>  Issue Type: Improvement
>  Components: Types
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>
> HIVE-7373 was to address the problems of trimming tailing zeros by Hive, 
> which caused many problems including treating 0.0, 0.00 and so on as 0, which 
> has different precision/scale. Please refer to HIVE-7373 description. 
> However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems 
> remained. HIVE-11835 was resolved recently to address one of the problems, 
> where 0.0, 0.00, and so on cannot be read into decimal(1,1).
> However, HIVE-11835 didn't address the problem of showing as 0 in query 
> result for any decimal values such as 0.0, 0.00, etc. This causes confusion 
> as 0 and 0.0 have different precision/scale than 0.
> The proposal here is to pad zeros for query result to the type's scale. This 
> not only removes the confusion described above, but also aligns with many 
> other DBs. Internal decimal number representation doesn't change, however.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-10-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947807#comment-14947807
 ] 

Siddharth Seth commented on HIVE-11786:
---

Yes.  There's a large number of entries - 2million+ in PART_COL_STATS. The 
number of new entries there is what I was worried about - how many new entries 
per table / partition, and whether that can have a significant impact. IAC, if 
the query is being re-written - maybe the indexes will not be required.

> Deprecate the use of redundant column in colunm stats related tables
> 
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, 
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
> These redundant columns violate database normalization rules and cause a lot 
> of inconvenience (sometimes difficult) in column stats related feature 
> implementation. For example, when renaming a table, we have to update 
> TABLE_NAME column in these tables as well which is unnecessary.
> This JIRA is first to deprecate the use of these columns at HMS code level. A 
> followed JIRA is to be opened to focus on DB schema change and upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: (was: HIVE-11642.22.patch)

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.22.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: HIVE-11642.22.patch

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.22.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2015-10-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947784#comment-14947784
 ] 

Xuefu Zhang commented on HIVE-3976:
---

[~DoingDone9], There is no change on the rules, which are enforced in their 
respective UDFs.

> Support specifying scale and precision with Hive decimal type
> -
>
> Key: HIVE-3976
> URL: https://issues.apache.org/jira/browse/HIVE-3976
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.11.0
>Reporter: Mark Grover
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, 
> HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, 
> HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, 
> HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff
>
>
> HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
> current implementation has unlimited precision and provides no way to specify 
> precision and scale when creating the table.
> For example, MySQL allows users to specify scale and precision of the decimal 
> datatype when creating the table:
> {code}
> CREATE TABLE numbers (a DECIMAL(20,2));
> {code}
> Hive should support something similar too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.

2015-10-07 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947769#comment-14947769
 ] 

Xuefu Zhang commented on HIVE-11253:


Just curious, why did we single out and move HiveDecimal.java to storage-api? 
It seems natural that it stays with other data types such as CHAR or VARCHAR.

> Move SearchArgument and VectorizedRowBatch classes to storage-api.
> --
>
> Key: HIVE-11253
> URL: https://issues.apache.org/jira/browse/HIVE-11253
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.0.0
>
> Attachments: HIVE-11253.patch, HIVE-11253.patch, HIVE-11253.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12053) Stats performance regression caused by HIVE-11786

2015-10-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947764#comment-14947764
 ] 

Chaoyu Tang commented on HIVE-12053:


[~sseth] have found that creating following indexes would not help to improve 
the stats performance either
{code}
CREATE INDEX COLNAME_TBLID_IDX ON TAB_COL_STATS (COLUMN_NAME, TBL_ID);
CREATE INDEX COLNAME_IDX ON TAB_COL_STATS (COLUMN_NAME);
CREATE INDEX COLNAME_PARTID_IDX ON PART_COL_STATS (COLUMN_NAME, PART_ID);
CREATE INDEX COLNAME_IDX ON PART_COL_STATS (COLUMN_NAME);
CREATE INDEX PARTNAME_IDX ON PARTITIONS (PART_NAME);
CREATE INDEX TBLNAME_IDX ON TBLS (TBL_NAME);
{code}

> Stats performance regression caused by HIVE-11786
> -
>
> Key: HIVE-12053
> URL: https://issues.apache.org/jira/browse/HIVE-12053
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>
> HIVE-11786 tried to normalize table TAB_COL_STATS/PART_COL_STATS but caused 
> performance regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-10-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947760#comment-14947760
 ] 

Chaoyu Tang commented on HIVE-11786:


Thanks [~sseth]. Maybe there are a large number of rows in 
TAB_COL_STATS/PART_COL_STATS, so it took a long time to create the index on 
their column "COLUMN_NAME". I am going to change the query to see if it is 
helpful. BTW, I have created a JIRA HIVE-12053 for the performance regression 
you found.

> Deprecate the use of redundant column in colunm stats related tables
> 
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, 
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
> These redundant columns violate database normalization rules and cause a lot 
> of inconvenience (sometimes difficult) in column stats related feature 
> implementation. For example, when renaming a table, we have to update 
> TABLE_NAME column in these tables as well which is unnecessary.
> This JIRA is first to deprecate the use of these columns at HMS code level. A 
> followed JIRA is to be opened to focus on DB schema change and upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

2015-10-07 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11634:
-
Attachment: (was: HIVE-11634.990.patch)

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> --
>
> Key: HIVE-11634
> URL: https://issues.apache.org/jira/browse/HIVE-11634
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, 
> HIVE-11634.99.patch, HIVE-11634.990.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

2015-10-07 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11634:
-
Attachment: HIVE-11634.990.patch

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> --
>
> Key: HIVE-11634
> URL: https://issues.apache.org/jira/browse/HIVE-11634
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, 
> HIVE-11634.99.patch, HIVE-11634.990.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

2015-10-07 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11634:
-
Attachment: HIVE-11634.990.patch

Added new test cases.

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> --
>
> Key: HIVE-11634
> URL: https://issues.apache.org/jira/browse/HIVE-11634
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, 
> HIVE-11634.99.patch, HIVE-11634.990.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11212) Create vectorized types for complex types

2015-10-07 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947747#comment-14947747
 ] 

Matt McCline commented on HIVE-11212:
-

+1 lgtm.  So much code drives through these basic objects that a successful 
Hive QA test run is ok in lieu of additional unit tests.

> Create vectorized types for complex types
> -
>
> Key: HIVE-11212
> URL: https://issues.apache.org/jira/browse/HIVE-11212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, 
> HIVE-11212.patch
>
>
> We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12061) add file type support to file metadata by expr call

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12061:
---

Assignee: Sergey Shelukhin

> add file type support to file metadata by expr call
> ---
>
> Key: HIVE-12061
> URL: https://issues.apache.org/jira/browse/HIVE-12061
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Expr filtering, automatic caching, etc. should be aware of file types for 
> advanced features. For now only ORC is supported, but I want to add boundary 
> for between ORC-specific and general metastore code, that could later be used 
> for other formats if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-10-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947732#comment-14947732
 ] 

Siddharth Seth commented on HIVE-11786:
---

No difference with the remaining indexes.
(The index creation takes a long time btw - and may impact stat generation ?)

{code}
2015-10-07T18:38:09,444 DEBUG [main([])]: metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 16195.018669ms 
+ 0.058186ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE", 
min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), 
max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), 
max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), 
max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), 
sum("NUM_FALSES"), 
avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as 
decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE"
 as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as 
decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME" 
"DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME" 
"PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE", 
"PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE", 
"PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE", 
"PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS", 
"PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES", 
"PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN 
"PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON 
("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" = 
"DBS"."DB_ID")) VW  where "DB_NAME" = ? and "TABLE_NAME" = ?  and "COLUMN_NAME" 
in (?) and "PARTITION_NAME" in 
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?
{code}

{code}
2015-10-07T18:38:29,309 DEBUG [main([])]: metastore.MetaStoreDirectSql 
(MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 18651.1996ms + 
0.050665ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE", 
min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), 
max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), 
max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), 
max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), 
sum("NUM_FALSES"), 
avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as 
decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE"
 as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as 
decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME" 
"DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME" 
"PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE", 
"PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE", 
"PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE", 
"PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS", 
"PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES", 
"PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN 
"PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON 
("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" = 
"DBS"."DB_ID")) VW  where "DB_NAME" = ? and "TABLE_NAME" = ?  and "COLUMN_NAME" 
in (?) and "PARTITION_NAME" in 
(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?
{code}

> Deprecate the use of redundant column in colunm stats related tables
> 
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, 
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
> These redundant columns violate

[jira] [Commented] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947734#comment-14947734
 ] 

Prasanth Jayachandran commented on HIVE-12057:
--

Actualy never mind. serSearchArgument is called during during reader creation 
i.e for each splits. That shouldn't be a big problem I guess as it happens 
after split gen and goes to task logs. Few log lines per task should be fine. 

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.01.patch, HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11212) Create vectorized types for complex types

2015-10-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11212:
-
Attachment: HIVE-11212.patch

Added the extra null check that Matthew asked for.

> Create vectorized types for complex types
> -
>
> Key: HIVE-11212
> URL: https://issues.apache.org/jira/browse/HIVE-11212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, 
> HIVE-11212.patch
>
>
> We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11894) CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table column name in CTAS queries

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947728#comment-14947728
 ] 

Hive QA commented on HIVE-11894:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765271/HIVE-11894.04.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9655 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5562/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5562/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5562/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765271 - PreCommit-HIVE-TRUNK-Build

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table 
> column name in CTAS queries
> ---
>
> Key: HIVE-11894
> URL: https://issues.apache.org/jira/browse/HIVE-11894
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-11894.01.patch, HIVE-11894.02.patch, 
> HIVE-11894.03.patch, HIVE-11894.04.patch
>
>
> To repro, run lineage2.q with return path turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner

2015-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8343:
-
Description: 
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}
The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html

  was:
In addEvent() and processVertex(), there is call such as the following:
{code}
  queue.offer(event);
{code}

The return value should be checked. If false is returned, event would not have 
been queued.
Take a look at line 328 in:
http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html


> Return value from BlockingQueue.offer() is not checked in 
> DynamicPartitionPruner
> 
>
> Key: HIVE-8343
> URL: https://issues.apache.org/jira/browse/HIVE-8343
> Project: Hive
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: JongWon Park
>Priority: Minor
> Attachments: HIVE-8343.patch
>
>
> In addEvent() and processVertex(), there is call such as the following:
> {code}
>   queue.offer(event);
> {code}
> The return value should be checked. If false is returned, event would not 
> have been queued.
> Take a look at line 328 in:
> http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()

2015-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8285:
-
Description: 
{code}
  if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
  && eC.getValue() == Boolean.TRUE) {
{code}
equals() should be used in the above comparison.

  was:
{code}
  if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
  && eC.getValue() == Boolean.TRUE) {
{code}

equals() should be used in the above comparison.


> Reference equality is used on boolean values in 
> PartitionPruner#removeTruePredciates()
> --
>
> Key: HIVE-8285
> URL: https://issues.apache.org/jira/browse/HIVE-8285
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Ted Yu
>Priority: Minor
> Attachments: HIVE-8285.patch
>
>
> {code}
>   if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo
>   && eC.getValue() == Boolean.TRUE) {
> {code}
> equals() should be used in the above comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()

2015-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8282:
-
Description: 
In convertJoinMapJoin():
{code}
for (Operator parentOp : 
joinOp.getParentOperators()) {
  if (parentOp instanceof MuxOperator) {
return null;
  }
}
{code}
NPE would result if convertJoinMapJoin() returns null:

{code}
MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
bigTablePosition);
MapJoinDesc joinDesc = mapJoinOp.getConf();
{code}


  was:
In convertJoinMapJoin():
{code}
for (Operator parentOp : 
joinOp.getParentOperators()) {
  if (parentOp instanceof MuxOperator) {
return null;
  }
}
{code}
NPE would result if convertJoinMapJoin() returns null:
{code}
MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
bigTablePosition);
MapJoinDesc joinDesc = mapJoinOp.getConf();
{code}



> Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
> -
>
> Key: HIVE-8282
> URL: https://issues.apache.org/jira/browse/HIVE-8282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Ted Yu
>Priority: Minor
> Attachments: HIVE-8282.patch
>
>
> In convertJoinMapJoin():
> {code}
> for (Operator parentOp : 
> joinOp.getParentOperators()) {
>   if (parentOp instanceof MuxOperator) {
> return null;
>   }
> }
> {code}
> NPE would result if convertJoinMapJoin() returns null:
> {code}
> MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, 
> bigTablePosition);
> MapJoinDesc joinDesc = mapJoinOp.getConf();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()

2015-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HIVE-8458:
-
Description: 
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null && reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}

If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE

  was:
{code}
Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);

// if the plan path hasn't been initialized just return, nothing to clean.
if (mapPath == null && reducePath == null) {
  return;
}

try {
  FileSystem fs = mapPath.getFileSystem(conf);
{code}
If mapPath is null but reducePath is not null, getFileSystem() call would 
produce NPE


> Potential null dereference in Utilities#clearWork()
> ---
>
> Key: HIVE-8458
> URL: https://issues.apache.org/jira/browse/HIVE-8458
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1
>Reporter: Ted Yu
>Assignee: skrho
>Priority: Minor
> Attachments: HIVE-8458_001.patch
>
>
> {code}
> Path mapPath = getPlanPath(conf, MAP_PLAN_NAME);
> Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME);
> // if the plan path hasn't been initialized just return, nothing to clean.
> if (mapPath == null && reducePath == null) {
>   return;
> }
> try {
>   FileSystem fs = mapPath.getFileSystem(conf);
> {code}
> If mapPath is null but reducePath is not null, getFileSystem() call would 
> produce NPE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947709#comment-14947709
 ] 

Sergey Shelukhin commented on HIVE-12057:
-

Are you sure setSearchArgument is called for splits? I don't think split 
generation ever creates the actual record reader, it only creates ReaderImpl 
for metadata.
As for caching sure, will make a v2

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.01.patch, HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12025) refactor bucketId generating code

2015-10-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947706#comment-14947706
 ] 

Prasanth Jayachandran commented on HIVE-12025:
--

The changes introduced in this patch in BucketIdResolverImpl is the correct way 
to compute bucket number. ReduceSinkOperator had a bug in bucket number 
computation regarding negative hashcodes (multiplying by -1 vs mast with 
Int.MAX). There might be some test failures related to this change but that is 
the expected change. Since these are util methods, it will be good to have unit 
tests for these (if one doesnot exist).

Other than that, lgtm +1. Pending tests.

> refactor bucketId generating code
> -
>
> Key: HIVE-12025
> URL: https://issues.apache.org/jira/browse/HIVE-12025
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-12025.2.patch, HIVE-12025.patch
>
>
> HIVE-11983 adds ObjectInspectorUtils.getBucketHashCode() and 
> getBucketNumber().
> There are several (at least) places in Hive that perform this computation:
> # ReduceSinkOperator.computeBucketNumber
> # ReduceSinkOperator.computeHashCode
> # BucketIdResolverImpl - only in 2.0.0 ASF line
> # FileSinkOperator.findWriterOffset
> # GenericUDFHash
> Should refactor it and make sure they all call methods from 
> ObjectInspectorUtils.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12052) automatically populate file metadata to HBase metastore based on config or table properties

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12052:
---

Assignee: Sergey Shelukhin

> automatically populate file metadata to HBase metastore based on config or 
> table properties
> ---
>
> Key: HIVE-12052
> URL: https://issues.apache.org/jira/browse/HIVE-12052
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> As discussed in HIVE-11500



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947702#comment-14947702
 ] 

Prasanth Jayachandran commented on HIVE-12057:
--

This patch removes SARG logging from split generation. But there is going to be 
log lines for every split during reader creation 
(OrcInputFormat.setSearchArgument()). 

Also, regd. the log line immediately after HiveConf object creation. Can we 
create a SARG once, cache it in the Context? We can avoid multiple creations 
for SARG that way. 

Ideally, we should have SARG object created once and logged once per query. 

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.01.patch, HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12057:

Attachment: HIVE-12057.01.patch

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.01.patch, HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12060) LLAP: create separate variable for llap tests

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12060:

Attachment: HIVE-12060.01.patch

Not sure if HiveQA will pick up new variable.. will try that

> LLAP: create separate variable for llap tests
> -
>
> Key: HIVE-12060
> URL: https://issues.apache.org/jira/browse/HIVE-12060
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HIVE-12060.01.patch
>
>
> No real reason to just reuse tez one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12060) LLAP: create separate variable for llap tests

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12060:
---

Assignee: Sergey Shelukhin

> LLAP: create separate variable for llap tests
> -
>
> Key: HIVE-12060
> URL: https://issues.apache.org/jira/browse/HIVE-12060
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12060.01.patch
>
>
> No real reason to just reuse tez one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11969) start Tez session in background when starting CLI

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11969:

Fix Version/s: 1.3.0

> start Tez session in background when starting CLI
> -
>
> Key: HIVE-11969
> URL: https://issues.apache.org/jira/browse/HIVE-11969
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11969.01.patch, HIVE-11969.02.patch, 
> HIVE-11969.03.patch, HIVE-11969.04.patch, HIVE-11969.patch, Screen Shot 
> 2015-10-02 at 14.23.17 .png
>
>
> Tez session spins up AM, which can cause delays, esp. if the cluster is very 
> busy.
> This can be done in background, so the AM might get started while the user is 
> running local commands and doing other things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947610#comment-14947610
 ] 

Thejas M Nair commented on HIVE-11408:
--

+1

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947611#comment-14947611
 ] 

Thejas M Nair commented on HIVE-11408:
--

Thanks for adding the test case!


> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11212) Create vectorized types for complex types

2015-10-07 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947600#comment-14947600
 ] 

Matt McCline commented on HIVE-11212:
-

Noticed the new ensureSize method -- growing a batch beyond 
VectorizedRowBatch.DEFAULT_SIZE. -- to support the new ListColumnVector's 
storing a range of elements.  The current vectorized operators that currently 
only support primitive do have some hard coded assumptions where they allocate 
various arrays (usually copies of the selected array) as being no more than 
DEFAULT_SIZE.  This doesn't affect this patch, but we'll need to be wary when 
we later try to make the vectorized operators support complex types.

I think DecimalColumnVector.setElement needs to check if the return result from 
...getHiveDecimal(precision, scale) is null and mark the vector column entry as 
null.  I think in some cases that method returns null when the value doesn't 
fit, etc.

I'm still trying to grok the flatten/unflatten stuff...

> Create vectorized types for complex types
> -
>
> Key: HIVE-11212
> URL: https://issues.apache.org/jira/browse/HIVE-11212
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch
>
>
> We need vectorized types for structs, maps, lists, and unions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-10-07 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947594#comment-14947594
 ] 

Chaoyu Tang commented on HIVE-11786:


Thank [~sseth] very much for the help. yes, the remaining two have been covered 
by the composite index. But anyway, please give it a try and let me know. If 
that does not help, I will rewrite the query as [~sershe] suggested, which is 
also being used in directsql for getPartition.

> Deprecate the use of redundant column in colunm stats related tables
> 
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, 
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
> These redundant columns violate database normalization rules and cause a lot 
> of inconvenience (sometimes difficult) in column stats related feature 
> implementation. For example, when renaming a table, we have to update 
> TABLE_NAME column in these tables as well which is unnecessary.
> This JIRA is first to deprecate the use of these columns at HMS code level. A 
> followed JIRA is to be opened to focus on DB schema change and upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12056:

Attachment: HIVE-12056.1.patch

> Branch 1.1.1: root pom and itest pom are not linked
> ---
>
> Key: HIVE-12056
> URL: https://issues.apache.org/jira/browse/HIVE-12056
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-12056.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked

2015-10-07 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947592#comment-14947592
 ] 

Thejas M Nair commented on HIVE-12056:
--

+1

> Branch 1.1.1: root pom and itest pom are not linked
> ---
>
> Key: HIVE-12056
> URL: https://issues.apache.org/jira/browse/HIVE-12056
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-12056.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12056:

Attachment: HIVE-12056.1.1patch

> Branch 1.1.1: root pom and itest pom are not linked
> ---
>
> Key: HIVE-12056
> URL: https://issues.apache.org/jira/browse/HIVE-12056
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12056:

Attachment: (was: HIVE-12056.1.1patch)

> Branch 1.1.1: root pom and itest pom are not linked
> ---
>
> Key: HIVE-12056
> URL: https://issues.apache.org/jira/browse/HIVE-12056
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11969) start Tez session in background when starting CLI

2015-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947578#comment-14947578
 ] 

Sergey Shelukhin commented on HIVE-11969:
-

It's a simple backport to branch-1; should we do that?

> start Tez session in background when starting CLI
> -
>
> Key: HIVE-11969
> URL: https://issues.apache.org/jira/browse/HIVE-11969
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.0.0
>
> Attachments: HIVE-11969.01.patch, HIVE-11969.02.patch, 
> HIVE-11969.03.patch, HIVE-11969.04.patch, HIVE-11969.patch, Screen Shot 
> 2015-10-02 at 14.23.17 .png
>
>
> Tez session spins up AM, which can cause delays, esp. if the cluster is very 
> busy.
> This can be done in background, so the AM might get started while the user is 
> running local commands and doing other things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12059) Clean up reference to deprecated constants in AvroSerdeUtils

2015-10-07 Thread Aaron Dossett (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947575#comment-14947575
 ] 

Aaron Dossett commented on HIVE-12059:
--

My patch gets all deprecated references out EXCEPT from the SerDeSpec 
annotation in AvroSerDe.  I don't have any experience developing annotations so 
that fix for that isn't obvious to me.

One approach would be to add some redundant Strings to AvroSerdeUtils with a 
level of access below public that AvroSerDe could use.  Open to other 
suggestions if this is important enough.

> Clean up reference to deprecated constants in AvroSerdeUtils
> 
>
> Key: HIVE-12059
> URL: https://issues.apache.org/jira/browse/HIVE-12059
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
>Priority: Minor
> Attachments: HIVE-12059.patch
>
>
> AvroSerdeUtils contains several deprecated String constants that are used by 
> other Hive modules.  Those should be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11408:

Attachment: HIVE-11408.2.patch

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11408:

Attachment: (was: HIVE-11408.2.patch)

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947569#comment-14947569
 ] 

Vaibhav Gumashta commented on HIVE-11408:
-

[~thejas] Attached patch is based on branch 1.1.1.

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: HIVE-11642.22.patch

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.22.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12059) Clean up reference to deprecated constants in AvroSerdeUtils

2015-10-07 Thread Aaron Dossett (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Dossett updated HIVE-12059:
-
Attachment: HIVE-12059.patch

> Clean up reference to deprecated constants in AvroSerdeUtils
> 
>
> Key: HIVE-12059
> URL: https://issues.apache.org/jira/browse/HIVE-12059
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Aaron Dossett
>Priority: Minor
> Attachments: HIVE-12059.patch
>
>
> AvroSerdeUtils contains several deprecated String constants that are used by 
> other Hive modules.  Those should be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: (was: HIVE-11642.22.patch)

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.22.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: (was: HIVE-11642.21.patch)

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.22.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11408:

Attachment: HIVE-11408.2.patch

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: (was: HIVE-11642.20.patch)

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.22.patch, HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils

2015-10-07 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-11408:

Summary: HiveServer2 is leaking ClassLoaders when add jar / temporary 
functions are used due to constructor caching in Hadoop ReflectionUtils  (was: 
HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used)

> HiveServer2 is leaking ClassLoaders when add jar / temporary functions are 
> used due to constructor caching in Hadoop ReflectionUtils
> 
>
> Key: HIVE-11408
> URL: https://issues.apache.org/jira/browse/HIVE-11408
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-11408.1.patch
>
>
> I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue 
> (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). 
> Basically, add jar creates a new classloader for loading the classes from the 
> new jar and adds the new classloader to the SessionState object of user's 
> session, making the older one its parent. Creating a temporary function uses 
> the new classloader to load the class used for the function. On closing a 
> session, although there is code to close the classloader for the session, I'm 
> not seeing the new classloader getting GCed and from the heapdump I can see 
> it holds on to the temporary function's class that should have gone away 
> after the session close. 
> Steps to reproduce:
> 1.
> {code}
> jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar;
> {code}
> 2. 
> Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was 
> added.
> 3. 
> {code}
> jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS 
> 'org.gumashta.udf.AUDF'; 
> {code}
> 4. 
> Close the jdbc session.
> 5. 
> Take the memory snapshot and verify that the new URLClassLoader is indeed 
> there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the 
> session which we already closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables

2015-10-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947562#comment-14947562
 ] 

Siddharth Seth commented on HIVE-11786:
---

Quick note. 
I created the following indexes.
{code}
CREATE INDEX COLNAME_TBLID_IDX ON TAB_COL_STATS (COLUMN_NAME, TBL_ID);
CREATE INDEX COLNAME_PARTID_IDX ON PART_COL_STATS (COLUMN_NAME, PART_ID);
CREATE INDEX PARTNAME_IDX ON PARTITIONS (PART_NAME);
CREATE INDEX TBLNAME_IDX ON TBLS (TBL_NAME);
{code}
This did not improve performance.

I'm creating the remaining 2 indexes (even though I don't think they're 
required given they're part of a multi column index) right now. Will post an 
update once the index creation is done.

> Deprecate the use of redundant column in colunm stats related tables
> 
>
> Key: HIVE-11786
> URL: https://issues.apache.org/jira/browse/HIVE-11786
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, 
> HIVE-11786.2.patch, HIVE-11786.patch
>
>
> The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns 
> such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have 
> foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. 
> These redundant columns violate database normalization rules and cause a lot 
> of inconvenience (sometimes difficult) in column stats related feature 
> implementation. For example, when renaming a table, we have to update 
> TABLE_NAME column in these tables as well which is unnecessary.
> This JIRA is first to deprecate the use of these columns at HMS code level. A 
> followed JIRA is to be opened to focus on DB schema change and upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11914) When transactions gets a heartbeat, it doesn't update the lock heartbeat.

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947561#comment-14947561
 ] 

Hive QA commented on HIVE-11914:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765272/HIVE-11914.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9654 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testExceptions
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5561/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5561/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5561/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765272 - PreCommit-HIVE-TRUNK-Build

> When transactions gets a heartbeat, it doesn't update the lock heartbeat.
> -
>
> Key: HIVE-11914
> URL: https://issues.apache.org/jira/browse/HIVE-11914
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Transactions
>Affects Versions: 1.0.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-11914.2.patch, HIVE-11914.patch
>
>
> TxnHandler.heartbeatTxn() updates the timestamp on the txn but not on the 
> associated locks.  This makes SHOW LOCKS confusing/misleading.
> This is especially visible in Streaming API use cases which use
> TxnHandler.heartbeatTxnRange(HeartbeatTxnRangeRequest rqst) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947554#comment-14947554
 ] 

Sergey Shelukhin commented on HIVE-11642:
-

Cannot repro either of these tests. Will try again... we might just disable 
explain test for minillap, I have no idea why stats keep changing.

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.20.patch, HIVE-11642.21.patch, HIVE-11642.22.patch, 
> HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12057:
---

Assignee: Sergey Shelukhin

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12057) ORC sarg is logged too much

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12057:

Attachment: HIVE-12057.patch

[~hagleitn] can you take a look?

> ORC sarg is logged too much
> ---
>
> Key: HIVE-12057
> URL: https://issues.apache.org/jira/browse/HIVE-12057
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-12057.patch
>
>
> SARG itself has too many newlines and it's logged for every splitgenerator in 
> split generation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)

2015-10-07 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-11634:
-
Attachment: HIVE-11634.99.patch

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> --
>
> Key: HIVE-11634
> URL: https://issues.apache.org/jira/browse/HIVE-11634
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, 
> HIVE-11634.99.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-5205) Javadoc warnings in HCatalog prevent Hive from building under OpenJDK7

2015-10-07 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik resolved HIVE-5205.
--
  Resolution: Won't Fix
Release Note: Perhaps it is of no interest anymore, closing.

> Javadoc warnings in HCatalog prevent Hive from building under OpenJDK7
> --
>
> Key: HIVE-5205
> URL: https://issues.apache.org/jira/browse/HIVE-5205
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.11.0
>Reporter: Konstantin Boudnik
>Assignee: Konstantin Boudnik
>  Labels: target-version_0.11
> Fix For: 0.11.1
>
> Attachments: HIVE-5205.patch
>
>
> when building Hive with OpenJDK7 the following warning message makes the 
> build fail:
>   [javadoc] 
> /var/lib/jenkins/workspace/Shark-Hive-0.11-OJDK7/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java:81:
>  warning - @return tag has no arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false

2015-10-07 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-12008:

Attachment: HIVE-12008.2.patch

Second patch handle SelectOpt's prunelist when there is a constant column. 

> Make last two tests added by HIVE-11384 pass when hive.in.test is false
> ---
>
> Key: HIVE-12008
> URL: https://issues.apache.org/jira/browse/HIVE-12008
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-12008.1.patch, HIVE-12008.2.patch
>
>
> The last two qfile unit tests fail when hive.in.test is false. It may relate 
> how we handle prunelist for select. When select include every column in a 
> table, the prunelist for the select is empty. It may cause issues to 
> calculate its parent's prunelist.. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch

2015-10-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-11417:
-
Description: I'd like to make the default path for reading and writing ORC 
files to be vectorized. To ensure that Hive can still read row by row, we'll 
need shims to support the old API.  (was: I'd like to make the default path for 
reading and writing ORC files to be vectorized. To ensure that Hive can still 
read row by row, I'll make ObjectInspectors that are backed by the 
VectorizedRowBatch.)
Summary: Create shims for the row by row read path that is backed by 
VectorizedRowBatch  (was: Create ObjectInspectors for VectorizedRowBatch)

> Create shims for the row by row read path that is backed by VectorizedRowBatch
> --
>
> Key: HIVE-11417
> URL: https://issues.apache.org/jira/browse/HIVE-11417
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.0.0
>
>
> I'd like to make the default path for reading and writing ORC files to be 
> vectorized. To ensure that Hive can still read row by row, we'll need shims 
> to support the old API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11985) handle long typenames from Avro schema in metastore

2015-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947406#comment-14947406
 ] 

Sergey Shelukhin commented on HIVE-11985:
-

[~ashutoshc] ping?

> handle long typenames from Avro schema in metastore
> ---
>
> Key: HIVE-11985
> URL: https://issues.apache.org/jira/browse/HIVE-11985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11985.01.patch, HIVE-11985.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12048) metastore file metadata cache should not be used when deltas are present

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947382#comment-14947382
 ] 

Hive QA commented on HIVE-12048:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765250/HIVE-12048.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9653 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5560/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5560/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5560/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765250 - PreCommit-HIVE-TRUNK-Build

> metastore file metadata cache should not be used when deltas are present
> 
>
> Key: HIVE-12048
> URL: https://issues.apache.org/jira/browse/HIVE-12048
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12048.patch
>
>
> Previous code doesn't check for deltas before getting footers from local 
> cache even though stripe filtering with deltas is not possible; this is 
> because checking local cache is cheap I guess. Make sure we check early for 
> metastore-based cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3

2015-10-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11642:

Attachment: HIVE-11642.22.patch

Not sure why some processes failed

> LLAP: make sure tests pass #3
> -
>
> Key: HIVE-11642
> URL: https://issues.apache.org/jira/browse/HIVE-11642
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, 
> HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, 
> HIVE-11642.20.patch, HIVE-11642.21.patch, HIVE-11642.22.patch, 
> HIVE-11642.patch
>
>
> Tests should pass against the most recent branch and Tez 0.8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()

2015-10-07 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-11892:
--
Attachment: HIVE-11892.2.patch

Updated golden files.
Also removed the special case pattern mask for GenericUDTFCount2, I removed the 
query explain from udtf_nofetchtask.q

> UDTF run in local fetch task does not return rows forwarded during 
> GenericUDTF.close()
> --
>
> Key: HIVE-11892
> URL: https://issues.apache.org/jira/browse/HIVE-11892
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11892.1.patch, HIVE-11892.2.patch
>
>
> Using the example UDTF GenericUDTFCount2, which is part of hive-contrib:
> {noformat}
> create temporary function udtfCount2 as 
> 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2';
> set hive.fetch.task.conversion=minimal;
> -- Task created, correct output (2 rows)
> select udtfCount2() from src;
> set hive.fetch.task.conversion=more;
> -- Runs in local task, incorrect output (0 rows)
> select udtfCount2() from src;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11976) Extend CBO rules to being able to apply rules only once on a given operator

2015-10-07 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947346#comment-14947346
 ] 

Laljo John Pullokkaran commented on HIVE-11976:
---

+1

> Extend CBO rules to being able to apply rules only once on a given operator
> ---
>
> Key: HIVE-11976
> URL: https://issues.apache.org/jira/browse/HIVE-11976
> Project: Hive
>  Issue Type: New Feature
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-11976.01.patch, HIVE-11976.02.patch, 
> HIVE-11976.03.patch, HIVE-11976.04.patch, HIVE-11976.patch
>
>
> Create a way to bail out quickly from HepPlanner if the rule has been already 
> applied on a certain operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()

2015-10-07 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947291#comment-14947291
 ] 

Jason Dere commented on HIVE-11892:
---

Test failures are due to explain plan differences now that UDTFs will not use 
fetch task conversion. Will regenerate the golden files for these tests.

> UDTF run in local fetch task does not return rows forwarded during 
> GenericUDTF.close()
> --
>
> Key: HIVE-11892
> URL: https://issues.apache.org/jira/browse/HIVE-11892
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11892.1.patch
>
>
> Using the example UDTF GenericUDTFCount2, which is part of hive-contrib:
> {noformat}
> create temporary function udtfCount2 as 
> 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2';
> set hive.fetch.task.conversion=minimal;
> -- Task created, correct output (2 rows)
> select udtfCount2() from src;
> set hive.fetch.task.conversion=more;
> -- Runs in local task, incorrect output (0 rows)
> select udtfCount2() from src;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe

2015-10-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11785:

Attachment: HIVE-11785.3.patch

> Support escaping carriage return and new line for LazySimpleSerDe
> -
>
> Key: HIVE-11785
> URL: https://issues.apache.org/jira/browse/HIVE-11785
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, 
> HIVE-11785.patch, test.parquet
>
>
> Create the table and perform the queries as follows. You will see different 
> results when the setting changes. 
> The expected result should be:
> {noformat}
> 1 newline
> here
> 2 carriage return
> 3 both
> here
> {noformat}
> {noformat}
> hive> create table repo (lvalue int, charstring string) stored as parquet;
> OK
> Time taken: 0.34 seconds
> hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
> Loading data to table default.repo
> chgrp: changing ownership of 
> 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not 
> belong to hive
> Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, 
> rawDataSize=0]
> OK
> Time taken: 0.732 seconds
> hive> set hive.fetch.task.conversion=more;
> hive> select * from repo;
> OK
> 1 newline
> here
> here  carriage return
> 3 both
> here
> Time taken: 0.253 seconds, Fetched: 3 row(s)
> hive> set hive.fetch.task.conversion=none;
> hive> select * from repo;
> Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1441752031022_0006, Tracking URL = 
> http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job  
> -kill job_1441752031022_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2015-09-09 11:35:54,127 Stage-1 map = 0%,  reduce = 0%
> 2015-09-09 11:36:04,664 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.98 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 980 msec
> Ended Job = job_1441752031022_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.98 sec   HDFS Read: 4251 HDFS 
> Write: 51 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 980 msec
> OK
> 1 newline
> NULL  NULL
> 2 carriage return
> NULL  NULL
> 3 both
> NULL  NULL
> Time taken: 25.131 seconds, Fetched: 6 row(s)
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled

2015-10-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9695:
---
Affects Version/s: (was: 2.0.0)
   0.14.0
   1.0.0
   1.1.0

> Redundant filter operator in reducer Vertex when CBO is disabled
> 
>
> Key: HIVE-9695
> URL: https://issues.apache.org/jira/browse/HIVE-9695
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 0.14.0, 1.0.0, 1.1.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch
>
>
> There is a redundant filter operator in reducer Vertex when CBO is disabled.
> Query 
> {code}
> select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales a, store_returns b, store
> where
> a.ss_item_sk = b.sr_item_sk
> and a.ss_ticket_number = b.sr_ticket_number 
> and ss_sold_date_sk between 2450816 and 2451500
>   and sr_returned_date_sk between 2450816 and 2451500
>   and s_store_sk = ss_store_sk;
> {code}
> Plan snippet 
> {code}
>   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: (_col1 = _col27) and (_col8 = _col34)) and 
> _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) 
> and (_col49 = _col6)) (type: boolean)
> {code}
> Full plan with CBO disabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 
> (SIMPLE_EDGE)
>   DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: b
>   filterExpr: ((sr_item_sk is not null and sr_ticket_number 
> is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: 
> boolean)
>   Statistics: Num rows: 2370038095 Data size: 170506118656 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (sr_item_sk is not null and sr_ticket_number 
> is not null) (type: boolean)
> Statistics: Num rows: 706893063 Data size: 6498502768 
> Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key expressions: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>   sort order: ++
>   Map-reduce partition columns: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>   Statistics: Num rows: 706893063 Data size: 6498502768 
> Basic stats: COMPLETE Column stats: COMPLETE
>   value expressions: sr_returned_date_sk (type: int)
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
>   alias: store
>   filterExpr: s_store_sk is not null (type: boolean)
>   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: s_store_sk is not null (type: boolean)
> Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key expressions: s_store_sk (type: int)
>   sort order: +
>   Map-reduce partition columns: s_store_sk (type: int)
>   Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
>   alias: a
>   filterExpr: (((ss_item_sk is not null and ss_ticket_number 
> is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 
> AND 2451500) (type: boolean)
>   Statistics: Num rows: 28878719387 Data size: 2405805439460 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((ss_item_sk is not null and ss_ticket_number 
> is not null) and ss_store_sk is not null) (type: boolean)
> Statistics: Num rows: 8405840828 Data size: 110101408700 
> Basic stats: COMPLETE Column stats: COMPLETE
> 

[jira] [Updated] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe

2015-10-07 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11785:

Attachment: (was: HIVE-11785.3.patch)

> Support escaping carriage return and new line for LazySimpleSerDe
> -
>
> Key: HIVE-11785
> URL: https://issues.apache.org/jira/browse/HIVE-11785
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Fix For: 2.0.0
>
> Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, 
> HIVE-11785.patch, test.parquet
>
>
> Create the table and perform the queries as follows. You will see different 
> results when the setting changes. 
> The expected result should be:
> {noformat}
> 1 newline
> here
> 2 carriage return
> 3 both
> here
> {noformat}
> {noformat}
> hive> create table repo (lvalue int, charstring string) stored as parquet;
> OK
> Time taken: 0.34 seconds
> hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
> Loading data to table default.repo
> chgrp: changing ownership of 
> 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not 
> belong to hive
> Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, 
> rawDataSize=0]
> OK
> Time taken: 0.732 seconds
> hive> set hive.fetch.task.conversion=more;
> hive> select * from repo;
> OK
> 1 newline
> here
> here  carriage return
> 3 both
> here
> Time taken: 0.253 seconds, Fetched: 3 row(s)
> hive> set hive.fetch.task.conversion=none;
> hive> select * from repo;
> Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1441752031022_0006, Tracking URL = 
> http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job  
> -kill job_1441752031022_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2015-09-09 11:35:54,127 Stage-1 map = 0%,  reduce = 0%
> 2015-09-09 11:36:04,664 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.98 
> sec
> MapReduce Total cumulative CPU time: 2 seconds 980 msec
> Ended Job = job_1441752031022_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1   Cumulative CPU: 2.98 sec   HDFS Read: 4251 HDFS 
> Write: 51 SUCCESS
> Total MapReduce CPU Time Spent: 2 seconds 980 msec
> OK
> 1 newline
> NULL  NULL
> 2 carriage return
> NULL  NULL
> 3 both
> NULL  NULL
> Time taken: 25.131 seconds, Fetched: 6 row(s)
> hive>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled

2015-10-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-9695:
---
Component/s: (was: Physical Optimizer)
 Logical Optimizer

> Redundant filter operator in reducer Vertex when CBO is disabled
> 
>
> Key: HIVE-9695
> URL: https://issues.apache.org/jira/browse/HIVE-9695
> Project: Hive
>  Issue Type: Improvement
>  Components: Logical Optimizer
>Affects Versions: 2.0.0
>Reporter: Mostafa Mokhtar
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch
>
>
> There is a redundant filter operator in reducer Vertex when CBO is disabled.
> Query 
> {code}
> select 
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales a, store_returns b, store
> where
> a.ss_item_sk = b.sr_item_sk
> and a.ss_ticket_number = b.sr_ticket_number 
> and ss_sold_date_sk between 2450816 and 2451500
>   and sr_returned_date_sk between 2450816 and 2451500
>   and s_store_sk = ss_store_sk;
> {code}
> Plan snippet 
> {code}
>   Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE 
> Column stats: COMPLETE
>   Filter Operator
> predicate: (_col1 = _col27) and (_col8 = _col34)) and 
> _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) 
> and (_col49 = _col6)) (type: boolean)
> {code}
> Full plan with CBO disabled
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
> Tez
>   Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 
> (SIMPLE_EDGE)
>   DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13
>   Vertices:
> Map 1
> Map Operator Tree:
> TableScan
>   alias: b
>   filterExpr: ((sr_item_sk is not null and sr_ticket_number 
> is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: 
> boolean)
>   Statistics: Num rows: 2370038095 Data size: 170506118656 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: (sr_item_sk is not null and sr_ticket_number 
> is not null) (type: boolean)
> Statistics: Num rows: 706893063 Data size: 6498502768 
> Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key expressions: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>   sort order: ++
>   Map-reduce partition columns: sr_item_sk (type: int), 
> sr_ticket_number (type: int)
>   Statistics: Num rows: 706893063 Data size: 6498502768 
> Basic stats: COMPLETE Column stats: COMPLETE
>   value expressions: sr_returned_date_sk (type: int)
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
>   alias: store
>   filterExpr: s_store_sk is not null (type: boolean)
>   Statistics: Num rows: 1704 Data size: 3256276 Basic stats: 
> COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: s_store_sk is not null (type: boolean)
> Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   key expressions: s_store_sk (type: int)
>   sort order: +
>   Map-reduce partition columns: s_store_sk (type: int)
>   Statistics: Num rows: 1704 Data size: 6816 Basic stats: 
> COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
>   alias: a
>   filterExpr: (((ss_item_sk is not null and ss_ticket_number 
> is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 
> AND 2451500) (type: boolean)
>   Statistics: Num rows: 28878719387 Data size: 2405805439460 
> Basic stats: COMPLETE Column stats: COMPLETE
>   Filter Operator
> predicate: ((ss_item_sk is not null and ss_ticket_number 
> is not null) and ss_store_sk is not null) (type: boolean)
> Statistics: Num rows: 8405840828 Data size: 110101408700 
> Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
>   ke

[jira] [Commented] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()

2015-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947235#comment-14947235
 ] 

Hive QA commented on HIVE-11892:




{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12765242/HIVE-11892.1.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9653 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_inline
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode
org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udtf_output_on_close
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_select_dummy_source
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5559/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5559/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5559/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12765242 - PreCommit-HIVE-TRUNK-Build

> UDTF run in local fetch task does not return rows forwarded during 
> GenericUDTF.close()
> --
>
> Key: HIVE-11892
> URL: https://issues.apache.org/jira/browse/HIVE-11892
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-11892.1.patch
>
>
> Using the example UDTF GenericUDTFCount2, which is part of hive-contrib:
> {noformat}
> create temporary function udtfCount2 as 
> 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2';
> set hive.fetch.task.conversion=minimal;
> -- Task created, correct output (2 rows)
> select udtfCount2() from src;
> set hive.fetch.task.conversion=more;
> -- Runs in local task, incorrect output (0 rows)
> select udtfCount2() from src;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12046) Re-create spark client if connection is dropped

2015-10-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947172#comment-14947172
 ] 

Jimmy Xiang commented on HIVE-12046:


For getDefaultParallelism() and getExecutorCount(), connection drop doesn't 
break the query. We log a warning in this case. That's why I didn't touch them.

Right, the remote client can become bad right after the isActive() check. User 
will get an exception in this case. We can enhance the error message and ask 
the user to re-try the query, which should be more convenient than logging out 
and in again.

> Re-create spark client if connection is dropped
> ---
>
> Key: HIVE-12046
> URL: https://issues.apache.org/jira/browse/HIVE-12046
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12046.1.patch
>
>
> Currently, if the connection to the spark cluster is dropped, the spark 
> client will stay in a bad state. A new Hive session is needed to re-establish 
> the connection. It is better to auto reconnect in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11977) Hive should handle an external avro table with zero length files present

2015-10-07 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11977:

Affects Version/s: 0.14.0
   1.0.0
   1.2.0
   1.1.0

> Hive should handle an external avro table with zero length files present
> 
>
> Key: HIVE-11977
> URL: https://issues.apache.org/jira/browse/HIVE-11977
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1
>Reporter: Aaron Dossett
>Assignee: Aaron Dossett
> Fix For: 2.0.0
>
> Attachments: HIVE-11977.2.patch, HIVE-11977.patch
>
>
> If a zero length file is in the top level directory housing an external avro 
> table,  all hive queries on the table fail.
> This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader 
> creates a new org.apache.avro.file.DataFileReader and DataFileReader throws 
> an exception when trying to read an empty file (because the empty file lacks 
> the magic number marking it as avro).  
> AvroGenericRecordReader should detect an empty file and then behave 
> reasonably.
> Caused by: java.io.IOException: Not a data file.
> at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> at org.apache.avro.file.DataFileReader.(DataFileReader.java:97)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:81)
> at 
> org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246)
> ... 25 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >