[jira] [Updated] (HIVE-3917) Support fast operation for analyze command

2013-01-18 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Description: 
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files:
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];

In the future, all statistics without scan can be retrieved via this optional 
parameter.


  was:
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files:
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];





> Support fast operation for analyze command
> --
>
> Key: HIVE-3917
> URL: https://issues.apache.org/jira/browse/HIVE-3917
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 0.11.0
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> hive supports analyze command to gather statistics from existing 
> tables/partition 
> https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
> It collects:
> 1. Number of Rows
> 2. Number of files
> 3. Size in Bytes
> If table/partition is big, the operation would take time since it will open 
> all files and scan all data.
> It would be nice to support fast operation to gather statistics which doesn't 
> require to open all files:
> 1. Number of files
> 2. Size in Bytes
> Potential syntax is 
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
> COMPUTE STATISTICS [noscan];
> In the future, all statistics without scan can be retrieved via this optional 
> parameter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3917) Support fast operation for analyze command

2013-01-18 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Description: 
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files:
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];




  was:
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files like
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];





> Support fast operation for analyze command
> --
>
> Key: HIVE-3917
> URL: https://issues.apache.org/jira/browse/HIVE-3917
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 0.11.0
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> hive supports analyze command to gather statistics from existing 
> tables/partition 
> https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
> It collects:
> 1. Number of Rows
> 2. Number of files
> 3. Size in Bytes
> If table/partition is big, the operation would take time since it will open 
> all files and scan all data.
> It would be nice to support fast operation to gather statistics which doesn't 
> require to open all files:
> 1. Number of files
> 2. Size in Bytes
> Potential syntax is 
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
> COMPUTE STATISTICS [noscan];

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3917) Support fast operation for analyze command

2013-01-18 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Description: 
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files like
1. Number of files
2. Size in Bytes

Potential syntax is 
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
COMPUTE STATISTICS [noscan];




  was:
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files like
1. Number of files
2. Size in Bytes







> Support fast operation for analyze command
> --
>
> Key: HIVE-3917
> URL: https://issues.apache.org/jira/browse/HIVE-3917
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 0.11.0
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> hive supports analyze command to gather statistics from existing 
> tables/partition 
> https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
> It collects:
> 1. Number of Rows
> 2. Number of files
> 3. Size in Bytes
> If table/partition is big, the operation would take time since it will open 
> all files and scan all data.
> It would be nice to support fast operation to gather statistics which doesn't 
> require to open all files like
> 1. Number of files
> 2. Size in Bytes
> Potential syntax is 
> ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] 
> COMPUTE STATISTICS [noscan];

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


hive-trunk-hadoop1 - Build # 27 - Failure

2013-01-18 Thread Apache Jenkins Server
Changes for Build #1

Changes for Build #2

Changes for Build #3

Changes for Build #4
[kevinwilfong] HIVE-3552. performant manner for performing 
cubes/rollups/grouping sets for a high number of grouping set keys.


Changes for Build #5

Changes for Build #6
[cws] HIVE-3875. Negative value for hive.stats.ndv.error should be disallowed 
(Shreepadma Venugopalan via cws)


Changes for Build #7
[namit] HIVE-3888 wrong mapside groupby if no partition is being selected
(Namit Jain via Ashutosh and namit)


Changes for Build #8

Changes for Build #9

Changes for Build #10
[kevinwilfong] HIVE-3803. explain dependency should show the dependencies 
hierarchically in presence of views. (njain via kevinwilfong)


Changes for Build #11

Changes for Build #12
[namit] HIVE-3824 bug if different serdes are used for different partitions
(Namit Jain via Ashutosh and namit)


Changes for Build #13

Changes for Build #14
[hashutosh] HIVE-3004 : RegexSerDe should support other column types in 
addition to STRING (Shreepadma Venugoplan via Ashutosh Chauhan)


Changes for Build #15
[hashutosh] HIVE-2439 : Upgrade antlr version to 3.4 (Thiruvel Thirumoolan via 
Ashutosh Chauhan)


Changes for Build #16
[namit] HIVE-3897 Add a way to get the uncompressed/compressed sizes of columns
from an RC File (Kevin Wilfong via namit)


Changes for Build #17
[namit] HIVE-3899 Partition pruning fails on  =  expression
(Kevin Wilfong via namit)


Changes for Build #18
[hashutosh] HIVE-2820 : Invalid tag is used for MapJoinProcessor (Navis via 
Ashutosh Chauhan)

[namit] HIVE-3872 MAP JOIN for VIEW thorws NULL pointer exception error
(Navis via namit)


Changes for Build #19
[cws] Add DECIMAL data type (Josh Wills, Vikram Dixit, Prasad Mujumdar, Mark 
Grover and Gunther Hagleitner via cws)


Changes for Build #20
[namit] HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit)


Changes for Build #21
[namit] HIVE-3898 getReducersBucketing in SemanticAnalyzer may return more than 
the
max number of reducers (Kevin Wilfong via namit)


Changes for Build #22

Changes for Build #23
[namit] HIVE-3893 something wrong with the hive-default.xml
(jet cheng via namit)


Changes for Build #24
[namit] HIVE-3915 Union with map-only query on one side and two MR job query on 
the other
produces wrong results (Kevin Wilfong via namit)


Changes for Build #25
[namit] HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit)


Changes for Build #26
[namit] HIVE-3699 Multiple insert overwrite into multiple tables query stores 
same results
in all tables (Navis via namit)


Changes for Build #27
[hashutosh] HIVE-3537 : release locks at the end of move tasks (Namit via 
Ashutosh Chauhan)




No tests ran.

The Apache Jenkins build system has built hive-trunk-hadoop1 (build #27)

Status: Failure

Check console output at https://builds.apache.org/job/hive-trunk-hadoop1/27/ to 
view the results.

[jira] [Commented] (HIVE-3537) release locks at the end of move tasks

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557927#comment-13557927
 ] 

Hudson commented on HIVE-3537:
--

Integrated in hive-trunk-hadoop1 #27 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/27/])
HIVE-3537 : release locks at the end of move tasks (Namit via Ashutosh 
Chauhan) (Revision 1435492)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435492
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/EmbeddedLockManager.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveLockObject.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLock.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java


> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11.0
>
> Attachments: hive.3537.10.patch, hive.3537.1.patch, 
> hive.3537.2.patch, hive.3537.3.patch, hive.3537.4.patch, hive.3537.5.patch, 
> hive.3537.6.patch, hive.3537.7.patch, hive.3537.8.patch, hive.3537.9.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3537) release locks at the end of move tasks

2013-01-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3537:
---

Fix Version/s: 0.11.0
   Status: Patch Available  (was: Open)

Committed to trunk. Thanks, Namit!

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.11.0
>
> Attachments: hive.3537.10.patch, hive.3537.1.patch, 
> hive.3537.2.patch, hive.3537.3.patch, hive.3537.4.patch, hive.3537.5.patch, 
> hive.3537.6.patch, hive.3537.7.patch, hive.3537.8.patch, hive.3537.9.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557920#comment-13557920
 ] 

Hudson commented on HIVE-3699:
--

Integrated in hive-trunk-hadoop1 #26 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/26/])
HIVE-3699 Multiple insert overwrite into multiple tables query stores same 
results
in all tables (Navis via namit) (Revision 1435484)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435484
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/multi_insert_gby.q
* 
/hive/trunk/ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/multi_insert.q.out
* /hive/trunk/ql/src/test/results/clientpositive/multi_insert_gby.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out


> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557910#comment-13557910
 ] 

Namit Jain commented on HIVE-3916:
--

+1

> For outer joins, when looping over the rows looking for filtered tags, it 
> doesn't report progress
> -
>
> Key: HIVE-3916
> URL: https://issues.apache.org/jira/browse/HIVE-3916
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3916.1.patch.txt
>
>
> In the CommonJoinOperator, for outer joins, there is a loop over every row in 
> the AbstractRowContainer checking for a filtered tag.  This can take a long 
> time if there are a large number or rows, and during this time, it does not 
> report progress.  If this runs for long enough, Hadoop will kill the task for 
> failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3825) Add Operator level Hooks

2013-01-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557907#comment-13557907
 ] 

Namit Jain commented on HIVE-3825:
--

+1

Running tests

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.2.patch.txt, HIVE-3825.3.patch.txt, 
> HIVE-3825.patch.4.txt, HIVE-3825.patch.5.txt, HIVE-3825.patch.6.txt, 
> HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3699:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Navis

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1922 - Still Failing

2013-01-18 Thread Apache Jenkins Server
Changes for Build #1921
[namit] HIVE-3915 Union with map-only query on one side and two MR job query on 
the other
produces wrong results (Kevin Wilfong via namit)


Changes for Build #1922
[namit] HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit)




1 tests failed.
REGRESSION:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1

Error Message:
Unexpected exception See build/ql/tmp/hive.log, or try "ant test ... 
-Dtest.silent=false" to get more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try "ant test ... -Dtest.silent=false" to get 
more logs.
at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.runTest(TestNegativeCliDriver.java:2321)
at 
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1(TestNegativeCliDriver.java:1819)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1922)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1922/ to 
view the results.

[jira] [Commented] (HIVE-2820) Invalid tag is used for MapJoinProcessor

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557885#comment-13557885
 ] 

Hudson commented on HIVE-2820:
--

Integrated in Hive-trunk-h0.21 #1922 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1922/])
HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit) (Revision 1435281)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435281
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/join_reorder4.q
* /hive/trunk/ql/src/test/results/clientpositive/join_reorder4.q.out


> Invalid tag is used for MapJoinProcessor
> 
>
> Key: HIVE-2820
> URL: https://issues.apache.org/jira/browse/HIVE-2820
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 0.10.0
> Environment: ubuntu
>Reporter: Navis
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.3.patch, HIVE-2820.D1935.4.patch
>
>
> Testing HIVE-2810, I've found tag and alias are used in very confusing 
> manner. For example, query below fails..
> {code}
> hive> set hive.auto.convert.join=true;
>  
> hive> select /*+ STREAMTABLE(a) */ * from myinput1 a join myinput1 b on 
> a.key=b.key join myinput1 c on a.key=c.key;
> Total MapReduce jobs = 4
> Ended Job = 1667415037, job is filtered out (removed at runtime).
> Ended Job = 1739566906, job is filtered out (removed at runtime).
> Ended Job = 1113337780, job is filtered out (removed at runtime).
> 12/02/24 10:27:14 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml 
> found on the CLASSPATH at /home/navis/hive/conf/hive-default.xml
> Execution log at: 
> /tmp/navis/navis_20120224102727_cafe0d8d-9b21-441d-bd4e-b83303b31cdc.log
> 2012-02-24 10:27:14   Starting to launch local task to process map join;  
> maximum memory = 932118528
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:312)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>   at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.startForward(MapredLocalTask.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:272)
>   at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:685)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Execution failed with exit status: 2
> Obtaining error information
> {code}
> Failed task has a plan which doesn't make sense.
> {noformat}
>   Stage: Stage-8
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> b 
>   Fetch Operator
> limit: -1
> c 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> b 
>   TableScan
> alias: b
> HashTable Sink Operator
>   condition expressions:
> 0 {key} {value}
> 1 {key} {value}
> 2 {key} {value}
>   handleSkewJoin: false
>   keys:
> 0 [Column[key]]
> 1 [Column[key]]
> 2 [Column[key]]
>   Position of Big Table: 0
> c 
>   TableScan
> alias: c
> Map Join Operator
>   condition map:
>In

[jira] [Commented] (HIVE-3909) Wrong data due to HIVE-2820

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557884#comment-13557884
 ] 

Hudson commented on HIVE-3909:
--

Integrated in Hive-trunk-h0.21 #1922 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1922/])
HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit) (Revision 1435281)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435281
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/join_reorder4.q
* /hive/trunk/ql/src/test/results/clientpositive/join_reorder4.q.out


> Wrong data due to HIVE-2820
> ---
>
> Key: HIVE-3909
> URL: https://issues.apache.org/jira/browse/HIVE-3909
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3909.D8013.1.patch
>
>
> Consider the query:
> ~/hive/hive1$ more ql/src/test/queries/clientpositive/join_reorder4.q
> CREATE TABLE T1(key1 STRING, val1 STRING) STORED AS TEXTFILE;
> CREATE TABLE T2(key2 STRING, val2 STRING) STORED AS TEXTFILE;
> CREATE TABLE T3(key3 STRING, val3 STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/T1.txt' INTO TABLE T1;
> LOAD DATA LOCAL INPATH '../data/files/T2.txt' INTO TABLE T2;
> LOAD DATA LOCAL INPATH '../data/files/T3.txt' INTO TABLE T3;
> set hive.auto.convert.join=true;
> explain select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> returns:
> 2 12  2   12  2   22

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3407) Update Hive CLI xdoc (sync with CLI wikidoc)

2013-01-18 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-3407:
-

Fix Version/s: 0.10.0
Affects Version/s: 0.9.0
 Release Note: Update CLI doc to match wiki.
   Status: Patch Available  (was: Open)

Patch 1 revises two files: 

* cli.xml is synced with the wiki doc.
* site.vsl adds a division break that prevents menu indentation.

> Update Hive CLI xdoc (sync with CLI wikidoc)
> 
>
> Key: HIVE-3407
> URL: https://issues.apache.org/jira/browse/HIVE-3407
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 0.9.0
>Reporter: Lefty Leverenz
>Assignee: Lefty Leverenz
>  Labels: documentation
> Fix For: 0.10.0
>
> Attachments: HIVE-3407.1.patch
>
>
> CLI documentation for Hive exists in two places (wikidocs and xdocs) and both 
> of the versions are out of date, but the xdocs version was worse: 
> * [http://hive.apache.org/docs/r0.9.0/language_manual/cli.html]
> * [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
> A revised CLI wikidoc is available and will soon be exchanged for the old 
> wikidoc. Although there's some resistance to moving more of the wikidocs into 
> xdocs, for now let's have current information in both places instead of 
> removing the xdocs version.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3917) Support fast operation for analyze command

2013-01-18 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3917:
---

Description: 
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files like
1. Number of files
2. Size in Bytes






  was:
hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files like
1.. Number of files
2. Size in Bytes







> Support fast operation for analyze command
> --
>
> Key: HIVE-3917
> URL: https://issues.apache.org/jira/browse/HIVE-3917
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 0.11.0
>Reporter: Gang Tim Liu
>Assignee: Gang Tim Liu
>
> hive supports analyze command to gather statistics from existing 
> tables/partition 
> https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables
> It collects:
> 1. Number of Rows
> 2. Number of files
> 3. Size in Bytes
> If table/partition is big, the operation would take time since it will open 
> all files and scan all data.
> It would be nice to support fast operation to gather statistics which doesn't 
> require to open all files like
> 1. Number of files
> 2. Size in Bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3917) Support fast operation for analyze command

2013-01-18 Thread Gang Tim Liu (JIRA)
Gang Tim Liu created HIVE-3917:
--

 Summary: Support fast operation for analyze command
 Key: HIVE-3917
 URL: https://issues.apache.org/jira/browse/HIVE-3917
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.11.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu


hive supports analyze command to gather statistics from existing 
tables/partition 
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables

It collects:
1. Number of Rows
2. Number of files
3. Size in Bytes

If table/partition is big, the operation would take time since it will open all 
files and scan all data.

It would be nice to support fast operation to gather statistics which doesn't 
require to open all files like
1.. Number of files
2. Size in Bytes






--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2820) Invalid tag is used for MapJoinProcessor

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557833#comment-13557833
 ] 

Hudson commented on HIVE-2820:
--

Integrated in Hive-trunk-hadoop2 #73 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/73/])
HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit) (Revision 1435281)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435281
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/join_reorder4.q
* /hive/trunk/ql/src/test/results/clientpositive/join_reorder4.q.out


> Invalid tag is used for MapJoinProcessor
> 
>
> Key: HIVE-2820
> URL: https://issues.apache.org/jira/browse/HIVE-2820
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 0.10.0
> Environment: ubuntu
>Reporter: Navis
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.3.patch, HIVE-2820.D1935.4.patch
>
>
> Testing HIVE-2810, I've found tag and alias are used in very confusing 
> manner. For example, query below fails..
> {code}
> hive> set hive.auto.convert.join=true;
>  
> hive> select /*+ STREAMTABLE(a) */ * from myinput1 a join myinput1 b on 
> a.key=b.key join myinput1 c on a.key=c.key;
> Total MapReduce jobs = 4
> Ended Job = 1667415037, job is filtered out (removed at runtime).
> Ended Job = 1739566906, job is filtered out (removed at runtime).
> Ended Job = 1113337780, job is filtered out (removed at runtime).
> 12/02/24 10:27:14 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml 
> found on the CLASSPATH at /home/navis/hive/conf/hive-default.xml
> Execution log at: 
> /tmp/navis/navis_20120224102727_cafe0d8d-9b21-441d-bd4e-b83303b31cdc.log
> 2012-02-24 10:27:14   Starting to launch local task to process map join;  
> maximum memory = 932118528
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:312)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>   at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.startForward(MapredLocalTask.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:272)
>   at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:685)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Execution failed with exit status: 2
> Obtaining error information
> {code}
> Failed task has a plan which doesn't make sense.
> {noformat}
>   Stage: Stage-8
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> b 
>   Fetch Operator
> limit: -1
> c 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> b 
>   TableScan
> alias: b
> HashTable Sink Operator
>   condition expressions:
> 0 {key} {value}
> 1 {key} {value}
> 2 {key} {value}
>   handleSkewJoin: false
>   keys:
> 0 [Column[key]]
> 1 [Column[key]]
> 2 [Column[key]]
>   Position of Big Table: 0
> c 
>   TableScan
> alias: c
> Map Join Operator
>   condition map:
>In

[jira] [Commented] (HIVE-3909) Wrong data due to HIVE-2820

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557832#comment-13557832
 ] 

Hudson commented on HIVE-3909:
--

Integrated in Hive-trunk-hadoop2 #73 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/73/])
HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit) (Revision 1435281)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435281
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/join_reorder4.q
* /hive/trunk/ql/src/test/results/clientpositive/join_reorder4.q.out


> Wrong data due to HIVE-2820
> ---
>
> Key: HIVE-3909
> URL: https://issues.apache.org/jira/browse/HIVE-3909
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3909.D8013.1.patch
>
>
> Consider the query:
> ~/hive/hive1$ more ql/src/test/queries/clientpositive/join_reorder4.q
> CREATE TABLE T1(key1 STRING, val1 STRING) STORED AS TEXTFILE;
> CREATE TABLE T2(key2 STRING, val2 STRING) STORED AS TEXTFILE;
> CREATE TABLE T3(key3 STRING, val3 STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/T1.txt' INTO TABLE T1;
> LOAD DATA LOCAL INPATH '../data/files/T2.txt' INTO TABLE T2;
> LOAD DATA LOCAL INPATH '../data/files/T3.txt' INTO TABLE T3;
> set hive.auto.convert.join=true;
> explain select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> returns:
> 2 12  2   12  2   22

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557822#comment-13557822
 ] 

Kevin Wilfong commented on HIVE-3916:
-

Thanks Ashutosh

> For outer joins, when looping over the rows looking for filtered tags, it 
> doesn't report progress
> -
>
> Key: HIVE-3916
> URL: https://issues.apache.org/jira/browse/HIVE-3916
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3916.1.patch.txt
>
>
> In the CommonJoinOperator, for outer joins, there is a loop over every row in 
> the AbstractRowContainer checking for a filtered tag.  This can take a long 
> time if there are a large number or rows, and during this time, it does not 
> report progress.  If this runs for long enough, Hadoop will kill the task for 
> failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [DISCUSS] HCatalog becoming a subproject of Hive

2013-01-18 Thread Alan Gates
I've created a wiki page for my proposed changes at 
https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers

Text to be removed is struck through.  Text to be added is in italics.

Any recommended changes before we vote?

Alan.


On Jan 17, 2013, at 2:08 PM, Carl Steinbach wrote:

> Sounds like a good plan to me. Since Ashutosh is a member of both the Hive
> and HCatalog PMCs it probably makes more sense for him to call the vote,
> but I'm willing to do it too.
> 
> On Wed, Jan 16, 2013 at 8:24 AM, Alan Gates  wrote:
> 
>> If you think that's the best path forward that's fine.  I can't call a
>> vote I don't think, since I'm not part of the Hive PMC.  But I'm happy to
>> draft a resolution for you and then let you call the vote.  Should I do
>> that?
>> 
>> Alan.
>> 
>> On Jan 11, 2013, at 4:34 PM, Carl Steinbach wrote:
>> 
>>> Hi Alan,
>>> 
>>> I agree that submitting this for a vote is the best option.
>>> 
>>> If anyone has additional proposed modifications please make them.
>> Otherwise I propose that the Hive PMC vote on this proposal.
>>> 
>>> In order for the Hive PMC to be able to vote on these changes they need
>> to be expressed in terms of one or more of the "actions" listed at the end
>> of the Hive project bylaws:
>>> 
>>> https://cwiki.apache.org/confluence/display/Hive/Bylaws
>>> 
>>> So I think we first need to amend to the bylaws in order to define the
>> rights and privileges of a submodule committer, and then separately vote
>> the HCatalog committers in as Hive submodule committers. Does this make
>> sense?
>>> 
>>> Thanks.
>>> 
>>> Carl
>>> 
>> 
>> 



[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2013-01-18 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3825:


Attachment: HIVE-3825.patch.6.txt

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.2.patch.txt, HIVE-3825.3.patch.txt, 
> HIVE-3825.patch.4.txt, HIVE-3825.patch.5.txt, HIVE-3825.patch.6.txt, 
> HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2013-01-18 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3825:


Attachment: (was: HIVE-3825.patch.6.txt)

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.2.patch.txt, HIVE-3825.3.patch.txt, 
> HIVE-3825.patch.4.txt, HIVE-3825.patch.5.txt, HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557789#comment-13557789
 ] 

Hudson commented on HIVE-3915:
--

Integrated in Hive-trunk-hadoop2 #72 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/72/])
HIVE-3915 Union with map-only query on one side and two MR job query on the 
other
produces wrong results (Kevin Wilfong via namit) (Revision 1435203)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435203
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* /hive/trunk/ql/src/test/queries/clientpositive/union33.q
* /hive/trunk/ql/src/test/results/clientpositive/union33.q.out


> Union with map-only query on one side and two MR job query on the other 
> produces wrong results
> --
>
> Key: HIVE-3915
> URL: https://issues.apache.org/jira/browse/HIVE-3915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a 
> subquery involving two sequential map reduce jobs on the other, it can 
> produce wrong results.  It appears that if the map only queries table scan 
> operator is processed first the task involving a union is made a root task.  
> Then when the other subquery is processed, the second map reduce job gains 
> the task involving the union as a child and it is made a root task.  This 
> means that both the first and second map reduce jobs are root tasks, so the 
> dependency between the two is ignored.  If they are run in parallel (i.e. the 
> cluster has more than one node) no results will be produced for the side of 
> the union with the two map reduce jobs and only the results of the other side 
> of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this 
> bug, and it is determined by the order values are retrieved from a map, and 
> hence hard to predict, so it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557786#comment-13557786
 ] 

Ashutosh Chauhan commented on HIVE-3916:


Kevin,
If you are doing outer joins, you may also want to be wary of HIVE-3381 and 
HIVE-2839

> For outer joins, when looping over the rows looking for filtered tags, it 
> doesn't report progress
> -
>
> Key: HIVE-3916
> URL: https://issues.apache.org/jira/browse/HIVE-3916
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3916.1.patch.txt
>
>
> In the CommonJoinOperator, for outer joins, there is a loop over every row in 
> the AbstractRowContainer checking for a filtered tag.  This can take a long 
> time if there are a large number or rows, and during this time, it does not 
> report progress.  If this runs for long enough, Hadoop will kill the task for 
> failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3916:


Status: Patch Available  (was: Open)

> For outer joins, when looping over the rows looking for filtered tags, it 
> doesn't report progress
> -
>
> Key: HIVE-3916
> URL: https://issues.apache.org/jira/browse/HIVE-3916
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3916.1.patch.txt
>
>
> In the CommonJoinOperator, for outer joins, there is a loop over every row in 
> the AbstractRowContainer checking for a filtered tag.  This can take a long 
> time if there are a large number or rows, and during this time, it does not 
> report progress.  If this runs for long enough, Hadoop will kill the task for 
> failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3916:


Attachment: HIVE-3916.1.patch.txt

> For outer joins, when looping over the rows looking for filtered tags, it 
> doesn't report progress
> -
>
> Key: HIVE-3916
> URL: https://issues.apache.org/jira/browse/HIVE-3916
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3916.1.patch.txt
>
>
> In the CommonJoinOperator, for outer joins, there is a loop over every row in 
> the AbstractRowContainer checking for a filtered tag.  This can take a long 
> time if there are a large number or rows, and during this time, it does not 
> report progress.  If this runs for long enough, Hadoop will kill the task for 
> failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557757#comment-13557757
 ] 

Kevin Wilfong commented on HIVE-3916:
-

https://reviews.facebook.net/D8031

> For outer joins, when looping over the rows looking for filtered tags, it 
> doesn't report progress
> -
>
> Key: HIVE-3916
> URL: https://issues.apache.org/jira/browse/HIVE-3916
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
>
> In the CommonJoinOperator, for outer joins, there is a loop over every row in 
> the AbstractRowContainer checking for a filtered tag.  This can take a long 
> time if there are a large number or rows, and during this time, it does not 
> report progress.  If this runs for long enough, Hadoop will kill the task for 
> failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3916) For outer joins, when looping over the rows looking for filtered tags, it doesn't report progress

2013-01-18 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3916:
---

 Summary: For outer joins, when looping over the rows looking for 
filtered tags, it doesn't report progress
 Key: HIVE-3916
 URL: https://issues.apache.org/jira/browse/HIVE-3916
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong


In the CommonJoinOperator, for outer joins, there is a loop over every row in 
the AbstractRowContainer checking for a filtered tag.  This can take a long 
time if there are a large number or rows, and during this time, it does not 
report progress.  If this runs for long enough, Hadoop will kill the task for 
failing to report progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-18 Thread Dilip Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dilip Joseph updated HIVE-3884:
---

Status: Patch Available  (was: Open)

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: describe_test_table.png, HIVE-3884.1.patch.txt, 
> HIVE-3884.2.patch.txt
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-18 Thread Dilip Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dilip Joseph updated HIVE-3884:
---

Attachment: HIVE-3884.2.patch.txt

Added comment about tabs at end of comment output.  Minor refactoring.

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: describe_test_table.png, HIVE-3884.1.patch.txt, 
> HIVE-3884.2.patch.txt
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

2013-01-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557700#comment-13557700
 ] 

Alan Gates commented on HIVE-896:
-

bq. If I read this right you are using CLUSTER BY and SORT BY instead of 
PARTITION BY and ORDER BY for syntax in OVER. Why?  To highlight the 
similarity. The Partition/Order specs in a Window clause have the same meaning 
as Cluster/Distribute in HQL. 
This is only true as long as you have only one OVER clause, right?  As soon as 
you add the ability to have separate OVER clauses partitioning by different 
keys (which users will want very soon) you lose this identity.

Even if you decide to retain this I would argue that the standard PARTITION 
BY/ORDER BY syntax should be accepted as well.  HQL already has enough one off 
syntax that makes life hard for people coming from more standard SQL.  It 
should not be exacerbated.

bq. Could you explain how the partition is handled in memory...
Partitions are backed by a Persistent List ( see 
ptf.ds.PartitionedByteBasedList) . We need do to some work to refactor this 
package. Yes you are right, things can be done in delaying bringing rows into a 
partition and getting rid of rows once outside the window. This is true for 
Windowing Table Function; especially for Range based Windows.
But for a general PTF the contract is Partition in Partition out. For e.g. 
CandidateFrequency function will read the rows in a partition multiple times.

This is part of where I was going with my earlier question on why a windowing 
function would ever return a partition.  I am becoming less convinced that it 
makes sense to combine windowing and partition functions.  While they both take 
partitions as inputs they return different things.  Partition functions return 
partitions and windowing functions return a single value.  As you point out 
here the partition functions will also not be interested in the range limiting 
features of windowing functions.  But taking advantage of this in windowing 
functions will be very important for performance optimizations, I suspect.  At 
the very least it seems like partitioning functions and windowing functions 
should be presented as separate entities to users and UDF writers, even if for 
now Hive shares some of the framework for handling them underneath.  This way 
in the future optimizations and new features can be added in a way that is 
advantageous for each.

> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---
>
> Key: HIVE-896
> URL: https://issues.apache.org/jira/browse/HIVE-896
> Project: Hive
>  Issue Type: New Feature
>  Components: OLAP, UDF
>Reporter: Amr Awadallah
>Priority: Minor
> Attachments: HIVE-896.1.patch.txt
>
>
> Windowing functions are very useful for click stream processing and similar 
> time-series/sliding-window analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #265

2013-01-18 Thread Apache Jenkins Server
See 

--
[...truncated 36495 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2013-01-18_14-29-51_442_7615735881161873409/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] Copying file: 

[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 

[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2013-01-18_14-29-55_743_3073584927146122645/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2013-01-18_14-29-55_743_3073584927146122645/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=
[junit] Hive history 
file=
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (key

[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-18 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-3874:


Attachment: orc.tgz

I've updated the patch with the index suppression option that Nammit asked for.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx, orc.tgz, orc.tgz
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2820) Invalid tag is used for MapJoinProcessor

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557578#comment-13557578
 ] 

Hudson commented on HIVE-2820:
--

Integrated in hive-trunk-hadoop1 #25 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/25/])
HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit) (Revision 1435281)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435281
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/join_reorder4.q
* /hive/trunk/ql/src/test/results/clientpositive/join_reorder4.q.out


> Invalid tag is used for MapJoinProcessor
> 
>
> Key: HIVE-2820
> URL: https://issues.apache.org/jira/browse/HIVE-2820
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0, 0.10.0
> Environment: ubuntu
>Reporter: Navis
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.1.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.2.patch, 
> ASF.LICENSE.NOT.GRANTED--HIVE-2820.D1935.3.patch, HIVE-2820.D1935.4.patch
>
>
> Testing HIVE-2810, I've found tag and alias are used in very confusing 
> manner. For example, query below fails..
> {code}
> hive> set hive.auto.convert.join=true;
>  
> hive> select /*+ STREAMTABLE(a) */ * from myinput1 a join myinput1 b on 
> a.key=b.key join myinput1 c on a.key=c.key;
> Total MapReduce jobs = 4
> Ended Job = 1667415037, job is filtered out (removed at runtime).
> Ended Job = 1739566906, job is filtered out (removed at runtime).
> Ended Job = 1113337780, job is filtered out (removed at runtime).
> 12/02/24 10:27:14 WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml 
> found on the CLASSPATH at /home/navis/hive/conf/hive-default.xml
> Execution log at: 
> /tmp/navis/navis_20120224102727_cafe0d8d-9b21-441d-bd4e-b83303b31cdc.log
> 2012-02-24 10:27:14   Starting to launch local task to process map join;  
> maximum memory = 932118528
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:312)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>   at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>   at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.startForward(MapredLocalTask.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:272)
>   at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:685)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Execution failed with exit status: 2
> Obtaining error information
> {code}
> Failed task has a plan which doesn't make sense.
> {noformat}
>   Stage: Stage-8
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> b 
>   Fetch Operator
> limit: -1
> c 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> b 
>   TableScan
> alias: b
> HashTable Sink Operator
>   condition expressions:
> 0 {key} {value}
> 1 {key} {value}
> 2 {key} {value}
>   handleSkewJoin: false
>   keys:
> 0 [Column[key]]
> 1 [Column[key]]
> 2 [Column[key]]
>   Position of Big Table: 0
> c 
>   TableScan
> alias: c
> Map Join Operator
>   condition map:
>In

[jira] [Commented] (HIVE-3909) Wrong data due to HIVE-2820

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557577#comment-13557577
 ] 

Hudson commented on HIVE-3909:
--

Integrated in hive-trunk-hadoop1 #25 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/25/])
HIVE-3909 Wrong data due to HIVE-2820
(Navis via namit) (Revision 1435281)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435281
Files : 
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ConditionalTask.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
* /hive/trunk/ql/src/test/queries/clientpositive/join_reorder4.q
* /hive/trunk/ql/src/test/results/clientpositive/join_reorder4.q.out


> Wrong data due to HIVE-2820
> ---
>
> Key: HIVE-3909
> URL: https://issues.apache.org/jira/browse/HIVE-3909
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3909.D8013.1.patch
>
>
> Consider the query:
> ~/hive/hive1$ more ql/src/test/queries/clientpositive/join_reorder4.q
> CREATE TABLE T1(key1 STRING, val1 STRING) STORED AS TEXTFILE;
> CREATE TABLE T2(key2 STRING, val2 STRING) STORED AS TEXTFILE;
> CREATE TABLE T3(key3 STRING, val3 STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/T1.txt' INTO TABLE T1;
> LOAD DATA LOCAL INPATH '../data/files/T2.txt' INTO TABLE T2;
> LOAD DATA LOCAL INPATH '../data/files/T3.txt' INTO TABLE T3;
> set hive.auto.convert.join=true;
> explain select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> returns:
> 2 12  2   12  2   22

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3638) metadataonly1.q test fails with Hadoop23

2013-01-18 Thread Chris Drome (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557500#comment-13557500
 ] 

Chris Drome commented on HIVE-3638:
---

[~ashutoshc]: Can we keep this open until I can get approval from my manager 
that it is a non-issue when we move to 2.0. Currently we are still building on 
0.23, so it is an issue for us.

> metadataonly1.q test fails with Hadoop23
> 
>
> Key: HIVE-3638
> URL: https://issues.apache.org/jira/browse/HIVE-3638
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Chris Drome
>
> Hive creates an empty file as a hack to get Hadoop to run a mapper.
> This no longer works with Hadoop23, causing this test to fail. Note that this 
> tests empty partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2013-01-18 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3825:


Status: Patch Available  (was: Open)

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.2.patch.txt, HIVE-3825.3.patch.txt, 
> HIVE-3825.patch.4.txt, HIVE-3825.patch.5.txt, HIVE-3825.patch.6.txt, 
> HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3825) Add Operator level Hooks

2013-01-18 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3825:


Attachment: HIVE-3825.patch.6.txt

updated patch 

> Add Operator level Hooks
> 
>
> Key: HIVE-3825
> URL: https://issues.apache.org/jira/browse/HIVE-3825
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pamela Vagata
>Assignee: Pamela Vagata
>Priority: Minor
> Attachments: HIVE-3825.2.patch.txt, HIVE-3825.3.patch.txt, 
> HIVE-3825.patch.4.txt, HIVE-3825.patch.5.txt, HIVE-3825.patch.6.txt, 
> HIVE-3825.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3036) hive should support BigDecimal datatype

2013-01-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3036:
---

Fix Version/s: 0.11.0

> hive should support BigDecimal datatype
> ---
>
> Key: HIVE-3036
> URL: https://issues.apache.org/jira/browse/HIVE-3036
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.7.1, 0.8.0, 0.8.1
>Reporter: Anurag Tangri
> Fix For: 0.11.0
>
>
> hive has support for big int but people have use cases where they need 
> decimal precision to a big value.
> Values in question are like decimal(x,y).
> for eg. decimal of form (17,6) which cannot be represented by float/double.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3036) hive should support BigDecimal datatype

2013-01-18 Thread Mikhail Bautin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557476#comment-13557476
 ] 

Mikhail Bautin commented on HIVE-3036:
--

Actually this seems to be a duplicate of HIVE-2693.

> hive should support BigDecimal datatype
> ---
>
> Key: HIVE-3036
> URL: https://issues.apache.org/jira/browse/HIVE-3036
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor, Types
>Affects Versions: 0.7.1, 0.8.0, 0.8.1
>Reporter: Anurag Tangri
>
> hive has support for big int but people have use cases where they need 
> decimal precision to a big value.
> Values in question are like decimal(x,y).
> for eg. decimal of form (17,6) which cannot be represented by float/double.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-18 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-3874:


Attachment: orc.tgz

Here's the current version of the code. The seek to row isn't implemented and 
it is still a standalone project, but it will let people start looking at it.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx, orc.tgz
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557416#comment-13557416
 ] 

Ashutosh Chauhan commented on HIVE-2206:


Oh.. I see that you have already updated RB and jira. I will take a look at it 
soon.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
> HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
> HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
> HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
> testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557411#comment-13557411
 ] 

Ashutosh Chauhan commented on HIVE-2206:


I am having second thoughts on cloning. Cloning graphs (like query plan) or 
dense structures (like ParseContext) is fraught with perils. Its likely that 
cloning will require new code and arguably have hard to detect bugs, since we 
need to track down every single pointer and clone all the way through. I think 
to avoid such issues and for simplicity, we can drop the cloning idea. The 
feature is anyway behind the config option which is off default, so query-plan 
will be modified only for the users who turn the flag on. 
Yin, if you have addressed my other comments, can you update the patch on RB 
and upload here on jira, I will take another look at it.

> add a new optimizer for query correlation discovery and optimization
> 
>
> Key: HIVE-2206
> URL: https://issues.apache.org/jira/browse/HIVE-2206
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.10.0
>Reporter: He Yongqiang
>Assignee: Yin Huai
> Attachments: HIVE-2206.10-r1384442.patch.txt, 
> HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
> HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
> HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
> HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
> HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
> HIVE-2206.20-r1434012.patch.txt, HIVE-2206.2.patch.txt, 
> HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, 
> HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, 
> HIVE-2206.8.r1224646.patch.txt, HIVE-2206.8-r1237253.patch.txt, 
> testQueries.2.q, YSmartPatchForHive.patch
>
>
> This issue proposes a new logical optimizer called Correlation Optimizer, 
> which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
> job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
> paper and slides of YSmart are linked at the bottom.
> Since Hive translates queries in a sentence by sentence fashion, for every 
> operation which may need to shuffle the data (e.g. join and aggregation 
> operations), Hive will generate a MapReduce job for that operation. However, 
> for those operations which may need to shuffle the data, they may involve 
> correlations explained below and thus can be executed in a single MR job.
> # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
> input relation sets are not disjoint;
> # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
> have not only input correlation, but also the same partition key;
> # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
> child nodes if it has the same partition key as that child node.
> The current implementation of correlation optimizer only detect correlations 
> among MR jobs for reduce-side join operators and reduce-side aggregation 
> operators (not map only aggregation). A query will be optimized if it 
> satisfies following conditions.
> # There exists a MR job for reduce-side join operator or reduce side 
> aggregation operator which have JFC with all of its parents MR jobs (TCs will 
> be also exploited if JFC exists);
> # All input tables of those correlated MR job are original input tables (not 
> intermediate tables generated by sub-queries); and 
> # No self join is involved in those correlated MR jobs.
> Correlation optimizer is implemented as a logical optimizer. The main reasons 
> are that it only needs to manipulate the query plan tree and it can leverage 
> the existing component on generating MR jobs.
> Current implementation can serve as a framework for correlation related 
> optimizations. I think that it is better than adding individual optimizers. 
> There are several work that can be done in future to improve this optimizer. 
> Here are three examples.
> # Support queries only involve TC;
> # Support queries in which input tables of correlated MR jobs involves 
> intermediate tables; and 
> # Optimize queries involving self join. 
> References:
> Paper and presentation of YSmart.
> Paper: 
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
> Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3909) Wrong data due to HIVE-2820

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-3909.
--

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed

Committed. Thanks Navis

> Wrong data due to HIVE-2820
> ---
>
> Key: HIVE-3909
> URL: https://issues.apache.org/jira/browse/HIVE-3909
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3909.D8013.1.patch
>
>
> Consider the query:
> ~/hive/hive1$ more ql/src/test/queries/clientpositive/join_reorder4.q
> CREATE TABLE T1(key1 STRING, val1 STRING) STORED AS TEXTFILE;
> CREATE TABLE T2(key2 STRING, val2 STRING) STORED AS TEXTFILE;
> CREATE TABLE T3(key3 STRING, val3 STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/T1.txt' INTO TABLE T1;
> LOAD DATA LOCAL INPATH '../data/files/T2.txt' INTO TABLE T2;
> LOAD DATA LOCAL INPATH '../data/files/T3.txt' INTO TABLE T3;
> set hive.auto.convert.join=true;
> explain select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> returns:
> 2 12  2   12  2   22

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-18 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557353#comment-13557353
 ] 

Owen O'Malley commented on HIVE-3874:
-

Yin, large stripes (and I'm defaulting to 250MB) enable efficient reads from 
HDFS. The row indexes help address the issue of the large stripes by providing 
the offsets within the large stripes.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-18 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557341#comment-13557341
 ] 

Owen O'Malley commented on HIVE-3874:
-

Joydeep, I've used a two level strategy:
  * large stripes (default 250MB) to enable large efficient reads
  * relatively frequent row index entries (default 10k rows) to enable skipping 
with in a stripe

The row index entries have the locations within each column to enable seeking 
to the right compression block and byte within the decompressed block.

I obviously did consider HFile, although from a practical point of view it is 
fairly embedded within HBase. Additionally, since it treats each of the columns 
as bytes it can't do any type-specific encodings/compression and can't 
interpret the column values, which is critical for performance.

Once you have the ability to skip large sets of rows based on the filter 
predicates, you can sort the table on the secondary keys and achieve a large 
speed up. For example, if your primary partition is transaction date, you might 
want to sort the table on state, zip, and last name. Then if you are looking 
for just the records in CA it won't need to read the records for the other 
states.




> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


hive-trunk-hadoop1 - Build # 24 - Failure

2013-01-18 Thread Apache Jenkins Server
Changes for Build #1

Changes for Build #2

Changes for Build #3

Changes for Build #4
[kevinwilfong] HIVE-3552. performant manner for performing 
cubes/rollups/grouping sets for a high number of grouping set keys.


Changes for Build #5

Changes for Build #6
[cws] HIVE-3875. Negative value for hive.stats.ndv.error should be disallowed 
(Shreepadma Venugopalan via cws)


Changes for Build #7
[namit] HIVE-3888 wrong mapside groupby if no partition is being selected
(Namit Jain via Ashutosh and namit)


Changes for Build #8

Changes for Build #9

Changes for Build #10
[kevinwilfong] HIVE-3803. explain dependency should show the dependencies 
hierarchically in presence of views. (njain via kevinwilfong)


Changes for Build #11

Changes for Build #12
[namit] HIVE-3824 bug if different serdes are used for different partitions
(Namit Jain via Ashutosh and namit)


Changes for Build #13

Changes for Build #14
[hashutosh] HIVE-3004 : RegexSerDe should support other column types in 
addition to STRING (Shreepadma Venugoplan via Ashutosh Chauhan)


Changes for Build #15
[hashutosh] HIVE-2439 : Upgrade antlr version to 3.4 (Thiruvel Thirumoolan via 
Ashutosh Chauhan)


Changes for Build #16
[namit] HIVE-3897 Add a way to get the uncompressed/compressed sizes of columns
from an RC File (Kevin Wilfong via namit)


Changes for Build #17
[namit] HIVE-3899 Partition pruning fails on  =  expression
(Kevin Wilfong via namit)


Changes for Build #18
[hashutosh] HIVE-2820 : Invalid tag is used for MapJoinProcessor (Navis via 
Ashutosh Chauhan)

[namit] HIVE-3872 MAP JOIN for VIEW thorws NULL pointer exception error
(Navis via namit)


Changes for Build #19
[cws] Add DECIMAL data type (Josh Wills, Vikram Dixit, Prasad Mujumdar, Mark 
Grover and Gunther Hagleitner via cws)


Changes for Build #20
[namit] HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit)


Changes for Build #21
[namit] HIVE-3898 getReducersBucketing in SemanticAnalyzer may return more than 
the
max number of reducers (Kevin Wilfong via namit)


Changes for Build #22

Changes for Build #23
[namit] HIVE-3893 something wrong with the hive-default.xml
(jet cheng via namit)


Changes for Build #24
[namit] HIVE-3915 Union with map-only query on one side and two MR job query on 
the other
produces wrong results (Kevin Wilfong via namit)




No tests ran.

The Apache Jenkins build system has built hive-trunk-hadoop1 (build #24)

Status: Failure

Check console output at https://builds.apache.org/job/hive-trunk-hadoop1/24/ to 
view the results.

Hive-trunk-h0.21 - Build # 1921 - Failure

2013-01-18 Thread Apache Jenkins Server
Changes for Build #1921
[namit] HIVE-3915 Union with map-only query on one side and two MR job query on 
the other
produces wrong results (Kevin Wilfong via namit)




No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1921)

Status: Failure

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1921/ to 
view the results.

[jira] [Commented] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557311#comment-13557311
 ] 

Hudson commented on HIVE-3915:
--

Integrated in Hive-trunk-h0.21 #1921 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1921/])
HIVE-3915 Union with map-only query on one side and two MR job query on the 
other
produces wrong results (Kevin Wilfong via namit) (Revision 1435203)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435203
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* /hive/trunk/ql/src/test/queries/clientpositive/union33.q
* /hive/trunk/ql/src/test/results/clientpositive/union33.q.out


> Union with map-only query on one side and two MR job query on the other 
> produces wrong results
> --
>
> Key: HIVE-3915
> URL: https://issues.apache.org/jira/browse/HIVE-3915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a 
> subquery involving two sequential map reduce jobs on the other, it can 
> produce wrong results.  It appears that if the map only queries table scan 
> operator is processed first the task involving a union is made a root task.  
> Then when the other subquery is processed, the second map reduce job gains 
> the task involving the union as a child and it is made a root task.  This 
> means that both the first and second map reduce jobs are root tasks, so the 
> dependency between the two is ignored.  If they are run in parallel (i.e. the 
> cluster has more than one node) no results will be produced for the side of 
> the union with the two map reduce jobs and only the results of the other side 
> of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this 
> bug, and it is determined by the order values are retrieved from a map, and 
> hence hard to predict, so it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results

2013-01-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557310#comment-13557310
 ] 

Hudson commented on HIVE-3915:
--

Integrated in hive-trunk-hadoop1 #24 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/24/])
HIVE-3915 Union with map-only query on one side and two MR job query on the 
other
produces wrong results (Kevin Wilfong via namit) (Revision 1435203)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1435203
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java
* /hive/trunk/ql/src/test/queries/clientpositive/union33.q
* /hive/trunk/ql/src/test/results/clientpositive/union33.q.out


> Union with map-only query on one side and two MR job query on the other 
> produces wrong results
> --
>
> Key: HIVE-3915
> URL: https://issues.apache.org/jira/browse/HIVE-3915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a 
> subquery involving two sequential map reduce jobs on the other, it can 
> produce wrong results.  It appears that if the map only queries table scan 
> operator is processed first the task involving a union is made a root task.  
> Then when the other subquery is processed, the second map reduce job gains 
> the task involving the union as a child and it is made a root task.  This 
> means that both the first and second map reduce jobs are root tasks, so the 
> dependency between the two is ignored.  If they are run in parallel (i.e. the 
> cluster has more than one node) no results will be produced for the side of 
> the union with the two map reduce jobs and only the results of the other side 
> of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this 
> bug, and it is determined by the order values are retrieved from a map, and 
> hence hard to predict, so it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3915:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Kevin

> Union with map-only query on one side and two MR job query on the other 
> produces wrong results
> --
>
> Key: HIVE-3915
> URL: https://issues.apache.org/jira/browse/HIVE-3915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.11.0
>
> Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a 
> subquery involving two sequential map reduce jobs on the other, it can 
> produce wrong results.  It appears that if the map only queries table scan 
> operator is processed first the task involving a union is made a root task.  
> Then when the other subquery is processed, the second map reduce job gains 
> the task involving the union as a child and it is made a root task.  This 
> means that both the first and second map reduce jobs are root tasks, so the 
> dependency between the two is ignored.  If they are run in parallel (i.e. the 
> cluster has more than one node) no results will be produced for the side of 
> the union with the two map reduce jobs and only the results of the other side 
> of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this 
> bug, and it is determined by the order values are retrieved from a map, and 
> hence hard to predict, so it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3638) metadataonly1.q test fails with Hadoop23

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557299#comment-13557299
 ] 

Ashutosh Chauhan commented on HIVE-3638:


Shall this be closed as Cannot Reproduce? Since we run with 2.0.0-alpha and 
latest builds also doesn't report this failure. 
https://builds.apache.org/job/Hive-trunk-hadoop2/lastCompletedBuild/testReport/

> metadataonly1.q test fails with Hadoop23
> 
>
> Key: HIVE-3638
> URL: https://issues.apache.org/jira/browse/HIVE-3638
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0, 0.9.1
>Reporter: Chris Drome
>
> Hive creates an empty file as a hack to get Hadoop to run a mapper.
> This no longer works with Hadoop23, causing this test to fail. Note that this 
> tests empty partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557284#comment-13557284
 ] 

Namit Jain commented on HIVE-3833:
--

Only if the 2 schemas are different, otherwise it is identityConverter

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3833) object inspectors should be initialized based on partition metadata

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557258#comment-13557258
 ] 

Ashutosh Chauhan commented on HIVE-3833:


Could this possibly result in performance hit (CPU)? Earlier, data was 
deserialized per table schema, now it will be first deserialized per partition 
schema and than converted to comply with table schema.

> object inspectors should be initialized based on partition metadata
> ---
>
> Key: HIVE-3833
> URL: https://issues.apache.org/jira/browse/HIVE-3833
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3833.10.patch, hive.3833.11.patch, 
> hive.3833.12.patch, hive.3833.13.patch, hive.3833.14.patch, 
> hive.3833.1.patch, hive.3833.2.patch, hive.3833.3.patch, hive.3833.4.patch, 
> hive.3833.5.patch, hive.3833.6.patch, hive.3833.7.patch, hive.3833.8.patch, 
> hive.3833.9.patch
>
>
> Currently, different partitions can be picked up for the same input split 
> based on the
> serdes' etc. And, we dont allow to change the schema for 
> LazyColumnarBinarySerDe.
> Instead of that, different partitions should be part of the same split, only 
> if the
> partition schemas exactly match. The operator tree object inspectors should 
> be based
> on the partition schema. That would give greater flexibility and also help 
> using binary serde with rcfile

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-01-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557245#comment-13557245
 ] 

Yin Huai commented on HIVE-3874:


one question. Why a small row group size has to be used if we want to speed up 
secondary index lookups? When using a large row group size, if we store a 
column to multiple blocks, with a index of this column in this large row group, 
we do not need to read the entire column from the disk. Also, we can use row 
numbers to locate what blocks should be read from other columns in the row 
group.

If a small row group size is used, the size of a single column can be very 
small and a single buffered read may retrieve lots of unnecessary data from 
those unneeded columns from the disk.

> Create a new Optimized Row Columnar file format for Hive
> 
>
> Key: HIVE-3874
> URL: https://issues.apache.org/jira/browse/HIVE-3874
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: OrcFileIntro.pptx
>
>
> There are several limitations of the current RC File format that I'd like to 
> address by creating a new format:
> * each column value is stored as a binary blob, which means:
> ** the entire column value must be read, decompressed, and deserialized
> ** the file format can't use smarter type-specific compression
> ** push down filters can't be evaluated
> * the start of each row group needs to be found by scanning
> * user metadata can only be added to the file when the file is created
> * the file doesn't store the number of rows per a file or row group
> * there is no mechanism for seeking to a particular row number, which is 
> required for external indexes.
> * there is no mechanism for storing light weight indexes within the file to 
> enable push-down filters to skip entire row groups.
> * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3537) release locks at the end of move tasks

2013-01-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557206#comment-13557206
 ] 

Ashutosh Chauhan commented on HIVE-3537:


Looks good. Running tests. Will commit if tests pass.

> release locks at the end of move tasks
> --
>
> Key: HIVE-3537
> URL: https://issues.apache.org/jira/browse/HIVE-3537
> Project: Hive
>  Issue Type: Bug
>  Components: Locking, Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3537.10.patch, hive.3537.1.patch, 
> hive.3537.2.patch, hive.3537.3.patch, hive.3537.4.patch, hive.3537.5.patch, 
> hive.3537.6.patch, hive.3537.7.patch, hive.3537.8.patch, hive.3537.9.patch
>
>
> Look at HIVE-3106 for details.
> In order to make sure that concurrency is not an issue for multi-table 
> inserts, the current option is to introduce a dependency task, which thereby
> delays the creation of all partitions. It would be desirable to release the
> locks for the outputs as soon as the move task is completed. That way, for
> multi-table inserts, the concurrency can be enabled without delaying any 
> table.
> Currently, the movetask contains a input/output, but they do not seem to be
> populated correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Attachment: hive.3403.15.patch

> user should not specify mapjoin to perform sort-merge bucketed join
> ---
>
> Key: HIVE-3403
> URL: https://issues.apache.org/jira/browse/HIVE-3403
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.3403.10.patch, hive.3403.11.patch, 
> hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
> hive.3403.15.patch, hive.3403.1.patch, hive.3403.2.patch, hive.3403.3.patch, 
> hive.3403.4.patch, hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, 
> hive.3403.8.patch, hive.3403.9.patch
>
>
> Currently, in order to perform a sort merge bucketed join, the user needs
> to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
> mapjoin hint.
> The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557115#comment-13557115
 ] 

Namit Jain commented on HIVE-3699:
--

+1

We had so many checked in tests with wrong results.

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results

2013-01-18 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557094#comment-13557094
 ] 

Namit Jain commented on HIVE-3915:
--

+1

> Union with map-only query on one side and two MR job query on the other 
> produces wrong results
> --
>
> Key: HIVE-3915
> URL: https://issues.apache.org/jira/browse/HIVE-3915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a 
> subquery involving two sequential map reduce jobs on the other, it can 
> produce wrong results.  It appears that if the map only queries table scan 
> operator is processed first the task involving a union is made a root task.  
> Then when the other subquery is processed, the second map reduce job gains 
> the task involving the union as a child and it is made a root task.  This 
> means that both the first and second map reduce jobs are root tasks, so the 
> dependency between the two is ignored.  If they are run in parallel (i.e. the 
> cluster has more than one node) no results will be produced for the side of 
> the union with the two map reduce jobs and only the results of the other side 
> of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this 
> bug, and it is determined by the order values are retrieved from a map, and 
> hence hard to predict, so it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3909) Wrong data due to HIVE-2820

2013-01-18 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557092#comment-13557092
 ] 

Phabricator commented on HIVE-3909:
---

njain has accepted the revision "HIVE-3909 [jira] Wrong data due to HIVE-2820".

REVISION DETAIL
  https://reviews.facebook.net/D8013

BRANCH
  DPAL-1965

To: JIRA, njain, navis


> Wrong data due to HIVE-2820
> ---
>
> Key: HIVE-3909
> URL: https://issues.apache.org/jira/browse/HIVE-3909
> Project: Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: HIVE-3909.D8013.1.patch
>
>
> Consider the query:
> ~/hive/hive1$ more ql/src/test/queries/clientpositive/join_reorder4.q
> CREATE TABLE T1(key1 STRING, val1 STRING) STORED AS TEXTFILE;
> CREATE TABLE T2(key2 STRING, val2 STRING) STORED AS TEXTFILE;
> CREATE TABLE T3(key3 STRING, val3 STRING) STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../data/files/T1.txt' INTO TABLE T1;
> LOAD DATA LOCAL INPATH '../data/files/T2.txt' INTO TABLE T2;
> LOAD DATA LOCAL INPATH '../data/files/T3.txt' INTO TABLE T3;
> set hive.auto.convert.join=true;
> explain select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(a) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> explain select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(c) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> select /*+ STREAMTABLE(b) */ a.*, b.*, c.* from T1 a join T2 b on 
> a.key1=b.key2 join T3 c on a.key1=c.key3;
> returns:
> 2 12  2   12  2   22

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3884:
-

Status: Open  (was: Patch Available)

comments

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: describe_test_table.png, HIVE-3884.1.patch.txt
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3915) Union with map-only query on one side and two MR job query on the other produces wrong results

2013-01-18 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3915:


Status: Patch Available  (was: Open)

> Union with map-only query on one side and two MR job query on the other 
> produces wrong results
> --
>
> Key: HIVE-3915
> URL: https://issues.apache.org/jira/browse/HIVE-3915
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-3915.1.patch.txt
>
>
> When a query contains a union with a map only subquery on one side and a 
> subquery involving two sequential map reduce jobs on the other, it can 
> produce wrong results.  It appears that if the map only queries table scan 
> operator is processed first the task involving a union is made a root task.  
> Then when the other subquery is processed, the second map reduce job gains 
> the task involving the union as a child and it is made a root task.  This 
> means that both the first and second map reduce jobs are root tasks, so the 
> dependency between the two is ignored.  If they are run in parallel (i.e. the 
> cluster has more than one node) no results will be produced for the side of 
> the union with the two map reduce jobs and only the results of the other side 
> of the union will be returned.
> The order TableScan operators are processed is crucial to reproducing this 
> bug, and it is determined by the order values are retrieved from a map, and 
> hence hard to predict, so it doesn't always reproduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3884) Better align columns in DESCRIBE table_name output to make more human-readable

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3884:
-

Status: Patch Available  (was: Open)

> Better align columns in DESCRIBE table_name output to make more human-readable
> --
>
> Key: HIVE-3884
> URL: https://issues.apache.org/jira/browse/HIVE-3884
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.9.0
>Reporter: Dilip Joseph
>Assignee: Dilip Joseph
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: describe_test_table.png, HIVE-3884.1.patch.txt
>
>
> If a table contains very long comments or very long column names, the output 
> of DESCRIBE table_name is not aligned nicely.  The attached screenshot shows 
> the following two problems:
> 1. Rows with long column names do not align well with other columns.
> 2. Rows with long comments wrap to the next line, and make it hard to read 
> the output.  The wrapping behavior depends on the width of the user's 
> terminal width.
> It will be nice to have a DESCRIBE PRETTY table_name command that will 
> produce nicely formatted output that avoids the two problems mentioned above. 
>  It is better to introduce a new DESCRIBE PRETTY command rather than change 
> the behavior of the existing DESCRIBE or DESCRIBE FORMATTED commands, so that 
> we avoid breaking any scripts that automatically parse the output.
> Since the pretty formatting depends on the current terminal width, we need a 
> new hive conf parameter to tell the CLI to auto-detect the current terminal 
> width or to use a fixed width (needed for unit tests).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira