[jira] [Created] (HIVE-2169) Hive should have support for clover and findbugs

2011-05-18 Thread Iyappan (JIRA)
Hive should have support for clover and findbugs


 Key: HIVE-2169
 URL: https://issues.apache.org/jira/browse/HIVE-2169
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Iyappan
Priority: Minor
 Fix For: 0.7.1


Hive should have support for clover and findbugs.

Clover delivers actionable Java code coverage metrics to assess the impact of 
unit tests.
Findbugs is a bug pattern detector for Java. 
Both of them can give useful information on the code coverage and potential 
bugs.
Clover and findbugs support should be added as ant targets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2080) Few code improvements in the ql and serde packages.

2011-05-18 Thread Chinna Rao Lalam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-2080:
---

Status: Patch Available  (was: Open)

> Few code improvements in the ql and serde packages.
> ---
>
> Key: HIVE-2080
> URL: https://issues.apache.org/jira/browse/HIVE-2080
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Serializers/Deserializers
>Affects Versions: 0.7.0
> Environment: Hadoop 0.20.1, Hive0.7.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-2080.Patch
>
>
> Few code improvements in the ql and serde packages.
> 1) Little performance Improvements 
> 2) Null checks to avoid NPEs
> 3) Effective varaible management.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2096) throw a error if the input is larger than a threshold for index input format

2011-05-18 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035965#comment-13035965
 ] 

He Yongqiang commented on HIVE-2096:


will commit after tests pass.

> throw a error if the input is larger than a threshold for index input format
> 
>
> Key: HIVE-2096
> URL: https://issues.apache.org/jira/browse/HIVE-2096
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Namit Jain
> Attachments: HIVE-2096.1.patch.txt, HIVE-2096.2.patch.txt, 
> HIVE-2096.3.patch.txt, HIVE-2096.4.patch.txt
>
>
> This can hang for ever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-18 Thread Tomasz Nykiel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035634#comment-13035634
 ] 

Tomasz Nykiel commented on HIVE-2144:
-

Yes, I agree. There are some subtle differences between UNIQUE and PK in Derby 
and MySQL (e.g., in MySQL the unique index allows null values, and in Derby it 
does not. So in general, PK constraint will be more suitable.

CREATE TABLE PARTITION_STAT_TBL ( IDE VARCHAR(255) PRIMARY KEY, ROW_COUNT 
BIGINT ) works for both Derby and MySql.
After a quick check it seems that it's supported by Oracle/MSSQL as well.



> reduce workload generated by JDBCStatsPublisher
> ---
>
> Key: HIVE-2144
> URL: https://issues.apache.org/jira/browse/HIVE-2144
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Tomasz Nykiel
>
> In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
> was inserted by another task (mostly likely a speculative or previously 
> failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
> issues. So there are basically 2x of queries per row inserted into the 
> intermediate stats table. This workload could be reduced to 1/2 if we insert 
> it anyway (it is very rare that IDs are duplicated) and use a different SQL 
> query in the aggregation phase to dedup the ID (e.g., using group-by and 
> max()). The benefits are that even though the aggregation query is more 
> expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is still unstable: Hive-trunk-h0.21 #737

2011-05-18 Thread Apache Jenkins Server
See 




[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-18 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035612#comment-13035612
 ] 

Ning Zhang commented on HIVE-2144:
--

Great! I like the idea. 

One comment about the primary key constraint: I'm not sure if UNIQUE is the 
standard way to specify primary key constraint. There are people using 
Oralce/MS SQL sever/Postgres as metastore, we should use a standard way. I 
think 'id varchar(255) PRIMARY KEY' is more widely supported. Can you double 
check with mysql and derby?

> reduce workload generated by JDBCStatsPublisher
> ---
>
> Key: HIVE-2144
> URL: https://issues.apache.org/jira/browse/HIVE-2144
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Tomasz Nykiel
>
> In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
> was inserted by another task (mostly likely a speculative or previously 
> failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
> issues. So there are basically 2x of queries per row inserted into the 
> intermediate stats table. This workload could be reduced to 1/2 if we insert 
> it anyway (it is very rare that IDs are duplicated) and use a different SQL 
> query in the aggregation phase to dedup the ID (e.g., using group-by and 
> max()). The benefits are that even though the aggregation query is more 
> expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2144) reduce workload generated by JDBCStatsPublisher

2011-05-18 Thread Tomasz Nykiel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035493#comment-13035493
 ] 

Tomasz Nykiel commented on HIVE-2144:
-

Currently the schema of the stat table is the following:

PARTITION_STAT_TABLE ( ID VARCHAR(255), ROW_COUNT BIGINT ) and does not have 
any integrity constraints declared.

We can amend it to:

PARTITION_STAT_TABLE ( ID VARCHAR(255) UNIQUE , ROW_COUNT BIGINT ).

Then instead of executing two queries per row inserted, we can execute one 
INSERT query, as we do currently.
In the case when the integrity constraint is violated, via the unique index, 
which can be caught by an exception, we perform a single UPDATE query.
The UPDATE query needs to check the condition, if the currently inserted stats 
are "newer" then the ones already in the table:

UPDATE PARTITION_STAT_TBL SET ROW_COUNT = new_value
WHERE ID = "rowID" AND
(0)new_value >
(1)(SELECT TEMP.ROW_COUNT FROM
(2)(SELECT ROW_COUNT FROM PARTITION_STAT_TBL WHERE ID = 
"rowID") TEMP )

--(0) is a condition that checks if the newly inserted value is greater that 
the one we already have.
--(1) and (2) is a work-around for MySQL, which does not allow to refer to the 
table that occurs in the update statement. Here, we basically materialize the 
value that we need for comparison.
--(1) should theoretically have (LIMIT 1) to choose exactly one tuple, however 
Derby does not support it, and by the unique constraint, and the fact that the 
insert failed, there exists exactly one tuple matching the ID predicate.

To summarize, for non existing rows, only one insert query will be executed, 
instead of two.
For existing rows, which seems to occur very infrequently, two queries instead 
of three will be executed.


> reduce workload generated by JDBCStatsPublisher
> ---
>
> Key: HIVE-2144
> URL: https://issues.apache.org/jira/browse/HIVE-2144
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ning Zhang
>Assignee: Tomasz Nykiel
>
> In JDBCStatsPublisher, we first try a SELECT query to see if the specific ID 
> was inserted by another task (mostly likely a speculative or previously 
> failed task). Depending on if the ID is there, an INSERT or UPDATE query was 
> issues. So there are basically 2x of queries per row inserted into the 
> intermediate stats table. This workload could be reduced to 1/2 if we insert 
> it anyway (it is very rare that IDs are duplicated) and use a different SQL 
> query in the aggregation phase to dedup the ID (e.g., using group-by and 
> max()). The benefits are that even though the aggregation query is more 
> expensive, it is only run once per query. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2161) Remaining patch for HIVE-2148

2011-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035462#comment-13035462
 ] 

Ashutosh Chauhan commented on HIVE-2161:


Can some one commit this one, it has already been discussed at HIVE-2148

> Remaining patch for HIVE-2148
> -
>
> Key: HIVE-2161
> URL: https://issues.apache.org/jira/browse/HIVE-2161
> Project: Hive
>  Issue Type: Task
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.8.0
>
> Attachments: hive_2161.patch
>
>
> Follow-up jira for HIVE-2148.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Jenkins build is still unstable: Hive-trunk-h0.21 #736

2011-05-18 Thread Apache Jenkins Server
See 




[jira] [Commented] (HIVE-1095) Hive in Maven

2011-05-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035417#comment-13035417
 ] 

Hudson commented on HIVE-1095:
--

Integrated in Hive-trunk-h0.21 #736 (See 
[https://builds.apache.org/hudson/job/Hive-trunk-h0.21/736/])
HIVE-1095. Hive in Maven. Contributed by Gerrit Jansen van Vuuren, 
Amareshwari Sriramadasu and Carl Steinbach.

amareshwari : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1124164
Files : 
* /hive/trunk/ant/ivy.xml
* /hive/trunk/ivy.xml
* /hive/trunk/jdbc/ivy.xml
* /hive/trunk/ql/ivy.xml
* /hive/trunk/build.xml
* /hive/trunk/service/ivy.xml
* /hive/trunk/hbase-handler/ivy.xml
* /hive/trunk/contrib/ivy.xml
* /hive/trunk/shims/ivy.xml
* /hive/trunk/hwi/ivy.xml
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/metastore/ivy.xml
* /hive/trunk/cli/ivy.xml
* /hive/trunk/serde/ivy.xml
* /hive/trunk/common/ivy.xml
* /hive/trunk/build-common.xml


> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Fix For: 0.7.1, 0.8.0
>
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.7.patch.txt, 
> HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, 
> HIVE-1095.v5.PATCH, HIVE-1095.v6.patch, hiveReleasedToMaven.tar.gz, 
> make-maven.log
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-1095) Hive in Maven

2011-05-18 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035278#comment-13035278
 ] 

Gerrit Jansen van Vuuren commented on HIVE-1095:


Great

Thanks Carl, Amareshwari for seeing this through.

> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Fix For: 0.7.1, 0.8.0
>
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.7.patch.txt, 
> HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, 
> HIVE-1095.v5.PATCH, HIVE-1095.v6.patch, hiveReleasedToMaven.tar.gz, 
> make-maven.log
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-1095) Hive in Maven

2011-05-18 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated HIVE-1095:
--

   Resolution: Fixed
Fix Version/s: 0.8.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch 0.7.

Thanks Gerrit and Carl !


> Hive in Maven
> -
>
> Key: HIVE-1095
> URL: https://issues.apache.org/jira/browse/HIVE-1095
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 0.6.0
>Reporter: Gerrit Jansen van Vuuren
>Priority: Minor
> Fix For: 0.7.1, 0.8.0
>
> Attachments: HIVE-1095-trunk.patch, HIVE-1095.7.patch.txt, 
> HIVE-1095.v2.PATCH, HIVE-1095.v3.PATCH, HIVE-1095.v4.PATCH, 
> HIVE-1095.v5.PATCH, HIVE-1095.v6.patch, hiveReleasedToMaven.tar.gz, 
> make-maven.log
>
>
> Getting hive into maven main repositories
> Documentation on how to do this is on:
> http://maven.apache.org/guides/mini/guide-central-repository-upload.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira