Build failed in Hudson: Hive-trunk-h0.17 #364

2010-02-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/364/

--
Started by timer
Building remotely on minerva.apache.org (Ubuntu)
Updating http://svn.apache.org/repos/asf/hadoop/hive/trunk
U eclipse-templates/.settings/org.eclipse.jdt.core.prefs
U eclipse-templates/.settings/org.eclipse.jdt.ui.prefs
U conf/hive-default.xml
U CHANGES.txt
U common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
U build.xml
U checkstyle/checkstyle.xml
U contrib/src/java/org/apache/hadoop/hive/contrib/mr/GenericMR.java
U .checkstyle
U 
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinObjectValue.java
U ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
At revision 910507
no revision recorded for http://svn.apache.org/repos/asf/hadoop/hive/trunk in 
the previous build
[hive] $ /home/hudson/tools/ant/latest/bin/ant -Dhadoop.version=0.17.2.1 clean 
package javadoc test
Buildfile: build.xml

clean:

clean:
 [echo] Cleaning: anttasks
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks

clean:
 [echo] Cleaning: shims
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/shims

clean:
 [echo] Cleaning: common

clean:
 [echo] Cleaning: serde

clean:
 [echo] Cleaning: metastore

clean:
 [echo] Cleaning: ql

clean:
 [echo] Cleaning: cli

clean:
 [echo] Cleaning: contrib

clean:

clean:

clean:
 [echo] Cleaning: hwi

clean:
 [exec] rm -rf 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/odbc 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/service/objs
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/service/fb303/objs
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/metastore/objs

clean-online:
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build

clean-offline:

jar:

create-dirs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/shims
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/shims/classes
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/jexl/classes
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/hadoopcore
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/shims/test
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/shims/test/src
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/shims/test/classes

compile-ant-tasks:

create-dirs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/classes
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/test
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/test/src
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/test/classes

init:

compile:
 [echo] Compiling: anttasks
[javac] Compiling 2 source files to 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/classes
[javac] Note: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ant/src/org/apache/hadoop/hive/ant/QTestGenTask.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

deploy-ant-tasks:

create-dirs:

init:

compile:
 [echo] Compiling: anttasks

jar:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/classes/org/apache/hadoop/hive/ant
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/anttasks/hive-anttasks-0.6.0.jar

init:

compile:

ivy-init-dirs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ivy
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ivy/lib
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ivy/report
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ivy/maven

ivy-download:
  [get] Getting: 

Build failed in Hudson: Hive-trunk-h0.18 #367

2010-02-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/367/changes

Changes:

[zshao] HIVE-1158. Introducing a new parameter for Map-side join bucket size. 
(Ning Zhang via zshao)

[zshao] HIVE-1147. Update Eclipse project configuration to match Checkstyle 
(Carl Steinbach via zshao)

--
[...truncated 2972 lines...]
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFPower.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPEqualOrGreaterThan.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBaseNumericOp.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitNot.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNot.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFPosMod.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNegative.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitXor.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPEqual.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFConcat.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBaseNumericUnaryOp.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCeil.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMod.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPLongDivide.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRegExpExtract.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFromUnixTime.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNotEqual.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFAsin.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFExp.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIndex.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNull.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSize.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/UDTFCollector.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcatWS.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStruct.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNotNull.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUtils.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLocate.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCase.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMap.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCoalesce.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/Collector.java
A 

Build failed in Hudson: Hive-trunk-h0.19 #367

2010-02-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/367/changes

Changes:

[zshao] HIVE-1158. Introducing a new parameter for Map-side join bucket size. 
(Ning Zhang via zshao)

[zshao] HIVE-1147. Update Eclipse project configuration to match Checkstyle 
(Carl Steinbach via zshao)

--
[...truncated 2972 lines...]
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFPower.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPEqualOrGreaterThan.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBaseNumericOp.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitNot.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNot.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFPosMod.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNegative.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFMinute.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPDivide.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPBitXor.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPEqual.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFConcat.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSecond.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBaseNumericUnaryOp.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCeil.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMod.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFToBoolean.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPLongDivide.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRegExpExtract.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFromUnixTime.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNotEqual.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFAsin.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/UDFExp.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTF.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java
AUql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIndex.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNull.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSize.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/UDTFCollector.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcatWS.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFStruct.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMax.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNotNull.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVarianceSample.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStd.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBridge.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBridge.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIf.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFUtils.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLocate.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCase.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMap.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArray.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCoalesce.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFField.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFExplode.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFElt.java
A 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFVariance.java
A ql/src/java/org/apache/hadoop/hive/ql/udf/generic/Collector.java
A 

[jira] Commented: (HIVE-1117) Make QueryPlan serializable

2010-02-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834360#action_12834360
 ] 

Namit Jain commented on HIVE-1117:
--

TestParse failed - can you update the outputs for TestParse results ?

 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1117) Make QueryPlan serializable

2010-02-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834372#action_12834372
 ] 

Namit Jain commented on HIVE-1117:
--

+1

will commit if the tests pass

 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-917) Bucketed Map Join

2010-02-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-917:
--

Attachment: hive-917-2010-2-16.patch

A new patch:
1) added explain extended
2) break the test to 4 tests.



 Bucketed Map Join
 -

 Key: HIVE-917
 URL: https://issues.apache.org/jira/browse/HIVE-917
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: He Yongqiang
 Attachments: hive-917-2010-2-15.patch, hive-917-2010-2-16.patch, 
 hive-917-2010-2-3.patch, hive-917-2010-2-8.patch


 Hive already have support for map-join. Map-join treats the big table as job 
 input, and in each mapper, it loads all data from a small table.
 In case the big table is already bucketed on the join key, we don't have to 
 load the whole small table in each of the mappers. This will greatly 
 alleviate the memory pressure, and make map-join work with medium-sized 
 tables.
 There are 4 steps we can improve:
 S0. This is what the user can already do now: create a new bucketed table and 
 insert all data from the small table to it; Submit BUCKETNUM jobs, each doing 
 a map-side join of bigtable TABLEPARTITION(BUCKET i OUT OF NBUCKETS) with 
 smallbucketedtable TABLEPARTITION(BUCKET i OUT OF NBUCKETS).
 S1. Change the code so that when map-join is loading the small table, we 
 automatically drop the rows with the keys that are NOT in the same bucket as 
 the big table. This should alleviate the problem on memory, but we might 
 still have thousands of mappers reading the whole of the small table.
 S2. Let's say the user already bucketed the small table on the join key into 
 exactly the same number of buckets (or a factor of the buckets of the big 
 table), then map-join can choose to load only the buckets that are useful.
 S3. Add a new hint (e.g. /*+ MAPBUCKETJOIN(a) */), so that Hive automatically 
 does S2, without the need of asking the user to create temporary bucketed 
 table for the small table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread He Yongqiang (JIRA)
Job counter error if hive.merge.mapfiles equals true
--

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang


if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1168) Fix Hive build on Hudson

2010-02-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834423#action_12834423
 ] 

John Sichi commented on HIVE-1168:
--

I am following up on some leads on who might have access to the Hudson build 
environment and will update status here once I get an answer.


 Fix Hive build on Hudson
 

 Key: HIVE-1168
 URL: https://issues.apache.org/jira/browse/HIVE-1168
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: John Sichi
Priority: Critical

 {quote}
 We need to delete the .ant directory containing the old ivy version in order 
 to fix it 
 (and if we're using the same environment for both trunk and branches, either 
 segregate them or script an rm to clean in between).
 {quote}
 It's worth noting that ant may have picked up the old version of Ivy from
 somewhere else. In order Ant's classpath contains:
 # Ant's startup JAR file, ant-launcher.jar
 # Everything in the directory containing the version of ant-launcher.jar 
 that's
   running, i.e. everything in ANT_HOME/lib
 # All JAR files in ${user.home}/.ant/lib
 # Directories and JAR files supplied via the -lib command line option.
 # Everything in the CLASSPATH variable unless the -noclasspath option is used.
 (2) implies that users on shared machines may have to install their own
 version of ant in order to get around these problems, assuming that the
 administrator has install the ivy.jar in $ANT_HOME/lib

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1168) Fix Hive build on Hudson

2010-02-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834433#action_12834433
 ] 

John Sichi commented on HIVE-1168:
--

Mailing list thread quoted by Carl is here:

http://mail-archives.apache.org/mod_mbox/hadoop-hive-dev/201002.mbox/%3c7b4f4fe5-d478-4d67-a30e-d3e88e744...@facebook.com%3e


 Fix Hive build on Hudson
 

 Key: HIVE-1168
 URL: https://issues.apache.org/jira/browse/HIVE-1168
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: John Sichi
Priority: Critical

 {quote}
 We need to delete the .ant directory containing the old ivy version in order 
 to fix it 
 (and if we're using the same environment for both trunk and branches, either 
 segregate them or script an rm to clean in between).
 {quote}
 It's worth noting that ant may have picked up the old version of Ivy from
 somewhere else. In order Ant's classpath contains:
 # Ant's startup JAR file, ant-launcher.jar
 # Everything in the directory containing the version of ant-launcher.jar 
 that's
   running, i.e. everything in ANT_HOME/lib
 # All JAR files in ${user.home}/.ant/lib
 # Directories and JAR files supplied via the -lib command line option.
 # Everything in the CLASSPATH variable unless the -noclasspath option is used.
 (2) implies that users on shared machines may have to install their own
 version of ant in order to get around these problems, assuming that the
 administrator has install the ivy.jar in $ANT_HOME/lib

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1173) Partition pruner cancels pruning if non-deterministic function present in filtering expression only in joins is present in query

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834438#action_12834438
 ] 

Zheng Shao commented on HIVE-1173:
--

Can you try condition: part = 'part1' AND part  UDF2('part0')

The optimizer might do something different because of the short-circuit 
calculation of AND.


 Partition pruner cancels pruning if non-deterministic function present in 
 filtering expression only in joins is present in query
 

 Key: HIVE-1173
 URL: https://issues.apache.org/jira/browse/HIVE-1173
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.4.0, 0.4.1
Reporter: Vladimir Klimontovich

 Brief description:
 case 1) non-deterministic present in partition condition, joins are present 
 in query = partition pruner doesn't do filtering of partitions based on 
 condition
 case 2) non-deterministic present in partition condition, joins aren't 
 present in query = partition pruner do filtering of partitions based on 
 condition
 It's quite illogical when pruning depends on presence of joins in query.
 Example:
 Let's consider following sequence of hive queries:
 1) Create non-deterministic function:
 create temporary function UDF2 as 'UDF2';
 {{
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.hive.ql.udf.UDFType;
 @UDFType(deterministic=false)
   public class UDF2 extends UDF {
   public String evaluate(String val) {
   return val;
   }
   }
 }}
 2) Create tables
 CREATE TABLE Main (
   a STRING,
   b INT
 )
 PARTITIONED BY(part STRING)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 LINES TERMINATED BY '10'
 STORED AS TEXTFILE;
 ALTER TABLE Main ADD PARTITION (part=part1) LOCATION 
 /hive-join-test/part1/;
 ALTER TABLE Main ADD PARTITION (part=part2) LOCATION 
 /hive-join-test/part2/;
 CREATE TABLE Joined (
   a STRING,
   f STRING
 )
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 LINES TERMINATED BY '10'
 STORED AS TEXTFILE
 LOCATION '/hive-join-test/join/';
 3) Run first query:
 select 
   m.a,
   m.b
 from Main m
 where
   part  UDF2('part0') AND part = 'part1';
 The pruner will work for this query: 
 mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1
 4) Run second query (with join):
 select 
   m.a,
   j.a,
   m.b
 from Main m
 join Joined j on
   j.a=m.a
 where
   part  UDF2('part0') AND part = 'part1';
 Pruner doesn't work: 
 mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2,hdfs://localhost:9000/hive-join-test/join
 5) Also lets try to run query with MAPJOIN hint
 select /*+MAPJOIN(j)*/ 
   m.a,
   j.a,
   m.b
 from Main m
 join Joined j on
   j.a=m.a
 where
   part  UDF2('part0') AND part = 'part1';
 The result is the same, pruner doesn't work: 
 mapred.input.dir=hdfs://localhost:9000/hive-join-test/part1,hdfs://localhost:9000/hive-join-test/part2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big table

2010-02-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834446#action_12834446
 ] 

Namit Jain commented on HIVE-1134:
--

Some pending work from https://issues.apache.org/jira/browse/HIVE-917 - you can 
do that in separate jira if you want to.

1. Add the mapping in explain plan so that it can be compared - look at
https://issues.apache.org/jira/browse/HIVE-976

2. Add a negative test - the number of buckets in the 2 tables are not exact 
multiples of each other. 
I mean, bucketed map join will not be used.

3. Instead of checking at runtime, set the defultbucketmatcher in the plan and 
initialize it using reflection

 bucketing mapjoin where the big table contains more than 1 big table
 

 Key: HIVE-1134
 URL: https://issues.apache.org/jira/browse/HIVE-1134
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-917) Bucketed Map Join

2010-02-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834448#action_12834448
 ] 

Namit Jain commented on HIVE-917:
-

Added some more tasks in the follow-up jira after talking to Yongqiang.
Will commit this if the tests pass

 Bucketed Map Join
 -

 Key: HIVE-917
 URL: https://issues.apache.org/jira/browse/HIVE-917
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: He Yongqiang
 Attachments: hive-917-2010-2-15.patch, hive-917-2010-2-16.patch, 
 hive-917-2010-2-3.patch, hive-917-2010-2-8.patch


 Hive already have support for map-join. Map-join treats the big table as job 
 input, and in each mapper, it loads all data from a small table.
 In case the big table is already bucketed on the join key, we don't have to 
 load the whole small table in each of the mappers. This will greatly 
 alleviate the memory pressure, and make map-join work with medium-sized 
 tables.
 There are 4 steps we can improve:
 S0. This is what the user can already do now: create a new bucketed table and 
 insert all data from the small table to it; Submit BUCKETNUM jobs, each doing 
 a map-side join of bigtable TABLEPARTITION(BUCKET i OUT OF NBUCKETS) with 
 smallbucketedtable TABLEPARTITION(BUCKET i OUT OF NBUCKETS).
 S1. Change the code so that when map-join is loading the small table, we 
 automatically drop the rows with the keys that are NOT in the same bucket as 
 the big table. This should alleviate the problem on memory, but we might 
 still have thousands of mappers reading the whole of the small table.
 S2. Let's say the user already bucketed the small table on the join key into 
 exactly the same number of buckets (or a factor of the buckets of the big 
 table), then map-join can choose to load only the buckets that are useful.
 S3. Add a new hint (e.g. /*+ MAPBUCKETJOIN(a) */), so that Hive automatically 
 does S2, without the need of asking the user to create temporary bucketed 
 table for the small table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834474#action_12834474
 ] 

Zheng Shao commented on HIVE-259:
-

 Is there any limitation on what can be used on the state object or can we use 
 any java Object? 
We support primitive classes, HashMap (translated into map type in Hive), 
ArrayList (array type in Hive), and any simple struct-like classes (struct type 
in Hive).
We support arbitrary levels of nesting, but no recursive types.

 Also how is the state serialized between Map and Reduce?
We use SerDe (see SerDe.serialize(...) ) to serialize/deserialize the objects, 
as well as translations between objects that have the same type (see 
ObjectInspector and ObjectInspectorConverters).


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259.1.patch, HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-16 Thread John Sichi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Sichi updated HIVE-1136:
-

Status: Patch Available  (was: Open)

This passed tests so I'm submitting it as ready.


 add type-checking setters for HiveConf class to match existing getters
 --

 Key: HIVE-1136
 URL: https://issues.apache.org/jira/browse/HIVE-1136
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1136.1.patch


 This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834482#action_12834482
 ] 

Zheng Shao commented on HIVE-1136:
--

+1
Will test and commit.


 add type-checking setters for HiveConf class to match existing getters
 --

 Key: HIVE-1136
 URL: https://issues.apache.org/jira/browse/HIVE-1136
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1136.1.patch


 This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1174:
---

Status: Patch Available  (was: Open)

 Job counter error if hive.merge.mapfiles equals true
 --

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1174.1.patch


 if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1174:
---

Attachment: hive-1174.1.patch

 Job counter error if hive.merge.mapfiles equals true
 --

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1174.1.patch


 if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-984) Building Hive occasionally fails with Ivy error: hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:

2010-02-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834519#action_12834519
 ] 

John Sichi commented on HIVE-984:
-

I spoke with Zheng about this and here's what we came up with.  Carl, let me 
know if this works for you.

* If at all possible, we want to keep building all supported shims as part of 
ant package to make sure that when a change breaks one, the developer finds out 
early (before even submitting a bad patch)
* The long term plan does involve deprecating and eventually dropping support 
for older Hadoop versions.  The fact that Facebook still has some dependencies 
on 0.17 probably explains why that is currently the oldest version, but the 
standard voting procedure can be used at the project level for initiating a 
deprecation process going forward.
* Regardless of how many Hadoop versions we support, the current Hadoop+ivy 
situation is definitely broken, and we need to fix it ASAP since it can be a 
major impediment to new or existing contributors.
* Before doing anything else, I'm going to see if a more reliable source than 
archive.apache.org would address the problem.  I'll test this with my home 
network tomorrow, which usually fails with archive.apache.org.
* If a more reliable source would help, then we'll see if we can get 
mirror.facebook.net to provide all supported Hadoop versions (currently only 
apache.archive.org has the old ones), and if that's the case, then we'll check 
in a change to build.properties to make it the default source.
* If either of the above is not the case, then we can do what you proposed in 
HIVE-1171 (check the Hadoop dependencies into svn instead).


 Building Hive occasionally fails with Ivy error: 
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 ---

 Key: HIVE-984
 URL: https://issues.apache.org/jira/browse/HIVE-984
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-984.2.patch, HIVE-984.patch


 Folks keep running into this problem when building Hive from source:
 {noformat}
 [ivy:retrieve]
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]   hadoop-resolver: tried
 [ivy:retrieve]
 http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
 [ivy:retrieve]  ::
 [ivy:retrieve]  ::  FAILED DOWNLOADS::
 [ivy:retrieve]  :: ^ see resolution messages for details  ^ ::
 [ivy:retrieve]  ::
 [ivy:retrieve]  :: hadoop#core;0.20.1!hadoop.tar.gz(source)
 [ivy:retrieve]  ::
 [ivy:retrieve]
 [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 {noformat}
 The problem appears to be either with a) the Hive build scripts, b) ivy, or 
 c) archive.apache.org
 Besides fixing the actual bug, one other option worth considering is to add 
 the Hadoop jars to the
 Hive source repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1136:
-

Status: Open  (was: Patch Available)

 add type-checking setters for HiveConf class to match existing getters
 --

 Key: HIVE-1136
 URL: https://issues.apache.org/jira/browse/HIVE-1136
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1136.1.patch


 This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834526#action_12834526
 ] 

Zheng Shao commented on HIVE-1136:
--

It seems that the following function is not in hadoop 0.17.
What about convert it to a string first? (and put a comment saying we did this 
for compatibility for hadoop 0.17)

{code}
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:303: cannot find 
symbol
[javac] symbol  : method setFloat(java.lang.String,float)
[javac] location: class org.apache.hadoop.conf.Configuration
[javac] conf.setFloat(var.varname, val);
[javac] ^
{code}


 add type-checking setters for HiveConf class to match existing getters
 --

 Key: HIVE-1136
 URL: https://issues.apache.org/jira/browse/HIVE-1136
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1136.1.patch


 This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1174:
---

Attachment: hive-1174.2.patch

 Job counter error if hive.merge.mapfiles equals true
 --

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1174.1.patch, hive-1174.2.patch


 if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834529#action_12834529
 ] 

Zheng Shao commented on HIVE-1174:
--

+1. Will test and commit.


 Job counter error if hive.merge.mapfiles equals true
 --

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1174.1.patch, hive-1174.2.patch


 if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834536#action_12834536
 ] 

He Yongqiang commented on HIVE-1174:


The problem in this jira is:
1) countJobs in Driver only count map-reduce tasks. 
2) For mergeJob, there are two tasks (one dummyMove, and one merge task which 
is a MRTask). It will count 1 job because dummyMove is not a MapReduceTask
3) But a bug in ConditionalTask will  always inc jobCounter by 1 when removing 
a task from candidate task list, even though it is not a MRTask,

 Job counter error if hive.merge.mapfiles equals true
 --

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: hive-1174.1.patch, hive-1174.2.patch


 if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-917) Bucketed Map Join

2010-02-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain resolved HIVE-917.
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: [Reviewed]

Committed. Thanks Yongqiang

 Bucketed Map Join
 -

 Key: HIVE-917
 URL: https://issues.apache.org/jira/browse/HIVE-917
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-917-2010-2-15.patch, hive-917-2010-2-16.patch, 
 hive-917-2010-2-3.patch, hive-917-2010-2-8.patch


 Hive already have support for map-join. Map-join treats the big table as job 
 input, and in each mapper, it loads all data from a small table.
 In case the big table is already bucketed on the join key, we don't have to 
 load the whole small table in each of the mappers. This will greatly 
 alleviate the memory pressure, and make map-join work with medium-sized 
 tables.
 There are 4 steps we can improve:
 S0. This is what the user can already do now: create a new bucketed table and 
 insert all data from the small table to it; Submit BUCKETNUM jobs, each doing 
 a map-side join of bigtable TABLEPARTITION(BUCKET i OUT OF NBUCKETS) with 
 smallbucketedtable TABLEPARTITION(BUCKET i OUT OF NBUCKETS).
 S1. Change the code so that when map-join is loading the small table, we 
 automatically drop the rows with the keys that are NOT in the same bucket as 
 the big table. This should alleviate the problem on memory, but we might 
 still have thousands of mappers reading the whole of the small table.
 S2. Let's say the user already bucketed the small table on the join key into 
 exactly the same number of buckets (or a factor of the buckets of the big 
 table), then map-join can choose to load only the buckets that are useful.
 S3. Add a new hint (e.g. /*+ MAPBUCKETJOIN(a) */), so that Hive automatically 
 does S2, without the need of asking the user to create temporary bucketed 
 table for the small table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-984) Building Hive occasionally fails with Ivy error: hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:

2010-02-16 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834544#action_12834544
 ] 

Carl Steinbach commented on HIVE-984:
-


bq.  If at all possible, we want to keep building all supported shims as part 
of ant package to make sure that when a change breaks one, the developer finds 
out early (before even submitting a bad patch)

Unless your change specifically mucks with the shim code I think it's unlikely 
that you're going to introduce a compile time error. It seems more likely that 
you would cause a test error, and that's something you will only catch if you 
run the full test suite against all supported versions -- something that we 
only expect Hudson to do.

Which brings up another point. How do we configure JIRA/Hudson to automatically 
test submitted patches? The Hadoop and Pig projects are both setup to do this, 
but I can't find any references to how it was done. Do either of you know how 
to set this up, or have objections to doing so?

bq. Before doing anything else, I'm going to see if a more reliable source than 
archive.apache.org would address the problem. I'll test this with my home 
network tomorrow, which usually fails with archive.apache.org.

Over the weekend I figured out that there are actually two different reasons 
why people are encountering errors during the download process, and wanted to 
make sure that everyone else is aware of this as well:

# Unable to connect to archive.apache.org: We can fix this by adding additional 
apache mirrors (see http://www.apache.org/mirrors/) to the hadoop-source 
resolver in ivysettings, and also by letting people know that they can 
explicitly set the mirror location using the hadoop.mirror property.

# -Dhadoop.version=0.20.1: When people set hadoop.version to 0.20.1 it causes 
ant to download both 0.20.0 *and* 0.20.1, which is unnecessary since the API 
does not change between patch releases. But the bigger problem is that 0.20.1's 
md5 checksum file on archive.apache.org contains an md5 hash along with a bunch 
of other garbage that breaks ivy. We can fix this either by disabling checksums 
for archive.apache.org (set ivy.checksums= on that resolver), or by enhancing 
the build script so that it ignores patch release numbers and maps 0.20.1 to 
0.20.0.



 Building Hive occasionally fails with Ivy error: 
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 ---

 Key: HIVE-984
 URL: https://issues.apache.org/jira/browse/HIVE-984
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-984.2.patch, HIVE-984.patch


 Folks keep running into this problem when building Hive from source:
 {noformat}
 [ivy:retrieve]
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]   hadoop-resolver: tried
 [ivy:retrieve]
 http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
 [ivy:retrieve]  ::
 [ivy:retrieve]  ::  FAILED DOWNLOADS::
 [ivy:retrieve]  :: ^ see resolution messages for details  ^ ::
 [ivy:retrieve]  ::
 [ivy:retrieve]  :: hadoop#core;0.20.1!hadoop.tar.gz(source)
 [ivy:retrieve]  ::
 [ivy:retrieve]
 [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 {noformat}
 The problem appears to be either with a) the Hive build scripts, b) ivy, or 
 c) archive.apache.org
 Besides fixing the actual bug, one other option worth considering is to add 
 the Hadoop jars to the
 Hive source repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1134) bucketing mapjoin where the big table contains more than 1 big partition

2010-02-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1134:
---

Summary: bucketing mapjoin where the big table contains more than 1 big 
partition  (was: bucketing mapjoin where the big table contains more than 1 big 
table)

 bucketing mapjoin where the big table contains more than 1 big partition
 

 Key: HIVE-1134
 URL: https://issues.apache.org/jira/browse/HIVE-1134
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1163) Eclipse launchtemplate changes to enable debugging

2010-02-16 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1163:
-

Attachment: HIVE-1163_3.patch

Hi Carl, I've made the changes JAVA_HOME and README.txt suggested by you. I 
tested the JAVA_HOME change in Linux and it doesn't seem to work (unset 
JAVA_HOME and launch Eclipse for debugging). Can you try it on Mac and see if 
it works?

 Eclipse launchtemplate changes to enable debugging
 --

 Key: HIVE-1163
 URL: https://issues.apache.org/jira/browse/HIVE-1163
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1163.patch, HIVE-1163_2.patch, HIVE-1163_3.patch


 Some recent changes in the build.xml and build-common.xml breaks the 
 debugging functionality in eclipse. Some system defined properties were 
 missing when running eclipse debugger. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-984) Building Hive occasionally fails with Ivy error: hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:

2010-02-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834558#action_12834558
 ] 

John Sichi commented on HIVE-984:
-

Shims:  you may be right, but I guess the principle is that all source code 
checked in ought to be covered by the build if possible.  It's arguable that we 
should actually do even more in this respect (rather than less), since for 
example in HIVE-1136 we just hit a case where one of my changes was 
incompatible with an old Hadoop version (nothing to do with shims).  If we 
built against all supported Hadoop versions as part of ant test, this would 
have been caught when I ran tests myself (so Zheng would never have had to 
spend time testing my bad patch and rejecting it).  ant test might be a 
reasonable place for that, since test time will always be orders of magnitude 
longer than build time.  (But note:  I'm not proposing to run tests on all 
versions except in Hudson!)

Hudson automatically testing patches:  I don't know the answer to that one, but 
it sounds like a very high-value automation to me if the resources are 
available, and my opinion on the version download issue might change if this 
were working reliably with permanently committed resources.

archive.apache.org:  the default mirroring for Hadoop seems to be 0.18.3, 
0.19.2, and 0.20.1 (that's what I see when I browse most of the mirrors), which 
doesn't match what Hive currently wants (0.17.2.1, 0.18.3, 0.19.0, and 0.20.0). 
 That's why I was thinking we might need a custom setup on mirror.facebook.net.


 Building Hive occasionally fails with Ivy error: 
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 ---

 Key: HIVE-984
 URL: https://issues.apache.org/jira/browse/HIVE-984
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-984.2.patch, HIVE-984.patch


 Folks keep running into this problem when building Hive from source:
 {noformat}
 [ivy:retrieve]
 [ivy:retrieve] :: problems summary ::
 [ivy:retrieve]  WARNINGS
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]  [FAILED ]
 hadoop#core;0.20.1!hadoop.tar.gz(source): invalid md5:
 expected=hadoop-0.20.1.tar.gz: computed=719e169b7760c168441b49f405855b72
 (138662ms)
 [ivy:retrieve]   hadoop-resolver: tried
 [ivy:retrieve]
 http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz
 [ivy:retrieve]  ::
 [ivy:retrieve]  ::  FAILED DOWNLOADS::
 [ivy:retrieve]  :: ^ see resolution messages for details  ^ ::
 [ivy:retrieve]  ::
 [ivy:retrieve]  :: hadoop#core;0.20.1!hadoop.tar.gz(source)
 [ivy:retrieve]  ::
 [ivy:retrieve]
 [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
 {noformat}
 The problem appears to be either with a) the Hive build scripts, b) ivy, or 
 c) archive.apache.org
 Besides fixing the actual bug, one other option worth considering is to add 
 the Hadoop jars to the
 Hive source repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-16 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834562#action_12834562
 ] 

John Sichi commented on HIVE-1136:
--

Per discussion with Zheng, added a new shim method to deal with the 
incompabitibility.


 add type-checking setters for HiveConf class to match existing getters
 --

 Key: HIVE-1136
 URL: https://issues.apache.org/jira/browse/HIVE-1136
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1136.1.patch, HIVE-1136.2.patch


 This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1163) Eclipse launchtemplate changes to enable debugging

2010-02-16 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1163:
-

Attachment: HIVE-1163.4.patch

* Fixed the JAVA_HOME problem.
* Disabled automatic appending of Eclipse system environment to test 
environment.
* Adjusted the formatting in README.txt and tweaked the Eclipse instructions.
* Added a HiveCLI launch configuration for running the CLI from within Eclipse.

I'm running the tests right now. Things look good so far.

 Eclipse launchtemplate changes to enable debugging
 --

 Key: HIVE-1163
 URL: https://issues.apache.org/jira/browse/HIVE-1163
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1163.4.patch, HIVE-1163.patch, HIVE-1163_2.patch, 
 HIVE-1163_3.patch


 Some recent changes in the build.xml and build-common.xml breaks the 
 debugging functionality in eclipse. Some system defined properties were 
 missing when running eclipse debugger. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1174) Job counter error if hive.merge.mapfiles equals true

2010-02-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1174:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
 Release Note: HIVE-1174. Fix Job counter error if hive.merge.mapfiles 
equals true. (Yongqiang He via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed. Thanks Yongqiang!

 Job counter error if hive.merge.mapfiles equals true
 --

 Key: HIVE-1174
 URL: https://issues.apache.org/jira/browse/HIVE-1174
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive-1174.1.patch, hive-1174.2.patch


 if hive.merge.mapfiles is set to true, the job counter will go to 3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1136) add type-checking setters for HiveConf class to match existing getters

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834630#action_12834630
 ] 

Zheng Shao commented on HIVE-1136:
--

+1. Will test and commit.


 add type-checking setters for HiveConf class to match existing getters
 --

 Key: HIVE-1136
 URL: https://issues.apache.org/jira/browse/HIVE-1136
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0

 Attachments: HIVE-1136.1.patch, HIVE-1136.2.patch


 This is a followup from HIVE-1129.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] release hive 0.5.0

2010-02-16 Thread Carl Steinbach
+1

On Mon, Feb 15, 2010 at 1:41 PM, Zheng Shao zsh...@gmail.com wrote:

 Hive branch 0.5 was created 5 weeks ago:
 https://svn.apache.org/viewvc/hadoop/hive/branches/branch-0.5/

 It has also been running as the production version of Hive at Facebook
 for 2 weeks.


 We'd like to start making release candidates (for 0.5.0) from branch 0.5.
 Please vote.

 --
 Yours,
 Zheng



[jira] Created: (HIVE-1175) Enable automatic patch testing on Hudson

2010-02-16 Thread Carl Steinbach (JIRA)
Enable automatic patch testing on Hudson


 Key: HIVE-1175
 URL: https://issues.apache.org/jira/browse/HIVE-1175
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach


See 
http://developer.yahoo.net/blogs/hadoop/2007/12/if_it_hurts_automate_it_1.html



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1175) Enable automatic patch testing on Hudson

2010-02-16 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834651#action_12834651
 ] 

Carl Steinbach commented on HIVE-1175:
--

The test-patch.sh script lives here: 
http://svn.apache.org/repos/asf/hadoop/core/nightly/test-patch 

We will need to configure svn:externals in order to get this pull in as part of 
hive trunk.

Something like this:

Check out hive trunk
cd hive-trunk
export EDITOR=emacs
svn propedit svn:externals testutils
[ the above step will open up the emacs. Type in the following line and save it]
test-patch http://svn.apache.org/repos/asf/hadoop/nightly/test-patch
svn commit

 Enable automatic patch testing on Hudson
 

 Key: HIVE-1175
 URL: https://issues.apache.org/jira/browse/HIVE-1175
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach

 See 
 http://developer.yahoo.net/blogs/hadoop/2007/12/if_it_hurts_automate_it_1.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1117) Make QueryPlan serializable

2010-02-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1117:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

committed. Thanks Zheng

 Make QueryPlan serializable
 ---

 Key: HIVE-1117
 URL: https://issues.apache.org/jira/browse/HIVE-1117
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Fix For: 0.6.0

 Attachments: HIVE-1117.1.code.patch, HIVE-1117.1.test.patch


 We need to make QueryPlan serializable so that we can resume the query some 
 time later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1163) Eclipse launchtemplate changes to enable debugging

2010-02-16 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834661#action_12834661
 ] 

Ning Zhang commented on HIVE-1163:
--

+1. Changes look good. JAVA_HOME works on Linux as well. 

 Eclipse launchtemplate changes to enable debugging
 --

 Key: HIVE-1163
 URL: https://issues.apache.org/jira/browse/HIVE-1163
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1163.4.patch, HIVE-1163.patch, HIVE-1163_2.patch, 
 HIVE-1163_3.patch


 Some recent changes in the build.xml and build-common.xml breaks the 
 debugging functionality in eclipse. Some system defined properties were 
 missing when running eclipse debugger. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HIVE-1158) Introducing a new parameter for Map-side join bucket size

2010-02-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reopened HIVE-1158:
--


Need a patch for branch 0.5

 Introducing a new parameter for Map-side join bucket size
 -

 Key: HIVE-1158
 URL: https://issues.apache.org/jira/browse/HIVE-1158
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0, 0.6.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1158.patch


 Map-side join cache the small table in memory and join with the split of the 
 large table at the mapper side. If the small table is too large, it uses 
 RowContainer to cache a number of rows indicated by parameter 
 hive.join.cache.size, whose default value is 25000. This parameter is also 
 used for regular reducer-side joins to cache all input tables except the 
 streaming table. This default value is too large for map-side join bucket 
 size, resulting in OOM exceptions sometimes. We should define a different 
 parameter to separate these two cache sizes. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.