date:20090129

[jira] Updated: (HIVE-30) Hive web interface

2009-01-29 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-30:


Attachment: hive-30-7.patch

This patch adds:
* the WAR location to be specified in hive-site.conf (changed to HiveConf)
* also the class path refers to hadoop.root
rather then a hardcoded version ie 0.19.0

 Hive web interface
 --

 Key: HIVE-30
 URL: https://issues.apache.org/jira/browse/HIVE-30
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Jeff Hammerbacher
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.2.0

 Attachments: HIVE-30-5.patch, HIVE-30-6.patch, hive-30-7.patch, 
 HIVE-30-A.patch, HIVE-30.patch, HIVE-30.patch


 Hive needs a web interface. The initial checkin should have:
 * simple schema browsing
 * query submission
 * query history (similar to MySQL's SHOW PROCESSLIST)
 A suggested feature: the ability to have a query notify the user when it's 
 completed.
 Edward Capriolo has expressed some interest in driving this process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-30) Hive web interface

2009-01-29 Thread Ashish Thusoo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668624#action_12668624
 ] 

Ashish Thusoo commented on HIVE-30:
---

Hi Edward,

It seems like the latest patch has the output for svn stat instead of svn 
diff... 

Thanks,
Ashish


 Hive web interface
 --

 Key: HIVE-30
 URL: https://issues.apache.org/jira/browse/HIVE-30
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Jeff Hammerbacher
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.2.0

 Attachments: HIVE-30-5.patch, HIVE-30-6.patch, hive-30-7.patch, 
 HIVE-30-A.patch, HIVE-30.patch, HIVE-30.patch


 Hive needs a web interface. The initial checkin should have:
 * simple schema browsing
 * query submission
 * query history (similar to MySQL's SHOW PROCESSLIST)
 A suggested feature: the ability to have a query notify the user when it's 
 completed.
 Edward Capriolo has expressed some interest in driving this process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

2009-01-29 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-253:
---

Affects Version/s: 0.2.0
Fix Version/s: 0.2.0

Marking this for 0.2.0 version.

 rand() gets precomputated in compilation phase
 --

 Key: HIVE-253
 URL: https://issues.apache.org/jira/browse/HIVE-253
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Zheng Shao
Assignee: Ashish Thusoo
Priority: Blocker
 Fix For: 0.2.0


 SELECT * FROM t WHERE rand()  0.01;
 Hive will say: No need to submit job, because the condition evaluates to 
 false.
 The rand() function is special in the sense that every time it evaluates to a 
 different value. We should disallow computing the value in the compiling 
 phase.
 One way to do that is to add an annotation in the UDFRand and check that in 
 the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-253) rand() gets precomputated in compilation phase

2009-01-29 Thread Ashish Thusoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo reassigned HIVE-253:
--

Assignee: Ashish Thusoo

 rand() gets precomputated in compilation phase
 --

 Key: HIVE-253
 URL: https://issues.apache.org/jira/browse/HIVE-253
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Zheng Shao
Assignee: Ashish Thusoo
Priority: Blocker
 Fix For: 0.2.0


 SELECT * FROM t WHERE rand()  0.01;
 Hive will say: No need to submit job, because the condition evaluates to 
 false.
 The rand() function is special in the sense that every time it evaluates to a 
 different value. We should disallow computing the value in the compiling 
 phase.
 One way to do that is to add an annotation in the UDFRand and check that in 
 the compiling phase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-79) Print number of raws inserted to table(s) when the query is finished.

2009-01-29 Thread Suresh Antony (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Antony reassigned HIVE-79:
-

Assignee: Suresh Antony

 Print number of raws inserted to table(s) when  the query is finished.
 --

 Key: HIVE-79
 URL: https://issues.apache.org/jira/browse/HIVE-79
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Logging
Reporter: Suresh Antony
Assignee: Suresh Antony
Priority: Minor
 Fix For: 0.2.0


 It is good to print the number of rows inserted into each table at end of 
 query. 
 insert overwrite table tab1 select a.* from tab2 a where a.col1 = 10;
 This query can print something like:
 tab1 rows=100

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-261) union all query hangs

2009-01-29 Thread Hao Liu (JIRA)

union all query hangs
-

 Key: HIVE-261
 URL: https://issues.apache.org/jira/browse/HIVE-261
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Hao Liu


we have this query:

SELECT a.u, b.id FROM (
 SELECT a1.u, a1.id as id FROM t_1 a1 WHERE a1.date = '2009-01-01' UNION ALL
 SELECT a2.u, a2.id as id FROM t_2 a2 WHERE a2.date = '2009-01-01' UNION ALL
 ...
 SELECT aN.u, aN.id as id FROM t_N an WHERE aN.date = '2009-01-01'
) a 
JOIN t b ON a.id = b.id WHERE b.date='2009-01-01' 
GROUP BY a.u, b.id

When we union more than 20 tables, the query will hang. It looks like something 
wrong in the compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-260) hive cli should not output the line by default

2009-01-29 Thread Ashish Thusoo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668681#action_12668681
 ] 

Ashish Thusoo commented on HIVE-260:


Does this show up along with the message that outputs the number of reducers 
etc...

If so would this just not go away with running the cli in silent mode?


 hive cli should not output the line by default
 --

 Key: HIVE-260
 URL: https://issues.apache.org/jira/browse/HIVE-260
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Zheng Shao
Priority: Blocker

 This is at the beginning of hive cli output:
 Hive history file=/tmp/zshao/hive_job_log_zshao_200901291532_-1964746650.txt
 We should remove it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-30) Hive web interface

2009-01-29 Thread Edward Capriolo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-30:


Attachment: hive-30-9.patch

Newest patch. (not a svn stat DOH!)

 Hive web interface
 --

 Key: HIVE-30
 URL: https://issues.apache.org/jira/browse/HIVE-30
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Jeff Hammerbacher
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.2.0

 Attachments: HIVE-30-5.patch, HIVE-30-6.patch, hive-30-7.patch, 
 hive-30-9.patch, HIVE-30-A.patch, HIVE-30.patch, HIVE-30.patch


 Hive needs a web interface. The initial checkin should have:
 * simple schema browsing
 * query submission
 * query history (similar to MySQL's SHOW PROCESSLIST)
 A suggested feature: the ability to have a query notify the user when it's 
 completed.
 Edward Capriolo has expressed some interest in driving this process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-01-29 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668699#action_12668699
 ] 

Edward Capriolo commented on HIVE-259:
--

95% percentile is very often used in Internet Service Provider billing that 
might be useful. 

The percentile calculation is a sort and then picking an element. The syntax 
could be like:

* PERCENTILE(column, .99) 
* PERCENTILE(column, .50)

In this manner you could do any percentile.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-82) Augment build.xml with a target to build the forrest docs and javadocs

2009-01-29 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668700#action_12668700
 ] 

Edward Capriolo commented on HIVE-82:
-

We can also use the Hive-Web-Interface to display the javadoc. If we create a 
folder $HIVE_HOME/doc the hive web server can load it as a static context.

 Augment build.xml with a target to build the forrest docs and javadocs
 --

 Key: HIVE-82
 URL: https://issues.apache.org/jira/browse/HIVE-82
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Build Infrastructure
Reporter: Jeff Hammerbacher

 See hadoop's build.xml, especially the targets docs and javadoc-dev

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-260) hive cli should not output the line by default

2009-01-29 Thread Zheng Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668706#action_12668706
 ] 

Zheng Shao commented on HIVE-260:
-

No. The number of reducers is not there.




 hive cli should not output the line by default
 --

 Key: HIVE-260
 URL: https://issues.apache.org/jira/browse/HIVE-260
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Zheng Shao
Priority: Blocker

 This is at the beginning of hive cli output:
 Hive history file=/tmp/zshao/hive_job_log_zshao_200901291532_-1964746650.txt
 We should remove it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-262) outer join gets some duplicate rows in some scenarios

2009-01-29 Thread Namit Jain (JIRA)

outer join gets some duplicate rows in some scenarios
-

 Key: HIVE-262
 URL: https://issues.apache.org/jira/browse/HIVE-262
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain


SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key  10) 
RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key  20);


returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-106) Join operation fails for some queries

2009-01-29 Thread Namit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668711#action_12668711
 ] 

Namit Jain commented on HIVE-106:
-

Josh, can you provide the data files for the tables activities and users which 
was failing

 Join operation fails for some queries
 -

 Key: HIVE-106
 URL: https://issues.apache.org/jira/browse/HIVE-106
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Josh Ferguson
Assignee: Namit Jain
Priority: Critical

 The Tables Are
 CREATE TABLE activities 
 (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
  FieldSchema(name:actee_id,type:string,comment:null), 
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
  
 actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 CREATE TABLE users 
 (id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
  
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 A working query is
 SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 A non working query is
 SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
 activities.actor_id = users.id WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 The Exception Is
 java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
 expression on string
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
   at

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

2009-01-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-262:


Attachment: patch.262.1.txt

 outer join gets some duplicate rows in some scenarios
 -

 Key: HIVE-262
 URL: https://issues.apache.org/jira/browse/HIVE-262
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.2.0

 Attachments: patch.262.1.txt


 SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key  
 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key  20);
 returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

2009-01-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-262:


Fix Version/s: 0.2.0
Affects Version/s: 0.2.0
   Status: Patch Available  (was: Open)

 outer join gets some duplicate rows in some scenarios
 -

 Key: HIVE-262
 URL: https://issues.apache.org/jira/browse/HIVE-262
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.2.0

 Attachments: patch.262.1.txt


 SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key  
 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key  20);
 returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

2009-01-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-262:


Status: Open  (was: Patch Available)

 outer join gets some duplicate rows in some scenarios
 -

 Key: HIVE-262
 URL: https://issues.apache.org/jira/browse/HIVE-262
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.2.0

 Attachments: patch.262.1.txt, patch262.2.txt


 SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key  
 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key  20);
 returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

2009-01-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-262:


Status: Patch Available  (was: Open)

 outer join gets some duplicate rows in some scenarios
 -

 Key: HIVE-262
 URL: https://issues.apache.org/jira/browse/HIVE-262
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.2.0

 Attachments: patch.262.1.txt, patch262.2.txt


 SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key  
 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key  20);
 returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

2009-01-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-262:


Attachment: patch262.2.txt

forgot to update parse result files

 outer join gets some duplicate rows in some scenarios
 -

 Key: HIVE-262
 URL: https://issues.apache.org/jira/browse/HIVE-262
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.2.0
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.2.0

 Attachments: patch.262.1.txt, patch262.2.txt


 SELECT * FROM src src1 JOIN src src2 ON (src1.key = src2.key AND src1.key  
 10) RIGHT OUTER JOIN src src3 ON (src1.key = src3.key AND src3.key  20);
 returns duplicate rows for outer join

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

2009-01-29 Thread Namit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-223:
---

Assignee: Namit Jain

 when using map-side aggregates - perform single map-reduce group-by
 ---

 Key: HIVE-223
 URL: https://issues.apache.org/jira/browse/HIVE-223
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Joydeep Sen Sarma
Assignee: Namit Jain

 today even when we do map side aggregates - we do multiple map-reduce jobs. 
 however - the reason for doing multiple map-reduce group-bys (for single 
 group-bys) was the fear of skews. When we are doing map side aggregates - 
 skews should not exist for the most part. There can be two reason for skews:
 - large number of entries for a single grouping set - map side aggregates 
 should take care of this
 - badness in hash function that sends too much stuff to one reducer - we 
 should be able to take care of this by having good hash functions (and prime 
 number reducer counts)
 So i think we should be able to do a single stage map-reduce when doing 
 map-side aggregates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-260) hive cli should not output the line by default

2009-01-29 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao resolved HIVE-260.
-

Resolution: Invalid

-S option does remove that line.


 hive cli should not output the line by default
 --

 Key: HIVE-260
 URL: https://issues.apache.org/jira/browse/HIVE-260
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Zheng Shao
Priority: Blocker

 This is at the beginning of hive cli output:
 Hive history file=/tmp/zshao/hive_job_log_zshao_200901291532_-1964746650.txt
 We should remove it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-30) Hive web interface

[jira] Commented: (HIVE-30) Hive web interface

[jira] Updated: (HIVE-253) rand() gets precomputated in compilation phase

[jira] Assigned: (HIVE-253) rand() gets precomputated in compilation phase

[jira] Assigned: (HIVE-79) Print number of raws inserted to table(s) when the query is finished.

[jira] Created: (HIVE-261) union all query hangs

[jira] Commented: (HIVE-260) hive cli should not output the line by default

[jira] Updated: (HIVE-30) Hive web interface

[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

[jira] Commented: (HIVE-82) Augment build.xml with a target to build the forrest docs and javadocs

[jira] Commented: (HIVE-260) hive cli should not output the line by default

[jira] Created: (HIVE-262) outer join gets some duplicate rows in some scenarios

[jira] Commented: (HIVE-106) Join operation fails for some queries

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

[jira] Updated: (HIVE-262) outer join gets some duplicate rows in some scenarios

[jira] Assigned: (HIVE-223) when using map-side aggregates - perform single map-reduce group-by

[jira] Resolved: (HIVE-260) hive cli should not output the line by default

20 matches

Site Navigation

Mail list logo

Footer information