[Hadoop Wiki] Trivial Update of "首页" by sunlightcs

2010-07-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "首页" page has been changed by sunlightcs.
http://wiki.apache.org/hadoop/%E9%A6%96%E9%A1%B5?action=diff&rev1=7&rev2=8

--

  
  Hadoop由Doug 
Cutting在2004年开始开发,2008年开始流行于中国,2009年在中国已经火红,包括中国移动、百度、网易、淘宝、腾讯、金山和华为等众多公司都在研究和使用它,另外还有中科院、暨南大学、浙江大学等众多高校在研究它。
  
+ 
+  * [[http://www.juziku.com/|聚资库]]
+ 


[Hadoop Wiki] Update of "Hive/DesignDocs" by JohnSichi

2010-07-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/DesignDocs" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/DesignDocs?action=diff&rev1=2&rev2=3

--

   * [[Hive/HBaseIntegration|HBase Integration]]
   * [[Hive/HBaseBulkLoad| HBase Bulk Load]]
   * [[Hive/Locking|Locking]]
+  * [[Hive/FilterPushdownDev|Filter Pushdown]]
  


Page nainai deleted from Hadoop Wiki

2010-07-13 Thread Apache Wiki
Dear wiki user,

You have subscribed to a wiki page "Hadoop Wiki" for change notification.

The page "nainai" has been deleted by DougCutting.
The comment on this change is: spam.
http://wiki.apache.org/hadoop/nainai


[Hadoop Wiki] Update of "Hive/LanguageManual/UDF" by Ar vindPrabhakar

2010-07-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/LanguageManual/UDF" page has been changed by ArvindPrabhakar.
http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF?action=diff&rev1=44&rev2=45

--

  == Built-in Aggregate Functions (UDAF) ==
  The following are built-in aggregate functions are supported in Hive:
  ||<10%>Return Type''' ''' ||<10%>Name(Signature)''' ''' ||Description''' ''' 
||
- ||bigint ||count(1), count(DISTINCT col [, col]...) ||count(1) returns the 
number of members in the group, whereas the count(DISTINCT col) gets the count 
of distinct values of the columns in the group ||
+ ||bigint ||count(*), count(expr), count(DISTINCT expr[, expr...]) || count(*) 
- Returns the total number of retrieved rows, including rows containing NULL 
values; count(expr) - Returns the number of rows for which the supplied 
expression is non-NULL; count(DISTINCT expr[, expr]) - Returns the number of 
rows for which the supplied expression(s) are unique and non-NULL. ||
  ||double ||sum(col), sum(DISTINCT col) ||Returns the sum of the elements in 
the group or the sum of the distinct values of the column in the group ||
  ||double ||avg(col), avg(DISTINCT col) ||Returns the average of the elements 
in the group or the average of the distinct values of the column in the group ||
  ||double ||min(col) ||Returns the minimum of the column in the group ||


[Hadoop Wiki] Update of "Hive/GenericUDAFCaseStudy" by ArvindPrabhakar

2010-07-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/GenericUDAFCaseStudy" page has been changed by ArvindPrabhakar.
http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy?action=diff&rev1=1&rev2=2

--

  
  == Writing the source ==
  
- As stated above, create a new file called 
`ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogram.java`, 
relative to the Hive root directory. Please see the 
`ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java`
 for a detailed example of a UDAF.
+ This section gives a high-level outline of how to implement your own generic 
UDAF. For a concrete example, look at any of the existing UDAF sources present 
in `ql/src/java/org/apache/hadoop/hive/ql/udf/generic/` directory.
+ 
+ At a high-level, there are two parts to implementing a Generic UDAF. The 
first is to write an ''evaluator'', and the second is to create a ''resolver''. 
An evaluator is the actual implementation of the generic UDAF with the 
processing logic in place. The resolver on the other provides a mechanism for 
the evaluator to be accessed by the query processing framework.
+ 
+ All evaluators must extend from the abstract base class 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator. This class provides 
a few abstract methods that must be implemented by the extending class. These 
methods establish the processing semantics followed by the UDAF. Please refer 
to the javadocs for the abstract methods to see their exact specifications.
+ 
+ The implementation of resolver is done by either implementing the interface 
org.apache.hadoop.hive.ql.udf.GenericUDAFResolver2 or extending from the 
abstract class 
org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver. There is 
also an interface org.apache.hadoop.hive.ql.udf.GenericUDAFResolver that can be 
implemented, but is deprecated as of 0.6.0 release. The key difference between 
GenericUDAFResolver and GenericUDAFResovler2 interface is the fact that the 
later allows the evaluator implementation to access extra information regarding 
the function invocation such as the presence of DISTINCT qualifier or the 
invocation with the wildcard syntax such as FUNCTION(*). Evaluators that 
implement the deprecated GenericUDAFResolver interface will not be able to tell 
the difference between an invocation such as FUNCTION() or FUNCTION(*) since 
the information regarding specification of the wildcard is not available. 
Similarly, these implementations will also not be able to tell the difference 
between FUNCTION(EXPR) vs FUNCTION(DISTINCT EXPR) since the information 
regarding presence of the DISTINCT qualifier too is not available.
+ 
+ Note that while the resolvers which implement the GenericUDAFResolver2 
interface are provided the extra information regarding the presence of DISTINCT 
qualifier of invocation with the wildcard syntax, they can choose to ignore it 
completely if it is of no significance to them. The underlying data 
manipulation to ensure DISTINCT nature of the expression values is actually 
done by the framework and not by the evaluator or resolver. For UDAF 
implementations that do not care about this extra information, they could 
simply extend from the AbstractGenericUDAFResolver interface which insulates 
the implementation from this information. It also offers an easy way to 
transition previously written UDAF implementations to migrate to the new 
resolver interface without having to re-write the implementation since the 
change from implementing GenericUDAFResolver interface to extending 
AbstractGenericUDAFResolver class is fairly minimal. There may be issues with 
implementations that are part of a inheritance hierarchy since it may not be 
easy to change the base class.
  
  == Modifying the function registry ==
  


svn commit: r963907 - in /hadoop/common/branches/branch-0.20: ./ src/hdfs/org/apache/hadoop/hdfs/server/namenode/ src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/ src/test/org/apache/hadoop/hd

2010-07-13 Thread shv
Author: shv
Date: Tue Jul 13 23:49:58 2010
New Revision: 963907

URL: http://svn.apache.org/viewvc?rev=963907&view=rev
Log:
HDFS-132. Port to branch 0.20. Contributed by Konstantin Shvachko.

Modified:
hadoop/common/branches/branch-0.20/CHANGES.txt

hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java

hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java

hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java

hadoop/common/branches/branch-0.20/src/test/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java

Modified: hadoop/common/branches/branch-0.20/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/CHANGES.txt?rev=963907&r1=963906&r2=963907&view=diff
==
--- hadoop/common/branches/branch-0.20/CHANGES.txt (original)
+++ hadoop/common/branches/branch-0.20/CHANGES.txt Tue Jul 13 23:49:58 2010
@@ -39,6 +39,9 @@ Release 0.20.3 - Unreleased
 HDFS-1258. Clearing namespace quota on "/" corrupts fs image.  
 (Aaron T. Myers via szetszwo)
 
+HDFS-132. Fix namenode to not report files deleted metrics for deletions
+done while replaying edits during startup. (suresh & shv)
+
   IMPROVEMENTS
 
 MAPREDUCE-1407. Update javadoc in mapreduce.{Mapper,Reducer} to match

Modified: 
hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
URL: 
http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java?rev=963907&r1=963906&r2=963907&view=diff
==
--- 
hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
 (original)
+++ 
hadoop/common/branches/branch-0.20/src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
 Tue Jul 13 23:49:58 2010
@@ -25,9 +25,6 @@ import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.ContentSummary;
 import org.apache.hadoop.fs.permission.*;
-import org.apache.hadoop.metrics.MetricsRecord;
-import org.apache.hadoop.metrics.MetricsUtil;
-import org.apache.hadoop.metrics.MetricsContext;
 import org.apache.hadoop.hdfs.protocol.FSConstants;
 import org.apache.hadoop.hdfs.protocol.Block;
 import org.apache.hadoop.hdfs.protocol.QuotaExceededException;
@@ -49,8 +46,6 @@ class FSDirectory implements FSConstants
   final INodeDirectoryWithQuota rootDir;
   FSImage fsImage;  
   private boolean ready = false;
-  // Metrics record
-  private MetricsRecord directoryMetrics = null;
 
   /** Access an existing dfs name directory. */
   FSDirectory(FSNamesystem ns, Configuration conf) {
@@ -65,13 +60,6 @@ class FSDirectory implements FSConstants
 Integer.MAX_VALUE, -1);
 this.fsImage = fsImage;
 namesystem = ns;
-initialize(conf);
-  }
-
-  private void initialize(Configuration conf) {
-MetricsContext metricsContext = MetricsUtil.getContext("dfs");
-directoryMetrics = MetricsUtil.createRecord(metricsContext, "FSDirectory");
-directoryMetrics.setTag("sessionId", conf.get("session.id"));
   }
 
   void loadFSImage(Collection dataDirs,
@@ -103,8 +91,8 @@ class FSDirectory implements FSConstants
   }
 
   private void incrDeletedFileCount(int count) {
-directoryMetrics.incrMetric("files_deleted", count);
-directoryMetrics.update();
+if (namesystem != null)
+  NameNode.getNameNodeMetrics().numFilesDeleted.inc(count);
   }
 
   /**
@@ -569,17 +557,19 @@ class FSDirectory implements FSConstants
   /**
* Remove the file from management, return blocks
*/
-  INode delete(String src) {
+  boolean delete(String src) {
 if (NameNode.stateChangeLog.isDebugEnabled()) {
   NameNode.stateChangeLog.debug("DIR* FSDirectory.delete: "+src);
 }
 waitForReady();
 long now = FSNamesystem.now();
-INode deletedNode = unprotectedDelete(src, now);
-if (deletedNode != null) {
-  fsImage.getEditLog().logDelete(src, now);
+int filesRemoved = unprotectedDelete(src, now);
+if (filesRemoved <= 0) {
+  return false;
 }
-return deletedNode;
+incrDeletedFileCount(filesRemoved);
+fsImage.getEditLog().logDelete(src, now);
+return true;
   }
   
   /** Return if a directory is empty or not **/
@@ -604,9 +594,9 @@ class FSDirectory implements FSConstants
* @param src a string representation of a path to an inode
* @param modificationTime the time the inode is removed
* @param deletedBlocks the place holder for the blocks to be removed
-   * @return if the deletion succeeds
+   * @return the number of inodes deleted; 0 if no inodes are deleted.
*/ 
-  INode unprotectedDelete(String src, long modificationTime) {
+  int 

[Hadoop Wiki] Update of "Hive/LanguageManual/Joins" by EdwardCapriolo

2010-07-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/LanguageManual/Joins" page has been changed by EdwardCapriolo.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins?action=diff&rev1=19&rev2=20

--

  <>
  
  ## page was renamed from Hive/LanguageManual/LanguageManual/Joins
- == Join Syntax ==
+ == THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join Syntax ==
  Hive supports the following syntax for joining tables:
  
  {{{
@@ -24, +24 @@

  join_condition:
  ON equality_expression ( AND equality_expression )*
  
- equality_expression: 
+ equality_expression:
  expression = expression
  }}}
+ Only equality joins, outer joins, and left semi joins are supported in Hive. 
Hive does not support join conditions that are not equality conditions as it is 
very difficult to express such conditions as a map/reduce job. Also, more than 
two tables can be joined in Hive.
- 
- Only equality joins, outer joins, and left semi joins are supported in Hive. 
Hive does not support join conditions that are not equality
- conditions as it is very difficult to express such conditions as a map/reduce 
job. Also, more than two tables can be
- joined in Hive.
  
  Some salient points to consider when writing join queries are as follows:
  
   * Only equality joins are allowed e.g.
+ 
- {{{ 
+ {{{
-   SELECT a.* FROM a JOIN b ON (a.id = b.id) 
+   SELECT a.* FROM a JOIN b ON (a.id = b.id)
  }}}
- {{{ 
+ {{{
-   SELECT a.* FROM a JOIN b ON (a.id = b.id AND a.department = b.department) 
+   SELECT a.* FROM a JOIN b ON (a.id = b.id AND a.department = b.department)
  }}}
-   are both valid joins, however
+  . are both valid joins, however
+ 
  {{{
SELECT a.* FROM a JOIN b ON (a.id <> b.id)
  }}}
-   is NOT allowed
+  . is NOT allowed
+ 
   * More than 2 tables can be joined in the same query e.g.
+ 
  {{{
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key2)
  }}}
-   is a valid join.
+  . is a valid join.
+ 
   * Hive converts joins over multiple tables into a single map/reduce job if 
for every table the same column is used in the join clauses e.g.
+ 
  {{{
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key1)
  }}}
-   is converted into a single map/reduce job as only key1 column for b is 
involved in the join. On the other hand
+  . is converted into a single map/reduce job as only key1 column for b is 
involved in the join. On the other hand
+ 
  {{{
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key2)
  }}}
-   is converted into two map/reduce jobs because key1 column from b is used in 
the first join condition and key2 column from b is used in the second one. The 
first map/reduce job joins a with b and the results are then joined with c in 
the second map/reduce job.
+  . is converted into two map/reduce jobs because key1 column from b is used 
in the first join condition and key2 column from b is used in the second one. 
The first map/reduce job joins a with b and the results are then joined with c 
in the second map/reduce job.
+ 
   * In every map/reduce stage of the join, the last table in the sequence is 
streamed through the reducers where as the others are buffered. Therefore, it 
helps to reduce the memory needed in the reducer for buffering the rows for a 
particular value of the join key by organizing the tables such that the largest 
tables appear last in the sequence. e.g. in
+ 
  {{{
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key1)
  }}}
-   all the three tables are joined in a single map/reduce job and the values 
for a particular value of the key for tables a and b are buffered in the memory 
in the reducers. Then for each row retrieved from c, the join is computed with 
the buffered rows. Similarly for
+  . all the three tables are joined in a single map/reduce job and the values 
for a particular value of the key for tables a and b are buffered in the memory 
in the reducers. Then for each row retrieved from c, the join is computed with 
the buffered rows. Similarly for
+ 
  {{{
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key2)
  }}}
-   there are two map/reduce jobs involved in computing the join. The first of 
these joins a with b and buffers the values of a while streaming the values of 
b in the reducers. The second of one of these jobs buffers the results of the 
first join while streaming the values of c through the reducers.
+  . there are two map/reduce jobs involved in computing the join. The first of 
these joins a with b and buffers the values of a while streaming the values of 
b in the reducers. The second of one of these jobs buffers the results of the 
first join while streaming the values of c through the reducers.
+ 
   * In every map/reduce stage of the joi