spark git commit: [SPARK-7886] Added unit test for HAVING aggregate pushdown.
Repository: spark Updated Branches: refs/heads/master 57c60c5be - e90035e67 [SPARK-7886] Added unit test for HAVING aggregate pushdown. This is a followup to #6712. Author: Reynold Xin r...@databricks.com Closes #6739 from rxin/6712-followup and squashes the following commits: fd9acfb [Reynold Xin] [SPARK-7886] Added unit test for HAVING aggregate pushdown. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e90035e6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e90035e6 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e90035e6 Branch: refs/heads/master Commit: e90035e676e492de840f44b61b330db526313019 Parents: 57c60c5 Author: Reynold Xin r...@databricks.com Authored: Wed Jun 10 18:58:01 2015 +0800 Committer: Cheng Lian l...@databricks.com Committed: Wed Jun 10 18:58:01 2015 +0800 -- .../src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala | 7 +++ .../src/main/scala/org/apache/spark/sql/hive/HiveQl.scala | 1 - 2 files changed, 7 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e90035e6/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 5babc43..3ca5ff3 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -38,6 +38,13 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll with SQLTestUtils { import sqlContext.implicits._ import sqlContext.sql + test(having clause) { +Seq((one, 1), (two, 2), (three, 3), (one, 5)).toDF(k, v).registerTempTable(hav) +checkAnswer( + sql(SELECT k, sum(v) FROM hav GROUP BY k HAVING sum(v) 2), + Row(one, 6) :: Row(three, 3) :: Nil) + } + test(SPARK-6743: no columns from cache) { Seq( (83, 0, 38), http://git-wip-us.apache.org/repos/asf/spark/blob/e90035e6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala index 041483e..ca4b80b 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala @@ -1307,7 +1307,6 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C HiveParser.DecimalLiteral) /* Case insensitive matches */ - val COALESCE = (?i)COALESCE.r val COUNT = (?i)COUNT.r val SUM = (?i)SUM.r val AND = (?i)AND.r - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7886] Use FunctionRegistry for built-in expressions in HiveContext.
Repository: spark Updated Branches: refs/heads/master 778f3ca81 - 57c60c5be [SPARK-7886] Use FunctionRegistry for built-in expressions in HiveContext. This builds on #6710 and also uses FunctionRegistry for function lookup in HiveContext. Author: Reynold Xin r...@databricks.com Closes #6712 from rxin/udf-registry-hive and squashes the following commits: f4c2df0 [Reynold Xin] Fixed style violation. 0bd4127 [Reynold Xin] Fixed Python UDFs. f9a0378 [Reynold Xin] Disable one more test. 5609494 [Reynold Xin] Disable some failing tests. 4efea20 [Reynold Xin] Don't check children resolved for UDF resolution. 2ebe549 [Reynold Xin] Removed more hardcoded functions. aadce78 [Reynold Xin] [SPARK-7886] Use FunctionRegistry for built-in expressions in HiveContext. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/57c60c5b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/57c60c5b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/57c60c5b Branch: refs/heads/master Commit: 57c60c5be7aa731ca1a6966f4285eb02f481eb71 Parents: 778f3ca Author: Reynold Xin r...@databricks.com Authored: Wed Jun 10 00:36:16 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 00:36:16 2015 -0700 -- .../apache/spark/sql/catalyst/SqlParser.scala | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 9 +-- .../catalyst/analysis/FunctionRegistry.scala| 12 +++- .../sql/catalyst/expressions/Expression.scala | 8 +-- .../scala/org/apache/spark/sql/SQLContext.scala | 7 +-- .../apache/spark/sql/execution/pythonUdfs.scala | 66 +++- .../hive/execution/HiveCompatibilitySuite.scala | 10 +-- .../org/apache/spark/sql/hive/HiveContext.scala | 2 +- .../org/apache/spark/sql/hive/HiveQl.scala | 30 - .../org/apache/spark/sql/hive/hiveUdfs.scala| 51 --- 10 files changed, 92 insertions(+), 105 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/57c60c5b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala index f74c17d..da3a717 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala @@ -68,7 +68,6 @@ class SqlParser extends AbstractSparkSQLParser with DataTypeParser { protected val FULL = Keyword(FULL) protected val GROUP = Keyword(GROUP) protected val HAVING = Keyword(HAVING) - protected val IF = Keyword(IF) protected val IN = Keyword(IN) protected val INNER = Keyword(INNER) protected val INSERT = Keyword(INSERT) @@ -277,6 +276,7 @@ class SqlParser extends AbstractSparkSQLParser with DataTypeParser { lexical.normalizeKeyword(udfName) match { case sum = SumDistinct(exprs.head) case count = CountDistinct(exprs) +case _ = throw new AnalysisException(sfunction $udfName does not support DISTINCT) } } | APPROXIMATE ~ ident ~ (( ~ DISTINCT ~ expression ~ )) ^^ { case udfName ~ exp = http://git-wip-us.apache.org/repos/asf/spark/blob/57c60c5b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 02b10c4..c4f12cf 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -460,7 +460,7 @@ class Analyzer( def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan = q transformExpressions { - case u @ UnresolvedFunction(name, children) if u.childrenResolved = + case u @ UnresolvedFunction(name, children) = withPosition(u) { registry.lookupFunction(name, children) } @@ -494,20 +494,21 @@ class Analyzer( object UnresolvedHavingClauseAttributes extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case filter @ Filter(havingCondition, aggregate @ Aggregate(_, originalAggExprs, _)) - if aggregate.resolved containsAggregate(havingCondition) = { + if aggregate.resolved containsAggregate(havingCondition) = + val evaluatedCondition = Alias(havingCondition, havingCondition)() val aggExprsWithHaving = evaluatedCondition +:
spark git commit: [SPARK-8215] [SPARK-8212] [SQL] add leaf math expression for e and pi
Repository: spark Updated Branches: refs/heads/master e90035e67 - c6ba7cca3 [SPARK-8215] [SPARK-8212] [SQL] add leaf math expression for e and pi Author: Daoyuan Wang daoyuan.w...@intel.com Closes #6716 from adrian-wang/epi and squashes the following commits: e2e8dbd [Daoyuan Wang] move tests 11b351c [Daoyuan Wang] add tests and remove pu db331c9 [Daoyuan Wang] py style 599ddd8 [Daoyuan Wang] add py e6783ef [Daoyuan Wang] register function 82d426e [Daoyuan Wang] add function entry dbf3ab5 [Daoyuan Wang] add PI and E Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c6ba7cca Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c6ba7cca Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c6ba7cca Branch: refs/heads/master Commit: c6ba7cca3338e3f4f719d86dbcff4406d949edc7 Parents: e90035e Author: Daoyuan Wang daoyuan.w...@intel.com Authored: Wed Jun 10 09:45:45 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 09:45:45 2015 -0700 -- .../catalyst/analysis/FunctionRegistry.scala| 2 ++ .../spark/sql/catalyst/expressions/math.scala | 35 .../expressions/MathFunctionsSuite.scala| 22 .../scala/org/apache/spark/sql/functions.scala | 18 ++ .../spark/sql/DataFrameFunctionsSuite.scala | 19 +++ 5 files changed, 96 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c6ba7cca/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index 936ffc7..ba89a5c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -106,6 +106,7 @@ object FunctionRegistry { expression[Cbrt](cbrt), expression[Ceil](ceil), expression[Cos](cos), +expression[EulerNumber](e), expression[Exp](exp), expression[Expm1](expm1), expression[Floor](floor), @@ -113,6 +114,7 @@ object FunctionRegistry { expression[Log](log), expression[Log10](log10), expression[Log1p](log1p), +expression[Pi](pi), expression[Pow](pow), expression[Rint](rint), expression[Signum](signum), http://git-wip-us.apache.org/repos/asf/spark/blob/c6ba7cca/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala index 7dacb6a..e1d8c9a 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala @@ -21,8 +21,33 @@ import org.apache.spark.sql.catalyst.expressions.codegen._ import org.apache.spark.sql.types.{DataType, DoubleType} /** + * A leaf expression specifically for math constants. Math constants expect no input. + * @param c The math constant. + * @param name The short name of the function + */ +abstract class LeafMathExpression(c: Double, name: String) + extends LeafExpression with Serializable { + self: Product = + + override def dataType: DataType = DoubleType + override def foldable: Boolean = true + override def nullable: Boolean = false + override def toString: String = s$name() + + override def eval(input: Row): Any = c + + override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = { +s + boolean ${ev.isNull} = false; + ${ctx.javaType(dataType)} ${ev.primitive} = java.lang.Math.$name; + + } +} + +/** * A unary expression specifically for math functions. Math Functions expect a specific type of * input format, therefore these functions extend `ExpectsInputTypes`. + * @param f The math function. * @param name The short name of the function */ abstract class UnaryMathExpression(f: Double = Double, name: String) @@ -100,6 +125,16 @@ abstract class BinaryMathExpression(f: (Double, Double) = Double, name: String) +// Leaf math functions +
spark git commit: [SPARK-8282] [SPARKR] Make number of threads used in RBackend configurable
Repository: spark Updated Branches: refs/heads/branch-1.4 7b88e6a1e - 28e8a6ea6 [SPARK-8282] [SPARKR] Make number of threads used in RBackend configurable Read number of threads for RBackend from configuration. [SPARK-8282] #comment Linking with JIRA Author: Hossein hoss...@databricks.com Closes #6730 from falaki/SPARK-8282 and squashes the following commits: 33b3d98 [Hossein] Documented new config parameter 70f2a9c [Hossein] Fixing import ec44225 [Hossein] Read number of threads for RBackend from configuration (cherry picked from commit 30ebf1a233295539c2455bd838bae7315711e1e2) Signed-off-by: Andrew Or and...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/28e8a6ea Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/28e8a6ea Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/28e8a6ea Branch: refs/heads/branch-1.4 Commit: 28e8a6ea65fd08ab9cefc4d179d5c66ffefd3eb4 Parents: 7b88e6a Author: Hossein hoss...@databricks.com Authored: Wed Jun 10 13:18:48 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:19:53 2015 -0700 -- .../main/scala/org/apache/spark/api/r/RBackend.scala| 5 +++-- docs/configuration.md | 12 2 files changed, 15 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/28e8a6ea/core/src/main/scala/org/apache/spark/api/r/RBackend.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/r/RBackend.scala b/core/src/main/scala/org/apache/spark/api/r/RBackend.scala index d24c650..1a5f2bc 100644 --- a/core/src/main/scala/org/apache/spark/api/r/RBackend.scala +++ b/core/src/main/scala/org/apache/spark/api/r/RBackend.scala @@ -29,7 +29,7 @@ import io.netty.channel.socket.nio.NioServerSocketChannel import io.netty.handler.codec.LengthFieldBasedFrameDecoder import io.netty.handler.codec.bytes.{ByteArrayDecoder, ByteArrayEncoder} -import org.apache.spark.Logging +import org.apache.spark.{Logging, SparkConf} /** * Netty-based backend server that is used to communicate between R and Java. @@ -41,7 +41,8 @@ private[spark] class RBackend { private[this] var bossGroup: EventLoopGroup = null def init(): Int = { -bossGroup = new NioEventLoopGroup(2) +val conf = new SparkConf() +bossGroup = new NioEventLoopGroup(conf.getInt(spark.r.numRBackendThreads, 2)) val workerGroup = bossGroup val handler = new RBackendHandler(this) http://git-wip-us.apache.org/repos/asf/spark/blob/28e8a6ea/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 3960e7e..95a322f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1495,6 +1495,18 @@ Apart from these, the following properties are also available, and may be useful /tr /table + SparkR +table class=table +trthProperty Name/ththDefault/ththMeaning/th/tr +tr + tdcodespark.r.numRBackendThreads/code/td + td2/td + td +Number of threads used by RBackend to handle RPC calls from SparkR package. + /td +/tr +/table + Cluster Managers Each cluster manager in Spark has additional configuration options. Configurations can be found on the pages for each mode: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8282] [SPARKR] Make number of threads used in RBackend configurable
Repository: spark Updated Branches: refs/heads/master 38112905b - 30ebf1a23 [SPARK-8282] [SPARKR] Make number of threads used in RBackend configurable Read number of threads for RBackend from configuration. [SPARK-8282] #comment Linking with JIRA Author: Hossein hoss...@databricks.com Closes #6730 from falaki/SPARK-8282 and squashes the following commits: 33b3d98 [Hossein] Documented new config parameter 70f2a9c [Hossein] Fixing import ec44225 [Hossein] Read number of threads for RBackend from configuration Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30ebf1a2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30ebf1a2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30ebf1a2 Branch: refs/heads/master Commit: 30ebf1a233295539c2455bd838bae7315711e1e2 Parents: 3811290 Author: Hossein hoss...@databricks.com Authored: Wed Jun 10 13:18:48 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:19:44 2015 -0700 -- .../main/scala/org/apache/spark/api/r/RBackend.scala| 5 +++-- docs/configuration.md | 12 2 files changed, 15 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/30ebf1a2/core/src/main/scala/org/apache/spark/api/r/RBackend.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/r/RBackend.scala b/core/src/main/scala/org/apache/spark/api/r/RBackend.scala index d24c650..1a5f2bc 100644 --- a/core/src/main/scala/org/apache/spark/api/r/RBackend.scala +++ b/core/src/main/scala/org/apache/spark/api/r/RBackend.scala @@ -29,7 +29,7 @@ import io.netty.channel.socket.nio.NioServerSocketChannel import io.netty.handler.codec.LengthFieldBasedFrameDecoder import io.netty.handler.codec.bytes.{ByteArrayDecoder, ByteArrayEncoder} -import org.apache.spark.Logging +import org.apache.spark.{Logging, SparkConf} /** * Netty-based backend server that is used to communicate between R and Java. @@ -41,7 +41,8 @@ private[spark] class RBackend { private[this] var bossGroup: EventLoopGroup = null def init(): Int = { -bossGroup = new NioEventLoopGroup(2) +val conf = new SparkConf() +bossGroup = new NioEventLoopGroup(conf.getInt(spark.r.numRBackendThreads, 2)) val workerGroup = bossGroup val handler = new RBackendHandler(this) http://git-wip-us.apache.org/repos/asf/spark/blob/30ebf1a2/docs/configuration.md -- diff --git a/docs/configuration.md b/docs/configuration.md index 3960e7e..95a322f 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1495,6 +1495,18 @@ Apart from these, the following properties are also available, and may be useful /tr /table + SparkR +table class=table +trthProperty Name/ththDefault/ththMeaning/th/tr +tr + tdcodespark.r.numRBackendThreads/code/td + td2/td + td +Number of threads used by RBackend to handle RPC calls from SparkR package. + /td +/tr +/table + Cluster Managers Each cluster manager in Spark has additional configuration options. Configurations can be found on the pages for each mode: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7756] CORE RDDOperationScope fix for IBM Java
Repository: spark Updated Branches: refs/heads/master 30ebf1a23 - 19e30b48f [SPARK-7756] CORE RDDOperationScope fix for IBM Java IBM Java has an extra method when we do getStackTrace(): this is getStackTraceImpl, a native method. This causes two tests to fail within DStreamScopeSuite when running with IBM Java. Instead of map or filter being the method names found, getStackTrace is returned. This commit addresses such an issue by using dropWhile. Given that our current method is withScope, we look for the next method that isn't ours: we don't care about methods that come before us in the stack trace: e.g. getStackTrace (regardless of how many levels this might go). IBM: java.lang.Thread.getStackTraceImpl(Native Method) java.lang.Thread.getStackTrace(Thread.java:1117) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:104) Oracle: PRINTING STACKTRACE!!! java.lang.Thread.getStackTrace(Thread.java:1552) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:106) I've tested this with Oracle and IBM Java, no side effects for other tests introduced. Author: Adam Roberts arobe...@uk.ibm.com Author: a-roberts arobe...@uk.ibm.com Closes #6740 from a-roberts/RDDScopeStackCrawlFix and squashes the following commits: 13ce390 [Adam Roberts] Ensure consistency with String equality checking a4fc0e0 [a-roberts] Update RDDOperationScope.scala Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/19e30b48 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/19e30b48 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/19e30b48 Branch: refs/heads/master Commit: 19e30b48f3c6d0b72871d3e15b9564c1b2822700 Parents: 30ebf1a Author: Adam Roberts arobe...@uk.ibm.com Authored: Wed Jun 10 13:21:01 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:21:51 2015 -0700 -- .../main/scala/org/apache/spark/rdd/RDDOperationScope.scala | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/19e30b48/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala b/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala index 6b09dfa..4466728 100644 --- a/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala +++ b/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala @@ -95,10 +95,9 @@ private[spark] object RDDOperationScope extends Logging { private[spark] def withScope[T]( sc: SparkContext, allowNesting: Boolean = false)(body: = T): T = { -val stackTrace = Thread.currentThread.getStackTrace().tail // ignore Thread#getStackTrace -val ourMethodName = stackTrace(1).getMethodName // i.e. withScope -// Climb upwards to find the first method that's called something different -val callerMethodName = stackTrace +val ourMethodName = withScope +val callerMethodName = Thread.currentThread.getStackTrace() + .dropWhile(_.getMethodName != ourMethodName) .find(_.getMethodName != ourMethodName) .map(_.getMethodName) .getOrElse { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7756] CORE RDDOperationScope fix for IBM Java
Repository: spark Updated Branches: refs/heads/branch-1.4 28e8a6ea6 - 568d1d51d [SPARK-7756] CORE RDDOperationScope fix for IBM Java IBM Java has an extra method when we do getStackTrace(): this is getStackTraceImpl, a native method. This causes two tests to fail within DStreamScopeSuite when running with IBM Java. Instead of map or filter being the method names found, getStackTrace is returned. This commit addresses such an issue by using dropWhile. Given that our current method is withScope, we look for the next method that isn't ours: we don't care about methods that come before us in the stack trace: e.g. getStackTrace (regardless of how many levels this might go). IBM: java.lang.Thread.getStackTraceImpl(Native Method) java.lang.Thread.getStackTrace(Thread.java:1117) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:104) Oracle: PRINTING STACKTRACE!!! java.lang.Thread.getStackTrace(Thread.java:1552) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:106) I've tested this with Oracle and IBM Java, no side effects for other tests introduced. Author: Adam Roberts arobe...@uk.ibm.com Author: a-roberts arobe...@uk.ibm.com Closes #6740 from a-roberts/RDDScopeStackCrawlFix and squashes the following commits: 13ce390 [Adam Roberts] Ensure consistency with String equality checking a4fc0e0 [a-roberts] Update RDDOperationScope.scala (cherry picked from commit 19e30b48f3c6d0b72871d3e15b9564c1b2822700) Signed-off-by: Andrew Or and...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/568d1d51 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/568d1d51 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/568d1d51 Branch: refs/heads/branch-1.4 Commit: 568d1d51d695bea4389f4470cd98707f3049885a Parents: 28e8a6e Author: Adam Roberts arobe...@uk.ibm.com Authored: Wed Jun 10 13:21:01 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:21:59 2015 -0700 -- .../main/scala/org/apache/spark/rdd/RDDOperationScope.scala | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/568d1d51/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala -- diff --git a/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala b/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala index 6b09dfa..4466728 100644 --- a/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala +++ b/core/src/main/scala/org/apache/spark/rdd/RDDOperationScope.scala @@ -95,10 +95,9 @@ private[spark] object RDDOperationScope extends Logging { private[spark] def withScope[T]( sc: SparkContext, allowNesting: Boolean = false)(body: = T): T = { -val stackTrace = Thread.currentThread.getStackTrace().tail // ignore Thread#getStackTrace -val ourMethodName = stackTrace(1).getMethodName // i.e. withScope -// Climb upwards to find the first method that's called something different -val callerMethodName = stackTrace +val ourMethodName = withScope +val callerMethodName = Thread.currentThread.getStackTrace() + .dropWhile(_.getMethodName != ourMethodName) .find(_.getMethodName != ourMethodName) .map(_.getMethodName) .getOrElse { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7261] [CORE] Change default log level to WARN in the REPL
Repository: spark Updated Branches: refs/heads/master e90c9d92d - 80043e9e7 [SPARK-7261] [CORE] Change default log level to WARN in the REPL 1. Add `log4j-defaults-repl.properties` that has log level WARN. 2. When logging is initialized, check whether inside the REPL. If so, use `log4j-defaults-repl.properties`. 3. Print the following information if using `log4j-defaults-repl.properties`: ``` Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel(INFO) ``` Author: zsxwing zsxw...@gmail.com Closes #6734 from zsxwing/log4j-repl and squashes the following commits: 3835eff [zsxwing] Change default log level to WARN in the REPL Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/80043e9e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/80043e9e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/80043e9e Branch: refs/heads/master Commit: 80043e9e761c44ce2c3a432dcd1989be573f8bb4 Parents: e90c9d9 Author: zsxwing zsxw...@gmail.com Authored: Wed Jun 10 13:25:59 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:26:33 2015 -0700 -- .rat-excludes | 1 + .../apache/spark/log4j-defaults-repl.properties | 12 + .../main/scala/org/apache/spark/Logging.scala | 26 ++-- 3 files changed, 32 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/80043e9e/.rat-excludes -- diff --git a/.rat-excludes b/.rat-excludes index 994c7e8..aa008e6 100644 --- a/.rat-excludes +++ b/.rat-excludes @@ -28,6 +28,7 @@ spark-env.sh spark-env.cmd spark-env.sh.template log4j-defaults.properties +log4j-defaults-repl.properties bootstrap-tooltip.js jquery-1.11.1.min.js d3.min.js http://git-wip-us.apache.org/repos/asf/spark/blob/80043e9e/core/src/main/resources/org/apache/spark/log4j-defaults-repl.properties -- diff --git a/core/src/main/resources/org/apache/spark/log4j-defaults-repl.properties b/core/src/main/resources/org/apache/spark/log4j-defaults-repl.properties new file mode 100644 index 000..b146f8a --- /dev/null +++ b/core/src/main/resources/org/apache/spark/log4j-defaults-repl.properties @@ -0,0 +1,12 @@ +# Set everything to be logged to the console +log4j.rootCategory=WARN, console +log4j.appender.console=org.apache.log4j.ConsoleAppender +log4j.appender.console.target=System.err +log4j.appender.console.layout=org.apache.log4j.PatternLayout +log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n + +# Settings to quiet third party logs that are too verbose +log4j.logger.org.spark-project.jetty=WARN +log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR +log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO +log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO http://git-wip-us.apache.org/repos/asf/spark/blob/80043e9e/core/src/main/scala/org/apache/spark/Logging.scala -- diff --git a/core/src/main/scala/org/apache/spark/Logging.scala b/core/src/main/scala/org/apache/spark/Logging.scala index 419d093..7fcb783 100644 --- a/core/src/main/scala/org/apache/spark/Logging.scala +++ b/core/src/main/scala/org/apache/spark/Logging.scala @@ -121,13 +121,25 @@ trait Logging { if (usingLog4j12) { val log4j12Initialized = LogManager.getRootLogger.getAllAppenders.hasMoreElements if (!log4j12Initialized) { -val defaultLogProps = org/apache/spark/log4j-defaults.properties -Option(Utils.getSparkClassLoader.getResource(defaultLogProps)) match { - case Some(url) = -PropertyConfigurator.configure(url) -System.err.println(sUsing Spark's default log4j profile: $defaultLogProps) - case None = -System.err.println(sSpark was unable to load $defaultLogProps) +if (Utils.isInInterpreter) { + val replDefaultLogProps = org/apache/spark/log4j-defaults-repl.properties + Option(Utils.getSparkClassLoader.getResource(replDefaultLogProps)) match { +case Some(url) = + PropertyConfigurator.configure(url) + System.err.println(sUsing Spark's repl log4j profile: $replDefaultLogProps) + System.err.println(To adjust logging level use sc.setLogLevel(\INFO\)) +case None = + System.err.println(sSpark was unable to load $replDefaultLogProps) + } +} else { + val defaultLogProps = org/apache/spark/log4j-defaults.properties +
spark git commit: [SPARK-8273] Driver hangs up when yarn shutdown in client mode
Repository: spark Updated Branches: refs/heads/master cb871c44c - 5014d0ed7 [SPARK-8273] Driver hangs up when yarn shutdown in client mode In client mode, if yarn was shut down with spark application running, the application will hang up after several retries(default: 30) because the exception throwed by YarnClientImpl could not be caught by upper level, we should exit in case that user can not be aware that. The exception we wanna catch is [here](https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java#L122), and I try to fix it refer to [MR](https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java#L320). Author: WangTaoTheTonic wangtao...@huawei.com Closes #6717 from WangTaoTheTonic/SPARK-8273 and squashes the following commits: 28752d6 [WangTaoTheTonic] catch the throwed exception Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5014d0ed Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5014d0ed Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5014d0ed Branch: refs/heads/master Commit: 5014d0ed7e2f69810654003f8dd38078b945cf05 Parents: cb871c4 Author: WangTaoTheTonic wangtao...@huawei.com Authored: Wed Jun 10 13:34:19 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:35:51 2015 -0700 -- yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5014d0ed/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala -- diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index ec9402a..da1ec2a 100644 --- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -29,6 +29,7 @@ import scala.collection.JavaConversions._ import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, ListBuffer, Map} import scala.reflect.runtime.universe import scala.util.{Try, Success, Failure} +import scala.util.control.NonFatal import com.google.common.base.Charsets.UTF_8 import com.google.common.base.Objects @@ -826,6 +827,9 @@ private[spark] class Client( case e: ApplicationNotFoundException = logError(sApplication $appId not found.) return (YarnApplicationState.KILLED, FinalApplicationStatus.KILLED) + case NonFatal(e) = +logError(sFailed to contact YARN for application $appId., e) +return (YarnApplicationState.FAILED, FinalApplicationStatus.FAILED) } val state = report.getYarnApplicationState - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8290] spark class command builder need read SPARK_JAVA_OPTS and SPARK_DRIVER_MEMORY properly
Repository: spark Updated Branches: refs/heads/master 80043e9e7 - cb871c44c [SPARK-8290] spark class command builder need read SPARK_JAVA_OPTS and SPARK_DRIVER_MEMORY properly SPARK_JAVA_OPTS was missed in reconstructing the launcher part, we should add it back so process launched by spark-class could read it properly. And so does `SPARK_DRIVER_MEMORY`. The missing part is [here](https://github.com/apache/spark/blob/1c30afdf94b27e1ad65df0735575306e65d148a1/bin/spark-class#L97). Author: WangTaoTheTonic wangtao...@huawei.com Author: Tao Wang wangtao...@huawei.com Closes #6741 from WangTaoTheTonic/SPARK-8290 and squashes the following commits: bd89f0f [Tao Wang] make sure the memory setting is right too e313520 [WangTaoTheTonic] spark class command builder need read SPARK_JAVA_OPTS Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb871c44 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb871c44 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb871c44 Branch: refs/heads/master Commit: cb871c44c38a4c1575ed076389f14641afafad7d Parents: 80043e9 Author: WangTaoTheTonic wangtao...@huawei.com Authored: Wed Jun 10 13:30:16 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:30:16 2015 -0700 -- .../java/org/apache/spark/launcher/SparkClassCommandBuilder.java | 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cb871c44/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java -- diff --git a/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java b/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java index d80abf2..de85720 100644 --- a/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java +++ b/launcher/src/main/java/org/apache/spark/launcher/SparkClassCommandBuilder.java @@ -93,6 +93,9 @@ class SparkClassCommandBuilder extends AbstractCommandBuilder { toolsDir.getAbsolutePath(), className); javaOptsKeys.add(SPARK_JAVA_OPTS); +} else { + javaOptsKeys.add(SPARK_JAVA_OPTS); + memKey = SPARK_DRIVER_MEMORY; } ListString cmd = buildJavaCommand(extraClassPath); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8273] Driver hangs up when yarn shutdown in client mode
Repository: spark Updated Branches: refs/heads/branch-1.4 568d1d51d - 2846a357f [SPARK-8273] Driver hangs up when yarn shutdown in client mode In client mode, if yarn was shut down with spark application running, the application will hang up after several retries(default: 30) because the exception throwed by YarnClientImpl could not be caught by upper level, we should exit in case that user can not be aware that. The exception we wanna catch is [here](https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java#L122), and I try to fix it refer to [MR](https://github.com/apache/hadoop/blob/branch-2.7.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java#L320). Author: WangTaoTheTonic wangtao...@huawei.com Closes #6717 from WangTaoTheTonic/SPARK-8273 and squashes the following commits: 28752d6 [WangTaoTheTonic] catch the throwed exception Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2846a357 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2846a357 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2846a357 Branch: refs/heads/branch-1.4 Commit: 2846a357f32bfa129bc37f4d1cbe9e19caaf69c9 Parents: 568d1d5 Author: WangTaoTheTonic wangtao...@huawei.com Authored: Wed Jun 10 13:34:19 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:36:16 2015 -0700 -- yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2846a357/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala -- diff --git a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index f4d4321..9296e79 100644 --- a/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -28,6 +28,7 @@ import scala.collection.JavaConversions._ import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, ListBuffer, Map} import scala.reflect.runtime.universe import scala.util.{Try, Success, Failure} +import scala.util.control.NonFatal import com.google.common.base.Objects import com.google.common.io.Files @@ -771,6 +772,9 @@ private[spark] class Client( case e: ApplicationNotFoundException = logError(sApplication $appId not found.) return (YarnApplicationState.KILLED, FinalApplicationStatus.KILLED) + case NonFatal(e) = +logError(sFailed to contact YARN for application $appId., e) +return (YarnApplicationState.FAILED, FinalApplicationStatus.FAILED) } val state = report.getYarnApplicationState - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5479] [YARN] Handle --py-files correctly in YARN.
Repository: spark Updated Branches: refs/heads/master 8f7308f9c - 38112905b [SPARK-5479] [YARN] Handle --py-files correctly in YARN. The bug description is a little misleading: the actual issue is that .py files are not handled correctly when distributed by YARN. They're added to spark.submit.pyFiles, which, when processed by context.py, explicitly whitelists certain extensions (see PACKAGE_EXTENSIONS), and that does not include .py files. On top of that, archives were not handled at all! They made it to the driver's python path, but never made it to executors, since the mechanism used to propagate their location (spark.submit.pyFiles) only works on the driver side. So, instead, ignore spark.submit.pyFiles and just build PYTHONPATH correctly for both driver and executors. Individual .py files are placed in a subdirectory of the container's local dir in the cluster, which is then added to the python path. Archives are added directly. The change, as a side effect, ends up solving the symptom described in the bug. The issue was not that the files were not being distributed, but that they were never made visible to the python application running under Spark. Also included is a proper unit test for running python on YARN, which broke in several different ways with the previous code. A short walk around of the changes: - SparkSubmit does not try to be smart about how YARN handles python files anymore. It just passes down the configs to the YARN client code. - The YARN client distributes python files and archives differently, placing the files in a subdirectory. - The YARN client now sets PYTHONPATH for the processes it launches; to properly handle different locations, it uses YARN's support for embedding env variables, so to avoid YARN expanding those at the wrong time, SparkConf is now propagated to the AM using a conf file instead of command line options. - Because the Client initialization code is a maze of implicit dependencies, some code needed to be moved around to make sure all needed state was available when the code ran. - The pyspark tests in YarnClusterSuite now actually distribute and try to use both a python file and an archive containing a different python module. Also added a yarn-client tests for completeness. - I cleaned up some of the code around distributing files to YARN, to avoid adding more copied pasted code to handle the new files being distributed. Author: Marcelo Vanzin van...@cloudera.com Closes #6360 from vanzin/SPARK-5479 and squashes the following commits: bcaf7e6 [Marcelo Vanzin] Feedback. c47501f [Marcelo Vanzin] Fix yarn-client mode. 46b1d0c [Marcelo Vanzin] Merge branch 'master' into SPARK-5479 c743778 [Marcelo Vanzin] Only pyspark cares about python archives. c8e5a82 [Marcelo Vanzin] Actually run pyspark in client mode. 705571d [Marcelo Vanzin] Move some code to the YARN module. 1dd4d0c [Marcelo Vanzin] Review feedback. 71ee736 [Marcelo Vanzin] Merge branch 'master' into SPARK-5479 220358b [Marcelo Vanzin] Scalastyle. cdbb990 [Marcelo Vanzin] Merge branch 'master' into SPARK-5479 7fe3cd4 [Marcelo Vanzin] No need to distribute primary file to executors. 09045f1 [Marcelo Vanzin] Style. 943cbf4 [Marcelo Vanzin] [SPARK-5479] [yarn] Handle --py-files correctly in YARN. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/38112905 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/38112905 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/38112905 Branch: refs/heads/master Commit: 38112905bc3b33f2ae75274afba1c30e116f6e46 Parents: 8f7308f Author: Marcelo Vanzin van...@cloudera.com Authored: Wed Jun 10 13:17:29 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:17:29 2015 -0700 -- .../org/apache/spark/deploy/SparkSubmit.scala | 77 ++--- .../spark/deploy/yarn/ApplicationMaster.scala | 20 +- .../yarn/ApplicationMasterArguments.scala | 12 +- .../org/apache/spark/deploy/yarn/Client.scala | 295 --- .../spark/deploy/yarn/ClientArguments.scala | 4 +- .../cluster/YarnClientSchedulerBackend.scala| 5 +- .../apache/spark/deploy/yarn/ClientSuite.scala | 4 +- .../spark/deploy/yarn/YarnClusterSuite.scala| 61 ++-- 8 files changed, 270 insertions(+), 208 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/38112905/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala index a0eae77..b8978e2 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
spark git commit: [SPARK-7527] [CORE] Fix createNullValue to return the correct null values and REPL mode detection
Repository: spark Updated Branches: refs/heads/master 19e30b48f - e90c9d92d [SPARK-7527] [CORE] Fix createNullValue to return the correct null values and REPL mode detection The root cause of SPARK-7527 is `createNullValue` returns an incompatible value `Byte(0)` for `char` and `boolean`. This PR fixes it and corrects the class name of the main class, and also adds an unit test to demonstrate it. Author: zsxwing zsxw...@gmail.com Closes #6735 from zsxwing/SPARK-7527 and squashes the following commits: bbdb271 [zsxwing] Use pattern match in createNullValue b0a0e7e [zsxwing] Remove the noisy in the test output 903e269 [zsxwing] Remove the code for Utils.isInInterpreter == false 5f92dc1 [zsxwing] Fix createNullValue to return the correct null values and REPL mode detection Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e90c9d92 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e90c9d92 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e90c9d92 Branch: refs/heads/master Commit: e90c9d92d9a86e9960c10a5c043f3c02f6c636f9 Parents: 19e30b4 Author: zsxwing zsxw...@gmail.com Authored: Wed Jun 10 13:22:52 2015 -0700 Committer: Andrew Or and...@databricks.com Committed: Wed Jun 10 13:24:02 2015 -0700 -- .../org/apache/spark/util/ClosureCleaner.scala | 40 -- .../scala/org/apache/spark/util/Utils.scala | 9 +--- .../apache/spark/util/ClosureCleanerSuite.scala | 44 3 files changed, 64 insertions(+), 29 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e90c9d92/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala b/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala index 6f2966b..305de4c 100644 --- a/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala +++ b/core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala @@ -109,7 +109,14 @@ private[spark] object ClosureCleaner extends Logging { private def createNullValue(cls: Class[_]): AnyRef = { if (cls.isPrimitive) { - new java.lang.Byte(0: Byte) // Should be convertible to any primitive type + cls match { +case java.lang.Boolean.TYPE = new java.lang.Boolean(false) +case java.lang.Character.TYPE = new java.lang.Character('\0') +case java.lang.Void.TYPE = + // This should not happen because `Foo(void x) {}` does not compile. + throw new IllegalStateException(Unexpected void parameter in constructor) +case _ = new java.lang.Byte(0: Byte) + } } else { null } @@ -319,28 +326,17 @@ private[spark] object ClosureCleaner extends Logging { private def instantiateClass( cls: Class[_], enclosingObject: AnyRef): AnyRef = { -if (!Utils.isInInterpreter) { - // This is a bona fide closure class, whose constructor has no effects - // other than to set its fields, so use its constructor - val cons = cls.getConstructors()(0) - val params = cons.getParameterTypes.map(createNullValue).toArray - if (enclosingObject != null) { -params(0) = enclosingObject // First param is always enclosing object - } - return cons.newInstance(params: _*).asInstanceOf[AnyRef] -} else { - // Use reflection to instantiate object without calling constructor - val rf = sun.reflect.ReflectionFactory.getReflectionFactory() - val parentCtor = classOf[java.lang.Object].getDeclaredConstructor() - val newCtor = rf.newConstructorForSerialization(cls, parentCtor) - val obj = newCtor.newInstance().asInstanceOf[AnyRef] - if (enclosingObject != null) { -val field = cls.getDeclaredField($outer) -field.setAccessible(true) -field.set(obj, enclosingObject) - } - obj +// Use reflection to instantiate object without calling constructor +val rf = sun.reflect.ReflectionFactory.getReflectionFactory() +val parentCtor = classOf[java.lang.Object].getDeclaredConstructor() +val newCtor = rf.newConstructorForSerialization(cls, parentCtor) +val obj = newCtor.newInstance().asInstanceOf[AnyRef] +if (enclosingObject != null) { + val field = cls.getDeclaredField($outer) + field.setAccessible(true) + field.set(obj, enclosingObject) } +obj } } http://git-wip-us.apache.org/repos/asf/spark/blob/e90c9d92/core/src/main/scala/org/apache/spark/util/Utils.scala -- diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala b/core/src/main/scala/org/apache/spark/util/Utils.scala index 153ece6..19157af 100644 ---
spark git commit: [SQL] [MINOR] Fixes a minor Java example error in SQL programming guide
Repository: spark Updated Branches: refs/heads/master 2b550a521 - 8f7308f9c [SQL] [MINOR] Fixes a minor Java example error in SQL programming guide Author: Cheng Lian l...@databricks.com Closes #6749 from liancheng/java-sample-fix and squashes the following commits: 5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8f7308f9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8f7308f9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8f7308f9 Branch: refs/heads/master Commit: 8f7308f9c49805b9486aaae5f60e4481e8ba24e8 Parents: 2b550a5 Author: Cheng Lian l...@databricks.com Authored: Wed Jun 10 11:48:14 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 11:48:14 2015 -0700 -- docs/sql-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/8f7308f9/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 40e33f7..c5ab074 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1479,7 +1479,7 @@ expressed in HiveQL. {% highlight java %} // sc is an existing JavaSparkContext. -HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); +HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc); sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); sqlContext.sql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SQL] [MINOR] Fixes a minor Java example error in SQL programming guide
Repository: spark Updated Branches: refs/heads/branch-1.4 a0a7f2f92 - 7b88e6a1e [SQL] [MINOR] Fixes a minor Java example error in SQL programming guide Author: Cheng Lian l...@databricks.com Closes #6749 from liancheng/java-sample-fix and squashes the following commits: 5b44585 [Cheng Lian] Fixes a minor Java example error in SQL programming guide (cherry picked from commit 8f7308f9c49805b9486aaae5f60e4481e8ba24e8) Signed-off-by: Reynold Xin r...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b88e6a1 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b88e6a1 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b88e6a1 Branch: refs/heads/branch-1.4 Commit: 7b88e6a1e3b8f79cb41842bc21893760dc4b74e6 Parents: a0a7f2f Author: Cheng Lian l...@databricks.com Authored: Wed Jun 10 11:48:14 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 11:48:19 2015 -0700 -- docs/sql-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7b88e6a1/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index cde5830..17f2954 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -1475,7 +1475,7 @@ expressed in HiveQL. {% highlight java %} // sc is an existing JavaSparkContext. -HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); +HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc); sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING)); sqlContext.sql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-7996] Deprecate the developer api SparkEnv.actorSystem
Repository: spark Updated Branches: refs/heads/master c6ba7cca3 - 2b550a521 [SPARK-7996] Deprecate the developer api SparkEnv.actorSystem Changed ```SparkEnv.actorSystem``` to be a function such that we can use the deprecated flag with it and added a deprecated message. Author: Ilya Ganelin ilya.gane...@capitalone.com Closes #6731 from ilganeli/SPARK-7996 and squashes the following commits: be43817 [Ilya Ganelin] Restored to val 9ed89e7 [Ilya Ganelin] Added a version info for deprecation 9610b08 [Ilya Ganelin] Converted actorSystem to function and added deprecated flag Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2b550a52 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2b550a52 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2b550a52 Branch: refs/heads/master Commit: 2b550a521e45e1dbca2cca40ddd94e20c013831c Parents: c6ba7cc Author: Ilya Ganelin ilya.gane...@capitalone.com Authored: Wed Jun 10 11:21:12 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 11:21:12 2015 -0700 -- core/src/main/scala/org/apache/spark/SparkEnv.scala | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2b550a52/core/src/main/scala/org/apache/spark/SparkEnv.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala b/core/src/main/scala/org/apache/spark/SparkEnv.scala index a185954..b066557 100644 --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala @@ -20,6 +20,8 @@ package org.apache.spark import java.io.File import java.net.Socket +import akka.actor.ActorSystem + import scala.collection.JavaConversions._ import scala.collection.mutable import scala.util.Properties @@ -75,7 +77,8 @@ class SparkEnv ( val conf: SparkConf) extends Logging { // TODO Remove actorSystem - val actorSystem = rpcEnv.asInstanceOf[AkkaRpcEnv].actorSystem + @deprecated(Actor system is no longer supported as of 1.4) + val actorSystem: ActorSystem = rpcEnv.asInstanceOf[AkkaRpcEnv].actorSystem private[spark] var isStopped = false private val pythonWorkers = mutable.HashMap[(String, Map[String, String]), PythonWorkerFactory]() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8200] [MLLIB] Check for empty RDDs in StreamingLinearAlgorithm
Repository: spark Updated Branches: refs/heads/branch-1.4 2846a357f - 59fc3f197 [SPARK-8200] [MLLIB] Check for empty RDDs in StreamingLinearAlgorithm Test cases for both StreamingLinearRegression and StreamingLogisticRegression, and code fix. Edit: This contribution is my original work and I license the work to the project under the project's open source license. Author: Paavo ppark...@gmail.com Closes #6713 from pparkkin/streamingmodel-empty-rdd and squashes the following commits: ff5cd78 [Paavo] Update strings to use interpolation. db234cf [Paavo] Use !rdd.isEmpty. 54ad89e [Paavo] Test case for empty stream. 393e36f [Paavo] Ignore empty RDDs. 0bfc365 [Paavo] Test case for empty stream. (cherry picked from commit b928f543845ddd39e914a0e8f0b0205fd86100c5) Signed-off-by: Sean Owen so...@cloudera.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59fc3f19 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59fc3f19 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59fc3f19 Branch: refs/heads/branch-1.4 Commit: 59fc3f197247c6c8c40ea7479573af023c89d718 Parents: 2846a35 Author: Paavo ppark...@gmail.com Authored: Wed Jun 10 23:17:42 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Wed Jun 10 23:26:54 2015 +0100 -- .../regression/StreamingLinearAlgorithm.scala | 28 +++- .../StreamingLogisticRegressionSuite.scala | 17 .../StreamingLinearRegressionSuite.scala| 18 + 3 files changed, 50 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/59fc3f19/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala index cea8f3f..2dd8aca 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala @@ -83,21 +83,23 @@ abstract class StreamingLinearAlgorithm[ throw new IllegalArgumentException(Model must be initialized before starting training.) } data.foreachRDD { (rdd, time) = - val initialWeights = -model match { - case Some(m) = -m.weights - case None = -val numFeatures = rdd.first().features.size -Vectors.dense(numFeatures) + if (!rdd.isEmpty) { +val initialWeights = + model match { +case Some(m) = + m.weights +case None = + val numFeatures = rdd.first().features.size + Vectors.dense(numFeatures) + } +model = Some(algorithm.run(rdd, initialWeights)) +logInfo(sModel updated at time ${time.toString}) +val display = model.get.weights.size match { + case x if x 100 = model.get.weights.toArray.take(100).mkString([, ,, ...) + case _ = model.get.weights.toArray.mkString([, ,, ]) } - model = Some(algorithm.run(rdd, initialWeights)) - logInfo(Model updated at time %s.format(time.toString)) - val display = model.get.weights.size match { -case x if x 100 = model.get.weights.toArray.take(100).mkString([, ,, ...) -case _ = model.get.weights.toArray.mkString([, ,, ]) +logInfo(sCurrent model: weights, ${display}) } - logInfo(Current model: weights, %s.format (display)) } } http://git-wip-us.apache.org/repos/asf/spark/blob/59fc3f19/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala index e98b61e..fd65329 100644 --- a/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala @@ -158,4 +158,21 @@ class StreamingLogisticRegressionSuite extends SparkFunSuite with TestSuiteBase val error = output.map(batch = batch.map(p = math.abs(p._1 - p._2)).sum / nPoints).toList assert(error.head 0.8 error.last 0.2) } + + // Test empty RDDs in a stream + test(handling empty RDDs in a stream) { +val model = new StreamingLogisticRegressionWithSGD() + .setInitialWeights(Vectors.dense(-0.1)) + .setStepSize(0.01) +
spark git commit: [SPARK-2774] Set preferred locations for reduce tasks
Repository: spark Updated Branches: refs/heads/master 5014d0ed7 - 96a7c888d [SPARK-2774] Set preferred locations for reduce tasks Set preferred locations for reduce tasks. The basic design is that we maintain a map from reducerId to a list of (sizes, locations) for each shuffle. We then set the preferred locations to be any machines that have 20% of more of the output that needs to be read by the reduce task. This will result in at most 5 preferred locations for each reduce task. Selecting the preferred locations involves O(# map tasks * # reduce tasks) computation, so we restrict this feature to cases where we have fewer than 1000 map tasks and 1000 reduce tasks. Author: Shivaram Venkataraman shiva...@cs.berkeley.edu Closes #6652 from shivaram/reduce-locations and squashes the following commits: 492e25e [Shivaram Venkataraman] Remove unused import 2ef2d39 [Shivaram Venkataraman] Address code review comments 897a914 [Shivaram Venkataraman] Remove unused hash map f5be578 [Shivaram Venkataraman] Use fraction of map outputs to determine locations Also removes caching of preferred locations to make the API cleaner 68bc29e [Shivaram Venkataraman] Fix line length 1090b58 [Shivaram Venkataraman] Change flag name 77ce7d8 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations e5d56bd [Shivaram Venkataraman] Add flag to turn off locality for shuffle deps 6cfae98 [Shivaram Venkataraman] Filter out zero blocks, rename variables 9d5831a [Shivaram Venkataraman] Address some more comments 8e31266 [Shivaram Venkataraman] Fix style 0df3180 [Shivaram Venkataraman] Address code review comments e7d5449 [Shivaram Venkataraman] Fix merge issues ad7cb53 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations df14cee [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 5093aea [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 0171d3c [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations bc4dfd6 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into reduce-locations 774751b [Shivaram Venkataraman] Fix bug introduced by line length adjustment 34d0283 [Shivaram Venkataraman] Fix style issues 3b464b7 [Shivaram Venkataraman] Set preferred locations for reduce tasks This is another attempt at #1697 addressing some of the earlier concerns. This adds a couple of thresholds based on number map and reduce tasks beyond which we don't use preferred locations for reduce tasks. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96a7c888 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96a7c888 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96a7c888 Branch: refs/heads/master Commit: 96a7c888d806adfdb2c722025a1079ed7eaa2052 Parents: 5014d0e Author: Shivaram Venkataraman shiva...@cs.berkeley.edu Authored: Wed Jun 10 15:03:40 2015 -0700 Committer: Kay Ousterhout kayousterh...@gmail.com Committed: Wed Jun 10 15:04:38 2015 -0700 -- .../org/apache/spark/MapOutputTracker.scala | 49 - .../apache/spark/scheduler/DAGScheduler.scala | 37 +- .../apache/spark/MapOutputTrackerSuite.scala| 35 + .../spark/scheduler/DAGSchedulerSuite.scala | 76 +++- 4 files changed, 177 insertions(+), 20 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/96a7c888/core/src/main/scala/org/apache/spark/MapOutputTracker.scala -- diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala index 0184228..862ffe8 100644 --- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala +++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala @@ -21,7 +21,7 @@ import java.io._ import java.util.concurrent.ConcurrentHashMap import java.util.zip.{GZIPInputStream, GZIPOutputStream} -import scala.collection.mutable.{HashSet, Map} +import scala.collection.mutable.{HashMap, HashSet, Map} import scala.collection.JavaConversions._ import scala.reflect.ClassTag @@ -284,6 +284,53 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf) cachedSerializedStatuses.contains(shuffleId) || mapStatuses.contains(shuffleId) } + /** + * Return a list of locations that each have fraction of map output greater than the specified + * threshold. + * + * @param shuffleId id of the shuffle + * @param reducerId id of the reduce task + * @param numReducers total number of reducers in the shuffle + * @param fractionThreshold
spark git commit: [SPARK-8200] [MLLIB] Check for empty RDDs in StreamingLinearAlgorithm
Repository: spark Updated Branches: refs/heads/master 96a7c888d - b928f5438 [SPARK-8200] [MLLIB] Check for empty RDDs in StreamingLinearAlgorithm Test cases for both StreamingLinearRegression and StreamingLogisticRegression, and code fix. Edit: This contribution is my original work and I license the work to the project under the project's open source license. Author: Paavo ppark...@gmail.com Closes #6713 from pparkkin/streamingmodel-empty-rdd and squashes the following commits: ff5cd78 [Paavo] Update strings to use interpolation. db234cf [Paavo] Use !rdd.isEmpty. 54ad89e [Paavo] Test case for empty stream. 393e36f [Paavo] Ignore empty RDDs. 0bfc365 [Paavo] Test case for empty stream. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b928f543 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b928f543 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b928f543 Branch: refs/heads/master Commit: b928f543845ddd39e914a0e8f0b0205fd86100c5 Parents: 96a7c88 Author: Paavo ppark...@gmail.com Authored: Wed Jun 10 23:17:42 2015 +0100 Committer: Sean Owen so...@cloudera.com Committed: Wed Jun 10 23:17:42 2015 +0100 -- .../regression/StreamingLinearAlgorithm.scala | 14 -- .../StreamingLogisticRegressionSuite.scala| 17 + .../StreamingLinearRegressionSuite.scala | 18 ++ 3 files changed, 43 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b928f543/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala b/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala index aee51bf..141052b 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/regression/StreamingLinearAlgorithm.scala @@ -83,13 +83,15 @@ abstract class StreamingLinearAlgorithm[ throw new IllegalArgumentException(Model must be initialized before starting training.) } data.foreachRDD { (rdd, time) = - model = Some(algorithm.run(rdd, model.get.weights)) - logInfo(Model updated at time %s.format(time.toString)) - val display = model.get.weights.size match { -case x if x 100 = model.get.weights.toArray.take(100).mkString([, ,, ...) -case _ = model.get.weights.toArray.mkString([, ,, ]) + if (!rdd.isEmpty) { +model = Some(algorithm.run(rdd, model.get.weights)) +logInfo(sModel updated at time ${time.toString}) +val display = model.get.weights.size match { + case x if x 100 = model.get.weights.toArray.take(100).mkString([, ,, ...) + case _ = model.get.weights.toArray.mkString([, ,, ]) +} +logInfo(sCurrent model: weights, ${display}) } - logInfo(Current model: weights, %s.format (display)) } } http://git-wip-us.apache.org/repos/asf/spark/blob/b928f543/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala index e98b61e..fd65329 100644 --- a/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/mllib/classification/StreamingLogisticRegressionSuite.scala @@ -158,4 +158,21 @@ class StreamingLogisticRegressionSuite extends SparkFunSuite with TestSuiteBase val error = output.map(batch = batch.map(p = math.abs(p._1 - p._2)).sum / nPoints).toList assert(error.head 0.8 error.last 0.2) } + + // Test empty RDDs in a stream + test(handling empty RDDs in a stream) { +val model = new StreamingLogisticRegressionWithSGD() + .setInitialWeights(Vectors.dense(-0.1)) + .setStepSize(0.01) + .setNumIterations(10) +val numBatches = 10 +val emptyInput = Seq.empty[Seq[LabeledPoint]] +val ssc = setupStreams(emptyInput, + (inputDStream: DStream[LabeledPoint]) = { +model.trainOn(inputDStream) +model.predictOnValues(inputDStream.map(x = (x.label, x.features))) + } +) +val output: Seq[Seq[(Double, Double)]] = runStreams(ssc, numBatches, numBatches) + } } http://git-wip-us.apache.org/repos/asf/spark/blob/b928f543/mllib/src/test/scala/org/apache/spark/mllib/regression/StreamingLinearRegressionSuite.scala
spark git commit: [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first
Repository: spark Updated Branches: refs/heads/branch-1.4 59fc3f197 - 5c05b5c0d [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first case cs CombineSum(expr) = val calcType = expr.dataType expr.dataType match { case DecimalType.Fixed(_, _) = DecimalType.Unlimited case _ = expr.dataType } calcType is always expr.dataType. credits are all belong to IntelliJ Author: navis.ryu na...@apache.org Closes #6736 from navis/SPARK-8285 and squashes the following commits: 20382c1 [navis.ryu] [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first (cherry picked from commit 6a47114bc297f0bce874e425feb1c24a5c26cef0) Signed-off-by: Reynold Xin r...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5c05b5c0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5c05b5c0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5c05b5c0 Branch: refs/heads/branch-1.4 Commit: 5c05b5c0d25fc902bf95ed7b93ad7b5775631150 Parents: 59fc3f1 Author: navis.ryu na...@apache.org Authored: Wed Jun 10 18:19:12 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 18:19:24 2015 -0700 -- .../org/apache/spark/sql/execution/GeneratedAggregate.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5c05b5c0/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala index 3e27c1b..af37917 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala @@ -118,7 +118,7 @@ case class GeneratedAggregate( AggregateEvaluation(currentSum :: Nil, initialValue :: Nil, updateFunction :: Nil, result) case cs @ CombineSum(expr) = -val calcType = expr.dataType +val calcType = expr.dataType match { case DecimalType.Fixed(_, _) = DecimalType.Unlimited @@ -129,7 +129,7 @@ case class GeneratedAggregate( val currentSum = AttributeReference(currentSum, calcType, nullable = true)() val initialValue = Literal.create(null, calcType) -// Coalasce avoids double calculation... +// Coalesce avoids double calculation... // but really, common sub expression elimination would be better val zero = Cast(Literal(0), calcType) // If we're evaluating UnscaledValue(x), we can do Count on x directly, since its - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first
Repository: spark Updated Branches: refs/heads/master 37719e0cd - 6a47114bc [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first case cs CombineSum(expr) = val calcType = expr.dataType expr.dataType match { case DecimalType.Fixed(_, _) = DecimalType.Unlimited case _ = expr.dataType } calcType is always expr.dataType. credits are all belong to IntelliJ Author: navis.ryu na...@apache.org Closes #6736 from navis/SPARK-8285 and squashes the following commits: 20382c1 [navis.ryu] [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a47114b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a47114b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a47114b Branch: refs/heads/master Commit: 6a47114bc297f0bce874e425feb1c24a5c26cef0 Parents: 37719e0 Author: navis.ryu na...@apache.org Authored: Wed Jun 10 18:19:12 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 18:19:12 2015 -0700 -- .../org/apache/spark/sql/execution/GeneratedAggregate.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6a47114b/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala index 3e27c1b..af37917 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/GeneratedAggregate.scala @@ -118,7 +118,7 @@ case class GeneratedAggregate( AggregateEvaluation(currentSum :: Nil, initialValue :: Nil, updateFunction :: Nil, result) case cs @ CombineSum(expr) = -val calcType = expr.dataType +val calcType = expr.dataType match { case DecimalType.Fixed(_, _) = DecimalType.Unlimited @@ -129,7 +129,7 @@ case class GeneratedAggregate( val currentSum = AttributeReference(currentSum, calcType, nullable = true)() val initialValue = Literal.create(null, calcType) -// Coalasce avoids double calculation... +// Coalesce avoids double calculation... // but really, common sub expression elimination would be better val zero = Cast(Literal(0), calcType) // If we're evaluating UnscaledValue(x), we can do Count on x directly, since its - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r9331 - /dev/spark/spark-1.4.0-rc4/ /release/spark/spark-1.4.0/
Author: pwendell Date: Wed Jun 10 23:18:01 2015 New Revision: 9331 Log: Adding Spark release 1.4.0 Added: release/spark/spark-1.4.0/ - copied from r9330, dev/spark/spark-1.4.0-rc4/ Removed: dev/spark/spark-1.4.0-rc4/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8164] transformExpressions should support nested expression sequence
Repository: spark Updated Branches: refs/heads/master 6a47114bc - 4e42842e8 [SPARK-8164] transformExpressions should support nested expression sequence Currently we only support `Seq[Expression]`, we should handle cases like `Seq[Seq[Expression]]` so that we can remove the unnecessary `GroupExpression`. Author: Wenchen Fan cloud0...@outlook.com Closes #6706 from cloud-fan/clean and squashes the following commits: 60a1193 [Wenchen Fan] support nested expression sequence and remove GroupExpression Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4e42842e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4e42842e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4e42842e Branch: refs/heads/master Commit: 4e42842e82e058d54329bd66185d8a7e77ab335a Parents: 6a47114 Author: Wenchen Fan cloud0...@outlook.com Authored: Wed Jun 10 18:22:47 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 18:22:47 2015 -0700 -- .../spark/sql/catalyst/analysis/Analyzer.scala | 6 +++--- .../sql/catalyst/expressions/Expression.scala | 12 --- .../spark/sql/catalyst/plans/QueryPlan.scala| 22 +--- .../catalyst/plans/logical/basicOperators.scala | 2 +- .../sql/catalyst/trees/TreeNodeSuite.scala | 14 + .../org/apache/spark/sql/execution/Expand.scala | 4 ++-- 6 files changed, 30 insertions(+), 30 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4e42842e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index c4f12cf..cbd8def 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -172,8 +172,8 @@ class Analyzer( * expressions which equal GroupBy expressions with Literal(null), if those expressions * are not set for this grouping set (according to the bit mask). */ -private[this] def expand(g: GroupingSets): Seq[GroupExpression] = { - val result = new scala.collection.mutable.ArrayBuffer[GroupExpression] +private[this] def expand(g: GroupingSets): Seq[Seq[Expression]] = { + val result = new scala.collection.mutable.ArrayBuffer[Seq[Expression]] g.bitmasks.foreach { bitmask = // get the non selected grouping attributes according to the bit mask @@ -194,7 +194,7 @@ class Analyzer( Literal.create(bitmask, IntegerType) }) -result += GroupExpression(substitution) +result += substitution } result.toSeq http://git-wip-us.apache.org/repos/asf/spark/blob/4e42842e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala index a05794f..63dd5f9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala @@ -239,18 +239,6 @@ abstract class UnaryExpression extends Expression with trees.UnaryNode[Expressio } } -// TODO Semantically we probably not need GroupExpression -// All we need is holding the Seq[Expression], and ONLY used in doing the -// expressions transformation correctly. Probably will be removed since it's -// not like a real expressions. -case class GroupExpression(children: Seq[Expression]) extends Expression { - self: Product = - override def eval(input: Row): Any = throw new UnsupportedOperationException - override def nullable: Boolean = false - override def foldable: Boolean = false - override def dataType: DataType = throw new UnsupportedOperationException -} - /** * Expressions that require a specific `DataType` as input should implement this trait * so that the proper type conversions can be performed in the analyzer. http://git-wip-us.apache.org/repos/asf/spark/blob/4e42842e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala index eff5c61..2f545bb 100644 ---
svn commit: r9330 - /dev/spark/spark-1.4.0-rc4/
Author: pwendell Date: Wed Jun 10 23:16:15 2015 New Revision: 9330 Log: Adding Spark 1.4.0 RC4 Added: dev/spark/spark-1.4.0-rc4/ dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.3.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.3.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.3.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.3.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.4.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.4.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.4.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.4.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.6.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.6.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.6.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop2.6.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-without-hadoop.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-without-hadoop.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-without-hadoop.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-without-hadoop.tgz.sha dev/spark/spark-1.4.0-rc4/spark-1.4.0.tgz (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0.tgz.asc (with props) dev/spark/spark-1.4.0-rc4/spark-1.4.0.tgz.md5 dev/spark/spark-1.4.0-rc4/spark-1.4.0.tgz.sha Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz == Binary file - no diff available. Propchange: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz -- svn:mime-type = application/x-gzip Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.asc == Binary file - no diff available. Propchange: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.asc -- svn:mime-type = application/pgp-signature Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.md5 == --- dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.md5 (added) +++ dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.md5 Wed Jun 10 23:16:15 2015 @@ -0,0 +1 @@ +spark-1.4.0-bin-cdh4.tgz: 25 98 11 6C 28 0B 27 87 FE B1 47 CB C9 77 91 90 Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.sha == --- dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.sha (added) +++ dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-cdh4.tgz.sha Wed Jun 10 23:16:15 2015 @@ -0,0 +1,3 @@ +spark-1.4.0-bin-cdh4.tgz: 94C53911 6FE567AE 79408A91 8637FD9C 0909A737 E7C8C359 + 89541574 993CD003 691F94EF 104B6530 B2BCC23C 536F088A + 6B3729A8 C5C97DA4 1FED6041 DF9D9918 Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz == Binary file - no diff available. Propchange: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz -- svn:mime-type = application/x-gzip Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz.asc == Binary file - no diff available. Propchange: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz.asc -- svn:mime-type = application/pgp-signature Added: dev/spark/spark-1.4.0-rc4/spark-1.4.0-bin-hadoop1-scala2.11.tgz.md5 == ---
spark git commit: [SPARK-8189] [SQL] use Long for TimestampType in SQL
Repository: spark Updated Branches: refs/heads/master b928f5438 - 37719e0cd [SPARK-8189] [SQL] use Long for TimestampType in SQL This PR change to use Long as internal type for TimestampType for efficiency, which means it will the precision below 100ns. Author: Davies Liu dav...@databricks.com Closes #6733 from davies/timestamp and squashes the following commits: d9565fa [Davies Liu] remove print 65cf2f1 [Davies Liu] fix Timestamp in SparkR 86fecfb [Davies Liu] disable two timestamp tests 8f77ee0 [Davies Liu] fix scala style 246ee74 [Davies Liu] address comments 309d2e1 [Davies Liu] use Long for TimestampType in SQL Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37719e0c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37719e0c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37719e0c Branch: refs/heads/master Commit: 37719e0cd0b00cc5ffee0ebe1652d465a574db0f Parents: b928f54 Author: Davies Liu dav...@databricks.com Authored: Wed Jun 10 16:55:39 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 16:55:39 2015 -0700 -- .../scala/org/apache/spark/api/r/SerDe.scala| 17 -- python/pyspark/sql/types.py | 11 .../scala/org/apache/spark/sql/BaseRow.java | 6 ++ .../main/scala/org/apache/spark/sql/Row.scala | 8 ++- .../sql/catalyst/CatalystTypeConverters.scala | 13 +++- .../spark/sql/catalyst/expressions/Cast.scala | 62 +--- .../expressions/SpecificMutableRow.scala| 1 + .../expressions/codegen/CodeGenerator.scala | 4 +- .../codegen/GenerateProjection.scala| 10 +++- .../sql/catalyst/expressions/literals.scala | 15 +++-- .../sql/catalyst/expressions/predicates.scala | 6 +- .../spark/sql/catalyst/util/DateUtils.scala | 44 +++--- .../apache/spark/sql/types/TimestampType.scala | 10 +--- .../sql/catalyst/expressions/CastSuite.scala| 11 ++-- .../sql/catalyst/util/DateUtilsSuite.scala | 40 + .../apache/spark/sql/types/DataTypeSuite.scala | 2 +- .../apache/spark/sql/columnar/ColumnStats.scala | 21 +-- .../apache/spark/sql/columnar/ColumnType.scala | 19 +++--- .../sql/execution/SparkSqlSerializer2.scala | 17 ++ .../spark/sql/execution/debug/package.scala | 2 + .../apache/spark/sql/execution/pythonUdfs.scala | 7 ++- .../org/apache/spark/sql/jdbc/JDBCRDD.scala | 10 +++- .../apache/spark/sql/json/JacksonParser.scala | 5 +- .../org/apache/spark/sql/json/JsonRDD.scala | 10 ++-- .../spark/sql/parquet/ParquetConverter.scala| 9 +-- .../spark/sql/parquet/ParquetTableSupport.scala | 10 ++-- .../org/apache/spark/sql/CachedTableSuite.scala | 2 +- .../spark/sql/columnar/ColumnStatsSuite.scala | 2 +- .../spark/sql/columnar/ColumnTypeSuite.scala| 11 ++-- .../spark/sql/columnar/ColumnarTestUtils.scala | 9 +-- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 2 +- .../org/apache/spark/sql/json/JsonSuite.scala | 14 +++-- .../hive/execution/HiveCompatibilitySuite.scala | 8 ++- .../apache/spark/sql/hive/HiveInspectors.scala | 20 --- .../org/apache/spark/sql/hive/TableReader.scala | 4 +- ...p cast #5-0-dbd7bcd167d322d6617b884c02c7f247 | 2 +- 36 files changed, 272 insertions(+), 172 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/37719e0c/core/src/main/scala/org/apache/spark/api/r/SerDe.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala index f8e3f1a..56adc85 100644 --- a/core/src/main/scala/org/apache/spark/api/r/SerDe.scala +++ b/core/src/main/scala/org/apache/spark/api/r/SerDe.scala @@ -18,7 +18,7 @@ package org.apache.spark.api.r import java.io.{DataInputStream, DataOutputStream} -import java.sql.{Date, Time} +import java.sql.{Timestamp, Date, Time} import scala.collection.JavaConversions._ @@ -107,9 +107,12 @@ private[spark] object SerDe { Date.valueOf(readString(in)) } - def readTime(in: DataInputStream): Time = { -val t = in.readDouble() -new Time((t * 1000L).toLong) + def readTime(in: DataInputStream): Timestamp = { +val seconds = in.readDouble() +val sec = Math.floor(seconds).toLong +val t = new Timestamp(sec * 1000L) +t.setNanos(((seconds - sec) * 1e9).toInt) +t } def readBytesArr(in: DataInputStream): Array[Array[Byte]] = { @@ -227,6 +230,9 @@ private[spark] object SerDe { case java.sql.Time = writeType(dos, time) writeTime(dos, value.asInstanceOf[Time]) +case java.sql.Timestamp = + writeType(dos, time) + writeTime(dos, value.asInstanceOf[Timestamp]) case [B =
spark git commit: [HOTFIX] Fixing errors in name mappings
Repository: spark Updated Branches: refs/heads/master a777eb04b - e84545fa7 [HOTFIX] Fixing errors in name mappings Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e84545fa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e84545fa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e84545fa Branch: refs/heads/master Commit: e84545fa771dde90de5675a9c551fe287af6f7fb Parents: a777eb0 Author: Patrick Wendell patr...@databricks.com Authored: Wed Jun 10 22:56:36 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Wed Jun 10 22:56:36 2015 -0700 -- dev/create-release/known_translations | 4 1 file changed, 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e84545fa/dev/create-release/known_translations -- diff --git a/dev/create-release/known_translations b/dev/create-release/known_translations index bbd4330..5f2671a 100644 --- a/dev/create-release/known_translations +++ b/dev/create-release/known_translations @@ -97,8 +97,6 @@ FavioVazquez - Favio Vazquez JaysonSunshine - Jayson Sunshine Liuchang0812 - Liu Chang Sephiroth-Lin - Sephiroth Lin -baishuo - Cheng Lian -daisukebe - Shixiong Zhu dobashim - Masaru Dobashi ehnalis - Zoltan Zvara emres - Emre Sevinc @@ -122,11 +120,9 @@ nyaapa - Arsenii Krasikov phatak-dev - Madhukara Phatak prabeesh - Prabeesh K rakeshchalasani - Rakesh Chalasani -raschild - Marcelo Vanzin rekhajoshm - Rekha Joshi sisihj - June He szheng79 - Shuai Zheng -ted-yu - Andrew Or texasmichelle - Michelle Casbon vinodkc - Vinod KC yongtang - Yong Tang - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-8248][SQL] string function: length
Repository: spark Updated Branches: refs/heads/master 4e42842e8 - 9fe3adcce [SPARK-8248][SQL] string function: length Author: Cheng Hao hao.ch...@intel.com Closes #6724 from chenghao-intel/length and squashes the following commits: aaa3c31 [Cheng Hao] revert the additional change 97148a9 [Cheng Hao] remove the codegen testing temporally ae08003 [Cheng Hao] update the comments 1eb1fd1 [Cheng Hao] simplify the code as commented 3e92d32 [Cheng Hao] use the selectExpr in unit test intead of SQLQuery 3c729aa [Cheng Hao] fix bug for constant null value in codegen 3641f06 [Cheng Hao] keep the length() method for registered function 8e30171 [Cheng Hao] update the code as comment db604ae [Cheng Hao] Add code gen support 548d2ef [Cheng Hao] register the length() 09a0738 [Cheng Hao] add length support Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9fe3adcc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9fe3adcc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9fe3adcc Branch: refs/heads/master Commit: 9fe3adccef687c92ff1ac17d946af089c8e28d66 Parents: 4e42842 Author: Cheng Hao hao.ch...@intel.com Authored: Wed Jun 10 19:55:10 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 19:55:10 2015 -0700 -- .../catalyst/analysis/FunctionRegistry.scala| 13 +++- .../sql/catalyst/expressions/Expression.scala | 3 +++ .../catalyst/expressions/stringOperations.scala | 21 .../expressions/StringFunctionsSuite.scala | 12 +++ .../scala/org/apache/spark/sql/functions.scala | 18 + .../spark/sql/DataFrameFunctionsSuite.scala | 20 +++ 6 files changed, 82 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9fe3adcc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index ba89a5c..39875d7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -89,14 +89,10 @@ object FunctionRegistry { expression[CreateArray](array), expression[Coalesce](coalesce), expression[Explode](explode), -expression[Lower](lower), -expression[Substring](substr), -expression[Substring](substring), expression[Rand](rand), expression[Randn](randn), expression[CreateStruct](struct), expression[Sqrt](sqrt), -expression[Upper](upper), // Math functions expression[Acos](acos), @@ -132,7 +128,14 @@ object FunctionRegistry { expression[Last](last), expression[Max](max), expression[Min](min), -expression[Sum](sum) +expression[Sum](sum), + +// string functions +expression[Lower](lower), +expression[StringLength](length), +expression[Substring](substr), +expression[Substring](substring), +expression[Upper](upper) ) val builtin: FunctionRegistry = { http://git-wip-us.apache.org/repos/asf/spark/blob/9fe3adcc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala index 63dd5f9..8c1e4d7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala @@ -212,6 +212,9 @@ abstract class LeafExpression extends Expression with trees.LeafNode[Expression] abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] { self: Product = + override def foldable: Boolean = child.foldable + override def nullable: Boolean = child.nullable + /** * Called by unary expressions to generate a code block that returns null if its parent returns * null, and if not not null, use `f` to generate the expression. http://git-wip-us.apache.org/repos/asf/spark/blob/9fe3adcc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
spark git commit: [SPARK-8217] [SQL] math function log2
Repository: spark Updated Branches: refs/heads/master 9fe3adcce - 2758ff0a9 [SPARK-8217] [SQL] math function log2 Author: Daoyuan Wang daoyuan.w...@intel.com This patch had conflicts when merged, resolved by Committer: Reynold Xin r...@databricks.com Closes #6718 from adrian-wang/udflog2 and squashes the following commits: 3909f48 [Daoyuan Wang] math function: log2 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2758ff0a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2758ff0a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2758ff0a Branch: refs/heads/master Commit: 2758ff0a96f03a61e10999b2462acf7a13236b7c Parents: 9fe3adc Author: Daoyuan Wang daoyuan.w...@intel.com Authored: Wed Jun 10 20:22:32 2015 -0700 Committer: Reynold Xin r...@databricks.com Committed: Wed Jun 10 20:22:32 2015 -0700 -- .../catalyst/analysis/FunctionRegistry.scala| 1 + .../spark/sql/catalyst/expressions/math.scala | 17 + .../expressions/MathFunctionsSuite.scala| 6 ++ .../scala/org/apache/spark/sql/functions.scala | 20 ++-- .../spark/sql/DataFrameFunctionsSuite.scala | 12 5 files changed, 54 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2758ff0a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index 39875d7..a7816e3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -111,6 +111,7 @@ object FunctionRegistry { expression[Log10](log10), expression[Log1p](log1p), expression[Pi](pi), +expression[Log2](log2), expression[Pow](pow), expression[Rint](rint), expression[Signum](signum), http://git-wip-us.apache.org/repos/asf/spark/blob/2758ff0a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala index e1d8c9a..97e960b 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala @@ -161,6 +161,23 @@ case class Floor(child: Expression) extends UnaryMathExpression(math.floor, FLO case class Log(child: Expression) extends UnaryMathExpression(math.log, LOG) +case class Log2(child: Expression) + extends UnaryMathExpression((x: Double) = math.log(x) / math.log(2), LOG2) { + override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = { +val eval = child.gen(ctx) +eval.code + s + boolean ${ev.isNull} = ${eval.isNull}; + ${ctx.javaType(dataType)} ${ev.primitive} = ${ctx.defaultValue(dataType)}; + if (!${ev.isNull}) { +${ev.primitive} = java.lang.Math.log(${eval.primitive}) / java.lang.Math.log(2); +if (Double.valueOf(${ev.primitive}).isNaN()) { + ${ev.isNull} = true; +} + } + + } +} + case class Log10(child: Expression) extends UnaryMathExpression(math.log10, LOG10) case class Log1p(child: Expression) extends UnaryMathExpression(math.log1p, LOG1P) http://git-wip-us.apache.org/repos/asf/spark/blob/2758ff0a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala -- diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala index 1fe6905..864c954 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala @@ -185,6 +185,12 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { testUnary(Log1p, math.log1p, (-10 to -2).map(_ * 1.0), expectNull = true) } + test(log2) { +def f: (Double) = Double = (x: Double) = math.log(x) / math.log(2) +testUnary(Log2, f, (0 to 20).map(_ * 0.1)) +testUnary(Log2, f, (-5 to -1).map(_ * 1.0), expectNull = true) + } + test(pow) { testBinary(Pow,
spark git commit: [HOTFIX] Adding more contributor name bindings
Repository: spark Updated Branches: refs/heads/master 2758ff0a9 - a777eb04b [HOTFIX] Adding more contributor name bindings Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a777eb04 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a777eb04 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a777eb04 Branch: refs/heads/master Commit: a777eb04bf981312b640326607158f78dd4163cd Parents: 2758ff0 Author: Patrick Wendell patr...@databricks.com Authored: Wed Jun 10 21:13:47 2015 -0700 Committer: Patrick Wendell patr...@databricks.com Committed: Wed Jun 10 21:14:03 2015 -0700 -- dev/create-release/known_translations | 42 ++ 1 file changed, 42 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a777eb04/dev/create-release/known_translations -- diff --git a/dev/create-release/known_translations b/dev/create-release/known_translations index 0a599b5..bbd4330 100644 --- a/dev/create-release/known_translations +++ b/dev/create-release/known_translations @@ -91,3 +91,45 @@ zapletal-martin - Martin Zapletal zuxqoj - Shekhar Bansal mingyukim - Mingyu Kim sigmoidanalytics - Mayur Rustagi +AiHe - Ai He +BenFradet - Ben Fradet +FavioVazquez - Favio Vazquez +JaysonSunshine - Jayson Sunshine +Liuchang0812 - Liu Chang +Sephiroth-Lin - Sephiroth Lin +baishuo - Cheng Lian +daisukebe - Shixiong Zhu +dobashim - Masaru Dobashi +ehnalis - Zoltan Zvara +emres - Emre Sevinc +gchen - Guancheng Chen +haiyangsea - Haiyang Sea +hlin09 - Hao Lin +hqzizania - Qian Huang +jeanlyn - Jean Lyn +jerluc - Jeremy A. Lucas +jrabary - Jaonary Rabarisoa +judynash - Judy Nash +kaka1992 - Chen Song +ksonj - Kalle Jepsen +kuromatsu-nobuyuki - Nobuyuki Kuromatsu +lazyman500 - Dong Xu +leahmcguire - Leah McGuire +mbittmann - Mark Bittmann +mbonaci - Marko Bonaci +meawoppl - Matthew Goodman +nyaapa - Arsenii Krasikov +phatak-dev - Madhukara Phatak +prabeesh - Prabeesh K +rakeshchalasani - Rakesh Chalasani +raschild - Marcelo Vanzin +rekhajoshm - Rekha Joshi +sisihj - June He +szheng79 - Shuai Zheng +ted-yu - Andrew Or +texasmichelle - Michelle Casbon +vinodkc - Vinod KC +yongtang - Yong Tang +ypcat - Pei-Lun Lee +zhichao-li - Zhichao Li +zzcclp - Zhichao Zhang - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.4.0-rc4 [deleted] 22596c534 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.4.0 [created] 22596c534 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.4.0-rc2 [deleted] 03fb26a3e - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.4.0-rc1 [deleted] 777a08166 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.4.0-rc3 [deleted] dd109a874 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org