date:20140910

git commit: [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme i...

2014-09-10 Thread tgraves

Repository: spark
Updated Branches:
  refs/heads/master b734ed0c2 -> 6f7a76838


[SPARK-3286] - Cannot view ApplicationMaster UI when Yarnâs url scheme i...

...s https

Author: Benoy Antony 

Closes #2276 from benoyantony/SPARK-3286 and squashes the following commits:

c3d51ee [Benoy Antony] Use address with scheme, but Allpha version removes the 
scheme
e82f94e [Benoy Antony] Use address with scheme, but Allpha version removes the 
scheme
92127c9 [Benoy Antony] rebasing from master
450c536 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when 
Yarnâs url scheme is https
f060c02 [Benoy Antony] [SPARK-3286] - Cannot view ApplicationMaster UI when 
Yarnâs url scheme is https


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f7a7683
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f7a7683
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f7a7683

Branch: refs/heads/master
Commit: 6f7a76838f15687583e3b0ab43309a3c079368c4
Parents: b734ed0
Author: Benoy Antony 
Authored: Wed Sep 10 11:59:39 2014 -0500
Committer: Thomas Graves 
Committed: Wed Sep 10 11:59:39 2014 -0500

--
 .../scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala| 4 +++-
 .../scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala   | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f7a7683/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
--
diff --git 
a/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala 
b/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
index ad27a9a..fc30953 100644
--- 
a/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
+++ 
b/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.deploy.yarn
 
 import scala.collection.{Map, Set}
+import java.net.URI;
 
 import org.apache.hadoop.net.NetUtils
 import org.apache.hadoop.yarn.api._
@@ -97,7 +98,8 @@ private class YarnRMClientImpl(args: 
ApplicationMasterArguments) extends YarnRMC
 // Users can then monitor stderr/stdout on that node if required.
 appMasterRequest.setHost(Utils.localHostName())
 appMasterRequest.setRpcPort(0)
-appMasterRequest.setTrackingUrl(uiAddress)
+//remove the scheme from the url if it exists since Hadoop does not expect 
scheme
+appMasterRequest.setTrackingUrl(new URI(uiAddress).getAuthority())
 resourceManager.registerApplicationMaster(appMasterRequest)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/6f7a7683/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
--
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
index a879c83..5756263 100644
--- 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
+++ 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@@ -189,7 +189,7 @@ private[spark] class ApplicationMaster(args: 
ApplicationMasterArguments,
 if (sc == null) {
   finish(FinalApplicationStatus.FAILED, "Timed out waiting for 
SparkContext.")
 } else {
-  registerAM(sc.ui.appUIHostPort, securityMgr)
+  registerAM(sc.ui.appUIAddress, securityMgr)
   try {
 userThread.join()
   } finally {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3362][SQL] Fix resolution for casewhen with nulls.

2014-09-10 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 6f7a76838 -> a0283300c


[SPARK-3362][SQL] Fix resolution for casewhen with nulls.

Current implementation will ignore else val type.

Author: Daoyuan Wang 

Closes #2245 from adrian-wang/casewhenbug and squashes the following commits:

3332f6e [Daoyuan Wang] remove wrong comment
83b536c [Daoyuan Wang] a comment to trigger retest
d7315b3 [Daoyuan Wang] code improve
eed35fc [Daoyuan Wang] bug in casewhen resolve


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0283300
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0283300
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0283300

Branch: refs/heads/master
Commit: a0283300c4af5e64a1dc06193245daa1e746b5f4
Parents: 6f7a768
Author: Daoyuan Wang 
Authored: Wed Sep 10 10:45:15 2014 -0700
Committer: Michael Armbrust 
Committed: Wed Sep 10 10:45:24 2014 -0700

--
 .../org/apache/spark/sql/catalyst/expressions/predicates.scala | 5 +++--
 ...en then 1 else null end -0-f7c7fdd35c084bc797890aa08d33693c | 1 +
 ... then 1.0 else null end -0-aeb1f906bfe92f2d406f84109301afe0 | 1 +
 ...n then 1L else null end -0-763ae85e7a52b4cf4162d6a8931716bb | 1 +
 ...n then 1S else null end -0-6f5f3b3dbe9f1d1eb98443aef315b982 | 1 +
 ...n then 1Y else null end -0-589982a400d86157791c7216b10b6b5d | 1 +
 ...en then null else 1 end -0-48bd83660cf3ba93cdbdc24559092171 | 1 +
 ... then null else 1.0 end -0-7f5ce763801781cf568c6a31dd80b623 | 1 +
 ...n then null else 1L end -0-a7f1305ea4f86e596c368e35e45cc4e5 | 1 +
 ...n then null else 1S end -0-dfb61969e6cb6e6dbe89225b538c8d98 | 1 +
 ...n then null else 1Y end -0-7f4c32299c3738739b678ece62752a7b | 1 +
 .../spark/sql/hive/execution/HiveTypeCoercionSuite.scala   | 6 ++
 12 files changed, 19 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a0283300/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
index 1313ccd..329af33 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
@@ -265,12 +265,13 @@ case class CaseWhen(branches: Seq[Expression]) extends 
Expression {
   false
 } else {
   val allCondBooleans = predicates.forall(_.dataType == BooleanType)
-  val dataTypesEqual = values.map(_.dataType).distinct.size <= 1
+  // both then and else val should be considered.
+  val dataTypesEqual = (values ++ elseValue).map(_.dataType).distinct.size 
<= 1
   allCondBooleans && dataTypesEqual
 }
   }
 
-  /** Written in imperative fashion for performance considerations.  Same for 
CaseKeyWhen. */
+  /** Written in imperative fashion for performance considerations. */
   override def eval(input: Row): Any = {
 val len = branchesArr.length
 var i = 0

http://git-wip-us.apache.org/repos/asf/spark/blob/a0283300/sql/hive/src/test/resources/golden/case
 when then 1 else null end -0-f7c7fdd35c084bc797890aa08d33693c
--
diff --git a/sql/hive/src/test/resources/golden/case when then 1 else null end 
-0-f7c7fdd35c084bc797890aa08d33693c b/sql/hive/src/test/resources/golden/case 
when then 1 else null end -0-f7c7fdd35c084bc797890aa08d33693c
new file mode 100644
index 000..d00491f
--- /dev/null
+++ b/sql/hive/src/test/resources/golden/case when then 1 else null end 
-0-f7c7fdd35c084bc797890aa08d33693c 
@@ -0,0 +1 @@
+1

http://git-wip-us.apache.org/repos/asf/spark/blob/a0283300/sql/hive/src/test/resources/golden/case
 when then 1.0 else null end -0-aeb1f906bfe92f2d406f84109301afe0
--
diff --git a/sql/hive/src/test/resources/golden/case when then 1.0 else null 
end -0-aeb1f906bfe92f2d406f84109301afe0 
b/sql/hive/src/test/resources/golden/case when then 1.0 else null end 
-0-aeb1f906bfe92f2d406f84109301afe0
new file mode 100644
index 000..d3827e7
--- /dev/null
+++ b/sql/hive/src/test/resources/golden/case when then 1.0 else null end 
-0-aeb1f906bfe92f2d406f84109301afe0   
@@ -0,0 +1 @@
+1.0

http://git-wip-us.apache.org/repos/asf/spark/blob/a0283300/sql/hive/src/test/resources/golden/case
 when then 1L else null end -0-763ae85e7a52b4cf4162d6a8931716bb
--
diff --git a/sql/hive/src/test/resources/golden/case when then 1L else null end 
-0-763ae85e7a52b4cf4162d6a8931716bb b/

git commit: [SPARK-3363][SQL] Type Coercion should promote null to all other types.

2014-09-10 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master a0283300c -> f0c87dc86


[SPARK-3363][SQL] Type Coercion should promote null to all other types.

Type Coercion should support every type to have null value

Author: Daoyuan Wang 
Author: Michael Armbrust 

Closes #2246 from adrian-wang/spark3363-0 and squashes the following commits:

c6241de [Daoyuan Wang] minor code clean
595b417 [Daoyuan Wang] Merge pull request #2 from marmbrus/pr/2246
832e640 [Michael Armbrust] reduce code duplication
ef6f986 [Daoyuan Wang] make double boolean miss in jsonRDD compatibleType
c619f0a [Daoyuan Wang] Type Coercion should support every type to have null 
value


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f0c87dc8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f0c87dc8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f0c87dc8

Branch: refs/heads/master
Commit: f0c87dc86ae65a39cd19370d8d960b4a60854517
Parents: a028330
Author: Daoyuan Wang 
Authored: Wed Sep 10 10:48:33 2014 -0700
Committer: Michael Armbrust 
Committed: Wed Sep 10 10:48:36 2014 -0700

--
 .../catalyst/analysis/HiveTypeCoercion.scala| 38 ---
 .../analysis/HiveTypeCoercionSuite.scala| 32 +---
 .../org/apache/spark/sql/json/JsonRDD.scala | 51 +---
 3 files changed, 67 insertions(+), 54 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f0c87dc8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index d6758eb..bd8131c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -26,10 +26,22 @@ object HiveTypeCoercion {
   // See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types.
   // The conversion for integral and floating point types have a linear 
widening hierarchy:
   val numericPrecedence =
-Seq(NullType, ByteType, ShortType, IntegerType, LongType, FloatType, 
DoubleType, DecimalType)
-  // Boolean is only wider than Void
-  val booleanPrecedence = Seq(NullType, BooleanType)
-  val allPromotions: Seq[Seq[DataType]] = numericPrecedence :: 
booleanPrecedence :: Nil
+Seq(ByteType, ShortType, IntegerType, LongType, FloatType, DoubleType, 
DecimalType)
+  val allPromotions: Seq[Seq[DataType]] = numericPrecedence :: Nil
+
+  def findTightestCommonType(t1: DataType, t2: DataType): Option[DataType] = {
+val valueTypes = Seq(t1, t2).filter(t => t != NullType)
+if (valueTypes.distinct.size > 1) {
+  // Try and find a promotion rule that contains both types in question.
+  val applicableConversion =
+HiveTypeCoercion.allPromotions.find(p => p.contains(t1) && 
p.contains(t2))
+
+  // If found return the widest common type, otherwise None
+  applicableConversion.map(_.filter(t => t == t1 || t == t2).last)
+} else {
+  Some(if (valueTypes.size == 0) NullType else valueTypes.head)
+}
+  }
 }
 
 /**
@@ -53,17 +65,6 @@ trait HiveTypeCoercion {
 Division ::
 Nil
 
-  trait TypeWidening {
-def findTightestCommonType(t1: DataType, t2: DataType): Option[DataType] = 
{
-  // Try and find a promotion rule that contains both types in question.
-  val applicableConversion =
-HiveTypeCoercion.allPromotions.find(p => p.contains(t1) && 
p.contains(t2))
-
-  // If found return the widest common type, otherwise None
-  applicableConversion.map(_.filter(t => t == t1 || t == t2).last)
-}
-  }
-
   /**
* Applies any changes to [[AttributeReference]] data types that are made by 
other rules to
* instances higher in the query tree.
@@ -144,7 +145,8 @@ trait HiveTypeCoercion {
* - LongType to FloatType
* - LongType to DoubleType
*/
-  object WidenTypes extends Rule[LogicalPlan] with TypeWidening {
+  object WidenTypes extends Rule[LogicalPlan] {
+import HiveTypeCoercion._
 
 def apply(plan: LogicalPlan): LogicalPlan = plan transform {
   case u @ Union(left, right) if u.childrenResolved && !u.resolved =>
@@ -352,7 +354,9 @@ trait HiveTypeCoercion {
   /**
* Coerces the type of different branches of a CASE WHEN statement to a 
common type.
*/
-  object CaseWhenCoercion extends Rule[LogicalPlan] with TypeWidening {
+  object CaseWhenCoercion extends Rule[LogicalPlan] {
+import HiveTypeCoercion._
+
 def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
   case cw @ CaseWhen

git commit: [HOTFIX] Fix scala style issue introduced by #2276.

2014-09-10 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master f0c87dc86 -> 26503fdf2


[HOTFIX] Fix scala style issue introduced by #2276.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/26503fdf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/26503fdf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/26503fdf

Branch: refs/heads/master
Commit: 26503fdf20f4181a2b390c88b83f364e6a4ccc21
Parents: f0c87dc
Author: Josh Rosen 
Authored: Wed Sep 10 12:02:23 2014 -0700
Committer: Josh Rosen 
Committed: Wed Sep 10 12:02:23 2014 -0700

--
 .../main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/26503fdf/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
--
diff --git 
a/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala 
b/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
index fc30953..acf2650 100644
--- 
a/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
+++ 
b/yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnRMClientImpl.scala
@@ -98,7 +98,7 @@ private class YarnRMClientImpl(args: 
ApplicationMasterArguments) extends YarnRMC
 // Users can then monitor stderr/stdout on that node if required.
 appMasterRequest.setHost(Utils.localHostName())
 appMasterRequest.setRpcPort(0)
-//remove the scheme from the url if it exists since Hadoop does not expect 
scheme
+// remove the scheme from the url if it exists since Hadoop does not 
expect scheme
 appMasterRequest.setTrackingUrl(new URI(uiAddress).getAuthority())
 resourceManager.registerApplicationMaster(appMasterRequest)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: SPARK-1713. Use a thread pool for launching executors.

2014-09-10 Thread tgraves

Repository: spark
Updated Branches:
  refs/heads/master 26503fdf2 -> 1f4a648d4


SPARK-1713. Use a thread pool for launching executors.

This patch copies the approach used in the MapReduce application master for 
launching containers.

Author: Sandy Ryza 

Closes #663 from sryza/sandy-spark-1713 and squashes the following commits:

036550d [Sandy Ryza] SPARK-1713. [YARN] Use a threadpool for launching executor 
containers


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1f4a648d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1f4a648d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1f4a648d

Branch: refs/heads/master
Commit: 1f4a648d4e30e837d6cf3ea8de1808e2254ad70b
Parents: 26503fd
Author: Sandy Ryza 
Authored: Wed Sep 10 14:34:24 2014 -0500
Committer: Thomas Graves 
Committed: Wed Sep 10 14:34:24 2014 -0500

--
 docs/running-on-yarn.md   |  7 +++
 .../org/apache/spark/deploy/yarn/YarnAllocator.scala  | 14 --
 2 files changed, 19 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1f4a648d/docs/running-on-yarn.md
--
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 943f06b..d8b22f3 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -125,6 +125,13 @@ Most of the configs are the same for Spark on YARN as for 
other deployment modes
  the environment of the executor launcher. 
   
 
+
+  spark.yarn.containerLauncherMaxThreads
+  25
+  
+The maximum number of threads to use in the application master for 
launching executor containers.
+  
+
 
 
 # Launching Spark on YARN

http://git-wip-us.apache.org/repos/asf/spark/blob/1f4a648d/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
--
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
index 02b9a81..0b8744f 100644
--- 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
+++ 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.deploy.yarn
 
 import java.util.{List => JList}
-import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent._
 import java.util.concurrent.atomic.AtomicInteger
 
 import scala.collection.JavaConversions._
@@ -32,6 +32,8 @@ import org.apache.spark.{Logging, SecurityManager, SparkConf, 
SparkEnv}
 import org.apache.spark.scheduler.{SplitInfo, TaskSchedulerImpl}
 import org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend
 
+import com.google.common.util.concurrent.ThreadFactoryBuilder
+
 object AllocationType extends Enumeration {
   type AllocationType = Value
   val HOST, RACK, ANY = Value
@@ -95,6 +97,14 @@ private[yarn] abstract class YarnAllocator(
   protected val (preferredHostToCount, preferredRackToCount) =
 generateNodeToWeight(conf, preferredNodes)
 
+  private val launcherPool = new ThreadPoolExecutor(
+// max pool size of Integer.MAX_VALUE is ignored because we use an 
unbounded queue
+sparkConf.getInt("spark.yarn.containerLauncherMaxThreads", 25), 
Integer.MAX_VALUE,
+1, TimeUnit.MINUTES,
+new LinkedBlockingQueue[Runnable](),
+new ThreadFactoryBuilder().setNameFormat("ContainerLauncher 
#%d").setDaemon(true).build())
+  launcherPool.allowCoreThreadTimeOut(true)
+
   def getNumExecutorsRunning: Int = numExecutorsRunning.intValue
 
   def getNumExecutorsFailed: Int = numExecutorsFailed.intValue
@@ -283,7 +293,7 @@ private[yarn] abstract class YarnAllocator(
 executorMemory,
 executorCores,
 securityMgr)
-  new Thread(executorRunnable).start()
+  launcherPool.execute(executorRunnable)
 }
   }
   logDebug("""


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2096][SQL] Correctly parse dot notations

2014-09-10 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 1f4a648d4 -> e4f4886d7


[SPARK-2096][SQL] Correctly parse dot notations

First let me write down the current `projections` grammar of spark sql:

expression: orExpression
orExpression  : andExpression {"or" andExpression}
andExpression : comparisonExpression {"and" 
comparisonExpression}
comparisonExpression  : termExpression | termExpression "=" 
termExpression | termExpression ">" termExpression | ...
termExpression: productExpression {"+"|"-" productExpression}
productExpression : baseExpression {"*"|"/"|"%" baseExpression}
baseExpression: expression "[" expression "]" | ... | ident | 
...
ident : identChar {identChar | digit} | delimiters | ...
identChar : letter | "_" | "."
delimiters: "," | ";" | "(" | ")" | "[" | "]" | ...
projection: expression [["AS"] ident]
projections   : projection { "," projection}

For something like `a.b.c[1]`, it will be parsed as:
http://img51.imgspice.com/i/03008/4iltjsnqgmtt_t.jpg"; border=0>
But for something like `a[1].b`, the current grammar can't parse it correctly.
A simple solution is written in `ParquetQuerySuite#NestedSqlParser`, changed 
grammars are:

delimiters: "." | "," | ";" | "(" | ")" | "[" | "]" | ...
identChar : letter | "_"
baseExpression: expression "[" expression "]" | expression "." 
ident | ... | ident | ...
This works well, but can't cover some corner case like `select t.a.b from table 
as t`:
http://img51.imgspice.com/i/03008/v2iau3hoxoxg_t.jpg"; border=0>
`t.a.b` parsed as `GetField(GetField(UnResolved("t"), "a"), "b")` instead of 
`GetField(UnResolved("t.a"), "b")` using this new grammar.
However, we can't resolve `t` as it's not a filed, but the whole table.(if we 
could do this, then `select t from table as t` is legal, which is unexpected)
My solution is:

dotExpressionHeader   : ident "." ident
baseExpression: expression "[" expression "]" | expression "." 
ident | ... | dotExpressionHeader  | ident | ...
I passed all test cases under sql locally and add a more complex case.
"arrayOfStruct.field1 to access all values of field1" is not supported yet. 
Since this PR has changed a lot of code, I will open another PR for it.
I'm not familiar with the latter optimize phase, please correct me if I missed 
something.

Author: Wenchen Fan 
Author: Michael Armbrust 

Closes #2230 from cloud-fan/dot and squashes the following commits:

e1a8898 [Wenchen Fan] remove support for arbitrary nested arrays
ee8a724 [Wenchen Fan] rollback LogicalPlan, support dot operation on nested 
array type
a58df40 [Michael Armbrust] add regression test for doubly nested data
16bc4c6 [Wenchen Fan] some enhance
95d733f [Wenchen Fan] split long line
dc31698 [Wenchen Fan] SPARK-2096 Correctly parse dot notations


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e4f4886d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e4f4886d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e4f4886d

Branch: refs/heads/master
Commit: e4f4886d7148bf48f9e3462b83bfb1ecc7edbe31
Parents: 1f4a648
Author: Wenchen Fan 
Authored: Wed Sep 10 12:56:59 2014 -0700
Committer: Michael Armbrust 
Committed: Wed Sep 10 12:56:59 2014 -0700

--
 .../apache/spark/sql/catalyst/SqlParser.scala   |  13 ++-
 .../catalyst/plans/logical/LogicalPlan.scala|   6 +-
 .../org/apache/spark/sql/json/JsonSuite.scala   |  14 +++
 .../apache/spark/sql/json/TestJsonData.scala|  26 +
 .../spark/sql/parquet/ParquetQuerySuite.scala   | 102 +--
 .../sql/hive/execution/SQLQuerySuite.scala  |  17 +++-
 6 files changed, 88 insertions(+), 90 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e4f4886d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
index a04b4a9..ca69531 100755
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
@@ -357,16 +357,25 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 expression ~ "[" ~ expression <~ "]" ^^ {
   case base ~ _ ~ ordinal => GetItem(base, ordinal)
 } |
+(expression <~ ".") ~ ident ^^ {
+  case base ~ fieldName => GetField(base, fieldName)
+} |
 TRUE ^^^ Literal(true, BooleanType) |

git commit: [SPARK-3411] Improve load-balancing of concurrently-submitted drivers across workers

2014-09-10 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master e4f4886d7 -> 558962a83


[SPARK-3411] Improve load-balancing of concurrently-submitted drivers across 
workers

If the waiting driver array is too big, the drivers in it will be dispatched to 
the first worker we get(if it has enough resources), with or without the 
Randomization.

We should do randomization every time we dispatch a driver, in order to better 
balance drivers.

Author: WangTaoTheTonic 
Author: WangTao 

Closes #1106 from WangTaoTheTonic/fixBalanceDrivers and squashes the following 
commits:

d1a928b [WangTaoTheTonic] Minor adjustment
b6560cf [WangTaoTheTonic] solve the shuffle problem for HashSet
f674e59 [WangTaoTheTonic] add comment and minor fix
2835929 [WangTao] solve the failed test and avoid filtering
2ca3091 [WangTao] fix checkstyle
bc91bb1 [WangTao] Avoid shuffle every time we schedule the driver using round 
robin
bbc7087 [WangTaoTheTonic] Optimize the schedule in Master


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/558962a8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/558962a8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/558962a8

Branch: refs/heads/master
Commit: 558962a83fb0758ab5c13ff4ea58cc96c29cbbcc
Parents: e4f4886
Author: WangTaoTheTonic 
Authored: Wed Sep 10 13:06:47 2014 -0700
Committer: Andrew Or 
Committed: Wed Sep 10 13:06:47 2014 -0700

--
 .../org/apache/spark/deploy/master/Master.scala   | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/558962a8/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index a3909d6..2a3bd6b 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -487,13 +487,25 @@ private[spark] class Master(
 if (state != RecoveryState.ALIVE) { return }
 
 // First schedule drivers, they take strict precedence over applications
-val shuffledWorkers = Random.shuffle(workers) // Randomization helps 
balance drivers
-for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
-  for (driver <- List(waitingDrivers: _*)) { // iterate over a copy of 
waitingDrivers
+// Randomization helps balance drivers
+val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == 
WorkerState.ALIVE))
+val aliveWorkerNum = shuffledAliveWorkers.size
+var curPos = 0
+for (driver <- waitingDrivers.toList) { // iterate over a copy of 
waitingDrivers
+  // We assign workers to each waiting driver in a round-robin fashion. 
For each driver, we
+  // start from the last worker that was assigned a driver, and continue 
onwards until we have
+  // explored all alive workers.
+  curPos = (curPos + 1) % aliveWorkerNum
+  val startPos = curPos
+  var launched = false
+  while (curPos != startPos && !launched) {
+val worker = shuffledAliveWorkers(curPos)
 if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= 
driver.desc.cores) {
   launchDriver(worker, driver)
   waitingDrivers -= driver
+  launched = true
 }
+curPos = (curPos + 1) % aliveWorkerNum
   }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-09-10 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.1.0 [created] 2f9b2bd78

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum instances per node as training parameters for decision tree.

2014-09-10 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 558962a83 -> 79cdb9b64


[SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum 
instances per node as training parameters for decision tree.

These two parameters can act as early stop rules to do pre-pruning. When a 
split cause cause left or right child to have less than `minInstancesPerNode` 
or has less information gain than `minInfoGain`, current node will not be split 
by this split.

When there is no possible splits that satisfy requirements, there is no useful 
information gain stats, but we still need to calculate the predict value for 
current node. So I separated calculation of predict from calculation of 
information gain, which can also save computation when the number of possible 
splits is large. Please see 
[SPARK-3272](https://issues.apache.org/jira/browse/SPARK-3272) for more details.

CC: mengxr manishamde jkbradley, please help me review this, thanks.

Author: qiping.lqp 
Author: chouqin 

Closes #2332 from chouqin/dt-preprune and squashes the following commits:

f1d11d1 [chouqin] fix typo
c7ebaf1 [chouqin] fix typo
39f9b60 [chouqin] change edge `minInstancesPerNode` to 2 and add one more test
0278a11 [chouqin] remove `noSplit` and set `Predict` private to tree
d593ec7 [chouqin] fix docs and change minInstancesPerNode to 1
efcc736 [qiping.lqp] fix bug
10b8012 [qiping.lqp] fix style
6728fad [qiping.lqp] minor fix: remove empty lines
bb465ca [qiping.lqp] Merge branch 'master' of https://github.com/apache/spark 
into dt-preprune
cadd569 [qiping.lqp] add api docs
46b891f [qiping.lqp] fix bug
e72c7e4 [qiping.lqp] add comments
845c6fa [qiping.lqp] fix style
f195e83 [qiping.lqp] fix style
987cbf4 [qiping.lqp] fix bug
ff34845 [qiping.lqp] separate calculation of predict of node from calculation 
of info gain
ac42378 [qiping.lqp] add min info gain and min instances per node parameters in 
decision tree


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/79cdb9b6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/79cdb9b6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/79cdb9b6

Branch: refs/heads/master
Commit: 79cdb9b64ad2fa3ab7f2c221766d36658b917c40
Parents: 558962a
Author: qiping.lqp 
Authored: Wed Sep 10 15:37:10 2014 -0700
Committer: Xiangrui Meng 
Committed: Wed Sep 10 15:37:10 2014 -0700

--
 .../apache/spark/mllib/tree/DecisionTree.scala  |  72 +
 .../mllib/tree/configuration/Strategy.scala |   9 ++
 .../mllib/tree/impl/DecisionTreeMetadata.scala  |   7 +-
 .../mllib/tree/model/InformationGainStats.scala |  20 ++--
 .../apache/spark/mllib/tree/model/Predict.scala |  36 +++
 .../apache/spark/mllib/tree/model/Split.scala   |   2 +
 .../spark/mllib/tree/DecisionTreeSuite.scala| 103 +--
 7 files changed, 213 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/79cdb9b6/mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
index d1309b2..9859656 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
@@ -130,7 +130,7 @@ class DecisionTree (private val strategy: Strategy) extends 
Serializable with Lo
 
   // Find best split for all nodes at a level.
   timer.start("findBestSplits")
-  val splitsStatsForLevel: Array[(Split, InformationGainStats)] =
+  val splitsStatsForLevel: Array[(Split, InformationGainStats, Predict)] =
 DecisionTree.findBestSplits(treeInput, parentImpurities,
   metadata, level, nodes, splits, bins, maxLevelForSingleGroup, timer)
   timer.stop("findBestSplits")
@@ -143,8 +143,9 @@ class DecisionTree (private val strategy: Strategy) extends 
Serializable with Lo
 timer.start("extractNodeInfo")
 val split = nodeSplitStats._1
 val stats = nodeSplitStats._2
+val predict = nodeSplitStats._3.predict
 val isLeaf = (stats.gain <= 0) || (level == strategy.maxDepth)
-val node = new Node(nodeIndex, stats.predict, isLeaf, Some(split), 
None, None, Some(stats))
+val node = new Node(nodeIndex, predict, isLeaf, Some(split), None, 
None, Some(stats))
 logDebug("Node = " + node)
 nodes(nodeIndex) = node
 timer.stop("extractNodeInfo")
@@ -425,7 +426,7 @@ object DecisionTree extends Serializable with Logging {
   splits: Array[Array[Split]],
   bins: Array[Array[Bin]],
   maxLevelForSingleGroup: Int,
-  timer: TimeTracker = new TimeTracker): Array[(Split, 
InformationGainSt

git commit: [SQL] Add test case with workaround for reading partitioned Avro files

2014-09-10 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 79cdb9b64 -> 84e2c8bfe


[SQL] Add test case with workaround for reading partitioned Avro files

In order to read from partitioned Avro files we need to also set the 
`SERDEPROPERTIES` since `TBLPROPERTIES` are not passed to the initialization.  
This PR simply adds a test to make sure we don't break this workaround.

Author: Michael Armbrust 

Closes #2340 from marmbrus/avroPartitioned and squashes the following commits:

6b969d6 [Michael Armbrust] fix style
fea2124 [Michael Armbrust] Add test case with workaround for reading 
partitioned avro files.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/84e2c8bf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/84e2c8bf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/84e2c8bf

Branch: refs/heads/master
Commit: 84e2c8bfe41837baf2aeffa9741e4dbd14351981
Parents: 79cdb9b
Author: Michael Armbrust 
Authored: Wed Sep 10 20:57:38 2014 -0700
Committer: Michael Armbrust 
Committed: Wed Sep 10 20:57:38 2014 -0700

--
 .../org/apache/spark/sql/hive/TestHive.scala| 69 +++-
 ...AvroSerDe-0-e4501461c855cc9071a872a64186c3de |  8 +++
 .../sql/hive/execution/HiveSerDeSuite.scala |  2 +
 3 files changed, 78 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/84e2c8bf/sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala
index a013f3f..6974f3e 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala
@@ -269,7 +269,74 @@ class TestHiveContext(sc: SparkContext) extends 
HiveContext(sc) {
  |)
""".stripMargin.cmd,
   s"LOAD DATA LOCAL INPATH '${getHiveFile("data/files/episodes.avro")}' 
INTO TABLE episodes".cmd
-)
+),
+// THIS TABLE IS NOT THE SAME AS THE HIVE TEST TABLE episodes_partitioned 
AS DYNAMIC PARITIONING
+// IS NOT YET SUPPORTED
+TestTable("episodes_part",
+  s"""CREATE TABLE episodes_part (title STRING, air_date STRING, doctor 
INT)
+ |PARTITIONED BY (doctor_pt INT)
+ |ROW FORMAT SERDE '${classOf[AvroSerDe].getCanonicalName}'
+ |STORED AS
+ |INPUTFORMAT '${classOf[AvroContainerInputFormat].getCanonicalName}'
+ |OUTPUTFORMAT '${classOf[AvroContainerOutputFormat].getCanonicalName}'
+ |TBLPROPERTIES (
+ |  'avro.schema.literal'='{
+ |"type": "record",
+ |"name": "episodes",
+ |"namespace": "testing.hive.avro.serde",
+ |"fields": [
+ |  {
+ |  "name": "title",
+ |  "type": "string",
+ |  "doc": "episode title"
+ |  },
+ |  {
+ |  "name": "air_date",
+ |  "type": "string",
+ |  "doc": "initial date"
+ |  },
+ |  {
+ |  "name": "doctor",
+ |  "type": "int",
+ |  "doc": "main actor playing the Doctor in episode"
+ |  }
+ |]
+ |  }'
+ |)
+   """.stripMargin.cmd,
+  // WORKAROUND: Required to pass schema to SerDe for partitioned tables.
+  // TODO: Pass this automatically from the table to partitions.
+  s"""
+ |ALTER TABLE episodes_part SET SERDEPROPERTIES (
+ |  'avro.schema.literal'='{
+ |"type": "record",
+ |"name": "episodes",
+ |"namespace": "testing.hive.avro.serde",
+ |"fields": [
+ |  {
+ |  "name": "title",
+ |  "type": "string",
+ |  "doc": "episode title"
+ |  },
+ |  {
+ |  "name": "air_date",
+ |  "type": "string",
+ |  "doc": "initial date"
+ |  },
+ |  {
+ |  "name": "doctor",
+ |  "type": "int",
+ |  "doc": "main actor playing the Doctor in episode"
+ |  }
+ |]
+ |  }'
+ |)
+""".stripMargin.cmd,
+  s"""
+INSERT OVERWRITE TABLE episodes_part PARTITION (doctor_pt=1)
+SELECT title, air_date, doctor FROM episodes
+  """.cmd
+  )
   )
 
   hiveQTestUtilTables.foreach(registerTestTable)

http://git-wip-us.apache.org/repos/asf/spark/blob/84e2c8bf/sql/hive/src/test/resources/golden/Read
 Partitioned with AvroSerDe-0-e4501461c855cc9071a872a64186c3de

git commit: [SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE

2014-09-10 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 84e2c8bfe -> f92cde24e


[SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE

Author: Michael Armbrust 

Closes #2323 from marmbrus/kryoJListNPE and squashes the following commits:

9634f11 [Michael Armbrust] Rollback JSON RDD changes
4d4d93c [Michael Armbrust] Merge remote-tracking branch 'origin/master' into 
kryoJListNPE
646976b [Michael Armbrust] Fix JSON RDD Conversion too
59065bc [Michael Armbrust] Remove explicit conversion to avoid NPE


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f92cde24
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f92cde24
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f92cde24

Branch: refs/heads/master
Commit: f92cde24e8f305bcec71bb3687498c1406da
Parents: 84e2c8b
Author: Michael Armbrust 
Authored: Wed Sep 10 20:59:40 2014 -0700
Committer: Michael Armbrust 
Committed: Wed Sep 10 20:59:40 2014 -0700

--
 sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f92cde24/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index a2f334a..c551c7c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -460,7 +460,6 @@ class SQLContext(@transient val sparkContext: SparkContext)
   rdd: RDD[Array[Any]],
   schema: StructType): SchemaRDD = {
 import scala.collection.JavaConversions._
-import scala.collection.convert.Wrappers.{JListWrapper, JMapWrapper}
 
 def needsConversion(dataType: DataType): Boolean = dataType match {
   case ByteType => true
@@ -482,8 +481,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
   case (null, _) => null
 
   case (c: java.util.List[_], ArrayType(elementType, _)) =>
-val converted = c.map { e => convert(e, elementType)}
-JListWrapper(converted)
+c.map { e => convert(e, elementType)}: Seq[Any]
 
   case (c, ArrayType(elementType, _)) if c.getClass.isArray =>
 c.asInstanceOf[Array[_]].map(e => convert(e, elementType)): Seq[Any]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer.

2014-09-10 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master f92cde24e -> c27718f37


[SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer.

LogicalPlan contains a âresolvedâ attribute indicating that all of its 
execution requirements have been resolved. This attribute is not checked before 
query execution. The analyzer contains a step to check that all Expressions are 
resolved, but this is not equivalent to checking all LogicalPlans. In 
particular, the Union planâs implementation of âresolvedâ verifies that 
the types of its childrenâs columns are compatible. Because the analyzer does 
not check that a Union plan is resolved, it is possible to execute a Union plan 
that outputs different types in the same column.  See SPARK-2781 for an example.

This patch adds two checks to the analyzerâs CheckResolution rule. First, 
each logical plan is checked to see if it is not resolved despite its children 
being resolved. This allows the âproblemâ unresolved plan to be included in 
the TreeNodeException for reporting. Then as a backstop the root plan is 
checked to see if it is resolved, which recursively checks that the entire plan 
tree is resolved. Note that the resolved attribute is implemented recursively, 
and this patch also explicitly checks the resolved attribute on each logical 
plan in the tree. I assume the query plan trees will not be large enough for 
this redundant checking to meaningfully impact performance.

Because this patch starts validating that LogicalPlans are resolved before 
execution, I had to fix some cases where unresolved plans were passing through 
the analyzer as part of the implementation of the hive query system. In 
particular, HiveContext applies the CreateTables and PreInsertionCasts, and 
ExtractPythonUdfs rules manually after the analyzer runs. I moved these rules 
to the analyzer stage (for hive queries only), in the process completing a code 
TODO indicating the rules should be moved to the analyzer.

Itâs worth noting that moving the CreateTables rule means introducing an 
analyzer rule with a significant side effect - in this case the side effect is 
creating a hive table. The rule will only attempt to create a table once even 
if its batch is executed multiple times, because it converts the 
InsertIntoCreatedTable plan it matches against into an InsertIntoTable. 
Additionally, these hive rules must be added to the Resolution batch rather 
than as a separate batch because hive rules rules may be needed to resolve 
non-root nodes, leaving the root to be resolved on a subsequent batch 
iteration. For example, the hive compatibility test auto_smb_mapjoin_14, and 
others, make use of a query plan where the root is a Union and its children are 
each a hive InsertIntoTable.

Mixing the custom hive rules with standard analyzer rules initially resulted in 
an additional failure because of policy differences between spark sql and hive 
when casting a boolean to a string. Hive casts booleans to strings as 
âtrueâ / âfalseâ while spark sql casts booleans to strings as â1â / 
â0â (causing the cast1.q test to fail). This behavior is a result of the 
BooleanCasts rule in HiveTypeCoercion.scala, and from looking at the 
implementation of BooleanCasts I think converting to to â1â/â0â is 
potentially a programming mistake. (If the BooleanCasts rule is disabled, 
casting produces âtrueâ/âfalseâ instead.) I believe âtrueâ / 
âfalseâ should be the behavior for spark sql - I changed the behavior so 
bools are converted to âtrueâ/âfalseâ to be consistent with hive, and 
none of the existing spark tests failed.

Finally, in some initial testing with hive it appears that an implicit type 
coercion of boolean to string results in a lowercase string, e.g. CONCAT( TRUE, 
ââ ) -> âtrueâ while an explicit cast produces an all caps string, e.g. 
CAST( TRUE AS STRING ) -> âTRUEâ.  The change Iâve made just converts to 
lowercase strings in all cases.  I believe it is at least more correct than the 
existing spark sql implementation where all Cast expressions become â1â / 
â0â.

Author: Aaron Staple 

Closes #1706 from staple/SPARK-2781 and squashes the following commits:

32683c4 [Aaron Staple] Fix compilation failure due to merge.
7c77fda [Aaron Staple] Move ExtractPythonUdfs to Analyzer's extendedRules in 
HiveContext.
d49bfb3 [Aaron Staple] Address review comments.
915b690 [Aaron Staple] Fix merge issue causing compilation failure.
701dcd2 [Aaron Staple] [SPARK-2781][SQL] Check resolution of LogicalPlans in 
Analyzer.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c27718f3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c27718f3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c27718f3

Branch: refs/heads/master
Commit: c27718f376483dbe6290de612094c8d4ce9b16b4
Parents: f92cde2
Author: Aaron Staple 
Authored: Wed Sep

svn commit: r1624193 - in /spark/site/docs/1.1.0: ./ api/ api/java/ api/java/org/ api/java/org/apache/ api/java/org/apache/spark/ api/java/org/apache/spark/annotation/ api/java/org/apache/spark/api/ a

2014-09-10 Thread pwendell

Author: pwendell
Date: Thu Sep 11 05:00:26 2014
New Revision: 1624193

URL: http://svn.apache.org/r1624193
Log:
Adding Spark 1.1.0 docs.



[This commit notification would consist of 484 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] git commit: HOTFIX: Changing color on doc menu

2014-09-10 Thread pwendell

HOTFIX: Changing color on doc menu


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e51ce9a5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e51ce9a5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e51ce9a5

Branch: refs/heads/branch-1.1
Commit: e51ce9a5539a395b7ceff6dcdc77bf7f033e51d8
Parents: 359cd59
Author: Patrick Wendell 
Authored: Wed Sep 10 22:14:55 2014 -0700
Committer: Patrick Wendell 
Committed: Wed Sep 10 22:14:55 2014 -0700

--
 docs/css/bootstrap.min.css | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] HOTFIX: Changing color on doc menu

2014-09-10 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 359cd59d1 -> e51ce9a55


http://git-wip-us.apache.org/repos/asf/spark/blob/e51ce9a5/docs/css/bootstrap.min.css
--
diff --git a/docs/css/bootstrap.min.css b/docs/css/bootstrap.min.css
index 3fa12ac..b2e6b89 100644
--- a/docs/css/bootstrap.min.css
+++ b/docs/css/bootstrap.min.css
@@ -6,4 +6,4 @@
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Designed and built with all the love in the world @twitter by @mdo and @fat.
- 
*/article,aside,details,figcaption,figure,footer,header,hgroup,nav,section{display:block}audio,canvas,video{display:inline-block;*display:inline;*zoom:1}audio:not([controls]){display:none}html{font-size:100%;-webkit-text-size-adjust:100%;-ms-text-size-adjust:100%}a:focus{outline:thin
 dotted #333;outline:5px auto 
-webkit-focus-ring-color;outline-offset:-2px}a:hover,a:active{outline:0}sub,sup{position:relative;font-size:75%;line-height:0;vertical-align:baseline}sup{top:-0.5em}sub{bottom:-0.25em}img{height:auto;max-width:100%;vertical-align:middle;border:0;-ms-interpolation-mode:bicubic}#map_canvas
 
img{max-width:none}button,input,select,textarea{margin:0;font-size:100%;vertical-align:middle}button,input{*overflow:visible;line-height:normal}button::-moz-focus-inner,input::-moz-focus-inner{padding:0;border:0}button,input[type="button"],input[type="reset"],input[type="submit"]{cursor:pointer;-webkit-appearance:button}input[type="search"]{-webkit-box-sizing:content-box;-moz-box-sizing:con
 
tent-box;box-sizing:content-box;-webkit-appearance:textfield}input[type="search"]::-webkit-search-decoration,input[type="search"]::-webkit-search-cancel-button{-webkit-appearance:none}textarea{overflow:auto;vertical-align:top}.clearfix{*zoom:1}.clearfix:before,.clearfix:after{display:table;line-height:0;content:""}.clearfix:after{clear:both}.hide-text{font:0/0
 
a;color:transparent;text-shadow:none;background-color:transparent;border:0}.input-block-level{display:block;width:100%;min-height:30px;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}body{margin:0;font-family:"Helvetica
 
Neue",Helvetica,Arial,sans-serif;font-size:14px;line-height:20px;color:#333;background-color:#fff}a{color:#08c;text-decoration:none}a:hover{color:#005580;text-decoration:underline}.img-rounded{-webkit-border-radius:6px;-moz-border-radius:6px;border-radius:6px}.img-polaroid{padding:4px;background-color:#fff;border:1px
 solid #ccc;border:1px solid rgba(0,0,0,0.2);-webkit-box-shadow:0
  1px 3px rgba(0,0,0,0.1);-moz-box-shadow:0 1px 3px 
rgba(0,0,0,0.1);box-shadow:0 1px 3px 
rgba(0,0,0,0.1)}.img-circle{-webkit-border-radius:500px;-moz-border-radius:500px;border-radius:500px}.row{margin-left:-20px;*zoom:1}.row:before,.row:after{display:table;line-height:0;content:""}.row:after{clear:both}[class*="span"]{float:left;margin-left:20px}.container,.navbar-static-top
 .container,.navbar-fixed-top .container,.navbar-fixed-bottom 
.container{width:940px}.span12{width:940px}.span11{width:860px}.span10{width:780px}.span9{width:700px}.span8{width:620px}.span7{width:540px}.span6{width:460px}.span5{width:380px}.span4{width:300px}.span3{width:220px}.span2{width:140px}.span1{width:60px}.offset12{margin-left:980px}.offset11{margin-left:900px}.offset10{margin-left:820px}.offset9{margin-left:740px}.offset8{margin-left:660px}.offset7{margin-left:580px}.offset6{margin-left:500px}.offset5{margin-left:420px}.offset4{margin-left:340px}.offset3{margin-left:260px}.offset2{margin-left:180px}.offs
 
et1{margin-left:100px}.row-fluid{width:100%;*zoom:1}.row-fluid:before,.row-fluid:after{display:table;line-height:0;content:""}.row-fluid:after{clear:both}.row-fluid
 
[class*="span"]{display:block;float:left;width:100%;min-height:30px;margin-left:2.127659574468085%;*margin-left:2.074468085106383%;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box}.row-fluid
 [class*="span"]:first-child{margin-left:0}.row-fluid 
.span12{width:100%;*width:99.94680851063829%}.row-fluid 
.span11{width:91.48936170212765%;*width:91.43617021276594%}.row-fluid 
.span10{width:82.97872340425532%;*width:82.92553191489361%}.row-fluid 
.span9{width:74.46808510638297%;*width:74.41489361702126%}.row-fluid 
.span8{width:65.95744680851064%;*width:65.90425531914893%}.row-fluid 
.span7{width:57.44680851063829%;*width:57.39361702127659%}.row-fluid 
.span6{width:48.93617021276595%;*width:48.88297872340425%}.row-fluid 
.span5{width:40.42553191489362%;*width:40.37234042553192%}.row-fluid 
.span4{width:31.9
 14893617021278%;*width:31.861702127659576%}.row-fluid 
.span3{width:23.404255319148934%;*width:23.351063829787233%}.row-fluid 
.span2{width:14.893617021276595%;*width:14.840425531914894%}.row-fluid 
.span1{width:6.382978723404255%;*width:6.329787234042553%}.row-fluid 
.offset12{margin-left:104.25531914893617%;*margin-left:104.14893617021275%}.row-fluid
 
.offset12:first-child{margin-left:102.12765957446808%;*margin-

svn commit: r1624195 - /spark/site/docs/1.1.0/css/bootstrap.min.css

2014-09-10 Thread pwendell

Author: pwendell
Date: Thu Sep 11 05:17:16 2014
New Revision: 1624195

URL: http://svn.apache.org/r1624195
Log:
Changing nav color on 1.1.0 docs

Modified:
spark/site/docs/1.1.0/css/bootstrap.min.css

Modified: spark/site/docs/1.1.0/css/bootstrap.min.css
URL: 
http://svn.apache.org/viewvc/spark/site/docs/1.1.0/css/bootstrap.min.css?rev=1624195&r1=1624194&r2=1624195&view=diff
==
--- spark/site/docs/1.1.0/css/bootstrap.min.css (original)
+++ spark/site/docs/1.1.0/css/bootstrap.min.css Thu Sep 11 05:17:16 2014
@@ -6,4 +6,4 @@
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Designed and built with all the love in the world @twitter by @mdo and @fat.

[... 3 lines stripped ...]


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-09-10 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.1.0-rc3 [deleted] c8886db83

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-09-10 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.1.0-snapshot1 [deleted] db4a0a5e8

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-09-10 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.1.0-rc2 [deleted] 446173ca1

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-09-10 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.1.0-snapshot2 [deleted] 631b798a5

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-09-10 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.1.0-rc1 [deleted] c0cb9d6b5

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3286] - Cannot view ApplicationMaster UI when Yarn’s url scheme i...

git commit: [SPARK-3362][SQL] Fix resolution for casewhen with nulls.

git commit: [SPARK-3363][SQL] Type Coercion should promote null to all other types.

git commit: [HOTFIX] Fix scala style issue introduced by #2276.

git commit: SPARK-1713. Use a thread pool for launching executors.

git commit: [SPARK-2096][SQL] Correctly parse dot notations

git commit: [SPARK-3411] Improve load-balancing of concurrently-submitted drivers across workers

Git Push Summary

git commit: [SPARK-2207][SPARK-3272][MLLib]Add minimum information gain and minimum instances per node as training parameters for decision tree.

git commit: [SQL] Add test case with workaround for reading partitioned Avro files

git commit: [SPARK-3447][SQL] Remove explicit conversion with JListWrapper to avoid NPE

git commit: [SPARK-2781][SQL] Check resolution of LogicalPlans in Analyzer.

svn commit: r1624193 - in /spark/site/docs/1.1.0: ./ api/ api/java/ api/java/org/ api/java/org/apache/ api/java/org/apache/spark/ api/java/org/apache/spark/annotation/ api/java/org/apache/spark/api/ a

[2/2] git commit: HOTFIX: Changing color on doc menu

[1/2] HOTFIX: Changing color on doc menu

svn commit: r1624195 - /spark/site/docs/1.1.0/css/bootstrap.min.css

Git Push Summary

Git Push Summary

Git Push Summary

Git Push Summary

Git Push Summary

21 matches

Site Navigation

Mail list logo

Footer information