[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121797911
  
  [Test build #37435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37435/console)
 for   PR 7434 at commit 
[`2225331`](https://github.com/apache/spark/commit/2225331ea36e4a39f097e53afead6930e3cb0ed5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedAttribute(nameParts: Seq[String]) extends 
Attribute `
  * `abstract class Star extends LeafExpression with NamedExpression `
  * `case class UnresolvedAlias(child: Expression) extends UnaryExpression 
with NamedExpression `
  * `abstract class LeafExpression extends Expression `
  * `abstract class UnaryExpression extends Expression `
  * `abstract class BinaryExpression extends Expression `
  * `case class SortOrder(child: Expression, direction: SortDirection) 
extends UnaryExpression `
  * `trait AggregateExpression extends Expression `
  * `trait PartialAggregate extends AggregateExpression `
  * `case class Min(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Max(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Count(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Average(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Sum(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class SumDistinct(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class First(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Last(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `trait Generator extends Expression `
  * `case class Explode(child: Expression) extends UnaryExpression with 
Generator `
  * `trait NamedExpression extends Expression `
  * `abstract class Attribute extends LeafExpression with NamedExpression `
  * `case class PrettyAttribute(name: String) extends Attribute `
  * `abstract class LeafNode extends LogicalPlan `
  * `abstract class UnaryNode extends LogicalPlan `
  * `abstract class BinaryNode extends LogicalPlan `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121797926
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8882][SPARK-5681][Streaming]Add a new R...

2015-07-15 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/7276#discussion_r34752101
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceiverSupervisorImpl.scala
 ---
@@ -182,4 +182,5 @@ private[streaming] class ReceiverSupervisorImpl(
 logDebug(sCleaning up blocks older then $cleanupThreshTime)
 receivedBlockHandler.cleanupOldBlocks(cleanupThreshTime.milliseconds)
   }
+
--- End diff --

nit: extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8600] [ML] Naive Bayes API for spark.ml...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7284#issuecomment-121816137
  
  [Test build #37450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37450/consoleFull)
 for   PR 7284 at commit 
[`c3de687`](https://github.com/apache/spark/commit/c3de6874b6b7a73e652cb129d0bb18327594f32f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9018][MLLIB] add stopwatches

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7415#issuecomment-121816166
  
  [Test build #37449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37449/consoleFull)
 for   PR 7415 at commit 
[`40b4347`](https://github.com/apache/spark/commit/40b43476dafcd42a562027740f4efe7089d0efd4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8882][SPARK-5681][Streaming]Add a new R...

2015-07-15 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/7276#discussion_r34752882
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
 ---
@@ -71,13 +82,57 @@ class ReceiverTracker(ssc: StreamingContext, 
skipReceiverLaunch: Boolean = false
   )
   private val listenerBus = ssc.scheduler.listenerBus
 
+  /** Enumeration to identify current state of the ReceiverTracker */
+  object TrackerState extends Enumeration {
+type CheckpointState = Value
+val Initialized, Started, Stopping, Stopped = Value
+  }
+  import TrackerState._
+
+  /** State of the tracker. Protected by trackerStateLock */
+  private var trackerState = Initialized
+
+  /** trackerStateLock is used to protect reading/writing trackerState 
*/
+  private val trackerStateLock = new AnyRef
+
   // endpoint is created when generator starts.
   // This not being null means the tracker has been started and not stopped
   private var endpoint: RpcEndpointRef = null
 
+  private val schedulingPolicy: ReceiverSchedulingPolicy =
+new LoadBalanceReceiverSchedulingPolicyImpl()
+
+  /**
+   * Track receivers' status for scheduling
+   */
+  private val receiverTrackingInfos = new HashMap[Int, 
ReceiverTrackingInfo]
+
+  /**
+   * Store all preferred locations for all receivers. We need this 
information to schedule receivers
+   */
+  private val receiverPreferredLocations = new HashMap[Int, Option[String]]
+
+  /** Use a separate lock to avoid dead-lock */
+  private val receiverTrackingInfosLock = new AnyRef
+
+  /** Check if tracker has been marked for starting */
+  private def isTrackerStarted(): Boolean = trackerStateLock.synchronized {
--- End diff --

nit: Please move these helper methods lower in the class after ` 
hasUnallocatedBlocks`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121821459
  
  [Test build #37457 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37457/consoleFull)
 for   PR 7413 at commit 
[`18c2208`](https://github.com/apache/spark/commit/18c2208f57b9f99c42b26e9fae849da52c2a05df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121821781
  
  [Test build #37457 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37457/console)
 for   PR 7413 at commit 
[`18c2208`](https://github.com/apache/spark/commit/18c2208f57b9f99c42b26e9fae849da52c2a05df).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9018][MLLIB] add stopwatches

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7415#issuecomment-121821799
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121821788
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/7034#issuecomment-121821688
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8925] [MLlib] Add @since tags to mllib....

2015-07-15 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7436#issuecomment-121821634
  
@sthota2014 You don't need to tag private or package private method. We 
only need `@since` on public methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9018][MLLIB] add stopwatches

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7415#issuecomment-121821637
  
  [Test build #37449 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37449/console)
 for   PR 7415 at commit 
[`40b4347`](https://github.com/apache/spark/commit/40b43476dafcd42a562027740f4efe7089d0efd4).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9022] [SQL] Generated projections for U...

2015-07-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/7437#issuecomment-121825079
  
@davies doesn't need to be part of this pr, but can you think about how we 
can do codegen testing with this new Unsafe project?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL]Give script a default serde w...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6638#issuecomment-121827751
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9058][SQL] Split projectionCode if it i...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7418#issuecomment-121827685
  
  [Test build #37461 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37461/consoleFull)
 for   PR 7418 at commit 
[`12d3794`](https://github.com/apache/spark/commit/12d3794b009a90d21de9a1d52d4f3ea9503f2b58).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9058][SQL] Split projectionCode if it i...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7418#issuecomment-121827606
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9058][SQL] Split projectionCode if it i...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7418#issuecomment-121827618
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL]Give script a default serde w...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6638#issuecomment-121827741
  
  [Test build #25 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/25/console)
 for   PR 6638 at commit 
[`2ee0488`](https://github.com/apache/spark/commit/2ee048825ad79a6a533ead969752b435af92166a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/6394#discussion_r34756031
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -872,6 +872,25 @@ class DAGScheduler(
 // will be posted, which should always come after a corresponding 
SparkListenerStageSubmitted
 // event.
 stage.latestInfo = StageInfo.fromStage(stage, 
Some(partitionsToCompute.size))
+val taskIdToLocations = try {
+  stage match {
+case s: ShuffleMapStage =
+  partitionsToCompute.map { id = (id, getPreferredLocs(stage.rdd, 
id))}.toMap
+case s: ResultStage =
+  val job = s.resultOfJob.get
+  partitionsToCompute.map { id =
+val p = job.partitions(id)
+(id, getPreferredLocs(stage.rdd, p))
+  }.toMap
+  }
+} catch {
+  case NonFatal(e) =
+abortStage(stage, sTask creation failed: 
$e\n${e.getStackTraceString})
+runningStages -= stage
+return
+}
+stage.latestInfo.taskLocalityPreferences = 
Some(taskIdToLocations.values.toSeq)
--- End diff --

Thanks @kayousterhout , that's a good idea, I will change the code 
accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8995][SQL] cast date strings like '2015...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7353#issuecomment-121830731
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8995][SQL] cast date strings like '2015...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7353#issuecomment-121830665
  
  [Test build #37454 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37454/console)
 for   PR 7353 at commit 
[`ca1ae69`](https://github.com/apache/spark/commit/ca1ae69c1baa7d4d14946bdd2638aec47e05be86).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8280][SPARK-8281][SQL]Handle NaN, null ...

2015-07-15 Thread yijieshen
Github user yijieshen commented on the pull request:

https://github.com/apache/spark/pull/6835#issuecomment-121798697
  
ok, will do soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8998][MLlib] Collect enough frequent pr...

2015-07-15 Thread zhangjiajin
Github user zhangjiajin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7412#discussion_r34750033
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala 
---
@@ -82,20 +84,70 @@ class PrefixSpan private (
   logWarning(Input data is not cached.)
 }
 val minCount = getMinCount(sequences)
-val lengthOnePatternsAndCounts =
-  getFreqItemAndCounts(minCount, sequences).collect()
-val prefixAndProjectedDatabase = getPrefixAndProjectedDatabase(
-  lengthOnePatternsAndCounts.map(_._1), sequences)
-val groupedProjectedDatabase = prefixAndProjectedDatabase
-  .map(x = (x._1.toSeq, x._2))
-  .groupByKey()
-  .map(x = (x._1.toArray, x._2.toArray))
-val nextPatterns = getPatternsInLocal(minCount, 
groupedProjectedDatabase)
-val lengthOnePatternsAndCountsRdd =
-  sequences.sparkContext.parallelize(
-lengthOnePatternsAndCounts.map(x = (Array(x._1), x._2)))
-val allPatterns = lengthOnePatternsAndCountsRdd ++ nextPatterns
-allPatterns
+val lengthOnePatternsAndCounts = getFreqItemAndCounts(minCount, 
sequences)
+val prefixSuffixPairs = getPrefixSuffixPairs(
+  lengthOnePatternsAndCounts.map(_._1).collect(), sequences)
+var patternsCount: Long = lengthOnePatternsAndCounts.count()
+var allPatternAndCounts = lengthOnePatternsAndCounts.map(x = 
(Array(x._1), x._2))
+var currentPrefixSuffixPairs = prefixSuffixPairs
+while (patternsCount = minPatternsBeforeShuffle  
currentPrefixSuffixPairs.count() != 0) {
+  val (nextPatternAndCounts, nextPrefixSuffixPairs) =
+getPatternCountsAndPrefixSuffixPairs(minCount, 
currentPrefixSuffixPairs)
+  patternsCount = nextPatternAndCounts.count().toInt
+  currentPrefixSuffixPairs = nextPrefixSuffixPairs
+  allPatternAndCounts = allPatternAndCounts ++ nextPatternAndCounts
+}
+if (patternsCount  0) {
+  val projectedDatabase = currentPrefixSuffixPairs
+.map(x = (x._1.toSeq, x._2))
+.groupByKey()
+.map(x = (x._1.toArray, x._2.toArray))
+  val nextPatternAndCounts = getPatternsInLocal(minCount, 
projectedDatabase)
+  allPatternAndCounts = allPatternAndCounts ++ nextPatternAndCounts
+}
+allPatternAndCounts
+  }
+
+  /**
+   * Get the pattern and counts, and prefix suffix pairs
+   * @param minCount minimum count
+   * @param prefixSuffixPairs prefix and suffix pairs,
+   * @return pattern and counts, and prefix suffix pairs
+   * (Array[pattern, count], RDD[prefix, suffix ])
+   */
+  private def getPatternCountsAndPrefixSuffixPairs(
+  minCount: Long,
+  prefixSuffixPairs: RDD[(Array[Int], Array[Int])]):
+  (RDD[(Array[Int], Long)], RDD[(Array[Int], Array[Int])]) = {
+val prefixAndFreqentItemAndCounts = prefixSuffixPairs
+  .flatMap { case (prefix, suffix) =
+  suffix.distinct.map(y = ((prefix.toSeq, y), 1L))
+}.reduceByKey(_ + _)
+  .filter(_._2 = minCount)
+val patternAndCounts = prefixAndFreqentItemAndCounts
+  .map{ case ((prefix, item), count) = (prefix.toArray :+ item, 
count) }
+val prefixlength = prefixSuffixPairs.first()._1.length
+if (prefixlength + 1 = maxPatternLength) {
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8103][core] DAGScheduler should not sub...

2015-07-15 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/6750#issuecomment-121805382
  
@kayousterhout I don't think that will work, but maybe I'm not seeing it.  
I think the problem is, you still need some way get a handle on the zombie 
TaskSetManager to be able to call allTasksInTaskSetFinished().  Right now, 
taskSetFinished is ultimately getting a handle on that TaskSetManager by 
looking it up in activeTaskSets [in 
`statusUpdate()`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L328).
  So if those zombie task sets aren't in activeTaskSets, it seems like you'd 
still need to keep track of them *somewhere* in TaskSchedulerImpl.

I feel like part of the problem is that active task sets is somewhat 
vague.  You might not expect it to contain task sets that have already failed 
(from a fetch failed), but still happen to have tasks running.  I guess 
zombie is vague too, but in a way that is better since you aren't tricked 
into thinking you know what it means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6284][MESOS] Add mesos role, principal ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4960#issuecomment-121811703
  
  [Test build #37431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37431/console)
 for   PR 4960 at commit 
[`0f9f03e`](https://github.com/apache/spark/commit/0f9f03e2ccd822aaa8939b8f7d5828e72ba88f11).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121811695
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121811701
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121811827
  
  [Test build #37447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37447/consoleFull)
 for   PR 7413 at commit 
[`dbb33a5`](https://github.com/apache/spark/commit/dbb33a5abd87828573b569e7002c18f4313e4c5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6284][MESOS] Add mesos role, principal ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4960#issuecomment-121811758
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8882][SPARK-5681][Streaming]Add a new R...

2015-07-15 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/7276#discussion_r34752114
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverSchedulingPolicy.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.scheduler
+
+import scala.collection.mutable
+import scala.util.Random
+
+import org.apache.spark.streaming.scheduler.ReceiverState._
+
+private[streaming] case class ReceiverTrackingInfo(
--- End diff --

Please provide scala docs for this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121811707
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121811713
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-15 Thread scwf
Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r34749244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala
 ---
@@ -34,7 +34,15 @@ case class CartesianProduct(left: SparkPlan, right: 
SparkPlan) extends BinaryNod
 val leftResults = left.execute().map(_.copy())
 val rightResults = right.execute().map(_.copy())
 
-leftResults.cartesian(rightResults).mapPartitions { iter =
+val cartesianRdd = if (leftResults.partitions.size  
rightResults.partitions.size) {
+  rightResults.cartesian(leftResults).mapPartitions { iter =
+iter.map(tuple = (tuple._2, tuple._1))
+  }
+} else {
+  leftResults.cartesian(rightResults)
+}
+
+cartesianRdd.mapPartitions { iter =
   val joinedRow = new JoinedRow
--- End diff --

yes, use partition size here is not accurate, see a rdd with 100 
partitions, and each partition has one record and a rdd with 10 partition and 
each partition has 100 million records, use the method above will cause more 
scan from hdfs   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121799959
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Refactor SimpleFutureAction.onCom...

2015-07-15 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/7385#discussion_r34749845
  
--- Diff: core/src/test/scala/org/apache/spark/FutureActionSuite.scala ---
@@ -49,4 +50,20 @@ class FutureActionSuite
 job.jobIds.size should be (2)
   }
 
+  test(simple async action callbacks should not tie up execution context 
threads (SPARK-9026)) {
+val rdd = sc.parallelize(1 to 10, 2).map(_ = Thread.sleep(1000 * 
1000))
+val pool = 
ThreadUtils.newDaemonCachedThreadPool(SimpleFutureActionTest)
+val executionContext = ExecutionContext.fromExecutorService(pool)
+val job = rdd.countAsync()
+try {
+  for (_ - 1 to 10) {
+job.onComplete(_ = ())(executionContext)
+assert(pool.getLargestPoolSize  10)
--- End diff --

This looks flaky. Even they are non blocking, there is NO guarantee that 
one of the 10 scheduled function `_ = ()` will finish by the end of this loop. 
So it may happen that in the 10th iteration, the previous 9 scheduled function 
are still not finished, the 10th on gets scheduled, and therefore the pool size 
= 10. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7034#issuecomment-121809587
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7034#issuecomment-121809618
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8882][SPARK-5681][Streaming]Add a new R...

2015-07-15 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/7276#discussion_r34752381
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverSchedulingPolicy.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.scheduler
+
+import scala.collection.mutable
+import scala.util.Random
+
+import org.apache.spark.streaming.scheduler.ReceiverState._
+
+private[streaming] case class ReceiverTrackingInfo(
+receiverId: Int,
+state: ReceiverState,
+scheduledLocations: Option[Seq[String]],
+runningLocation: Option[String])
+
+private[streaming] trait ReceiverSchedulingPolicy {
+
+  /**
+   * Return a list of candidate executors to run the receiver. If the list 
is empty, the caller can
+   * run this receiver in arbitrary executor.
+   */
+  def scheduleReceiver(
+  receiverId: Int,
+  preferredLocation: Option[String],
+  receiverTrackingInfoMap: Map[Int, ReceiverTrackingInfo],
+  executors: Seq[String]): Seq[String]
+}
+
+/**
+ * A ReceiverScheduler trying to balance executors' load. Here is the 
approach to schedule executors
+ * for a receiver.
+ * ol
+ *   li
+ * If preferredLocation is set, preferredLocation should be one of the 
candidate executors.
+ *   /li
+ *   li
+ * Every executor will be assigned to a weight according to the 
receivers running or scheduling
+ * on it.
+ * ul
+ *   li
+ * If a receiver is running on an executor, it contributes 1.0 to 
the executor's weight.
+ *   /li
+ *   li
+ * If a receiver is scheduled to an executor but has not yet run, 
it contributes
+ * `1.0 / #candidate_executors_of_this_receiver` to the executor's 
weight./li
+ * /ul
+ * At last, if there are more than 3 idle executors (weight = 0), 
returns all idle executors.
+ * Otherwise, we only return 3 best options according to the weights.
+ *   /li
+ * /ol
+ *
+ */
+private[streaming] class LoadBalanceReceiverSchedulingPolicyImpl extends 
ReceiverSchedulingPolicy {
+
+  def scheduleReceiver(
+  receiverId: Int,
+  preferredLocation: Option[String],
+  receiverTrackingInfoMap: Map[Int, ReceiverTrackingInfo],
+  executors: Seq[String]): Seq[String] = {
+if (executors.isEmpty) {
+  return Seq.empty
+}
+
+// Always try to schedule to the preferred locations
+val locations = mutable.Set[String]()
+locations ++= preferredLocation
+
+val executorWeights = receiverTrackingInfoMap.filter { case (id, _) =
+  // Ignore the receiver to be scheduled. It may be still running.
+  id != receiverId
+}.values.flatMap { receiverTrackingInfo =
+  receiverTrackingInfo.state match {
+case ReceiverState.INACTIVE = Nil
+case ReceiverState.SCHEDULED =
+  val scheduledLocations = 
receiverTrackingInfo.scheduledLocations.get
+  // The probability that a scheduled receiver will run in an 
executor is
+  // 1.0 / scheduledLocations.size
+  scheduledLocations.map(location = location - 1.0 / 
scheduledLocations.size)
--- End diff --

put `1.0 / scheduledLocations.size` in parenthesis, becomes easier to read.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8882][SPARK-5681][Streaming]Add a new R...

2015-07-15 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/7276#discussion_r34752341
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverSchedulingPolicy.scala
 ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.scheduler
+
+import scala.collection.mutable
+import scala.util.Random
+
+import org.apache.spark.streaming.scheduler.ReceiverState._
+
+private[streaming] case class ReceiverTrackingInfo(
+receiverId: Int,
+state: ReceiverState,
+scheduledLocations: Option[Seq[String]],
+runningLocation: Option[String])
+
+private[streaming] trait ReceiverSchedulingPolicy {
+
+  /**
+   * Return a list of candidate executors to run the receiver. If the list 
is empty, the caller can
+   * run this receiver in arbitrary executor.
+   */
+  def scheduleReceiver(
+  receiverId: Int,
+  preferredLocation: Option[String],
+  receiverTrackingInfoMap: Map[Int, ReceiverTrackingInfo],
+  executors: Seq[String]): Seq[String]
+}
+
+/**
+ * A ReceiverScheduler trying to balance executors' load. Here is the 
approach to schedule executors
+ * for a receiver.
+ * ol
+ *   li
+ * If preferredLocation is set, preferredLocation should be one of the 
candidate executors.
+ *   /li
+ *   li
+ * Every executor will be assigned to a weight according to the 
receivers running or scheduling
+ * on it.
+ * ul
+ *   li
+ * If a receiver is running on an executor, it contributes 1.0 to 
the executor's weight.
+ *   /li
+ *   li
+ * If a receiver is scheduled to an executor but has not yet run, 
it contributes
+ * `1.0 / #candidate_executors_of_this_receiver` to the executor's 
weight./li
+ * /ul
+ * At last, if there are more than 3 idle executors (weight = 0), 
returns all idle executors.
+ * Otherwise, we only return 3 best options according to the weights.
+ *   /li
+ * /ol
+ *
+ */
+private[streaming] class LoadBalanceReceiverSchedulingPolicyImpl extends 
ReceiverSchedulingPolicy {
+
+  def scheduleReceiver(
+  receiverId: Int,
+  preferredLocation: Option[String],
+  receiverTrackingInfoMap: Map[Int, ReceiverTrackingInfo],
+  executors: Seq[String]): Seq[String] = {
+if (executors.isEmpty) {
+  return Seq.empty
+}
+
+// Always try to schedule to the preferred locations
+val locations = mutable.Set[String]()
+locations ++= preferredLocation
+
+val executorWeights = receiverTrackingInfoMap.filter { case (id, _) =
+  // Ignore the receiver to be scheduled. It may be still running.
--- End diff --

What does this mean It may be still running? Can you elaborate that case, 
when and how that can happen?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9058][SQL] Split projectionCode if it i...

2015-07-15 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7418#discussion_r34752387
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala
 ---
@@ -45,10 +45,32 @@ object GenerateMutableProjection extends 
CodeGenerator[Seq[Expression], () = Mu
   else
 ${ctx.setColumn(mutableRow, e.dataType, i, 
evaluationCode.primitive)};
 
-}.mkString(\n)
+}
+
+val projectionCodeSegments = 
projectionCodes.grouped(50).toSeq.map(_.mkString(\n))
+
+val (projectionCode, projectionFuncs) = if 
(projectionCodeSegments.length == 1) {
+  (projectionCodeSegments(0), )
+} else {
+  val pCode = (0 until projectionCodeSegments.length).map { i =
+sprojectSeg$i(_i);
+  }.mkString(\n)
+
+  val pFuncs = (0 until projectionCodeSegments.length).map { i =
+s
+  public void projectSeg$i(Object _i) {
--- End diff --

Since the codegen aim to inline the execution, probably we'd better not to 
increase the overhead for type casting. 
Even, we'd better to put every 50(says) expressions into a single codegen 
function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121818364
  
  [Test build #37440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37440/console)
 for   PR 7434 at commit 
[`3135a8b`](https://github.com/apache/spark/commit/3135a8b9edaf52aba17c9028f1672334e793456d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedAttribute(nameParts: Seq[String]) extends 
Attribute `
  * `abstract class Star extends LeafExpression with NamedExpression `
  * `case class UnresolvedAlias(child: Expression) extends UnaryExpression 
with NamedExpression `
  * `case class SortOrder(child: Expression, direction: SortDirection) 
extends UnaryExpression `
  * `trait AggregateExpression extends Expression `
  * `trait PartialAggregate extends AggregateExpression `
  * `case class Min(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Max(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Count(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Average(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Sum(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class SumDistinct(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class First(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `case class Last(child: Expression) extends UnaryExpression with 
PartialAggregate `
  * `trait Generator extends Expression `
  * `case class Explode(child: Expression) extends UnaryExpression with 
Generator `
  * `trait NamedExpression extends Expression `
  * `abstract class Attribute extends LeafExpression with NamedExpression `
  * `case class PrettyAttribute(name: String) extends Attribute `
  * `abstract class LeafNode extends LogicalPlan `
  * `abstract class UnaryNode extends LogicalPlan `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8600] [ML] Naive Bayes API for spark.ml...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7284#issuecomment-121822432
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9018][MLLIB] add stopwatches

2015-07-15 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7415#issuecomment-121822827
  
Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8600] [ML] Naive Bayes API for spark.ml...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7284#issuecomment-121822255
  
  [Test build #37450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37450/console)
 for   PR 7284 at commit 
[`c3de687`](https://github.com/apache/spark/commit/c3de6874b6b7a73e652cb129d0bb18327594f32f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NaiveBayes(override val uid: String)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9060] [SQL] Revert SPARK-8359, SPARK-88...

2015-07-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7426


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121823870
  
  [Test build #37458 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37458/consoleFull)
 for   PR 7434 at commit 
[`9e8a4de`](https://github.com/apache/spark/commit/9e8a4def6f02e03899fa2fafdd2841c513d280af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7034#issuecomment-121825914
  
  [Test build #37443 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37443/console)
 for   PR 7034 at commit 
[`e534b87`](https://github.com/apache/spark/commit/e534b87a125d264123216025d16d61da327f837d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Length(child: Expression) extends UnaryExpression with 
ExpectsInputTypes `
  * `case class FormatNumber(x: Expression, d: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7034#issuecomment-121825948
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Refactor SimpleFutureAction.onCom...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7385#issuecomment-121838283
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9026] Refactor SimpleFutureAction.onCom...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7385#issuecomment-121838243
  
**[Test build #37442 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37442/console)**
 for PR 7385 at commit 
[`c6fdc21`](https://github.com/apache/spark/commit/c6fdc2169f5bb8802b7b2d0019433de8bb0cae66)
 after a configured wait of `175m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8964] [SQL] [WIP] Use Exchange to perfo...

2015-07-15 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/7334#issuecomment-121838166
  
@JoshRosen ,seems `execution.CollectLimit` will eventually invoke the code 
like (in SparkPlan.executeTake):
```scala
sc.runJob(childRDD, (it: Iterator[InternalRow]) = it.take(left).toArray, 
  p,  allowLocal = false)
```
I am wondering if 
`execution.CollectLimit(limit, planLater(child))` 
V.S. 
`execution.Limit(global = true, limit, execution.Limit(global=false, limit, 
child))`
are actually equals in data shuffling / copying, if so, probably we can 
simplify the code by removing the `CollectLimit` and `ReturnAnswer`. 

Sorry if I missed something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1855] Local checkpointing

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7279#issuecomment-121840805
  
  [Test build #1081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1081/consoleFull)
 for   PR 7279 at commit 
[`a92657d`](https://github.com/apache/spark/commit/a92657d815e7837a64d69546acc954a792ae1d1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1855] Local checkpointing

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7279#issuecomment-121840800
  
  [Test build #1080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1080/consoleFull)
 for   PR 7279 at commit 
[`a92657d`](https://github.com/apache/spark/commit/a92657d815e7837a64d69546acc954a792ae1d1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1855] Local checkpointing

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7279#issuecomment-121840775
  
  [Test build #1079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1079/consoleFull)
 for   PR 7279 at commit 
[`a92657d`](https://github.com/apache/spark/commit/a92657d815e7837a64d69546acc954a792ae1d1a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-15 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/6394#discussion_r34757572
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala ---
@@ -24,11 +24,15 @@ package org.apache.spark
 private[spark] trait ExecutorAllocationClient {
 
   /**
-   * Express a preference to the cluster manager for a given total number 
of executors.
+   * Express a preference to the cluster manager for a given total number 
of executors,
+   * number of locality aware pending tasks and related locality 
preferences.
* This can result in canceling pending requests or filing additional 
requests.
* @return whether the request is acknowledged by the cluster manager.
*/
-  private[spark] def requestTotalExecutors(numExecutors: Int): Boolean
+  private[spark] def requestTotalExecutors(
+  numExecutors: Int,
+  localityAwarePendingTasks: Int,
+  preferredLocalityToCount: Map[String, Int]): Boolean
--- End diff --

Actually I think the key string is hostname, not executor :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8160][SQL]Support using external sortin...

2015-07-15 Thread lianhuiwang
Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/6875#issuecomment-121798597
  
ok,@JoshRosen, thanks, i close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8280][SPARK-8281][SQL]Handle NaN, null ...

2015-07-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/6835#issuecomment-121798590
  
Not at all. Feel free to close this one and submit a new one.
.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8366] When tasks failed and append new ...

2015-07-15 Thread XuTingjun
Github user XuTingjun commented on the pull request:

https://github.com/apache/spark/pull/6817#issuecomment-121798632
  
@andrewor14 , Sorry to bother you again. I think it's really a bug, wish 
you have a look again, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121810908
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9058][SQL] Split projectionCode if it i...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7418#issuecomment-121810983
  
  [Test build #37444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37444/consoleFull)
 for   PR 7418 at commit 
[`7435454`](https://github.com/apache/spark/commit/7435454ae5aef0819a3c7498e0b4f191a43cb752).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121811072
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9030][STREAMING][WIP] Add Kinesis.creat...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7413#issuecomment-121811087
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8998][MLlib] Collect enough frequent pr...

2015-07-15 Thread zhangjiajin
Github user zhangjiajin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7412#discussion_r34752412
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala 
---
@@ -82,20 +84,70 @@ class PrefixSpan private (
   logWarning(Input data is not cached.)
 }
 val minCount = getMinCount(sequences)
-val lengthOnePatternsAndCounts =
-  getFreqItemAndCounts(minCount, sequences).collect()
-val prefixAndProjectedDatabase = getPrefixAndProjectedDatabase(
-  lengthOnePatternsAndCounts.map(_._1), sequences)
-val groupedProjectedDatabase = prefixAndProjectedDatabase
-  .map(x = (x._1.toSeq, x._2))
-  .groupByKey()
-  .map(x = (x._1.toArray, x._2.toArray))
-val nextPatterns = getPatternsInLocal(minCount, 
groupedProjectedDatabase)
-val lengthOnePatternsAndCountsRdd =
-  sequences.sparkContext.parallelize(
-lengthOnePatternsAndCounts.map(x = (Array(x._1), x._2)))
-val allPatterns = lengthOnePatternsAndCountsRdd ++ nextPatterns
-allPatterns
+val lengthOnePatternsAndCounts = getFreqItemAndCounts(minCount, 
sequences)
+val prefixSuffixPairs = getPrefixSuffixPairs(
+  lengthOnePatternsAndCounts.map(_._1).collect(), sequences)
+var patternsCount: Long = lengthOnePatternsAndCounts.count()
+var allPatternAndCounts = lengthOnePatternsAndCounts.map(x = 
(Array(x._1), x._2))
+var currentPrefixSuffixPairs = prefixSuffixPairs
+while (patternsCount = minPatternsBeforeShuffle  
currentPrefixSuffixPairs.count() != 0) {
+  val (nextPatternAndCounts, nextPrefixSuffixPairs) =
+getPatternCountsAndPrefixSuffixPairs(minCount, 
currentPrefixSuffixPairs)
+  patternsCount = nextPatternAndCounts.count().toInt
+  currentPrefixSuffixPairs = nextPrefixSuffixPairs
+  allPatternAndCounts = allPatternAndCounts ++ nextPatternAndCounts
+}
+if (patternsCount  0) {
+  val projectedDatabase = currentPrefixSuffixPairs
+.map(x = (x._1.toSeq, x._2))
+.groupByKey()
+.map(x = (x._1.toArray, x._2.toArray))
+  val nextPatternAndCounts = getPatternsInLocal(minCount, 
projectedDatabase)
+  allPatternAndCounts = allPatternAndCounts ++ nextPatternAndCounts
+}
+allPatternAndCounts
+  }
+
+  /**
+   * Get the pattern and counts, and prefix suffix pairs
+   * @param minCount minimum count
+   * @param prefixSuffixPairs prefix and suffix pairs,
+   * @return pattern and counts, and prefix suffix pairs
+   * (Array[pattern, count], RDD[prefix, suffix ])
+   */
+  private def getPatternCountsAndPrefixSuffixPairs(
+  minCount: Long,
+  prefixSuffixPairs: RDD[(Array[Int], Array[Int])]):
+  (RDD[(Array[Int], Long)], RDD[(Array[Int], Array[Int])]) = {
+val prefixAndFreqentItemAndCounts = prefixSuffixPairs
+  .flatMap { case (prefix, suffix) =
+  suffix.distinct.map(y = ((prefix.toSeq, y), 1L))
+}.reduceByKey(_ + _)
+  .filter(_._2 = minCount)
+val patternAndCounts = prefixAndFreqentItemAndCounts
+  .map{ case ((prefix, item), count) = (prefix.toArray :+ item, 
count) }
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL]Give script a default serde w...

2015-07-15 Thread zhichao-li
Github user zhichao-li commented on the pull request:

https://github.com/apache/spark/pull/6638#issuecomment-121814588
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121817147
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8995][SQL] cast date strings like '2015...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7353#issuecomment-121817328
  
  [Test build #37454 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37454/consoleFull)
 for   PR 7353 at commit 
[`ca1ae69`](https://github.com/apache/spark/commit/ca1ae69c1baa7d4d14946bdd2638aec47e05be86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8998][MLlib] Collect enough frequent pr...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7412#issuecomment-121816926
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8995][SQL] cast date strings like '2015...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7353#issuecomment-121816935
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9022] [SQL] Generated projections for U...

2015-07-15 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/7437

[SPARK-9022] [SQL] Generated projections for UnsafeRow

Added two projections: GenerateUnsafeProjection and FromUnsafeProjection, 
which could be used to convert UnsafeRow from/to GenericInternalRow.

They will re-use the buffer during projection, similar to MutableProjection 
(without all the interface MutableProjection has).

cc @rxin @JoshRosen

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark unsafe_proj2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7437.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7437


commit 5a2637347a2f96d55a17b4c866bccfc40b654ffc
Author: Davies Liu dav...@databricks.com
Date:   2015-07-16T03:30:19Z

unsafe projections




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8774] [ML] Add R model formula with bas...

2015-07-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7381


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121819633
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8600] [ML] Naive Bayes API for spark.ml...

2015-07-15 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/7284#discussion_r34753646
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala 
---
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.mllib.linalg._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
+import org.apache.spark.mllib.classification.NaiveBayesSuite._
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.Row
+
+class NaiveBayesSuite extends SparkFunSuite with MLlibTestSparkContext {
+
+  def validatePrediction(predictionAndLabels: DataFrame): Unit = {
+val numOfErrorPredictions = predictionAndLabels.collect().count {
+  case Row(prediction: Double, label: Double) =
+prediction != label
+}
+// At least 80% of the predictions should be on.
+assert(numOfErrorPredictions  predictionAndLabels.count() / 5)
+  }
+
+  def validateModelFit(
+  piData: Vector,
+  thetaData: Matrix,
+  model: NaiveBayesModel): Unit = {
+assert(Vectors.dense(model.pi.toArray.map(math.exp)) ~==
+  Vectors.dense(piData.toArray.map(math.exp)) absTol 0.05, pi 
mismatch)
--- End diff --

waiting for #7357 to be merged and we can directly compare two mapped 
vector like this
```
assert(model.pi.map(math.exp) ~== piData.map(math.exp) absTol 0.05, pi 
mismatch) 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121819664
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121823525
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8968] [SQL] shuffled by the partition c...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7336#issuecomment-121823524
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8968] [SQL] shuffled by the partition c...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7336#issuecomment-121823620
  
  [Test build #37459 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37459/consoleFull)
 for   PR 7336 at commit 
[`b5ada0a`](https://github.com/apache/spark/commit/b5ada0ab4944661c8ab6bf030006d111657d13e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9022] [SQL] Generated projections for U...

2015-07-15 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7437#discussion_r34754330
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java
 ---
@@ -19,11 +19,11 @@
 
 import java.io.IOException;
 
+import com.google.common.annotations.VisibleForTesting;
--- End diff --

import order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121823508
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8968] [SQL] shuffled by the partition c...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7336#issuecomment-121823512
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8792] [ML] Add Python API for PCA trans...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7190#issuecomment-121825596
  
  [Test build #37460 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37460/console)
 for   PR 7190 at commit 
[`8f4ac31`](https://github.com/apache/spark/commit/8f4ac31f8a772ea3016a9614e986cbc3c0bb4468).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class PCA(JavaEstimator, HasInputCol, HasOutputCol):`
  * `class PCAModel(JavaModel):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9018][MLLIB] add stopwatches

2015-07-15 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/7415#discussion_r34755425
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/StopwatchSuite.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+
+class StopwatchSuite extends SparkFunSuite with MLlibTestSparkContext {
+
+  private def testStopwatchOnDriver(sw: Stopwatch): Unit = {
+assert(sw.name === sw)
+assert(sw.elapsed() === 0L)
+assert(!sw.isRunning)
+intercept[AssertionError] {
+  sw.stop()
+}
+sw.start()
+Thread.sleep(50)
+val duration = sw.stop()
+assert(duration = 50  duration  100) // using a loose upper bound
+val elapsed = sw.elapsed()
+assert(elapsed === duration)
+sw.start()
+Thread.sleep(50)
+val duration2 = sw.stop()
+assert(duration2 = 50  duration2  100)
+val elapsed2 = sw.elapsed()
+assert(elapsed2 == duration + duration2)
--- End diff --

Should we no longer bother with this?  Or is it just for Longs (in which 
case enforcing consistency may be easiest)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8125] [SQL] Accelerates Parquet schema ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7396#issuecomment-121827555
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8245][SQL] FormatNumber/Length Support ...

2015-07-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/7034#issuecomment-121827562
  
Thanks - merging this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8125] [SQL] Accelerates Parquet schema ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7396#issuecomment-121827505
  
  [Test build #37441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37441/console)
 for   PR 7396 at commit 
[`f122f10`](https://github.com/apache/spark/commit/f122f1070fb08cd737a42f683f7f8d1bb7f4a4ad).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class FakeFileStatus(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7119][SQL]Give script a default serde w...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6638#issuecomment-121839582
  
  [Test build #37462 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37462/consoleFull)
 for   PR 6638 at commit 
[`4ab11b7`](https://github.com/apache/spark/commit/4ab11b7e5df106993682aef7d4bc7759827734b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8791][SQL] Improve the InternalRow.hash...

2015-07-15 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/7189#issuecomment-121839736
  
Thank you all for reviewing the code for me, but I think there would be 
more general way to solve this, closing it for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8791][SQL] Improve the InternalRow.hash...

2015-07-15 Thread chenghao-intel
Github user chenghao-intel closed the pull request at:

https://github.com/apache/spark/pull/7189


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/7434#discussion_r34757450
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -277,15 +276,21 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
 /**
  * A logical plan node with no children.
  */
-abstract class LeafNode extends LogicalPlan with 
trees.LeafNode[LogicalPlan] {
+abstract class LeafNode extends LogicalPlan {
   self: Product =
+
+  override def children: Seq[LogicalPlan] = Nil
 }
 
 /**
  * A logical plan node with single child.
  */
-abstract class UnaryNode extends LogicalPlan with 
trees.UnaryNode[LogicalPlan] {
+abstract class UnaryNode extends LogicalPlan {
   self: Product =
--- End diff --

remove `self: Product =`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/7434#discussion_r34757444
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -277,15 +276,21 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
 /**
  * A logical plan node with no children.
  */
-abstract class LeafNode extends LogicalPlan with 
trees.LeafNode[LogicalPlan] {
+abstract class LeafNode extends LogicalPlan {
   self: Product =
--- End diff --

remove `self: Product =`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121800272
  
  [Test build #37439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37439/consoleFull)
 for   PR 7434 at commit 
[`9c589cf`](https://github.com/apache/spark/commit/9c589cf216ff5eb46031ed332a35e6e23d91d2fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121802635
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9085][SQL] Remove LeafNode, UnaryNode, ...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7434#issuecomment-121802697
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9018][MLLIB] add stopwatches

2015-07-15 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7415#discussion_r34752606
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/StopwatchSuite.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+
+class StopwatchSuite extends SparkFunSuite with MLlibTestSparkContext {
+
+  private def testStopwatchOnDriver(sw: Stopwatch): Unit = {
+assert(sw.name === sw)
+assert(sw.elapsed() === 0L)
+assert(!sw.isRunning)
+intercept[AssertionError] {
+  sw.stop()
+}
+sw.start()
+Thread.sleep(50)
+val duration = sw.stop()
+assert(duration = 50  duration  100) // using a loose upper bound
+val elapsed = sw.elapsed()
+assert(elapsed === duration)
+sw.start()
+Thread.sleep(50)
+val duration2 = sw.stop()
+assert(duration2 = 50  duration2  100)
+val elapsed2 = sw.elapsed()
+assert(elapsed2 == duration + duration2)
--- End diff --

Actually @ericl pointed out `==` and `===` are equal in this case (Long). 
Both provide the same error message. I will update it to be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121815160
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121815152
  
  [Test build #37448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37448/console)
 for   PR 7379 at commit 
[`b405e45`](https://github.com/apache/spark/commit/b405e45d931fb04b914858e75e3fa3cb07bc0394).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class BroadcastRangeJoin(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8682][SQL][WIP] Range Join

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7379#issuecomment-121820543
  
  [Test build #37456 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/37456/consoleFull)
 for   PR 7379 at commit 
[`8204eae`](https://github.com/apache/spark/commit/8204eaed1b9399f17415afc6ce178c845f29746f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r34754893
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala
 ---
@@ -34,7 +34,15 @@ case class CartesianProduct(left: SparkPlan, right: 
SparkPlan) extends BinaryNod
 val leftResults = left.execute().map(_.copy())
 val rightResults = right.execute().map(_.copy())
 
-leftResults.cartesian(rightResults).mapPartitions { iter =
+val cartesianRdd = if (leftResults.partitions.size  
rightResults.partitions.size) {
+  rightResults.cartesian(leftResults).mapPartitions { iter =
+iter.map(tuple = (tuple._2, tuple._1))
+  }
+} else {
+  leftResults.cartesian(rightResults)
+}
+
+cartesianRdd.mapPartitions { iter =
   val joinedRow = new JoinedRow
--- End diff --

@hvanhovell Yes, use sizeInBytes is better, but also have a problem, if 
leftResults only have 1 record and this record size are big, and rightResults 
have many records and these records total size are small, then at this scenario 
will cause worse performance. The best way is we check the total records for 
the partition, but now we can not get it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8792] [ML] Add Python API for PCA trans...

2015-07-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7190#issuecomment-121825685
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >