[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-18 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7853#discussion_r39876553
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -96,7 +96,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 
   val startTime = System.currentTimeMillis()
 
-  private val stopped: AtomicBoolean = new AtomicBoolean(false)
+  private[spark] val stopped: AtomicBoolean = new AtomicBoolean(false)
--- End diff --

See this thread: http://search-hadoop.com/m/q3RTtqvncy17sSTx1

Can this be marked with @DeveloperApi ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10458] [Spark Core] Added isStopped() m...

2015-09-18 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/8749#discussion_r39883770
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -264,6 +264,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   val tachyonFolderName = externalBlockStoreFolderName
 
   def isLocal: Boolean = (master == "local" || master.startsWith("local["))
+  def isStopped: Boolean = stopped.get()
--- End diff --

Should this be annotated @DeveloperApi ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-18 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7853#discussion_r39880833
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -96,7 +96,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 
   val startTime = System.currentTimeMillis()
 
-  private val stopped: AtomicBoolean = new AtomicBoolean(false)
+  private[spark] val stopped: AtomicBoolean = new AtomicBoolean(false)
--- End diff --

That's right.

I think there would be more user requests coming in for exposing this flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Expose SparkContext#stopped flag with @Develop...

2015-09-18 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/8822

Expose SparkContext#stopped flag with @DeveloperApi

See this thread: http://search-hadoop.com/m/q3RTtqvncy17sSTx1

We should expose this flag to developers

@andrewor14 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8822.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8822


commit 007c6d60c2e05fa96b3d0e7dc48c2bd1303022de
Author: tedyu <yuzhih...@gmail.com>
Date:   2015-09-18T17:44:00Z

Expose stopped flag with @DeveloperApi




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Check partitionId's range in ExternalSorter#sp...

2015-09-10 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/8703

Check partitionId's range in ExternalSorter#spill()

See this thread for background:
http://search-hadoop.com/m/q3RTt0rWvIkHAE81

We should check the range of partition Id and provide meaningful message 
through exception.

Alternatively, we can use abs() and modulo to force the partition Id into 
legitimate range. However, expectation is that user should correct the logic 
error in his / her code.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8703


commit 70eb93eb6fa269c26f82e1125aeea69a589d0428
Author: tedyu <yuzhih...@gmail.com>
Date:   2015-09-10T18:27:01Z

Check partitionId's range in ExternalSorter#spill()

commit 63dfe1191ed9f04560cfd39fe01108fcef462673
Author: tedyu <yuzhih...@gmail.com>
Date:   2015-09-10T18:29:11Z

Correct indentation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10074 Include Float in @specialized anno...

2015-09-02 Thread tedyu
Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/8259


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10004] [shuffle] Perform auth checks wh...

2015-09-02 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/8218#discussion_r38595554
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java
 ---
@@ -109,15 +111,34 @@ public void connectionTerminated(Channel channel) {
 }
   }
 
+  @Override
+  public void checkAuthorization(TransportClient client, long streamId) {
+if (client.getClientId() != null) {
+  StreamState state = streams.get(streamId);
+  Preconditions.checkArgument(state != null, "Unknown stream ID.");
+  if (!client.getClientId().equals(state.appId)) {
+throw new SecurityException(String.format(
+  "Client %s not authorized to read stream %d (app %s).",
--- End diff --

Should we not disclose the actual appId in the exception message ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Include Float in @specialized annotation

2015-08-17 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/8259#issuecomment-132006028
  
@rxin 
I don't use GraphX.
But I think including Float in these classes would boost performance where 
Float is used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Include Float in @specialized annotation

2015-08-17 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/8259

Include Float in @specialized annotation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8259


commit 1bd166fd8c1f943077278fa8b26c547df5f10b89
Author: tedyu yuzhih...@gmail.com
Date:   2015-08-17T23:46:43Z

Include Float in @specialized annotation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-10074 Include Float in @specialized anno...

2015-08-17 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/8259#issuecomment-132063810
  
Test failures don't seem to be related to the PR.

Not sure whether the following is a test or not:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41080/testReport/junit/org.apache.spark.deploy.yarn/YarnClusterSuite/_It_is_not_a_test_/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9486][SQL] Add data source aliasing for...

2015-08-09 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7802#discussion_r36590599
  
--- Diff: 
sql/core/src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
 ---
@@ -0,0 +1,3 @@
+org.apache.spark.sql.jdbc.DefaultSource
+org.apache.spark.sql.json.DefaultSource
+org.apache.spark.sql.parquet.DefaultSource
--- End diff --

Should orc be added as well ?
I see change to OrcRelation.scala below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [BUILD] Remove dependency reduced POM hack

2015-08-03 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/7919

[BUILD] Remove dependency reduced POM hack



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7919.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7919


commit 1bfbd7b9c66d1bde1606fec995886e42af627ffc
Author: tedyu yuzhih...@gmail.com
Date:   2015-08-04T01:53:23Z

[BUILD] Remove dependency reduced POM hack




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [BUILD] Remove dependency reduced POM hack

2015-08-03 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7919#issuecomment-127478076
  
The PySpark test failure should not be related to the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9549][SQL] fix bugs in expressions

2015-08-03 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7882#discussion_r36114103
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -614,8 +614,9 @@ object DateTimeUtils {
*/
   def dateAddMonths(days: Int, months: Int): Int = {
 val absoluteMonth = (getYear(days) - YearZero) * 12 + getMonth(days) - 
1 + months
-val currentMonthInYear = absoluteMonth % 12
-val currentYear = absoluteMonth / 12
+val nonNegativeMonth = if (absoluteMonth = 0) absoluteMonth else 0
+val currentMonthInYear = nonNegativeMonth % 12
+val currentYear = nonNegativeMonth / 12
--- End diff --

/% is not defined for Int.

I read the notion in a Scala book which I have returned. I will read more 
once I have that book back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9549][SQL] fix bugs in expressions

2015-08-03 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7882#discussion_r36103906
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -614,8 +614,9 @@ object DateTimeUtils {
*/
   def dateAddMonths(days: Int, months: Int): Int = {
 val absoluteMonth = (getYear(days) - YearZero) * 12 + getMonth(days) - 
1 + months
-val currentMonthInYear = absoluteMonth % 12
-val currentYear = absoluteMonth / 12
+val nonNegativeMonth = if (absoluteMonth = 0) absoluteMonth else 0
+val currentMonthInYear = nonNegativeMonth % 12
+val currentYear = nonNegativeMonth / 12
--- End diff --

The above two statements can be replaced with:
val (currentYear, currentMonthInYear) = nonNegativeMonth /% 12


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4751] Dynamic allocation in standalone ...

2015-08-01 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7532#discussion_r36038428
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
 ---
@@ -152,6 +152,34 @@ private[spark] class SparkDeploySchedulerBackend(
   super.applicationId
 }
 
+  /**
+   * Request executors from the Master by specifying the total number 
desired,
+   * including existing pending and running executors.
+   *
+   * @return whether the request is acknowledged.
+   */
+  protected override def doRequestTotalExecutors(requestedTotal: Int): 
Boolean = {
+Option(client) match {
+  case Some(c) = c.requestTotalExecutors(requestedTotal)
+  case None =
+logWarning(Attempted to request executors before driver fully 
initialized.)
+false
+}
+  }
+
+  /**
+   * Kill the given list of executors through the Master.
+   * @return whether the kill request is acknowledged.
+   */
+  protected override def doKillExecutors(executorIds: Seq[String]): 
Boolean = {
+Option(client) match {
+  case Some(c) = c.killExecutors(executorIds)
+  case None =
+logWarning(Attempted to kill executors before driver fully 
initialized.)
--- End diff --

Maybe include executorIds in the log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4751] Dynamic allocation in standalone ...

2015-08-01 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7532#discussion_r36038415
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -804,6 +827,87 @@ private[master] class Master(
   }
 
   /**
+   * Handle a request to set the target number of executors for this 
application.
+   *
+   * If the executor limit is adjusted upwards, new executors will be 
launched provided
+   * that there are workers with sufficient resources. If it is adjusted 
downwards, however,
+   * we do not kill existing executors until we explicitly receive a kill 
request.
+   *
+   * @return whether the application has previously registered with this 
Master.
+   */
+  private def handleRequestExecutors(appId: String, requestedTotal: Int): 
Boolean = {
+idToApp.get(appId) match {
+  case Some(appInfo) =
+logInfo(sApplication $appId requested to set total executors to 
$requestedTotal.)
+appInfo.executorLimit = requestedTotal
+schedule()
+true
+  case None =
+logWarning(sUnknown application $appId requested $requestedTotal 
total executors.)
+false
+}
+  }
+
+  /**
+   * Handle a kill request from the given application.
+   *
+   * This method assumes the executor limit has already been adjusted 
downwards through
+   * a separate [[RequestExecutors]] message, such that we do not launch 
new executors
+   * immediately after the old ones are removed.
+   *
+   * @return whether the application has previously registered with this 
Master.
+   */
+  private def handleKillExecutors(appId: String, executorIds: Seq[Int]): 
Boolean = {
+idToApp.get(appId) match {
+  case Some(appInfo) =
+logInfo(sApplication $appId requests to kill executors:  + 
executorIds.mkString(, ))
+val (known, unknown) = 
executorIds.partition(appInfo.executors.contains)
+known.foreach { executorId =
+  val desc = appInfo.executors(executorId)
+  appInfo.removeExecutor(desc)
+  killExecutor(desc)
+}
+if (unknown.nonEmpty) {
+  logWarning(sApplication $appId attempted to kill non-existent 
executors: 
--- End diff --

Should this be logged at DEBUG level ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...

2015-07-30 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7756#issuecomment-126369191
  
Anything I can do to move this forward ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...

2015-07-30 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7756#issuecomment-126534789
  
Anything I need to do for this issue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...

2015-07-29 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7756#issuecomment-126117214
  
In CoarseGrainedSchedulerBackend.scala, around line 281:
```
  override def stop() {
stopExecutors()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Clear Active SparkContext in stop() method usi...

2015-07-29 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7756#issuecomment-126087585
  
Thanks for the quick reviews.

See if rev2 is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Clear Active SparkContext in stop() method usi...

2015-07-29 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/7756

Clear Active SparkContext in stop() method using finally

See 'stopped SparkContext remaining active' thread on mailing list for 
relevant details

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7756


commit f5fb519e6f20eed448806a4794277503c341e1a8
Author: tedyu yuzhih...@gmail.com
Date:   2015-07-29T20:03:17Z

Clear Active SparkContext in stop() method using finally




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-9446 Clear Active SparkContext in stop()...

2015-07-29 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7756#issuecomment-126122944
  
Doesn't seem to matter, considering we have the following at the beginning 
of stop():
```
if (!stopped.compareAndSet(false, true)) {
  logInfo(SparkContext already stopped.)
  return
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Make MetastoreRelation#hiveQlPartitions lazy v...

2015-07-17 Thread tedyu
Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/7466


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Make MetastoreRelation#hiveQlPartitions lazy v...

2015-07-17 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/7466#issuecomment-122286140
  
Covered by #7421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Make MetastoreRelation#hiveQlPartitions lazy v...

2015-07-17 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/7466

Make MetastoreRelation#hiveQlPartitions lazy val



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7466


commit d0bb9d64233e77d7930495058d41c26a290eed49
Author: tedyu yuzhih...@gmail.com
Date:   2015-07-17T10:58:41Z

Make MetastoreRelation#hiveQlPartitions lazy val




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...

2015-07-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6444#discussion_r34847593
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
 ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection.unsafe.sort;
+
+import java.io.File;
+import java.io.IOException;
+
+import scala.Tuple2;
+
+import org.apache.spark.executor.ShuffleWriteMetrics;
+import org.apache.spark.serializer.DummySerializerInstance;
+import org.apache.spark.storage.BlockId;
+import org.apache.spark.storage.BlockManager;
+import org.apache.spark.storage.BlockObjectWriter;
+import org.apache.spark.storage.TempLocalBlockId;
+import org.apache.spark.unsafe.PlatformDependent;
+
+/**
+ * Spills a list of sorted records to disk. Spill files have the following 
format:
+ *
+ *   [# of records (int)] [[len (int)][prefix (long)][data (bytes)]...]
+ */
+final class UnsafeSorterSpillWriter {
--- End diff --

Should this class declare 'implement Closeable' ?
This way, its caller would be able to use try-with-resources construct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...

2015-07-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6444#discussion_r34849488
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillMerger.java
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection.unsafe.sort;
+
+import java.io.IOException;
+import java.util.Comparator;
+import java.util.PriorityQueue;
+
+final class UnsafeSorterSpillMerger {
+
+  private final PriorityQueueUnsafeSorterIterator priorityQueue;
+
+  public UnsafeSorterSpillMerger(
+  final RecordComparator recordComparator,
+  final PrefixComparator prefixComparator,
+  final int numSpills) {
+final ComparatorUnsafeSorterIterator comparator = new 
ComparatorUnsafeSorterIterator() {
+
+  @Override
+  public int compare(UnsafeSorterIterator left, UnsafeSorterIterator 
right) {
+final int prefixComparisonResult =
+  prefixComparator.compare(left.getKeyPrefix(), 
right.getKeyPrefix());
+if (prefixComparisonResult == 0) {
+  return recordComparator.compare(
+left.getBaseObject(), left.getBaseOffset(),
+right.getBaseObject(), right.getBaseOffset());
+} else {
+  return prefixComparisonResult;
+}
+  }
+};
+priorityQueue = new PriorityQueueUnsafeSorterIterator(numSpills, 
comparator);
+  }
+
+  public void addSpill(UnsafeSorterIterator spillReader) throws 
IOException {
+if (spillReader.hasNext()) {
+  spillReader.loadNext();
+}
+priorityQueue.add(spillReader);
--- End diff --

If !spillReader.hasNext(), can the addition be skipped ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...

2015-07-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6444#discussion_r34850864
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillWriter.java
 ---
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection.unsafe.sort;
+
+import java.io.File;
+import java.io.IOException;
+
+import scala.Tuple2;
+
+import org.apache.spark.executor.ShuffleWriteMetrics;
+import org.apache.spark.serializer.DummySerializerInstance;
+import org.apache.spark.storage.BlockId;
+import org.apache.spark.storage.BlockManager;
+import org.apache.spark.storage.BlockObjectWriter;
+import org.apache.spark.storage.TempLocalBlockId;
+import org.apache.spark.unsafe.PlatformDependent;
+
+/**
+ * Spills a list of sorted records to disk. Spill files have the following 
format:
+ *
+ *   [# of records (int)] [[len (int)][prefix (long)][data (bytes)]...]
+ */
+final class UnsafeSorterSpillWriter {
+
+  static final int DISK_WRITE_BUFFER_SIZE = 1024 * 1024;
+
+  // Small writes to DiskBlockObjectWriter will be fairly inefficient. 
Since there doesn't seem to
+  // be an API to directly transfer bytes from managed memory to the disk 
writer, we buffer
+  // data through a byte array.
+  private byte[] writeBuffer = new byte[DISK_WRITE_BUFFER_SIZE];
+
+  private final File file;
+  private final BlockId blockId;
+  private final int numRecordsToWrite;
+  private BlockObjectWriter writer;
+  private int numRecordsSpilled = 0;
+
+  public UnsafeSorterSpillWriter(
+  BlockManager blockManager,
+  int fileBufferSize,
+  ShuffleWriteMetrics writeMetrics,
+  int numRecordsToWrite) throws IOException {
+final Tuple2TempLocalBlockId, File spilledFileInfo =
+  blockManager.diskBlockManager().createTempLocalBlock();
+this.file = spilledFileInfo._2();
+this.blockId = spilledFileInfo._1();
+this.numRecordsToWrite = numRecordsToWrite;
+// Unfortunately, we need a serializer instance in order to construct 
a DiskBlockObjectWriter.
+// Our write path doesn't actually use this serializer (since we end 
up calling the `write()`
+// OutputStream methods), but DiskBlockObjectWriter still calls some 
methods on it. To work
+// around this, we pass a dummy no-op serializer.
+writer = blockManager.getDiskWriter(
+  blockId, file, DummySerializerInstance.INSTANCE, fileBufferSize, 
writeMetrics);
+// Write the number of records
+writeIntToBuffer(numRecordsToWrite, 0);
+writer.write(writeBuffer, 0, 4);
+  }
+
+  // Based on DataOutputStream.writeLong.
+  private void writeLongToBuffer(long v, int offset) throws IOException {
+writeBuffer[offset + 0] = (byte)(v  56);
+writeBuffer[offset + 1] = (byte)(v  48);
+writeBuffer[offset + 2] = (byte)(v  40);
+writeBuffer[offset + 3] = (byte)(v  32);
+writeBuffer[offset + 4] = (byte)(v  24);
+writeBuffer[offset + 5] = (byte)(v  16);
+writeBuffer[offset + 6] = (byte)(v   8);
+writeBuffer[offset + 7] = (byte)(v   0);
+  }
+
+  // Based on DataOutputStream.writeInt.
+  private void writeIntToBuffer(int v, int offset) throws IOException {
+writeBuffer[offset + 0] = (byte)(v  24);
+writeBuffer[offset + 1] = (byte)(v  16);
+writeBuffer[offset + 2] = (byte)(v   8);
+writeBuffer[offset + 3] = (byte)(v   0);
+  }
+
+  /**
+   * Write a record to a spill file.
+   *
+   * @param baseObject the base object / memory page containing the record
+   * @param baseOffset the base offset which points directly to the record 
data.
+   * @param recordLength the length of the record.
+   * @param keyPrefix a sort key prefix
+   */
+  public void write(
+  Object baseObject,
+  long baseOffset,
+  int recordLength

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...

2015-07-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6444#discussion_r34847659
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection.unsafe.sort;
+
+import java.io.IOException;
+import java.util.LinkedList;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.executor.ShuffleWriteMetrics;
+import org.apache.spark.shuffle.ShuffleMemoryManager;
+import org.apache.spark.storage.BlockManager;
+import org.apache.spark.unsafe.PlatformDependent;
+import org.apache.spark.unsafe.memory.MemoryBlock;
+import org.apache.spark.unsafe.memory.TaskMemoryManager;
+import org.apache.spark.util.Utils;
+
+/**
+ * External sorter based on {@link UnsafeInMemorySorter}.
+ */
+public final class UnsafeExternalSorter {
+
+  private final Logger logger = 
LoggerFactory.getLogger(UnsafeExternalSorter.class);
+
+  private static final int PAGE_SIZE = 1  27;  // 128 megabytes
+  @VisibleForTesting
+  static final int MAX_RECORD_SIZE = PAGE_SIZE - 4;
+
+  private final PrefixComparator prefixComparator;
+  private final RecordComparator recordComparator;
+  private final int initialSize;
+  private final TaskMemoryManager memoryManager;
+  private final ShuffleMemoryManager shuffleMemoryManager;
+  private final BlockManager blockManager;
+  private final TaskContext taskContext;
+  private ShuffleWriteMetrics writeMetrics;
+
+  /** The buffer size to use when writing spills using 
DiskBlockObjectWriter */
+  private final int fileBufferSizeBytes;
+
+  /**
+   * Memory pages that hold the records being sorted. The pages in this 
list are freed when
+   * spilling, although in principle we could recycle these pages across 
spills (on the other hand,
+   * this might not be necessary if we maintained a pool of re-usable 
pages in the TaskMemoryManager
+   * itself).
+   */
+  private final LinkedListMemoryBlock allocatedPages = new 
LinkedListMemoryBlock();
+
+  // These variables are reset after spilling:
+  private UnsafeInMemorySorter sorter;
+  private MemoryBlock currentPage = null;
+  private long currentPagePosition = -1;
+  private long freeSpaceInCurrentPage = 0;
+
+  private final LinkedListUnsafeSorterSpillWriter spillWriters = new 
LinkedList();
+
+  public UnsafeExternalSorter(
+  TaskMemoryManager memoryManager,
+  ShuffleMemoryManager shuffleMemoryManager,
+  BlockManager blockManager,
+  TaskContext taskContext,
+  RecordComparator recordComparator,
+  PrefixComparator prefixComparator,
+  int initialSize,
+  SparkConf conf) throws IOException {
+this.memoryManager = memoryManager;
+this.shuffleMemoryManager = shuffleMemoryManager;
+this.blockManager = blockManager;
+this.taskContext = taskContext;
+this.recordComparator = recordComparator;
+this.prefixComparator = prefixComparator;
+this.initialSize = initialSize;
+// Use getSizeAsKb (not bytes) to maintain backwards compatibility for 
units
+this.fileBufferSizeBytes = (int) 
conf.getSizeAsKb(spark.shuffle.file.buffer, 32k) * 1024;
+initializeForWriting();
+  }
+
+  // TODO: metrics tracking + integration with shuffle write metrics
+  // need to connect the write metrics to task metrics so we count the 
spill IO somewhere.
+
+  /**
+   * Allocates new sort data structures. Called when creating the sorter 
and after each spill.
+   */
+  private void initializeForWriting() throws IOException {
+this.writeMetrics = new ShuffleWriteMetrics();
+// TODO

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...

2015-07-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6444#discussion_r34849233
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSortDataFormat.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection.unsafe.sort;
+
+import org.apache.spark.util.collection.SortDataFormat;
+
+/**
+ * Supports sorting an array of (record pointer, key prefix) pairs.
+ * Used in {@link UnsafeInMemorySorter}.
+ * p
+ * Within each long[] buffer, position {@code 2 * i} holds a pointer 
pointer to the record at
+ * index {@code i}, while position {@code 2 * i + 1} in the array holds an 
8-byte key prefix.
+ */
+final class UnsafeSortDataFormat extends 
SortDataFormatRecordPointerAndKeyPrefix, long[] {
+
+  public static final UnsafeSortDataFormat INSTANCE = new 
UnsafeSortDataFormat();
+
+  private UnsafeSortDataFormat() { }
+
+  @Override
+  public RecordPointerAndKeyPrefix getKey(long[] data, int pos) {
+// Since we re-use keys, this method shouldn't be called.
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public RecordPointerAndKeyPrefix newKey() {
+return new RecordPointerAndKeyPrefix();
+  }
+
+  @Override
+  public RecordPointerAndKeyPrefix getKey(long[] data, int pos, 
RecordPointerAndKeyPrefix reuse) {
+reuse.recordPointer = data[pos * 2];
+reuse.keyPrefix = data[pos * 2 + 1];
+return reuse;
+  }
+
+  @Override
+  public void swap(long[] data, int pos0, int pos1) {
+long tempPointer = data[pos0 * 2];
+long tempKeyPrefix = data[pos0 * 2 + 1];
+data[pos0 * 2] = data[pos1 * 2];
+data[pos0 * 2 + 1] = data[pos1 * 2 + 1];
+data[pos1 * 2] = tempPointer;
+data[pos1 * 2 + 1] = tempKeyPrefix;
+  }
+
+  @Override
+  public void copyElement(long[] src, int srcPos, long[] dst, int dstPos) {
+dst[dstPos * 2] = src[srcPos * 2];
+dst[dstPos * 2 + 1] = src[srcPos * 2 + 1];
+  }
+
+  @Override
+  public void copyRange(long[] src, int srcPos, long[] dst, int dstPos, 
int length) {
+System.arraycopy(src, srcPos * 2, dst, dstPos * 2, length * 2);
+  }
+
+  @Override
+  public long[] allocate(int length) {
+assert (length  Integer.MAX_VALUE / 2) : Length  + length +  is 
too large;
--- End diff --

Should length of (Integer.MAX_VALUE / 2) be allowed ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] Binary processing so...

2015-07-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6444#discussion_r34848070
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection.unsafe.sort;
+
+import java.io.IOException;
+import java.util.LinkedList;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.TaskContext;
+import org.apache.spark.executor.ShuffleWriteMetrics;
+import org.apache.spark.shuffle.ShuffleMemoryManager;
+import org.apache.spark.storage.BlockManager;
+import org.apache.spark.unsafe.PlatformDependent;
+import org.apache.spark.unsafe.memory.MemoryBlock;
+import org.apache.spark.unsafe.memory.TaskMemoryManager;
+import org.apache.spark.util.Utils;
+
+/**
+ * External sorter based on {@link UnsafeInMemorySorter}.
+ */
+public final class UnsafeExternalSorter {
+
+  private final Logger logger = 
LoggerFactory.getLogger(UnsafeExternalSorter.class);
+
+  private static final int PAGE_SIZE = 1  27;  // 128 megabytes
+  @VisibleForTesting
+  static final int MAX_RECORD_SIZE = PAGE_SIZE - 4;
+
+  private final PrefixComparator prefixComparator;
+  private final RecordComparator recordComparator;
+  private final int initialSize;
+  private final TaskMemoryManager memoryManager;
+  private final ShuffleMemoryManager shuffleMemoryManager;
+  private final BlockManager blockManager;
+  private final TaskContext taskContext;
+  private ShuffleWriteMetrics writeMetrics;
+
+  /** The buffer size to use when writing spills using 
DiskBlockObjectWriter */
+  private final int fileBufferSizeBytes;
+
+  /**
+   * Memory pages that hold the records being sorted. The pages in this 
list are freed when
+   * spilling, although in principle we could recycle these pages across 
spills (on the other hand,
+   * this might not be necessary if we maintained a pool of re-usable 
pages in the TaskMemoryManager
+   * itself).
+   */
+  private final LinkedListMemoryBlock allocatedPages = new 
LinkedListMemoryBlock();
+
+  // These variables are reset after spilling:
+  private UnsafeInMemorySorter sorter;
+  private MemoryBlock currentPage = null;
+  private long currentPagePosition = -1;
+  private long freeSpaceInCurrentPage = 0;
+
+  private final LinkedListUnsafeSorterSpillWriter spillWriters = new 
LinkedList();
+
+  public UnsafeExternalSorter(
+  TaskMemoryManager memoryManager,
+  ShuffleMemoryManager shuffleMemoryManager,
+  BlockManager blockManager,
+  TaskContext taskContext,
+  RecordComparator recordComparator,
+  PrefixComparator prefixComparator,
+  int initialSize,
+  SparkConf conf) throws IOException {
+this.memoryManager = memoryManager;
+this.shuffleMemoryManager = shuffleMemoryManager;
+this.blockManager = blockManager;
+this.taskContext = taskContext;
+this.recordComparator = recordComparator;
+this.prefixComparator = prefixComparator;
+this.initialSize = initialSize;
+// Use getSizeAsKb (not bytes) to maintain backwards compatibility for 
units
+this.fileBufferSizeBytes = (int) 
conf.getSizeAsKb(spark.shuffle.file.buffer, 32k) * 1024;
+initializeForWriting();
+  }
+
+  // TODO: metrics tracking + integration with shuffle write metrics
+  // need to connect the write metrics to task metrics so we count the 
spill IO somewhere.
+
+  /**
+   * Allocates new sort data structures. Called when creating the sorter 
and after each spill.
+   */
+  private void initializeForWriting() throws IOException {
+this.writeMetrics = new ShuffleWriteMetrics();
+// TODO

[GitHub] spark pull request: [SPARK-9045] Fix Scala 2.11 build break in Uns...

2015-07-14 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/7405#discussion_r34623959
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala 
---
@@ -17,11 +17,14 @@
 
 package org.apache.spark.sql
 
-import org.apache.spark.sql.catalyst.InternalRow
-
 /**
  * Shim to allow us to implement [[scala.Iterator]] in Java. Scala 2.11+ 
has an AbstractIterator
  * class for this, but that class is `private[scala]` in 2.10. We need to 
explicitly fix this to
- * `Row` in order to work around a spurious IntelliJ compiler error.
+ * `Row` in order to work around a spurious IntelliJ compiler error. This 
cannot be an abstract
+ * class because that leads to compilation errors under Scala 2.11.
  */
-private[spark] abstract class AbstractScalaRowIterator extends 
Iterator[InternalRow]
+private[spark] class AbstractScalaRowIterator[T] extends Iterator[T] {
--- End diff --

The 'Abstract' should be dropped from the class name, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6980] [CORE] Akka timeout exceptions in...

2015-07-03 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6205#discussion_r33873268
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala ---
@@ -182,3 +184,109 @@ private[spark] object RpcAddress {
 RpcAddress(host, port)
   }
 }
+
+
+/**
+ * An exception thrown if RpcTimeout modifies a [[TimeoutException]].
+ */
+private[rpc] class RpcTimeoutException(message: String, cause: 
TimeoutException)
+  extends TimeoutException(message) { initCause(cause) }
+
+
+/**
+ * Associates a timeout with a description so that a when a 
TimeoutException occurs, additional
+ * context about the timeout can be amended to the exception message.
+ * @param timeout timeout duration in seconds
+ * @param conf the configuration parameter that controls this timeout
+ */
+private[spark] class RpcTimeout(timeout: FiniteDuration, val conf: String) 
{
+
+  /** Get the timeout duration */
+  def duration: FiniteDuration = timeout
+
+  /** Amends the standard message of TimeoutException to include the 
description */
+  private def createRpcTimeoutException(te: TimeoutException): 
RpcTimeoutException = {
+new RpcTimeoutException(te.getMessage() + . This timeout is 
controlled by  + conf, te)
+  }
+
+  /**
+   * PartialFunction to match a TimeoutException and add the timeout 
description to the message
+   *
+   * @note This can be used in the recover callback of a Future to add to 
a TimeoutException
+   * Example:
+   *val timeout = new RpcTimeout(5 millis, short timeout)
+   *Future(throw new 
TimeoutException).recover(timeout.addMessageIfTimeout)
+   */
+  def addMessageIfTimeout[T]: PartialFunction[Throwable, T] = {
+// The exception has already been converted to a RpcTimeoutException 
so just raise it
+case rte: RpcTimeoutException = throw rte
+// Any other TimeoutException get converted to a RpcTimeoutException 
with modified message
+case te: TimeoutException = throw createRpcTimeoutException(te)
+  }
+
+  /**
+   * Wait for the completed result and return it. If the result is not 
available within this
+   * timeout, throw a [[RpcTimeoutException]] to indicate which 
configuration controls the timeout.
+   * @param  awaitable  the `Awaitable` to be awaited
+   * @throws RpcTimeoutException if after waiting for the specified time 
`awaitable`
+   * is still not ready
+   */
+  def awaitResult[T](awaitable: Awaitable[T]): T = {
+try {
+  Await.result(awaitable, duration)
+} catch addMessageIfTimeout
+  }
+}
+
+
+private[spark] object RpcTimeout {
+
+  /**
+   * Lookup the timeout property in the configuration and create
+   * a RpcTimeout with the property key in the description.
+   * @param conf configuration properties containing the timeout
+   * @param timeoutProp property key for the timeout in seconds
+   * @throws NoSuchElementException if property is not set
+   */
+  def apply(conf: SparkConf, timeoutProp: String): RpcTimeout = {
+val timeout = { conf.getTimeAsSeconds(timeoutProp) seconds }
+new RpcTimeout(timeout, timeoutProp)
+  }
+
+  /**
+   * Lookup the timeout property in the configuration and create
+   * a RpcTimeout with the property key in the description.
+   * Uses the given default value if property is not set
+   * @param conf configuration properties containing the timeout
+   * @param timeoutProp property key for the timeout in seconds
+   * @param defaultValue default timeout value in seconds if property not 
found
+   */
+  def apply(conf: SparkConf, timeoutProp: String, defaultValue: String): 
RpcTimeout = {
+val timeout = { conf.getTimeAsSeconds(timeoutProp, defaultValue) 
seconds }
+new RpcTimeout(timeout, timeoutProp)
+  }
+
+  /**
+   * Lookup prioritized list of timeout properties in the configuration
+   * and create a RpcTimeout with the first set property key in the
+   * description.
+   * Uses the given default value if property is not set
+   * @param conf configuration properties containing the timeout
+   * @param timeoutPropList prioritized list of property keys for the 
timeout in seconds
+   * @param defaultValue default timeout value in seconds if no properties 
found
+   */
+  def apply(conf: SparkConf, timeoutPropList: Seq[String], defaultValue: 
String): RpcTimeout = {
+require(timeoutPropList.nonEmpty)
+
+// Find the first set property or use the default value with the first 
property
+val itr = timeoutPropList.iterator
+var foundProp: Option[(String, String)] = None
+while (itr.hasNext  foundProp.isEmpty){
+  val propKey

[GitHub] spark pull request: [SPARK-7988][STREAMING] Round-robin scheduling...

2015-06-30 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6607#discussion_r33632077
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
 ---
@@ -271,6 +273,41 @@ class ReceiverTracker(ssc: StreamingContext, 
skipReceiverLaunch: Boolean = false
 }
 
 /**
+ * Get the list of executors excluding driver
+ */
+private def getExecutors(ssc: StreamingContext): List[String] = {
+  val executors = 
ssc.sparkContext.getExecutorMemoryStatus.map(_._1.split(:)(0)).toList
+  val driver = ssc.sparkContext.getConf.get(spark.driver.host)
+  executors.diff(List(driver))
+}
+
+/** Set host location(s) for each receiver so as to distribute them 
over
+ * executors in a round-robin fashion taking into account 
preferredLocation if set
+ */
+private[streaming] def scheduleReceivers(receivers: Seq[Receiver[_]],
+  executors: List[String]): Array[ArrayBuffer[String]] = {
+  val locations = new Array[ArrayBuffer[String]](receivers.length)
+  var i = 0
+  for (i - 0 until receivers.length) {
+locations(i) = new ArrayBuffer[String]()
+if (receivers(i).preferredLocation.isDefined) {
+  locations(i) += receivers(i).preferredLocation.get
+}
+  }
+  var count = 0
+  for (i - 0 until max(receivers.length, executors.length)) {
--- End diff --

Why is max used here ?
receivers.length is not enough ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-14 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6793#discussion_r32379130
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
 ---
@@ -69,6 +69,7 @@ class ArithmeticExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 
 checkEvaluation(-c1, -1.1, row)
 checkEvaluation(c1 + c2, 3.1, row)
+checkDoubleEvaluation(Rand(30), (0.7363714192755834 +- 0.001), row)
--- End diff --

I am in Beijing now.
Except for difficulty of accessing gmail, github is quite slow as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-14 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6793#discussion_r32379114
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
 ---
@@ -69,6 +69,7 @@ class ArithmeticExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 
 checkEvaluation(-c1, -1.1, row)
 checkEvaluation(c1 + c2, 3.1, row)
+checkDoubleEvaluation(Rand(30), (0.7363714192755834 +- 0.001), row)
--- End diff --

Looking at the tests under sql, I don't see how TaskContext is explicitly 
set.

Creating a new test is fine. The new test would contain a method containing 
one line.
Just want to make sure this is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-13 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6793#issuecomment-111724468
  
I am trying to figure out how checkEvaluation should be used for the new 
test.

 protected def checkEvaluation(
  expression: Expression, expected: Any, inputRow: Row = EmptyRow): 
Unit = {

w.r.t. Rand(), the expected value is not deterministic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-13 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6793#issuecomment-111751576
  
Looking at ArithmeticExpressionSuite.scala, it has some checks in the 
following form:
checkDoubleEvaluation(c1 - c2, (-0.9 +- 0.001), row)

This seems to be better fit for checking the return value from Rand()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-06-13 Thread tedyu
Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/6059


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix NullPointerException with functions.rand()

2015-06-12 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/6793

Fix NullPointerException with functions.rand()

This PR fixes the problem reported by Justin Yip in the thread 
'NullPointerException with functions.rand()'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6793


commit a1d66c59abc59a1454a53255ce9e9df66dd98f5a
Author: tedyu yuzhih...@gmail.com
Date:   2015-06-12T22:57:42Z

Fix NullPointerException with functions.rand()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-12 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6793#issuecomment-111668224
  
I looked at UnsafeFixedWidthAggregationMapSuite.scala in expressions 
package.

Is RandomSuite.scala going to test Rand and Randn only ?

A bit more hint is appreciated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-12 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6793#issuecomment-111655373
  
Mind telling me which suite the new test should be added to ?

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-8336 Fix NullPointerException with funct...

2015-06-12 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6793#issuecomment-111657774
  
At first glance, none of the test suites under 
sql/catalyst/src/test//scala/org/apache/spark/sql seems proper for the new test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7955][Core] Ensure executors with cache...

2015-05-29 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6508#discussion_r31374404
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -101,6 +104,11 @@ private[spark] class ExecutorAllocationManager(
   private val executorIdleTimeoutS = conf.getTimeAsSeconds(
 spark.dynamicAllocation.executorIdleTimeout, 60s)
 
+  private val cachedExecutorTimeoutS = conf.getTimeAsSeconds(
+spark.dynamicAllocation.executorIdleTimeout, s${Long.MaxValue / 
60}s)
--- End diff --

Did you mean 'Long.MaxValue / 1000' ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Kryo buffer size configured in mb should be pr...

2015-05-24 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6390#discussion_r30953036
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -50,7 +50,13 @@ class KryoSerializer(conf: SparkConf)
   with Logging
   with Serializable {
 
-  private val bufferSizeKb = 
conf.getSizeAsKb(spark.kryoserializer.buffer, 64k)
+  private val bufferSize = conf.get(spark.kryoserializer.buffer, 64k)
+  private val bufferSizeKb = _
+  if (bufferSize.endsWith(k) || bufferSize.endsWith(kb)) {
+bufferSizeKb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k)
+  } else {
+bufferSizeKb = conf.getSizeAsMb(spark.kryoserializer.buffer, 64k)
--- End diff --

Yes - I was trying to differentiate mb from kb.

Let me regroup the checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Kryo buffer size configured in mb should be pr...

2015-05-24 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/6390

Kryo buffer size configured in mb should be properly supported

@JoshRosen
This PR tried to fix the issue reported by Debasish Das under 'Kryo option 
changed' thread

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6390


commit 830d0d098f01564339a9568b2e736d233c7378bb
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-24T14:30:34Z

Kryo buffer size configured in mb should be properly supported




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Kryo buffer size configured in mb should be pr...

2015-05-24 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6390#discussion_r30953067
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -50,7 +50,13 @@ class KryoSerializer(conf: SparkConf)
   with Logging
   with Serializable {
 
-  private val bufferSizeKb = 
conf.getSizeAsKb(spark.kryoserializer.buffer, 64k)
+  private val bufferSize = conf.get(spark.kryoserializer.buffer, 64k)
+  private val bufferSizeKb = _
+  if (bufferSize.endsWith(k) || bufferSize.endsWith(kb)) {
+bufferSizeKb = conf.getSizeAsKb(spark.kryoserializer.buffer, 64k)
+  } else {
+bufferSizeKb = conf.getSizeAsMb(spark.kryoserializer.buffer, 64k)
--- End diff --

Let me wait for Debasish to confirm whether he can reproduce the error 
before updating this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Add test which shows Kryo buffer size configur...

2015-05-24 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6390#discussion_r30955789
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -62,6 +62,10 @@ class KryoSerializerSuite extends FunSuite with 
SharedSparkContext {
 val thrown3 = intercept[IllegalArgumentException](new 
KryoSerializer(conf4).newInstance())
 assert(thrown3.getMessage.contains(kryoBufferProperty))
 assert(!thrown3.getMessage.contains(kryoBufferMaxProperty))
+val conf5 = conf.clone()
+conf5.set(kryoBufferProperty, 8m)
--- End diff --

Currently there is no test with legitimate kryoBufferProperty value 
expressed in 'm'

This addition fills the gap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix compilation error due to buildScan() being...

2015-05-18 Thread tedyu
Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/6246


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Fix compilation error due to buildScan() being...

2015-05-18 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/6246

Fix compilation error due to buildScan() being final

This PR fixes the following compilation error:

[error] 
/home/jenkins/workspace/Spark-Master-Maven-with-YARN/HADOOP_PROFILE/hadoop-2.4/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala:174:
 overriding method buildScan in class HadoopFsRelation of type 
(requiredColumns: Array[String], filters: 
Array[org.apache.spark.sql.sources.Filter], inputPaths: 
Array[String])org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.expressions.Row];
[error]  method buildScan cannot override final member
[error]   override def buildScan(requiredColumns: Array[String],

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6246


commit e5e94e11a8eabf2913329d365b6424f2b458bd2c
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-18T21:35:06Z

Fix compilation error due to buildScan() being final




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2883] [SQL] ORC data source for Spark S...

2015-05-16 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/6194#discussion_r30459211
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/package.scala ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.sql.{DataFrame, SaveMode}
+
+package object orc {
+  /**
+   * ::Experimental::
+   *
+   * Extra ORC file loading functionality on [[HiveContext]] through 
implicit conversion.
+   *
+   * @since 1.4.0
+   */
+  @Experimental
+  implicit class OrcContext(sqlContext: HiveContext) {
+/**
+ * ::Experimental::
+ *
+ * Loads specified Parquet files, returning the result as a 
[[DataFrame]].
--- End diff --

Parquet - ORC


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-05-12 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6059#issuecomment-101334832
  
Ran suite two more times with hadoop-2.4 profile - DriverSuite passed.

Running suite with hadoop-2.3 profile now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-05-12 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6059#issuecomment-101345262
  
DriverSuite passed with hadoop-2.3 profile as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/6059

SPARK-7355 FlakyTest - o.a.s.DriverSuite

The test passed locally:
{code}
^[[32mDriverSuite:^[[0m
^[[32m- driver should exit after finishing without cleanup (SPARK-530)^[[0m
^[[36mRun completed in 32 seconds, 350 milliseconds.^[[0m
^[[36mTotal number of tests run: 1^[[0m
^[[36mSuites: completed 2, aborted 0^[[0m
^[[36mTests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0^[[0m
^[[32mAll tests passed.^[[0m
{code}

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6059.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6059


commit 58668606cf81a5d72ea9bc080047d21a513b885f
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-11T17:09:05Z

SPARK-7355 FlakyTest - o.a.s.DriverSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6059#issuecomment-101007216
  
From 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32407/consoleFull
 :
{code}
[info] DriverSuite:
[info] - driver should exit after finishing without cleanup (SPARK-530) (11 
seconds, 740 milliseconds)
{code}
The test passed in the full test suite run.
FYI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6059#issuecomment-101024546
  
I am running test suite locally to try to reproduce the test failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7355 FlakyTest - o.a.s.DriverSuite

2015-05-11 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6059#issuecomment-101073092
  
Ran test suite 3 times where DriverSuite passed every time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Reference fasterxml.jackson.version in sql/cor...

2015-05-09 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6031#issuecomment-100551775
  
We do have green builds now:

https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/2115/HADOOP_PROFILE=hadoop-2.4,label=centos/console


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Upgrade version of jackson-databind in sql/cor...

2015-05-09 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/6028

Upgrade version of jackson-databind in sql/core/pom.xml

Currently version of jackson-databind in sql/core/pom.xml is 2.3.0

This is older than the version specified in root pom.xml

This PR upgrades the version in sql/core/pom.xml so that they're consistent.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6028.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6028


commit 28c83946a02751f368d8dff5f4f89e5e7164deb0
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-09T14:57:48Z

Upgrade version of jackson-databind in sql/core/pom.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Upgrade version of jackson-databind in sql/cor...

2015-05-09 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6028#issuecomment-100501599
  
This PR should get rid of the following test failure:


https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/lastCompletedBuild/testReport/test.org.apache.spark.sql/JavaUDFSuite/udf2Test/

java.lang.NoSuchMethodError: 
com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/
 jackson/databind/PropertyName;ZZZ)V
  at 
com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector. 

 
com$fasterxml$jackson$module$scala$introspect$ScalaPropertiesCollector$$_addField(ScalaPropertiesCollector.scala:109)
  at 
com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:100)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Upgrade version of jackson-databind in sql/cor...

2015-05-09 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/6028#issuecomment-100501682
  
@rxin 
Please take a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Reference fasterxml.jackson.version in sql/cor...

2015-05-09 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/6031

Reference fasterxml.jackson.version in sql/core/pom.xml



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6031


commit 28c83946a02751f368d8dff5f4f89e5e7164deb0
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-09T14:57:48Z

Upgrade version of jackson-databind in sql/core/pom.xml

commit ff2a44ffc8f922905ef629507970b6dec6a29790
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-09T18:00:08Z

Merge branch 'master' of github.com:apache/spark

commit 5c2580cae3280efd12ca16d969637b068f82e3b9
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-09T18:01:21Z

Reference fasterxml.jackson.version in sql/core/pom.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7450 Use UNSAFE.getLong() to speed up Bi...

2015-05-07 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5897#issuecomment-100050095
  
bq. I'm really pedantic

I like that :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7450 Use UNSAFE.getLong() to speed up Bi...

2015-05-07 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5897#issuecomment-100015784
  
SPARK-7450 has been filed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-06 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5897#issuecomment-99621943
  
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:769:
 not found: type HiveCompatibilitySuite
[error]   extends HiveCompatibilitySuite with BeforeAndAfter {
[error]   ^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:828:
 value testCases is not a member of org.scalatest.BeforeAndAfter
[error]   override def testCases: Seq[(String, File)] = 
super.testCases.filter {
[error]   ^

I don't think the above error was related to my change.
BitSetSuite passed:

[info] Test org.apache.spark.unsafe.bitset.BitSetSuite.basicOps started
[info] Test org.apache.spark.unsafe.bitset.BitSetSuite.traversal started
[info] Test run finished: 0 failed, 0 ignored, 2 total, 0.012s



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-04 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/5897#discussion_r29638709
  
--- Diff: 
unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java ---
@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long 
baseOffset, int index) {
* Returns {@code true} if any bit is set.
*/
   public static boolean anySet(Object baseObject, long baseOffset, long 
bitSetWidthInBytes) {
-for (int i = 0; i = bitSetWidthInBytes; i++) {
+int widthInLong = (int)(bitSetWidthInBytes / SIZE_OF_LONG);
+for (int i = 0; i = widthInLong; i++) {
+  if (PlatformDependent.UNSAFE.getLong(baseObject, baseOffset + i) != 
0) {
+return true;
+  }
+}
+for (int i = (int)(SIZE_OF_LONG * widthInLong); i  
bitSetWidthInBytes; i++) {
--- End diff --

I don't think so.
The second loop is for the remaining bytes :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-04 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5897#issuecomment-98897971
  
From failed test:
bq. [error] oro#oro;2.0.8!oro.jar origin location must be absolute: 
file:/home/jenkins/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar

Pretty sure the above is not caused by my change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-04 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/5897#discussion_r29638486
  
--- Diff: 
unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java ---
@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long 
baseOffset, int index) {
* Returns {@code true} if any bit is set.
*/
   public static boolean anySet(Object baseObject, long baseOffset, long 
bitSetWidthInBytes) {
-for (int i = 0; i = bitSetWidthInBytes; i++) {
+long widthInLong = bitSetWidthInBytes / SIZE_OF_LONG;
+for (long i = 0; i = widthInLong; i++) {
--- End diff --

Addressed the above comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-04 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/5897#discussion_r29639524
  
--- Diff: 
unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java ---
@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long 
baseOffset, int index) {
* Returns {@code true} if any bit is set.
*/
   public static boolean anySet(Object baseObject, long baseOffset, long 
bitSetWidthInBytes) {
-for (int i = 0; i = bitSetWidthInBytes; i++) {
+int widthInLong = (int)(bitSetWidthInBytes / SIZE_OF_LONG);
+for (int i = 0; i = widthInLong; i++) {
+  if (PlatformDependent.UNSAFE.getLong(baseObject, baseOffset + i) != 
0) {
+return true;
+  }
+}
+for (int i = (int)(SIZE_OF_LONG * widthInLong); i  
bitSetWidthInBytes; i++) {
--- End diff --

That case is covered by the first loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-04 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/5897

Use UNSAFE.getLong() to speed up BitSetMethods#anySet()

@JoshRosen 
Please take a look

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5897.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5897


commit 3e9b6919ccb8ccf53b5bcb9fe183a1b7fad0e9ef
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-04T23:59:02Z

Use UNSAFE.getLong() to speed up BitSetMethods#anySet()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Use UNSAFE.getLong() to speed up BitSetMethods...

2015-05-04 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5897#issuecomment-98913797
  
From 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31809/testReport/junit/org.apache.spark.deploy/SparkSubmitSuite/includes_jars_passed_in_through___jars/
 :

sbt.ForkMain$ForkError: The code passed to failAfter did not complete 
within 60 seconds.
at 
org.scalatest.concurrent.Timeouts$$anonfun$failAfter$1.apply(Timeouts.scala:249)
at 
org.scalatest.concurrent.Timeouts$$anonfun$failAfter$1.apply(Timeouts.scala:249)

I think the above was not related to my change.

bq. This patch adds the following public classes (experimental)

This change doesn't add any public class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Set null as default value for TaskContextImpl#...

2015-05-03 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/5874

Set null as default value for TaskContextImpl#taskMemoryManager

@JoshRosen 
Please take a look.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5874


commit 4627733ebb5c3b273b607f90e0042f8d47b97cbd
Author: tedyu yuzhih...@gmail.com
Date:   2015-05-03T21:41:27Z

Set null as default value for TaskContextImpl#taskMemoryManager




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Set null as default value for TaskContextImpl#...

2015-05-03 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5874#issuecomment-98562614
  
From what I can tell (e.g. JavaAPISuite), null taskMemoryManager is passed 
in existing tests.
This doesn't seem to increase the chance of NPE.

I agree with Josh that tests should make use of this default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Set null as default value for TaskContextImpl#...

2015-05-03 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5874#issuecomment-98568325
  
I found two places where taskMemoryManager is used:
DAGScheduler#runLocallyWithinThread()
Executor#run()
taskMemoryManager is constructed in both places.

See if I missed any reference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Set null as default value for TaskContextImpl#...

2015-05-03 Thread tedyu
Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/5874


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Set null as default value for TaskContextImpl#...

2015-05-03 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5874#issuecomment-98571954
  
Right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-29 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/5725#issuecomment-97511381
  









sup**[sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeFixedWidthAggregationMap.java,
 line 133 
\[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo56lmUPtN4oJEAxApu)**
 ([raw 
file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeFixedWidthAggregationMap.java#L133)):/sup
Please include writtenLength, unsafeRow.length and javaRow in the assertion 
so that we have more information when debugging.

---


sup**[sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java,
 line 148 
\[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo57pOAjJQa1nPxxd2y)**
 ([raw 
file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L148)):/sup
Condition doesn't match assertion message w.r.t. =

---


sup**[unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java,
 line 23 
\[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5AmeniFC0_bDQdiyb)**
 ([raw 
file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java#L23)):/sup
Declare this class static ?

---

sup**[unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java, 
line 26 
\[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5B0VcBmjKX4TnrZD7)**
 ([raw 
file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java#L26)):/sup
nit: in-heap - on-heap

---

sup**[unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java, 
line 24 
\[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5BT5TTxBoPNV_juFI)**
 ([raw 
file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java#L24)):/sup
https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html doesn't 
serve your needs ?

---


sup**[unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java, 
line 75 
\[r14\]](https://reviewable.io:443/reviews/apache/spark/5725#-Jo5C4Ty7HYh-hvVPSEb)**
 ([raw 
file](https://github.com/apache/spark/blob/512bd94d463f741170e904ae186e238f997c/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java#L75)):/sup
Maybe use something similar to 
http://www.docjar.com/docs/api/sun/misc/Unsafe.html#getInt(Object, long) for 
speedup ?

---


---

Comments from the [review on 
Reviewable.io](https://reviewable.io:443/reviews/apache/spark/5725)
!-- Sent from Reviewable.io --



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-29 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/5704#discussion_r29382639
  
--- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -281,21 +285,30 @@ private[spark] class ExecutorAllocationManager(
*/
   private def addExecutors(maxNumExecutorsNeeded: Int): Int = {
 // Do not request more executors if it would put our target over the 
upper bound
-val currentTarget = targetNumExecutors
-if (currentTarget = maxNumExecutors) {
-  logDebug(sNot adding executors because there are already 
${executorIds.size}  +
-sregistered and $numExecutorsPending pending executor(s) (limit 
$maxNumExecutors))
+if (numExecutorsTarget = maxNumExecutors) {
+  val numExecutorsPending = numExecutorsTarget - executorIds.size
+  logDebug(sNot adding executors because there are already 
${executorIds.size} registered +
+sand ${numExecutorsPending} pending executor(s) (limit 
$maxNumExecutors))
--- End diff --

nit: insert a space before 'and'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...

2015-04-23 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/5673

SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputformat...

py

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5673


commit 18d172a95710c85bb5e2afea83f2c41889d71597
Author: tedyu yuzhih...@gmail.com
Date:   2015-04-24T01:12:15Z

SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputformat.py




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6085 Increase default value for memory o...

2015-02-28 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/4836

SPARK-6085 Increase default value for memory overhead



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4836.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4836


commit 1fdd4df059fd013da56619c884e1e3a403605beb
Author: tedyu yuzhih...@gmail.com
Date:   2015-02-28T21:15:46Z

SPARK-6085 Increase default value for memory overhead




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6085 Increase default value for memory o...

2015-02-28 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/4836#issuecomment-76551769
  
Thanks for the reminder, updated accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-6045 RecordWriter should be checked agai...

2015-02-26 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/4794

SPARK-6045 RecordWriter should be checked against null in PairRDDFunctio...

...ns#saveAsNewAPIHadoopDataset

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4794


commit 2d8d4b1fb61f5aa73bbd824e429e4f388bc84511
Author: tedyu yuzhih...@gmail.com
Date:   2015-02-26T20:57:15Z

SPARK-6045 RecordWriter should be checked against null in 
PairRDDFunctions#saveAsNewAPIHadoopDataset




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-23 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/4021#discussion_r25209770
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulators.scala ---
@@ -320,7 +334,13 @@ private[spark] object Accumulators {
   def add(values: Map[Long, Any]): Unit = synchronized {
 for ((id, value) - values) {
   if (originals.contains(id)) {
-originals(id).asInstanceOf[Accumulable[Any, Any]] ++= value
+// Since we are now storing weak references, we must check whether 
the underlying data
+// is valid. 
+originals(id).get match {
+  case Some(accum) = accum.asInstanceOf[Accumulable[Any, Any]] 
++= value
+  case None = 
+throw new IllegalAccessError(Attempted to access garbage 
collected Accumulator.)   
--- End diff --

If Accumulator is garbage collected, should we log and continue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3885] Provide mechanism to remove accum...

2015-02-23 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/4021#issuecomment-75666737
  
Not yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-11-17 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/3115#issuecomment-63339550
  
Thanks Sean for chiming in.

bq. Most people wouldn't depend on the server
Mapreduce related classes, e.g. TableMapReduceUtil, are in hbase-server


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Exclude dependency on hbase-annotations module

2014-11-17 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/3286#issuecomment-63357810
  
@pwendell 
I logged SPARK-4455
Let me know if I need to create another pull request due to the merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4455 Exclude dependency on hbase-annotat...

2014-11-17 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/3286#issuecomment-63381658
  
These annotations are used to indicate the audience / stability of HBase 
APIs.
They're not needed by HBase clients.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Exclude dependency on hbase-annotations module

2014-11-15 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/3286

Exclude dependency on hbase-annotations module

@pwendell 
Please take a look

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3286


commit 2f28b088cdfa99fd68989e600c3b6bc856845c80
Author: tedyu yuzhih...@gmail.com
Date:   2014-11-15T15:22:46Z

Exclude dependency on hbase-annotations module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-11-05 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/1893#issuecomment-61847146
  
I will create another pull request since my local workspace has become 
stale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-11-05 Thread tedyu
Github user tedyu closed the pull request at:

https://github.com/apache/spark/pull/1893


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-11-05 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/3115

SPARK-1297 Upgrade HBase dependency to 0.98

@pwendell @rxin 
Please take a look

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3115


commit 2b079c8c5b15643ec85daeb0b9e360d26fdec0e9
Author: tedyu yuzhih...@gmail.com
Date:   2014-11-05T17:27:59Z

SPARK-1297 Upgrade HBase dependency to 0.98




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-11-05 Thread tedyu
Github user tedyu commented on the pull request:

https://github.com/apache/spark/pull/3115#issuecomment-61866705
  
hbase 0.98 has been declared stable release.

Since hbase 0.94 is not modularized, compilation against 0.94 or earlier 
releases wouldn't be supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-11 Thread tedyu
GitHub user tedyu opened a pull request:

https://github.com/apache/spark/pull/1893

SPARK-1297 Upgrade HBase dependency to 0.98

Two profiles are added to examples/pom.xml :
hbase-hadoop1 (default)
hbase-hadoop2

I verified that compilation passes with either profile active.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tedyu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1893


commit 70fb7b4ea8fd7647e4a4ddca4df71521b749521c
Author: tedyu yuzhih...@gmail.com
Date:   2014-08-11T17:21:13Z

SPARK-1297 Upgrade HBase dependency to 0.98




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-11 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/1893#discussion_r16066911
  
--- Diff: examples/pom.xml ---
@@ -45,6 +45,39 @@
 /dependency
   /dependencies
 /profile
+profile
+  idhbase-hadoop2/id
+  activation
+property
+  namehbase.profile/name
+  valuehadoop2/value
+/property
+  /activation
+  properties
+protobuf.version2.5.0/protobuf.version
--- End diff --

Will drop in next PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-11 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/1893#discussion_r16066969
  
--- Diff: examples/pom.xml ---
@@ -45,6 +45,39 @@
 /dependency
   /dependencies
 /profile
+profile
+  idhbase-hadoop2/id
+  activation
+property
+  namehbase.profile/name
+  valuehadoop2/value
+/property
+  /activation
+  properties
+protobuf.version2.5.0/protobuf.version
+hbase.version0.98.4-hadoop2/hbase.version
+  /properties
+  dependencyManagement
+dependencies
+/dependencies
+  /dependencyManagement
+/profile
+profile
+  idhbase-hadoop1/id
+  activation
+property
+  name!hbase.profile/name
+/property
+  /activation
+  properties
+hbase.version0.98.4-hadoop1/hbase.version
--- End diff --

Can this be done when the need comes ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-11 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/1893#discussion_r16067144
  
--- Diff: examples/pom.xml ---
@@ -110,36 +143,52 @@
   version${project.version}/version
 /dependency
 dependency
-  groupIdorg.apache.hbase/groupId
-  artifactIdhbase/artifactId
-  version${hbase.version}/version
-  exclusions
-exclusion
-  groupIdasm/groupId
-  artifactIdasm/artifactId
-/exclusion
-exclusion
-  groupIdorg.jboss.netty/groupId
-  artifactIdnetty/artifactId
-/exclusion
-exclusion
-  groupIdio.netty/groupId
-  artifactIdnetty/artifactId
-/exclusion
-exclusion
-  groupIdcommons-logging/groupId
-  artifactIdcommons-logging/artifactId
-/exclusion
-exclusion
-  groupIdorg.jruby/groupId
-  artifactIdjruby-complete/artifactId
-/exclusion
-  /exclusions
-/dependency
-dependency
   groupIdorg.eclipse.jetty/groupId
   artifactIdjetty-server/artifactId
 /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-testing-util/artifactId
+version${hbase.version}/version
+exclusions
+  exclusion
+groupIdorg.jruby/groupId
+artifactIdjruby-complete/artifactId
+  /exclusion
+/exclusions
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-protocol/artifactId
+version${hbase.version}/version
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-common/artifactId
+version${hbase.version}/version
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-client/artifactId
+version${hbase.version}/version
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
--- End diff --

hbase-server is needed.
Hive hbase-handler does the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-11 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/1893#discussion_r16068521
  
--- Diff: examples/pom.xml ---
@@ -110,36 +143,52 @@
   version${project.version}/version
 /dependency
 dependency
-  groupIdorg.apache.hbase/groupId
-  artifactIdhbase/artifactId
-  version${hbase.version}/version
-  exclusions
-exclusion
-  groupIdasm/groupId
-  artifactIdasm/artifactId
-/exclusion
-exclusion
-  groupIdorg.jboss.netty/groupId
-  artifactIdnetty/artifactId
-/exclusion
-exclusion
-  groupIdio.netty/groupId
-  artifactIdnetty/artifactId
-/exclusion
-exclusion
-  groupIdcommons-logging/groupId
-  artifactIdcommons-logging/artifactId
-/exclusion
-exclusion
-  groupIdorg.jruby/groupId
-  artifactIdjruby-complete/artifactId
-/exclusion
-  /exclusions
-/dependency
-dependency
   groupIdorg.eclipse.jetty/groupId
   artifactIdjetty-server/artifactId
 /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-testing-util/artifactId
+version${hbase.version}/version
+exclusions
+  exclusion
+groupIdorg.jruby/groupId
+artifactIdjruby-complete/artifactId
+  /exclusion
+/exclusions
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-protocol/artifactId
+version${hbase.version}/version
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
+artifactIdhbase-common/artifactId
+version${hbase.version}/version
+  /dependency
+  dependency
+groupIdorg.apache.hbase/groupId
--- End diff --

Added exclusions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1297 Upgrade HBase dependency to 0.98

2014-08-11 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/1893#discussion_r16068638
  
--- Diff: examples/pom.xml ---
@@ -45,6 +45,39 @@
 /dependency
   /dependencies
 /profile
+profile
+  idhbase-hadoop2/id
+  activation
+property
+  namehbase.profile/name
+  valuehadoop2/value
+/property
+  /activation
+  properties
+protobuf.version2.5.0/protobuf.version
+hbase.version0.98.4-hadoop2/hbase.version
+  /properties
+  dependencyManagement
+dependencies
+/dependencies
+  /dependencyManagement
+/profile
+profile
+  idhbase-hadoop1/id
+  activation
+property
+  name!hbase.profile/name
+/property
+  /activation
+  properties
+hbase.version0.98.4-hadoop1/hbase.version
--- End diff --

When hbase.version is specified on command line, that should be effective, 
right ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1127 Add spark-hbase.

2014-03-22 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/194#discussion_r10862801
  
--- Diff: 
external/hbase/src/main/scala/org/apache/spark/nosql/hbase/HBaseUtils.scala ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.nosql.hbase
+
+import org.apache.hadoop.io.Text
+import org.apache.spark.rdd.RDD
+
+/**
+ * A public object that provide HBase supports.
+ * You could save RDD into HBase through 
[[org.apache.spark.nosql.hbase.HBaseUtils.saveAsHBaseTable]] method.
+ */
+object HBaseUtils {
+
+  /**
+   * Save [[org.apache.spark.rdd.RDD[Text]]] as a HBase table
+   * @param rdd [[org.apache.spark.rdd.RDD[Text]]]
+   * @param zkHost the zookeeper hosts. e.g. 
10.232.98.10,10.232.98.11,10.232.98.12
+   * @param zkPort the zookeeper client listening port. e.g. 2181
+   * @param zkNode the zookeeper znode of HBase. e.g. hbase-apache
+   * @param table the name of table which we save records
+   * @param rowkeyType the type of rowkey. 
[[org.apache.spark.nosql.hbase.HBaseType]]
+   * @param columns the column list. 
[[org.apache.spark.nosql.hbase.HBaseColumn]]
+   * @param delimiter the delimiter which used to split record into fields
+   */
+  def saveAsHBaseTable(rdd: RDD[Text], zkHost: String, zkPort: String, 
zkNode: String,
+   table: String, rowkeyType: String, columns: 
List[HBaseColumn], delimiter: Char) {
+val conf = new HBaseConf(zkHost, zkPort, zkNode, table, rowkeyType, 
columns, delimiter)
+
+def writeToHBase(iter: Iterator[Text]) {
+  val writer = new SparkHBaseWriter(conf)
+
+  try {
+writer.init()
+while (iter.hasNext) {
+  val record = iter.next()
+  writer.write(record)
+}
+  } finally {
+writer.close()
+  }
--- End diff --

See my comment in close() method below.
If init() throws exception, htable would be null. With an additional null 
check, writer.close() should be a no-op.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


<    1   2   3   4   5   6   >