[GitHub] spark issue #13651: [SPARK-15776][SQL] Divide Expression inside Aggregation ...

2016-06-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the issue:

https://github.com/apache/spark/pull/13651
  
LGTM thank you


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13678: [SPARK-15824][SQL] Execute WITH .... INSERT ... statemen...

2016-06-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the issue:

https://github.com/apache/spark/pull/13678
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13524: [SPARK-15776][SQL] Type coercion incorrect

2016-06-12 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the issue:

https://github.com/apache/spark/pull/13524
  
@rxin Done. Pleas help review, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13561: [SPARK-15824][SQL] Run 'with ... insert ... selec...

2016-06-08 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/13561

[SPARK-15824][SQL] Run 'with ... insert ... select' failed when use spark 
thriftserver

## What changes were proposed in this pull request?

Dataset.collect will call withNewExecutionId and 
InsertIntoHadoopFsRelationCommand also will call withNewExecutionId, then for 
below SQL will cause IllegalArgumentException(spark.sql.execution.id is already 
set")
```sql
create table src(k int, v int);
create table src_parquet(k int, v int);
with v as (select 1, 2) insert into table src_parquet from src;
```
## How was this patch tested?

Will add UT later


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-15824

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13561.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13561


commit 0f66322b748a95bfa4a122c832680c260f8da843
Author: Sephiroth-Lin 
Date:   2016-06-08T12:19:14Z

Run 'with ... insert ... select' failed when use spark thriftserver




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13524: [SPARK-15776] Type coercion incorrect

2016-06-06 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/13524

[SPARK-15776] Type coercion incorrect

## What changes were proposed in this pull request?

Update type coercion order, details see 
https://issues.apache.org/jira/browse/SPARK-15776

## How was this patch tested?

Will add later


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-15776

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13524.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13524


commit 10b906a5d9d04df515675578df6d624c55b9ea41
Author: Sephiroth-Lin 
Date:   2016-06-06T14:42:42Z

Type coercion incorrect




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-10-21 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-150096989
  
@cloud-fan OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-10-21 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/7417


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-10-13 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r41956822
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -274,12 +275,30 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   }
 
   object CartesianProduct extends Strategy {
+def getSmallSide(left: LogicalPlan, right: LogicalPlan): BuildSide = {
+  if (right.statistics.sizeInBytes < left.statistics.sizeInBytes) {
+joins.BuildRight
+  } else {
+joins.BuildLeft
+  }
+}
--- End diff --

OK, no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-10-13 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r41855351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala
 ---
@@ -28,9 +28,17 @@ import org.apache.spark.sql.execution.metric.SQLMetrics
  * :: DeveloperApi ::
  */
 @DeveloperApi
-case class CartesianProduct(left: SparkPlan, right: SparkPlan) extends 
BinaryNode {
+case class CartesianProduct(
+left: SparkPlan,
+right: SparkPlan,
+buildSide: BuildSide) extends BinaryNode {
--- End diff --

@yhuai use buildSide just want to know which side is small, and use this to 
decide whether we need to change the order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9596][SQL]treat hadoop classes as share...

2015-10-13 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7931#discussion_r41853694
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
 ---
@@ -124,6 +124,7 @@ private[hive] class IsolatedClientLoader(
 name.contains("slf4j") ||
 name.contains("log4j") ||
 name.startsWith("org.apache.spark.") ||
+(name.startsWith("org.apache.hadoop.") && 
!name.startsWith("org.apache.hadoop.hive.")) ||
--- End diff --

scope is too huge, and without reload hadoop, will only have one 
FileSystem.Cache which will cause 
[SPARK-11083](https://issues.apache.org/jira/browse/SPARK-11083)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-17 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7853#issuecomment-141082236
  
@andrewor14 I have set stopped to private[spark], @liancheng @yhuai any 
thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-09-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-138554397
  
@scwf done. @zsxwing updated code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-09-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r38504238
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala
 ---
@@ -27,16 +27,27 @@ import org.apache.spark.sql.execution.{BinaryNode, 
SparkPlan}
  * :: DeveloperApi ::
  */
 @DeveloperApi
-case class CartesianProduct(left: SparkPlan, right: SparkPlan) extends 
BinaryNode {
+case class CartesianProduct(
+left: SparkPlan,
+right: SparkPlan,
+buildSide: BuildSide) extends BinaryNode {
   override def output: Seq[Attribute] = left.output ++ right.output
 
+  private val (small, big) = buildSide match {
+case BuildRight => (left, right)
+case BuildLeft => (right, left)
+  }
+
   protected override def doExecute(): RDD[InternalRow] = {
-val leftResults = left.execute().map(_.copy())
-val rightResults = right.execute().map(_.copy())
+val leftResults = small.execute().map(_.copy())
+val rightResults = big.execute().map(_.copy())
--- End diff --

@davies Sorry, I don't very clear. Use zipPartition() can get two iters, 
then we use these 2 iters do cartesian by ourselfe, don't call cartesian()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9519][Yarn] Confirm stop sc successfull...

2015-08-04 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7846#issuecomment-127815406
  
@vanzin @srowen Updated, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9519][Yarn] Confirm stop sc successfull...

2015-08-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7846#issuecomment-12629
  
Yes, this change doesn't stop this sequence from happening. As monitor 
thread is daemon thread, we don't need call interrupt as sc.stop().
Below I am not very clear:
1. there's still a race condition
2. The thread can have a "stop" method that interrupts it only if it's 
blocked in monitorApplication
Thank you!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-08-01 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/7853

[SPARK-9522][SQL] SparkSubmit process can not exit if kill application when 
HiveThriftServer was starting

When we start HiveThriftServer, we will start SparkContext first, then 
start HiveServer2, if we kill application while HiveServer2 is starting then 
SparkContext will stop successfully, but SparkSubmit process can not exit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-9522

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7853.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7853


commit a48482c803a74b2e51a0257b8f2185ff9136559c
Author: linweizhong 
Date:   2015-08-01T08:26:12Z

SparkSubmit process can not exit if kill application when HiveThriftServer
was starting




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-08-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-126880682
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9519][Yarn] Confirm stop sc successfull...

2015-08-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7846#issuecomment-126879883
  
@srowen We need call interrupt in YarnClientSchedulerBackend.stop(), 
details see PR #5305 and PR #3143, so even if we call sc.stop() in the finally 
block of the monitor thread it also can not stop successfully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-31 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-126856094
  
@hvanhovell Good suggestion, thank you, updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9519][Yarn] Confirm stop sc successfull...

2015-07-31 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/7846

[SPARK-9519][Yarn] Confirm stop sc successfully when application was killed

Currently, when we kill application on Yarn, then will call sc.stop() at 
Yarn application state monitor thread, then in 
YarnClientSchedulerBackend.stop() will call interrupt this will cause 
SparkContext not stop fully as we will wait executor to exit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-9519

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7846.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7846


commit 243d2c79b7587e33bf32d8df3c5adcbe6fa9b251
Author: linweizhong 
Date:   2015-08-01T03:05:21Z

Confirm stop sc successfully when application was killed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-22 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-123925858
  
@hvanhovell I use tpc-ds to test, for below SQL clause:
```
with single_value as (
  select 1 tpcds_val from date_dim
)
select sum(ss_quantity * ss_sales_price) ssales, tpcds_val
from store_sales, single_value
group by tpcds_val
```
use this patch run1h55min, without this patch run half tasks use 16.7h


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-21 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r35180395
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastCartesianProduct.scala
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.joins
+
+import scala.concurrent._
+import scala.concurrent.duration._
+
+import org.apache.spark.annotation.DeveloperApi
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, JoinedRow}
+import org.apache.spark.sql.execution.{BinaryNode, SparkPlan}
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class BroadcastCartesianProduct(
--- End diff --

BroadcastNestedLoopJoin just used for out join right? But this is used for 
cartesian.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7417#discussion_r34754893
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProduct.scala
 ---
@@ -34,7 +34,15 @@ case class CartesianProduct(left: SparkPlan, right: 
SparkPlan) extends BinaryNod
 val leftResults = left.execute().map(_.copy())
 val rightResults = right.execute().map(_.copy())
 
-leftResults.cartesian(rightResults).mapPartitions { iter =>
+val cartesianRdd = if (leftResults.partitions.size > 
rightResults.partitions.size) {
+  rightResults.cartesian(leftResults).mapPartitions { iter =>
+iter.map(tuple => (tuple._2, tuple._1))
+  }
+} else {
+  leftResults.cartesian(rightResults)
+}
+
+cartesianRdd.mapPartitions { iter =>
   val joinedRow = new JoinedRow
--- End diff --

@hvanhovell Yes, use sizeInBytes is better, but also have a problem, if 
leftResults only have 1 record and this record size are big, and rightResults 
have many records and these records total size are small, then at this scenario 
will cause worse performance. The best way is we check the total records for 
the partition, but now we can not get it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7417#issuecomment-121588200
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9066][SQL] Improve cartesian performanc...

2015-07-15 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/7417

[SPARK-9066][SQL] Improve cartesian performance

see jira https://issues.apache.org/jira/browse/SPARK-9066

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-9066

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7417


commit 0a620989e1e857ba9c84389493dc5f45a29450f6
Author: linweizhong 
Date:   2015-07-15T09:17:01Z

Optimize cartesian order




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...

2015-07-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/7209


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...

2015-07-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7209#issuecomment-119504817
  
@liancheng OK, no problem. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...

2015-07-06 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7209#issuecomment-119064179
  
@liancheng I have updated, please help to review, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...

2015-07-05 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/7209#issuecomment-118699916
  
@liancheng OK, good, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8811][SQL] Read array struct data from ...

2015-07-03 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/7209

[SPARK-8811][SQL] Read array struct data from parquet error

JIRA:https://issues.apache.org/jira/browse/SPARK-8811

For example:
we have a table: 
```
t1(c1 string, c2 string, arr_c1 array>, 
arr_c2 array>)
we save data in parquet.
for select * from t1, we know in parquet the fileSchema may be:
message hive_schema {
  optional binary c1;
  optional binary c2;
  optional group arr_c1 (LIST) {
repeated group bag {
  optional group array_element {
optional binary IN_C1;
optional binary IN_C2;
  }
}
  }
  optional group arr_c2 (LIST) {
repeated group bag {
  optional group array_element {
optional binary IN_C1;
optional binary IN_C2;
  }
}
  }
}
but the requestSchema is:
message root {
  optional binary c1;
  optional binary c2;
  optional group arr_c1 (LIST) {
repeated group bag {
  optional group element {
optional binary IN_C1;
optional binary IN_C2;
  }
}
  }
  optional group arr_c2 (LIST) {
repeated group bag {
  optional group element {
optional binary IN_C1;
optional binary IN_C2;
  }
}
  }
}
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-8811

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7209


commit ecd25477abd6735514ab48549a4a937bf6d00f42
Author: linweizhong 
Date:   2015-07-03T07:55:00Z

Change schema for array type from element to array_element




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8162][BUILD] Run spark-shell cause Null...

2015-06-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/6704


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8162][BUILD] Run spark-shell cause Null...

2015-06-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6704#issuecomment-110231290
  
Close it first as PR #6711 can fix NPE, if we find the root cause of why 
the `@VisibleForTesting` annotation causes a NPE in the shell then reopen it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8162][BUILD] Run spark-shell cause Null...

2015-06-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6704#issuecomment-109965178
  
@srowen I build the Spark with comman **`mvn -Pyarn -Phadoop-2.4 
-Dhadoop.version=2.7.0 -Phive -Phive-thriftserver -Psparkr -DskipTests 
package`** and run spark-shell with comman **`./bin/spark-shell --master 
yarn-client`**

* Maven: 3.x
* JDK: 1.8.0_40
* OS: SUSE 11 SP3



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Run spark-shell cause NullPointerException

2015-06-08 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/6704

Run spark-shell cause NullPointerException

see jira https://issues.apache.org/jira/browse/SPARK-8162
JDK: 1.8.0_40
Hadoop: 2.7.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-8162

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6704


commit 9e8a3fec918d6d11dada6f1bf2db94df8b668537
Author: linweizhong 
Date:   2015-06-08T11:07:17Z

Add com.google.common.annotations.VisibleForTesting to assembly jar as we
need it




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-06-07 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6409#issuecomment-109820635
  
@srowen @vanzin  This PR can cleanup correctly. I just mean without this PR 
even if we add KILLED status on ApplicationMaster to check, then it can not 
cleanup when then application is killed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-06-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6409#issuecomment-108165847
  
@vanzin I have tested again, and below is the result of final status when 
we use yarn to kill the application:
 \  | YARN  UI | Driver Log | AppMaster Log
--- |--||--
yarn-client |  KILLED  |   KILLED   |FAILED
--- |--||--
yarn-cluster|  KILLED  ||   UNDEFINED


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-06-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6409#discussion_r31500453
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -91,51 +91,54 @@ private[spark] class Client(
* available in the alpha API.
*/
   def submitApplication(): ApplicationId = {
-var appId: ApplicationId = null
+// Before we submit current application, we cleanup staging director 
as some old appStagingDir
+// can not be deleted when those old jobs are failed or killed and so 
on, please see SPARK-7705
+// and SPARK-7503 for details.
+cleanupStagingDir()
+
+// Setup the credentials before doing anything else, so we have don't 
have issues at any point.
+setupCredentials()
+yarnClient.init(yarnConf)
+yarnClient.start()
+
+logInfo("Requesting a new application from cluster with %d 
NodeManagers"
+  .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
+
+// Get a new application from our RM
+val newApp = yarnClient.createApplication()
+val newAppResponse = newApp.getNewApplicationResponse()
+val appId = newAppResponse.getApplicationId()
+
+// Verify whether the cluster has enough resources for our AM
+verifyClusterResources(newAppResponse)
+
+// Set up the appropriate contexts to launch our AM
+val containerContext = createContainerLaunchContext(newAppResponse)
+val appContext = createApplicationSubmissionContext(newApp, 
containerContext)
+
+// Finally, submit and monitor the application
+logInfo(s"Submitting application ${appId.getId} to ResourceManager")
+yarnClient.submitApplication(appContext)
+appId
+  }
+
+  /**
+   * Cleanup  all subdirectory of SPARK_STAGING directory.
+   */
+  private def cleanupStagingDir(): Unit = {
+val stagingDirPath = new Path(SPARK_STAGING)
--- End diff --

I'm so sorry, thank you for point out my mistake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-06-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6409#discussion_r31490416
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -825,6 +813,9 @@ private[spark] class Client(
* throw an appropriate SparkException.
*/
   def run(): Unit = {
+// Cleanup staging director as some appStagingDir can not be deleted 
when job is failed or
+// killed, please see SPARK-7705 for details.
+cleanupStagingDir()
--- End diff --

1. Cleanup old application staging directory before submit current 
application.
2. Yes, if called on run on yarn-client will not work, yarn-cluster is ok, 
so called on submitApplication().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-06-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6409#issuecomment-107399469
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-05-31 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/6409#discussion_r31397611
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -849,6 +852,27 @@ private[spark] class Client(
   }
 }
   }
+
+  private def cleanupStagingDir(): Unit = {
--- End diff --

Yes, we need to refactor, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-05-29 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5643#discussion_r31304130
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastLeftSemiJoinHash.scala
 ---
@@ -32,36 +32,59 @@ case class BroadcastLeftSemiJoinHash(
 leftKeys: Seq[Expression],
 rightKeys: Seq[Expression],
 left: SparkPlan,
-right: SparkPlan) extends BinaryNode with HashJoin {
+right: SparkPlan,
+condition: Option[Expression]) extends BinaryNode with HashJoin {
 
   override val buildSide: BuildSide = BuildRight
 
   override def output: Seq[Attribute] = left.output
 
+  @transient private lazy val boundCondition =
+newPredicate(condition.getOrElse(Literal(true)), left.output ++ 
right.output)
+
   protected override def doExecute(): RDD[Row] = {
 val buildIter= buildPlan.execute().map(_.copy()).collect().toIterator
-val hashSet = new java.util.HashSet[Row]()
-var currentRow: Row = null
 
-// Create a Hash set of buildKeys
-while (buildIter.hasNext) {
-  currentRow = buildIter.next()
-  val rowKey = buildSideKeyGenerator(currentRow)
-  if (!rowKey.anyNull) {
-val keyExists = hashSet.contains(rowKey)
-if (!keyExists) {
-  hashSet.add(rowKey)
+condition match {
+  case None =>
+val hashSet = new java.util.HashSet[Row]()
+var currentRow: Row = null
+
+// Create a Hash set of buildKeys
+while (buildIter.hasNext) {
+  currentRow = buildIter.next()
+  val rowKey = buildSideKeyGenerator(currentRow)
+  if (!rowKey.anyNull) {
+val keyExists = hashSet.contains(rowKey)
+if (!keyExists) {
+  hashSet.add(rowKey)
+}
+  }
 }
-  }
-}
 
-val broadcastedRelation = sparkContext.broadcast(hashSet)
+val broadcastedRelation = sparkContext.broadcast(hashSet)
 
-streamedPlan.execute().mapPartitions { streamIter =>
-  val joinKeys = streamSideKeyGenerator()
-  streamIter.filter(current => {
-!joinKeys(current).anyNull && 
broadcastedRelation.value.contains(joinKeys.currentValue)
-  })
+streamedPlan.execute().mapPartitions { streamIter =>
+  val joinKeys = streamSideKeyGenerator()
+  streamIter.filter(current => {
+!joinKeys(current).anyNull && 
broadcastedRelation.value.contains(joinKeys.currentValue)
+  })
+}
+  case _ =>
+val hashRelation = HashedRelation(buildIter, buildSideKeyGenerator)
+val broadcastedRelation = sparkContext.broadcast(hashRelation)
+
+streamedPlan.execute().mapPartitions { streamIter =>
+  val joinKeys = streamSideKeyGenerator()
+  val joinedRow = new JoinedRow
+
+  streamIter.filter(current => {
+val rowBuffer = 
broadcastedRelation.value.get(joinKeys.currentValue)
--- End diff --

we need to apply first before we get currentValue, or will get null for the 
first row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-05-28 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6409#issuecomment-106286738
  
@tgravescs yes, if yarn do it is better,  but now it didn't, so as @vanzin 
said may be we can do it when launcher, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-05-26 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6409#issuecomment-105715873
  
@tgravescs I have tested below:
max retried is defaule, use yarn -kill to kill application when application 
start running, run SparkPi with parameter 2.
---
yarn-cluster: 
YARN UI  AppMaster Log(add code to print the final status on 
ApplicationMaster.scala line 127)
KILLEDFAILED
---
yarn-client:
YARN UI  AppMaster Log
KILLEDUNDEFINED
---

@vanzin yes, this may break application retries, we need to consider more, 
and I will try.
@srowen @tgravescs @vanzin thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7339][PySpark] PySpark shuffle spill me...

2015-05-26 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5887#issuecomment-105516789
  
ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7705][Yarn] Cleanup of .sparkStaging di...

2015-05-26 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/6409

[SPARK-7705][Yarn] Cleanup of .sparkStaging directory fails if application 
is killed

As I have tested, if we cancel or kill the app then the final status may be 
undefined, killed or succeeded, so clean up staging directory when appMaster 
exit at any final application status.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-7705

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6409


commit 95595c30cce6708bd0470f66b79b3ed9d66a5d03
Author: linweizhong 
Date:   2015-05-26T12:27:43Z

Cleanup of .sparkStaging directory when AppMaster exit at any final
application status




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7339][PySpark] PySpark shuffle spill me...

2015-05-17 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5887#issuecomment-102756602
  
@andrewor14 what's your opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7339][PySpark] PySpark shuffle spill me...

2015-05-14 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5887#issuecomment-101951195
  
@davies what's your opinion now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7595][SQL] Window will cause resolve fa...

2015-05-13 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6114#issuecomment-10184
  
@scwf @yhuai Done, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7595][SQL] Window will cause resolve fa...

2015-05-13 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/6114

[SPARK-7595][SQL] Window will cause resolve failed with self join

for example:
table: src(key string, value string)
sql: with v1 as(select key, count(value) over (partition by key) cnt_val 
from src), v2 as(select v1.key, v1_lag.cnt_val from v1, v1 v1_lag where v1.key 
= v1_lag.key) select * from v2 limit 5;
then will analyze fail when resolving conflicting references in Join:
'Limit 5
 'Project [*]
  'Subquery v2
   'Project ['v1.key,'v1_lag.cnt_val]
'Filter ('v1.key = 'v1_lag.key)
 'Join Inner, None
  Subquery v1
   Project [key#95,cnt_val#94L]
Window [key#95,value#96], 
[HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(value#96)
 WindowSpecDefinition [key#95], [], ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING AS cnt_val#94L], WindowSpecDefinition [key#95], [], ROWS 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
 Project [key#95,value#96]
  MetastoreRelation default, src, None
  Subquery v1_lag
   Subquery v1
Project [key#97,cnt_val#94L]
 Window [key#97,value#98], 
[HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCount(value#98)
 WindowSpecDefinition [key#97], [], ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING AS cnt_val#94L], WindowSpecDefinition [key#97], [], ROWS 
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  Project [key#97,value#98]
   MetastoreRelation default, src, None

Conflicting attributes: cnt_val#94L

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark spark-7595

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6114


commit dfe9169c10360417e705516f87bfed29d7eef01d
Author: linweizhong 
Date:   2015-05-13T06:56:16Z

Handle windowExpression with self join




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7526][SparkR] Specify ip of RBackend, M...

2015-05-11 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/6053#issuecomment-101105615
  
@shivaram Yes, I also think there should be no problems, as it is not 
system dependent. I will test this on Windows, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7526][SparkR] Specify ip of RBackend, M...

2015-05-11 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/6053

[SPARK-7526][SparkR] Specify ip of RBackend, MonitorServer and RRDD Socket 
server

These R process only used to communicate with JVM process on local, so 
binding to localhost is more reasonable then wildcard ip.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark spark-7526

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6053


commit 5303af767b21ddeb4e57faeb5774f3ebc498733c
Author: linweizhong 
Date:   2015-05-11T12:54:51Z

bind to localhost rather than wildcard ip




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor][PySpark] Set PYTHONPATH to python/lib/...

2015-05-10 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/6047

[Minor][PySpark] Set PYTHONPATH to python/lib/pyspark.zip rather than 
python/pyspark

As PR#5580 we have create pyspark.zip on building and set PYTHONPATH to 
python/lib/pyspark.zip, so to keep consistence update this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark pyspark_pythonpath

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6047


commit 8cc3d96da953292ae9a34917008ff0536cfb4381
Author: linweizhong 
Date:   2015-05-11T02:35:34Z

Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as
PR#5580 we have create pyspark.zip on build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7339][PySpark] PySpark shuffle spill me...

2015-05-05 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5887#discussion_r29729314
  
--- Diff: python/pyspark/shuffle.py ---
@@ -362,7 +362,9 @@ def _spill(self):
 
 self.spills += 1
 gc.collect()  # release the memory as much as possible
-MemoryBytesSpilled += (used_memory - get_used_memory()) << 20
+memorySpilled = used_memory - get_used_memory()
--- End diff --

Updated, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7339][PySpark] PySpark shuffle spill me...

2015-05-04 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5887#issuecomment-98901992
  
Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7339][PySpark] PySpark shuffle spill me...

2015-05-04 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5887

[SPARK-7339][PySpark] PySpark shuffle spill memory sometimes are not correct



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark spark-7339

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5887


commit d41672b70c44003ff8c1ad8f3703f6da52c824a4
Author: linweizhong 
Date:   2015-05-04T12:28:28Z

Update MemoryBytesSpilled when memorySpilled > 0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-29 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5580#issuecomment-97346388
  
If user don't use make-distribution.sh and just compile Spark use maven or 
sbt, then don't have pyspark.zip. So we really don't need to do the zip in the 
code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-27 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-96867560
  
@tgravescs yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-25 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-96145643
  
@andrewor14 @sryza how about your opinions? thanks. @lianhuiwang please 
help me review this, thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [PySpark][Minor] Update sql example, so that c...

2015-04-24 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5684

[PySpark][Minor] Update sql example, so that can read file correctly

To run Spark, default will read file from HDFS if we don't set the schema.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark pyspark_example_minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5684


commit 19fe145e7a00574080b91d311376b6d2cdb4254e
Author: linweizhong 
Date:   2015-04-24T09:16:23Z

Update example sql.py, so that can read file correctly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5689][Doc] Document what can be run in ...

2015-04-23 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/5490


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-22 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-95102969
  
@andrewor14 Sorry, these days I am busy, now I have update the code. ^-^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-19 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-94331295
  
@lianhuiwang OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...

2015-04-17 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5256#issuecomment-93915251
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-16 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-93724717
  
@andrewor14 @sryza @WangTaoTheTonic As I have test again, if we install 
Spark on each node, then we can set 
spark.executorEnv.PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip
 to pass PYTHONPATH to executor. So this PR is another solution to run PySpark 
on yan if we don't install Spark on each node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-16 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-93705830
  
@andrewor14 @sryza Done, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6604][PySpark]Specify ip of python serv...

2015-04-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5256#issuecomment-93650104
  
@srowen OK, thanks.
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Pass PYTHONPATH to execu...

2015-04-15 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5478#issuecomment-93270239
  
@andrewor14 @sryza Yes, to assume that the python files will already be 
present on the slave machines is not very reasonable. But if user want to use 
PySpark, then they must compile the Spark in JDK1.6, but I think now most user 
are use JDK1.7+. Maybe a good solution is package the PySpark in another jar 
and automatically shipped by YARN to all containers. And add this jar to 
PYTHONPATH with asseambly jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5689][Doc] Document what can be run in ...

2015-04-13 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5490

[SPARK-5689][Doc] Document what can be run in different YARN modes



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-5689

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5490.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5490


commit 97ba6a8f7a91433f74fe81e1107d649203621192
Author: linweizhong 
Date:   2015-04-13T12:37:07Z

Document what can be run in different YARN modes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6870][Yarn] Catch InterruptedException ...

2015-04-13 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5479#discussion_r28231958
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -128,10 +128,14 @@ private[spark] class YarnClientSchedulerBackend(
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
 val t = new Thread {
   override def run() {
-val (state, _) = client.monitorApplication(appId, 
logApplicationReport = false)
-logError(s"Yarn application has already exited with state $state!")
-sc.stop()
-Thread.currentThread().interrupt()
+try {
+  val (state, _) = client.monitorApplication(appId, 
logApplicationReport = false)
--- End diff --

We interrupt the monitor thread when we call stop(), so don't need to call 
sc.stop() again. We add sc.stop() after client.monitorApplication return just 
to confirm we can stop SparkContext when app has finished/failed/killed before 
we stop SparkContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6870][Yarn] Catch InterruptedException ...

2015-04-12 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5479#discussion_r28210698
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -128,10 +128,14 @@ private[spark] class YarnClientSchedulerBackend(
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
 val t = new Thread {
   override def run() {
-val (state, _) = client.monitorApplication(appId, 
logApplicationReport = false)
-logError(s"Yarn application has already exited with state $state!")
-sc.stop()
-Thread.currentThread().interrupt()
+try {
+  val (state, _) = client.monitorApplication(appId, 
logApplicationReport = false)
--- End diff --

Yes, we don't need to call Thread.currentThread().interrupt() here, but I 
think we need to stop the SparkContext. If user kill the app on Yarn, then we 
need to stop the SparkContext right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6870][Yarn] Catch InterruptedException ...

2015-04-12 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5479

[SPARK-6870][Yarn] Catch InterruptedException when yarn application state 
monitor thread been interrupted

On PR #5305 we interrupt the monitor thread but forget to catch the 
InterruptedException, then in the log will print the stack info, so we need to 
catch it.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-6870

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5479


commit 3513fdb943c41e5242ad187dccddad65a9870288
Author: linweizhong 
Date:   2015-04-12T08:13:20Z

Catch InterruptedException

commit 0d8958a28addb68c9263679e898c286cbfdc9eff
Author: linweizhong 
Date:   2015-04-12T08:16:16Z

Update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6869][PySpark] Pass PYTHONPATH to execu...

2015-04-12 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5478

[SPARK-6869][PySpark] Pass PYTHONPATH to executor, so that executor can 
read pyspark file from local file system on executor node

From SPARK-1920 and SPARK-1520 we know PySpark on Yarn can not work when 
the assembly jar are package by JDK 1.7+, so pass the PYTHONPATH(set in 
spark-env.sh) to executor so that executor python process can read pyspark file 
from local file system rather than from assembly jar.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-6869

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5478.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5478


commit 413fa25dde845146153a58793ca6b3ec3a820ea8
Author: linweizhong 
Date:   2015-04-12T08:02:43Z

Pass PYTHONPATH to executor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4346][SPARK-3596][YARN] Commonize the m...

2015-04-07 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5305#discussion_r27939765
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -127,23 +127,11 @@ private[spark] class YarnClientSchedulerBackend(
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
 val t = new Thread {
   override def run() {
-while (!stopping) {
-  var state: YarnApplicationState = null
-  try {
-val report = client.getApplicationReport(appId)
-state = report.getYarnApplicationState()
-  } catch {
-case e: ApplicationNotFoundException =>
-  state = YarnApplicationState.KILLED
-  }
-  if (state == YarnApplicationState.FINISHED ||
-state == YarnApplicationState.KILLED ||
-state == YarnApplicationState.FAILED) {
-logError(s"Yarn application has already exited with state 
$state!")
-sc.stop()
-stopping = true
-  }
-  Thread.sleep(1000L)
+val (state, _) = client.monitorApplication(appId, 
logApplicationReport = false)
+if (!stopping) {
--- End diff --

Right. We need to interrupt the thread in stop().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4346][SPARK-3596][YARN] Commonize the m...

2015-04-05 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5305#discussion_r2372
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -559,50 +560,56 @@ private[spark] class Client(
 var lastState: YarnApplicationState = null
 while (true) {
   Thread.sleep(interval)
-  val report = getApplicationReport(appId)
-  val state = report.getYarnApplicationState
-
-  if (logApplicationReport) {
-logInfo(s"Application report for $appId (state: $state)")
-val details = Seq[(String, String)](
-  ("client token", getClientToken(report)),
-  ("diagnostics", report.getDiagnostics),
-  ("ApplicationMaster host", report.getHost),
-  ("ApplicationMaster RPC port", report.getRpcPort.toString),
-  ("queue", report.getQueue),
-  ("start time", report.getStartTime.toString),
-  ("final status", report.getFinalApplicationStatus.toString),
-  ("tracking URL", report.getTrackingUrl),
-  ("user", report.getUser)
-)
-
-// Use more loggable format if value is null or empty
-val formattedDetails = details
-  .map { case (k, v) =>
-  val newValue = Option(v).filter(_.nonEmpty).getOrElse("N/A")
-  s"\n\t $k: $newValue" }
-  .mkString("")
-
-// If DEBUG is enabled, log report details every iteration
-// Otherwise, log them every time the application changes state
-if (log.isDebugEnabled) {
-  logDebug(formattedDetails)
-} else if (lastState != state) {
-  logInfo(formattedDetails)
+  try {
+val report = getApplicationReport(appId)
--- End diff --

Done. Thank you!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-04-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/5292


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-04-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5292#discussion_r27647902
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -125,6 +125,7 @@ private[spark] class YarnClientSchedulerBackend(
*/
   private def asyncMonitorApplication(): Unit = {
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
+val interval = conf.getLong("spark.yarn.client.progress.pollinterval", 
1000)
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4346][SPARK-3596][YARN] Commonize the m...

2015-04-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5305#issuecomment-88838919
  
@srowen unit tests failed at run Python app on yarn-cluster mode, I think 
this didn't cause by this PR, please ask jenkins to retest, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-04-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5292#discussion_r27642711
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -125,6 +125,7 @@ private[spark] class YarnClientSchedulerBackend(
*/
   private def asyncMonitorApplication(): Unit = {
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
+val interval = conf.getLong("spark.yarn.client.progress.pollinterval", 
1000)
--- End diff --

@srowen Yes, #5305 can solve this issue, may be we can close this PR first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-04-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5292#discussion_r27636093
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -125,6 +125,7 @@ private[spark] class YarnClientSchedulerBackend(
*/
   private def asyncMonitorApplication(): Unit = {
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
+val interval = conf.getLong("spark.yarn.client.progress.pollinterval", 
1000)
--- End diff --

Yeah, you are right. In PR #5305 I use the client.monitorApplication, then 
we can use "spark.yarn.report.interval" to changing the yarn client monitor 
interval. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4346][SPARK-3596][YARN] Commonize the m...

2015-04-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5305#issuecomment-88752700
  
Jenkins, retest please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-04-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/5294#issuecomment-88708773
  
@tgravescs @srowen @sryza As i have retest again, if we don't populate 
hadoop classpath, then in all case it dosen't work. This PR cann't solve this 
issue, i will close it later, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-04-01 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/5294


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4346][SPARK-3596][YARN] Commonize the m...

2015-03-31 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5305

[SPARK-4346][SPARK-3596][YARN] Commonize the monitor logic

1. YarnClientSchedulerBack.asyncMonitorApplication use 
Client.monitorApplication so that commonize the monitor logic
2. Support changing the yarn client monitor interval, see #5292

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-4346_3596

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5305.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5305


commit 568f46f6cd4ed38ddf8a018d8d532f9be2228045
Author: unknown 
Date:   2015-04-01T05:50:25Z

YarnClientSchedulerBack.asyncMonitorApplication should be common with
Client.monitorApplication

commit 6b47ff7c21daf0db42e9a7f3233daf90bb70ee63
Author: unknown 
Date:   2015-04-01T06:17:14Z

Update code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-03-31 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5292#discussion_r27540657
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 ---
@@ -125,6 +125,7 @@ private[spark] class YarnClientSchedulerBackend(
*/
   private def asyncMonitorApplication(): Unit = {
 assert(client != null && appId != null, "Application has not been 
submitted yet!")
+val interval = conf.getLong("spark.yarn.client.progress.pollinterval", 
1000)
--- End diff --

Thank you, but as it is the client to get the application report from the 
RM, so maybe "spark.yarn.client.progress.pollinterval" is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-03-31 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5294

[SPARK-1502][YARN]Add config option to not include yarn/mapred cluster 
classpath



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-1502

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5294


commit 96aa689b8b65ce73e13e4f48a49b85a5f8ed751a
Author: unknown 
Date:   2015-03-31T11:31:13Z

Add config option to not include yarn/mapred cluster classpath




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-03-31 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5292

[SPARK-3596][YARN]Support changing the yarn client monitor interval



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-3596

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5292


commit 7d6c4746986f78a37f31a12b92e0cf14332a01a4
Author: unknown 
Date:   2015-03-31T11:08:26Z

Support changing the yarn client monitor interval




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Specify ip of python server scoket

2015-03-29 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/5256

Specify ip of python server scoket

In driver now will start a server socket and use a wildcard ip, use 
127.0.0.0 is more reasonable, as we only use it by local Python process.
/cc @davies

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-6604

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5256


commit c88bee9819eef5a8091357d6a239e9ab61da0050
Author: unknown 
Date:   2015-03-30T06:21:07Z

Specify ip of python server scoket




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-03-05 Thread Sephiroth-Lin
Github user Sephiroth-Lin closed the pull request at:

https://github.com/apache/spark/pull/4620


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-03-02 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-76895189
  
@srowen ok, pls help to close this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-24 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-75919413
  
@srowen as PR #4747 will cache the local root directories, then we can 
close this PR first. For PR  #4747 I think we also need to remove the local 
root directories after application is exited or SparkContext is stoped, or else 
also will create too many empty directories.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5801] [core] Avoid creating nested dire...

2015-02-24 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4747#discussion_r25322767
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -728,6 +746,11 @@ private[spark] object Utils extends Logging {
 localDirs
   }
 
+  /** Used by unit tests. Do not call from other places. */
+  private[spark] def clearLocalRootDirs(): Unit = {
--- End diff --

May be we can call this function to delete the local root directory in 
non-yarn mode when application is exited or SparkContext is stoped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-18 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74864163
  
@srowen ok, thank you. If this subdirectory is really needed, may be we can 
add code to delete this subdirectory after jvm exit or sc.stop().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-18 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74860104
  
@srowen as in function "getOrCreateLocalRootDirs" will create a 
subdirectory for root local dir, then if we call "getLocalDir" will create a 
subdirectory for root local dir. who call getOrCreateLocalRootDirs directly. In 
current master branch, when we create tmp dir will call getLocaalDir first, so 
it will create nested directories. And in standalone mode, will create tmp dir 
first when lunch executor, so total it will create 4 levels directories, in 
other mode it will create 2 levels directories for all tmp dir.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-16 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4620#issuecomment-74604408
  
@srowen yes, this is same as SPARK-5801. In standalone, worker will create 
temp directories for executor, so if we create an unnecessary directory for 
local root directory, then when we create temp directory will create too many 
nested directories.

@srowen @andrewor14 from the CI report, test failed is not caused by this 
PR, can retest it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...

2015-02-15 Thread Sephiroth-Lin
GitHub user Sephiroth-Lin opened a pull request:

https://github.com/apache/spark/pull/4620

[SPARK-5830][Core]Don't create unnecessary directory for local root dir

Now will create an unnecessary directory for local root directory, and this 
directory will not be deleted after application exit.
For example:
before will create tmp dir like "/tmp/spark-UUID"
now will create tmp dir like "/tmp/spark-UUID/spark-UUID"
so the dir "/tmp/spark-UUID" will not be deleted as a local root directory.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Sephiroth-Lin/spark SPARK-5830

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4620.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4620


commit 916fa04408e4d14b10734537402603b92763ca6d
Author: Sephiroth-Lin 
Date:   2015-02-16T06:36:53Z

Don't create unnecessary directory for local root dir

commit 26670d83fee7c3bc0681ca775ab9f0dbc3da9d2d
Author: Sephiroth-Lin 
Date:   2015-02-16T06:37:37Z

Don't create unnecessary directory for local root dir




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

2015-02-10 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4412#issuecomment-73682144
  
@srowen thank you, please help to check again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

2015-02-10 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4412#discussion_r24404265
  
--- Diff: core/src/main/scala/org/apache/spark/HttpFileServer.scala ---
@@ -50,6 +50,15 @@ private[spark] class HttpFileServer(
 
   def stop() {
 httpServer.stop()
+
+// If we only stop sc, but the driver process still run as a services 
then we need to delete 
+// the tmp dir, if not, it will create too many tmp dirs
+try {
+  Utils.deleteRecursively(baseDir)
+} catch {
+  case e: Exception =>
+logWarning("Exception while deleting Spark temp dir: " + 
baseDir.getAbsolutePath, e)
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

2015-02-09 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on the pull request:

https://github.com/apache/spark/pull/4412#issuecomment-73655978
  
@srowen thank you, now I add a member to store the reference of the tmp dir 
if it was created, please help to check again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

2015-02-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4412#discussion_r24306573
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -93,6 +93,19 @@ class SparkEnv (
 // actorSystem.awaitTermination()
 
 // Note that blockTransferService is stopped by BlockManager since it 
is started by it.
+
+// If we only stop sc, but the driver process still run as a services 
then we need to delete
+// the tmp dir, if not, it will create too many tmp dirs.
+// We only need to delete the tmp dir create by driver, because 
sparkFilesDir is point to the
+// current working dir in executor which we do not need to delete.
+if (SparkContext.DRIVER_IDENTIFIER == executorId) {
--- End diff --

@srowen Than you. If we want to make this much more intimately bound, may 
be we can check the sparkFilesDir directly, or else, we need to add a parameter 
to SparkEnv class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...

2015-02-08 Thread Sephiroth-Lin
Github user Sephiroth-Lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/4412#discussion_r24305832
  
--- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala ---
@@ -93,6 +93,19 @@ class SparkEnv (
 // actorSystem.awaitTermination()
 
 // Note that blockTransferService is stopped by BlockManager since it 
is started by it.
+
+// If we only stop sc, but the driver process still run as a services 
then we need to delete
+// the tmp dir, if not, it will create too many tmp dirs.
+// We only need to delete the tmp dir create by driver, because 
sparkFilesDir is point to the
+// current working dir in executor which we do not need to delete.
+if (SparkContext.DRIVER_IDENTIFIER == executorId) {
--- End diff --

@srowen sorry, I don't very clear. You mean we can not use the executorId 
to distinguish the driver and executor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >