[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19086
  
yes, correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19086#discussion_r136747432
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -569,46 +569,51 @@ class SessionCatalog(
   /**
* Rename a table.
*
-   * If a database is specified in `oldName`, this will rename the table 
in that database.
+   * If the database specified in `newName` is different from the one 
specified in `oldName`,
+   * It will result in moving table across databases.
+   *
* If no database is specified, this will first attempt to rename a 
temporary table with
-   * the same name, then, if that does not exist, rename the table in the 
current database.
+   * the same name, then, if that does not exist, current database will be 
used for locating
+   * `oldName` or `newName`.
--- End diff --

So should I make the comment here unchanged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19086
  
@gatorsmile 
I updated, let me known if there's still comments not resolved.
Thanks again for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-09-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19086#discussion_r136719337
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -502,17 +502,16 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 val storageWithNewPath = if (rawTable.tableType == MANAGED && 
hasPathOption) {
   // If it's a managed table with path option and we are renaming it, 
then the path option
   // becomes inaccurate and we need to update it according to the new 
table name.
-  val newTablePath = defaultTablePath(TableIdentifier(newName, 
Some(db)))
+  val newTablePath = defaultTablePath(TableIdentifier(newName, 
Some(newDb)))
--- End diff --


https://github.com/apache/spark/pull/19086/files#diff-8c4108666a6639034f0ddbfa075bcb37R827
Is this ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-09-02 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18270
  
Thanks for notification. Actually we implement the same logic with hive, 
though there's a bug ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-08-31 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19086
  
Thanks, I will refine soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19086: [SPARK-21874][SQL] Support changing database when rename...

2017-08-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19086
  
@gatorsmile 
Thanks for taking time look at this. I updated description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18270
  
Thank you so much !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19086: [SPARK-21874][SQL] Support changing database when...

2017-08-30 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/19086

[SPARK-21874][SQL] Support changing database when rename table.

## What changes were proposed in this pull request?

Support changing database of table by `alter table dbA.XXX rename to 
dbB.YYY`

## How was this patch tested?

Updated existing unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21874

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19086.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19086


commit 7865b3dbe5bc4cda263bc75f8c3524c7bb0fe81c
Author: jinxing <jinxing6...@126.com>
Date:   2017-08-30T03:09:57Z

Support changing database when rename table.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18713: [SPARK-21509][SQL] Add a config to enable adaptive query...

2017-08-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18713
  
cc @cenyuhai 
As we talked offline, maybe your have interest on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18270
  
@gatorsmile 
Could you please give some ideas why the value of `grouping_id()` generated 
in Spark is different from `grouping__id` Hive? Is it designed on purpose? A 
lot of our users are using `grouping__id` in `if(...)` clause. The 
incompatibility between Spark and Hive is making our migration very difficult. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-22 Thread jinxing64
Github user jinxing64 closed the pull request at:

https://github.com/apache/spark/pull/18866


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-22 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
@cloud-fan 
Thanks for reply. Looks like #19001 continues working on this and it's more 
comprehensive. I will close this pr for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-20 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18270
  
@cenyuhai 
Are you still working on this? Could please fix the test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-20 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
@cloud-fan 
Would you give some advice on this ? Thus I can know if I'm on the right 
direction. I can keep working on it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-20 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18270
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [WIP][SPARK-21649][SQL] Support writing data into hive b...

2017-08-19 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
In current change:
1. `ClusteredDistribution` becomes ClusteredDistribution(clustering: 
Seq[Expression], clustersOpt: Option[Int] = None, useHiveHash: Boolean = 
false)` -- a) number and  clusters can be specified, b) `useHiveHash` indicates 
whether Hive hash should be used for partitioning.
2. `InsertIntoHiveTable` requires distribution 
`ClusteredDistribution(partitionAttributes ++ bucketColumns, 
Option(bucketSpec.numBuckets), useHiveHash = true)`. Thus to make sure one 
bucket has only one file.
3. Partition id is not shown in name of bucket file, thus only bucket id 
decides the order of bucket file names.
4. Write an empty file for empty bucket file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-08 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
cc @cloud-fan
Would you mind give some comments? I can keep working on this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
@viirya 
Please take another look when you have time. I've already updated :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18866#discussion_r131682897
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 ---
@@ -534,4 +534,29 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
   }
 }
   }
+
+  test("Insert data into hive bucketized table.") {
+sql("""
+ |CREATE TABLE bucketizedTable (key int, value string)
+ |CLUSTERED BY (key) SORTED BY (key ASC) into 4 buckets
+ |ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
+ |""".stripMargin)
+val identifier = 
spark.sessionState.sqlParser.parseTableIdentifier("bucketizedTable")
+val data = spark.sparkContext.parallelize((0 until 100)
+  .map(i => TestData(i, i.toString))).toDF()
+data.write.mode(SaveMode.Overwrite).insertInto("bucketizedTable")
+val dir = spark.sessionState.catalog.defaultTablePath(identifier)
+val bucketFiles = new File(dir).listFiles().sortWith((a: File, b: 
File) => {
+  a.getName < b.getName
+}).filter(file => file.getName.startsWith("part-"))
+assert(bucketFiles.length === 4)
+(0 to 3).foreach { bucket =>
+  spark.read.format("text")
+.load(bucketFiles(bucket).getAbsolutePath)
+.collect().map(_.getString(0).split("\t")(0).toInt)
+.zip((bucket to (100, 4))).foreach { case (a, b) =>
+assert(a === b)
--- End diff --

Yes, that would be better. I've updated, please take another look : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18866#discussion_r131607680
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -262,7 +262,12 @@ case class HashPartitioning(expressions: 
Seq[Expression], numPartitions: Int)
* Returns an expression that will produce a valid partition ID(i.e. 
non-negative and is less
* than numPartitions) based on hashing expressions.
*/
-  def partitionIdExpression: Expression = Pmod(new 
Murmur3Hash(expressions), Literal(numPartitions))
+  def partitionIdExpression(useHiveHash: Boolean = false): Expression =
+if (useHiveHash) {
+  Pmod(new HiveHash(expressions), Literal(numPartitions))
--- End diff --

@viirya Thanks a lot for comment !
I compared code between v0.13 and v1.2.1. In my understanding, there's no 
compatibility issue. v1.2.1 add `hashcode` support for more 
types(`INTERVAL_YEAR_MONTH`, `INTERVAL_DAY_TIME`), for the existing types, they 
are compatible with each other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-07 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18866
  
I added the unit test referring 
(https://github.com/apache/hive/blob/branch-1/ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java#L393).
Hive will sort bucket files by file names when do SMB join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18866: [SPARK-21649][SQL] Support writing data into hive...

2017-08-07 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18866

[SPARK-21649][SQL] Support writing data into hive bucket table.

## What changes were proposed in this pull request?

Support writing hive bucket table. Spark internally uses Murmur3Hash for 
partitioning. We can use hive hash for compatibility when write to bucket table.

## How was this patch tested?

Unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21649

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18866.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18866


commit 51d2c110d01b8a4ef1d53d144c443e0e9b43817b
Author: jinxing <jinxing6...@126.com>
Date:   2017-08-07T04:12:56Z

Support writing data into hive bucket table.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18713: [SPARK-21509][SQL] Add a config to enable adaptiv...

2017-07-27 Thread jinxing64
Github user jinxing64 closed the pull request at:

https://github.com/apache/spark/pull/18713


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18713: [SPARK-21509][SQL] Add a config to enable adaptive query...

2017-07-27 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18713
  
Ok, I will close this for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18735: [SPARK-21530] Update description of spark.shuffle.maxChu...

2017-07-26 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18735
  
@tgravescs 
Thanks, I should be more careful :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18735: [SPARK-21530] Update description of spark.shuffle...

2017-07-26 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18735#discussion_r129608114
  
--- Diff: docs/configuration.md ---
@@ -636,6 +636,8 @@ Apart from these, the following properties are also 
available, and may be useful
   Long.MAX_VALUE
   
 The max number of chunks allowed to being transferred at the same time 
on shuffle service.
+Note that new coming connections will be closed when the max number is 
hit. Client should
--- End diff --

Yes, this is much better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18735: [SPARK-21530] Update description of spark.shuffle.maxChu...

2017-07-25 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18735
  
cc @tgravescs @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18735: [SPARK-21530] Update description of spark.shuffle...

2017-07-25 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18735

[SPARK-21530] Update description of spark.shuffle.maxChunksBeingTransferred.

## What changes were proposed in this pull request?

Update the description of `spark.shuffle.maxChunksBeingTransferred` to 
include that the new coming connections will be closed when the max is hit and 
client should have retry mechanism.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21530

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18735


commit ad710dd8efc7bf577c196e95e39a31407d8ff3e7
Author: jinxing <jinxing6...@126.com>
Date:   2017-07-26T00:43:16Z

Update description of spark.shuffle.maxChunksBeingTransferred




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-25 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs 
Thanks for help.
 > I think we should expand the description of the config to say what 
happens when the limit is hit. Since its not using real flow control a user 
might set this thinking nothing bad will happen, but its dropping connections 
so could cause failures if the retries don't work.

Could you give the link for the JIRA ? I'm happy to work on a follow-up PR 
if possible.

For the flow control part, I'm just worrying the queue will be too large 
and causing memory issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-25 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Thanks for merging !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18713: [SPARK-21509][SQL] Add a config to enable adaptive query...

2017-07-23 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18713
  
cc @cloud-fan @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-23 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18713: [SPARK-21509][SQL] Add a config to enable adaptiv...

2017-07-22 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18713

[SPARK-21509][SQL] Add a config to enable adaptive query execution only for 
the last que…

## What changes were proposed in this pull request?
Feature of adaptive query execution is a good way to avoid generating too 
many small files on HDFS, like mentioned in SPARK-16188.
When feature of adaptive query execution is enabled, all shuffles will be 
coordinated. The drawbacks:
1. It's hard to balance the num of reducers(this decides the processing 
speed) and file size on HDFS
2. It generates some more 
shuffles(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L101)
3. It generates lots of jobs, which have extra cost for scheduling.

We can add a config and enable adaptive query execution only for the last 
shuffle.

## How was this patch tested?

Unit test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21509

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18713


commit efb8cd36de0e0b96a8efe855b6243c1f553ce195
Author: jinxing <jinxing6...@126.com>
Date:   2017-07-22T04:22:36Z

Add a config to enable adaptive query execution only for the last query 
execution.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-21 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128794455
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/StreamManager.java
 ---
@@ -83,4 +83,16 @@ public void connectionTerminated(Channel channel) { }
*/
   public void checkAuthorization(TransportClient client, long streamId) { }
 
+  /**
+   * Return the number of chunks being transferred and not finished yet in 
this StreamManager.
+   */
+  public long chunksBeingTransferred() {
+return 0;
+  }
+
+  /**
+   * Called when a chunk is successfully sent.
+   */
+  public void chunkSent(long streamId) { }
--- End diff --

Yes, it would be much better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@cloud-fan 
I understand your concern. A `TransportRequestHandler` is for a 
channel/connection. We want to track the sending chunks of all connections. So 
I guess we must have a manager for all the connections. Currently, all chunks 
are served from `OneForOneStreamManager`, so I put the logic there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128519475
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java
 ---
@@ -145,7 +172,12 @@ private void processStreamRequest(final StreamRequest 
req) {
 }
 
 if (buf != null) {
-  respond(new StreamResponse(req.streamId, buf.size(), buf));
+  respond(new StreamResponse(req.streamId, buf.size(), 
buf)).addListener(future -> {
+if (streamManager instanceof OneForOneStreamManager) {
--- End diff --

The `streamId` sent to `NettyStreamManager` is not a 
"{streamId}_{chunkIndex}", so fail to parse?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128498015
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
 ---
@@ -257,4 +257,7 @@ public Properties cryptoConf() {
 return CryptoUtils.toCryptoConf("spark.network.crypto.config.", 
conf.getAll());
   }
 
+  public long maxChunksBeingTransferred() {
+return conf.getLong("spark.network.shuffle.maxChunksBeingTransferred", 
Long.MAX_VALUE);
--- End diff --

It depends on memory config of NodeManager. In our cluster, we will set it 
to be 3G(memory overhead for Netty)/1k(size of 
ChannelOutboundBuffer$Entry)=300.
I think it's ok to leave it MAX_VALUE(by default it's disabled).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128497296
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java
 ---
@@ -145,7 +172,12 @@ private void processStreamRequest(final StreamRequest 
req) {
 }
 
 if (buf != null) {
-  respond(new StreamResponse(req.streamId, buf.size(), buf));
+  respond(new StreamResponse(req.streamId, buf.size(), 
buf)).addListener(future -> {
+if (streamManager instanceof OneForOneStreamManager) {
--- End diff --

When executor fetch file from driver, it will send `StreamRequest`. 
`NettyStreamManager` on driver will serve the stream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r128227194
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,46 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
+withTempView("nums") {
+  val expected =
+Row(1, 1, 1, 4, null, 8, 25) ::
+  Row(1, 3, 4, 9, 1, 12, 24) ::
+  Row(1, 5, 9, 15, 4, 16, 21) ::
+  Row(1, 7, 16, 21, 8, 9, 16) ::
+  Row(1, 9, 25, 16, 12, null, 9) ::
+  Row(0, 2, 2, 6, null, 10, 30) ::
+  Row(0, 4, 6, 12, 2, 14, 28) ::
+  Row(0, 6, 12, 18, 6, 18, 24) ::
+  Row(0, 8, 20, 24, 10, 10, 18) ::
+  Row(0, 10, 30, 18, 14, null, 10) ::
--- End diff --

`expected` is calculated manually. This test is to verify there is no 
behavior change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128217700
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java
 ---
@@ -25,6 +25,9 @@
 
 import com.google.common.base.Preconditions;
 import io.netty.channel.Channel;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+
--- End diff --

Ah, sorry..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128217450
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java
 ---
@@ -139,6 +153,32 @@ public void checkAuthorization(TransportClient client, 
long streamId) {
 }
   }
 
+  @Override
+  public void chunkSent(Object streamId) {
--- End diff --

Yes, a `long streamId` will be cast to `Long`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r128217046
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java
 ---
@@ -53,9 +56,13 @@
 // that the caller only requests each chunk one at a time, in order.
 int curChunk = 0;
 
+// Used to keep track of the number of chunks being transferred and 
not finished yet.
+AtomicLong chunksBeingTransferred;
--- End diff --

Really? we do `chunksBeingTransferred++` and `chunksBeingTransferred--`? I 
think it's not safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r128215724
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -359,37 +359,41 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
   test("window function: mutiple window expressions specified by range in 
a single expression") {
 val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
 nums.createOrReplaceTempView("nums")
-
-val expected =
-  Row(1, 1, 1, 4, 25) ::
-Row(1, 3, 4, 9, 24) ::
-Row(1, 5, 9, 15, 21) ::
-Row(1, 7, 16, 21, 16) ::
-Row(1, 9, 25, 16, 9) ::
-Row(0, 2, 2, 6, 30) ::
-Row(0, 4, 6, 12, 28) ::
-Row(0, 6, 12, 18, 24) ::
-Row(0, 8, 20, 24, 18) ::
-Row(0, 10, 30, 18, 10) ::
-Nil
-
-val actual = sql(
-  """
-|SELECT
-|  y,
-|  x,
-|  sum(x) over w1 as history_sum,
-|  sum(x) over w2 as period_sum,
-|  sum(x) over w3 as future_sum
-|FROM nums
-|WINDOW w1 AS (PARTITION BY y ORDER BY x RANGE BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW),
-|   w2 AS (PARTITION BY y ORDER BY x RANGE BETWEEN 2 PRECEDING 
AND 2 FOLLOWING),
-|   w3 AS (PARTITION BY y ORDER BY x RANGE BETWEEN CURRENT ROW 
AND UNBOUNDED FOLLOWING)
-  """.stripMargin
-)
-
-checkAnswer(actual, expected)
-spark.catalog.dropTempView("nums")
+withTempView("nums") {
+  val expected =
+Row(1, 1, 1, 4, null, 8, 25) ::
--- End diff --

@cloud-fan 
This is null and do you think 0 is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-18 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r128142152
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

Sure, I will add it later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
@cloud-fan @jiangxb1987 
Thanks for help! I will refine and post the result of manual test late 
today :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
@jiangxb1987 Thanks a lot for quick reply !

> One concern is the test case don't reflect the improvement of this change.

Yes, there is no unit test for `WindowFunctionFrame` now. The test case I 
added is just to check the correctness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
cc @cloud-fan @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175][WIP] Reject OpenBlocks when memory shortag...

2017-07-16 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs 

Thanks a lot for helping this pr !
I changed this pr, in current change:
Shuffle server will track the number of chunks being transferred. 
Connection will be closed when the number is above water mark.
I think it makes sense -- shuffle server should close the connection rather 
than OOM when memory pressure is high.

For the flow control part, it is not included in current change. Because I 
don't have a proper way to balance between buffering requests and caching 
`ChannelOutboundBuffer$Entry`. 

>You mentioned a problem with just doing a close, what was the problem?

I just think user should know there's another condition that can trigger 
closing connection. Also reconnection can have some cost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...

2017-07-14 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18487
  
`maxReqsInFlight` and `maxBytesInFlight` is hard to control the # of blocks 
in a single request. When # of map is very high, this change can alleviate the 
pressure of shuffle server. 
@dhruve what is the proper value for `maxBlocksInFlightPerAddress`? I like 
this pr if there is no performance issue. It will be great if you can post some 
benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-14 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-14 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
retest please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-14 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18634

[SPARK-21414] Refine SlidingWindowFunctionFrame to avoid OOM.

## What changes were proposed in this pull request?

In `SlidingWindowFunctionFrame`, it is now adding all rows to the buffer 
for which the input row value is equal to or less than the output row upper 
bound, then drop all rows from the buffer for which the input row value is 
smaller than the output row lower bound.
This could result in the buffer is very big though the window is small.
For example: 
```
select a, b, sum(a) 
over (partition by b order by a range between 100 following and 101 
following) 
from table
```
We can refine the logic and just add the qualified rows into buffer.

## How was this patch tested?
Added test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21414

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18634


commit 5103ae80d456ca26c7b853f99e73c4a2c152ad41
Author: jinxing <jinxing6...@126.com>
Date:   2017-07-12T12:02:28Z

Refine SlidingWindowFunctionFrame to avoid OOM.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175][WIP] Reject OpenBlocks when memory shortag...

2017-07-13 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs 
Thanks a lot for advice.
> the flow control part should allow everyone to start fetching without 
rejecting a bunch, especially if the network can't push it out that fast anyway.
For instance only create a handful of those outgoing buffers and wait to 
get successfully sent messages back for the those before creating more. 

Regarding flow control, can I ask a question?
In flow control, we won't reject `OpenBlocks`, but do we reject 
`ChunkFetchRequest`?  If we don't reject `ChunkFetchRequest`, it means we will 
accept all `ChunkFetchRequest`s. We must store the `ChunkFetchRequest`s in some 
buffer when memory pressure from Netty is above water mark. But buffering the 
`ChunkFetchRequest`s can also take much memory. That will be another issue. The 
problem is the client never know when to slow down sending requests.

Am I wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175][WIP] Reject OpenBlocks when memory shortag...

2017-07-12 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
I think it could be more efficient to do the control on shuffle service 
side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175][WIP] Reject OpenBlocks when memory shortag...

2017-07-12 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Previously I was saying that I have 200k+ connections to one shuffle 
service. I'm sorry about this, the information is wrong. It turns out that our 
each `NodeManager` has two auxiliary shuffle services, one for Spark and one 
for "Hadoop MapReduce". Most of the connections are for "Hadoop MapReduce" 
shuffle service. Analyzing the heap dump, there are only 761 
connection(`NioSocketChannel`)s to the Spark shuffle service. (How I found 
this? Spark shuffle service is using Netty4 for transferring blocks. I found 
tons of `org.jboss.netty.channel.socket.nio.NioAcceptedSocketChannel`, checking 
Netty code, I found they are only used in Netty3, that's used in our Hadoop.) 
So @zsxwing , there is no connection leak in my understanding.

The situation is we have 10K map tasks ran on the same shuffle service and 
around 1K reduce tasks fetching the blocks . On java heap I found one 
`io.netty.channel.ChannelOutboundBuffer.Entry`(reference one block) will cost 
almost 1K bytes and we have 3.5M Entries. When OOM, we have 
`io.netty.channel.ChannelOutboundBuffer.Entry`s retaining 3GBytes.

So the problem here is one connection is fetching too many blocks. I 
believe tuning `spark.reducer.maxReqsInFlight` or 
`spark.reducer.maxBlocksInFlightPerAddress`(#18487) can alleviate this issue.  
The question is how to set it appropriately. It seems hard because we need to 
make a balance between warehouse performance and stability. After all there are 
only 2~3 NodeManagers running OOM, we cannot set 
`spark.reducer.maxReqsInFlight` too small to avoid performance degradation.

I checked the connections of one shuffle services yesterday. 5K connections 
is very common during the night. It's easy to happen, say there are 5K reduces 
running at the same time. What if there are 5 applications and each has 5K 
reduces? That will be 25k connections. If each connection is fetching 100 
blocks and each `Entry` is 1KB. The memory cost is 2.5G. I think it's too much.

So I'm still proposing concurrency control. Different from current change, 
can we control the number of blocks being transferred ? If the number is above 
water mark, we can fail the new coming `OpenBlocks`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175][WIP] Reject OpenBlocks when memory shortag...

2017-07-11 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
>Is this all a single application? 

No, it's data warehouse, there are thousands ETLs

 >You say 6000 nodes with 64 executors on each host, how many cores per 
executor?

1 core per executor.

> have you tried using spark.reducer.maxReqsInFlight

I think we didn't.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18405: [SPARK-21194][SQL] Fail the putNullmethod when co...

2017-07-11 Thread jinxing64
Github user jinxing64 closed the pull request at:

https://github.com/apache/spark/pull/18405


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-07-11 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18405
  
Sure, I will close this pr for now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18405: [SPARK-21194][SQL] Fail the putNullmethod when co...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18405#discussion_r126584992
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
 ---
@@ -758,6 +758,35 @@ class ColumnarBatchSuite extends SparkFunSuite {
 }}
   }
 
+  test("Putting null should fail when null is forbidden in array.") {
+(MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
+  val column = ColumnVector.allocate(10, new ArrayType(IntegerType, 
false), memMode)
+  val data = column.arrayData();
+  data.putInt(0, 0)
+  data.putInt(1, 1)
+  assert(data.getInt(0) === 0)
+  assert(data.getInt(1) === 1)
+  val ex = intercept[RuntimeException] {
+data.putNull(2)
--- End diff --

Sure, did it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18593: [SPARK-21369][Core]Don't use Scala Tuple2 in common/netw...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18593
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18405
  
@cloud-fan 
More comments on this : ) ?
If this is too trivial, should I close ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175][WIP] Reject OpenBlocks when memory shortag...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs 
Thanks a lot for your advice :) very helpful.
I will try more on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18541: [SPARK-21315][SQL]Skip some spill files when generateIte...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18541
  
@jiangxb1987 
Thanks for approving !
Already updated to PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18565: [SPARK-21342] Fix DownloadCallback to work well with Ret...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18565
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18541: [SPARK-21315][SQL]Skip some spill files when gene...

2017-07-10 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18541#discussion_r126342950
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -589,29 +589,51 @@ public long getKeyPrefix() {
   }
 
   /**
-   * Returns a iterator, which will return the rows in the order as 
inserted.
+   * Returns an iterator starts from startIndex, which will return the 
rows in the order as
+   * inserted.
*
* It is the caller's responsibility to call `cleanupResources()`
* after consuming this iterator.
*
* TODO: support forced spilling
*/
-  public UnsafeSorterIterator getIterator() throws IOException {
+  public UnsafeSorterIterator getIterator(int startIndex) throws 
IOException {
 if (spillWriters.isEmpty()) {
   assert(inMemSorter != null);
-  return inMemSorter.getSortedIterator();
+  UnsafeSorterIterator iter = inMemSorter.getSortedIterator();
+  moveOver(iter, startIndex);
+  return iter;
 } else {
   LinkedList queue = new LinkedList<>();
+  int i = 0;
   for (UnsafeSorterSpillWriter spillWriter : spillWriters) {
-queue.add(spillWriter.getReader(serializerManager));
+if (i + spillWriter.recordsSpilled() <= startIndex) {
+  i += spillWriter.recordsSpilled();
--- End diff --

Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18565: [SPARK-21342] Fix DownloadCallback to work well w...

2017-07-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18565#discussion_r126306193
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -119,9 +132,9 @@ public void onSuccess(ByteBuffer response) {
   // Immediately request all chunks -- we expect that the total 
size of the request is
   // reasonable due to higher level chunking in 
[[ShuffleBlockFetcherIterator]].
   for (int i = 0; i < streamHandle.numChunks; i++) {
-if (shuffleFiles != null) {
+if (toDisk) {
--- End diff --

Good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18565: [SPARK-21342] Fix DownloadCallback to work well w...

2017-07-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18565#discussion_r126306207
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -58,29 +59,41 @@
   private final BlockFetchingListener listener;
   private final ChunkReceivedCallback chunkCallback;
   private TransportConf transportConf = null;
-  private File[] shuffleFiles = null;
+  private Boolean toDisk;
+  private Supplier tmpFileCreater;
+  private Supplier shuffleBlockFetcherIteratorIsZombie;
 
   private StreamHandle streamHandle = null;
 
   public OneForOneBlockFetcher(
+TransportClient client,
+String appId,
+String execId,
+String[] blockIds,
+BlockFetchingListener listener,
+TransportConf transportConf) {
+this(client, appId, execId, blockIds, listener, transportConf, false, 
null, null);
+  }
+
+  public OneForOneBlockFetcher(
   TransportClient client,
   String appId,
   String execId,
   String[] blockIds,
   BlockFetchingListener listener,
   TransportConf transportConf,
-  File[] shuffleFiles) {
+  Boolean toDisk,
+  Supplier tmpFileCreater,
+  Supplier shuffleBlockFetcherIteratorIsZombie) {
--- End diff --

I will refine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18541: [SPARK-21315][SQL]Skip some spill files when generateIte...

2017-07-09 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18541
  
@cloud-fan 
Would you mind take a look when  you have time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18541: [SPARK-21315][SQL]Skip some spill files when gene...

2017-07-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18541#discussion_r126299265
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 ---
@@ -589,29 +589,50 @@ public long getKeyPrefix() {
   }
 
   /**
-   * Returns a iterator, which will return the rows in the order as 
inserted.
+   * Returns a iterator starts from startIndex, which will return the rows 
in the order as inserted.
--- End diff --

Sure thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18565: [SPARK-21342] Fix DownloadCallback to work well with Ret...

2017-07-09 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18565
  
In current change:
1. I'm using `java.util.function.Supplier` instead of `TmpFileCreater`
2. Pass `shuffleBlockFetcherIteratorIsZombie` from 
`ShuffleBlockFetcherIterator` to `OneForOneBlockFetcher`;
3. Succeeded files will be cleanup in `ShuffleBlockFetcherIterator`, while 
others will be deleted in `OneForOneBlockFetcher`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-08 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Thanks for reply. I will figure out what I can do for this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18565: [SPARK-21342] Fix DownloadCallback to work well with Ret...

2017-07-08 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18565
  
There are some corner cases because of creating and deleting in both 
`ShuffleBlockFetcherIterator` and `OneForOneBlockFetcher`. It seems no need to 
pass the `shuffleFiles` from `ShuffleBlockFetcherIterator` to `ShuffleClient` 
and `OneForOneBlockFetcher`. Now I changed create and delete files only in 
`OneForOneBlockFetcher`. We don't do renaming. Every `DownloadCallback` will 
have their own target file. All target files will be deleted when blockManager 
stop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18565: [SPARK-21342] Fix DownloadCallback to work well w...

2017-07-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18565#discussion_r126277863
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -151,15 +152,27 @@ private void failRemainingBlocks(String[] 
failedBlockIds, Throwable e) {
 }
   }
 
+  private static synchronized boolean renameFile(File src, File dest) {
+if (dest.exists()) {
+  if (!dest.delete()) {
--- End diff --

I was using `synchronized static` because there could be multiple 
`OneForOneBlockFetcher` doing the rename work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-07 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs 
Thanks a lot for reviewing this pr thus much. I think I'm making a stupid 
mistake. Can I ask a question, how to decide the number of connections? I'm 
just counting the 
`org.jboss.netty.channel.socket.nio.NioAcceptedSockentChannel` in the heap 
dump. To be honest, 200k is also beyond my imagination.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18566: Refine the document for spark.reducer.maxReqSizeShuffleT...

2017-07-07 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18566
  
I didn't include this config in configuration.md. Do I need to?
cc @zsxwing @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18566: Refine the document for spark.reducer.maxReqSizeS...

2017-07-07 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18566

Refine the document for spark.reducer.maxReqSizeShuffleToMem.

## What changes were proposed in this pull request?

In current code, reducer can break the old shuffle service when 
`spark.reducer.maxReqSizeShuffleToMem` is enabled. Let's refine document.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21343

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18566.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18566


commit 1fd9403d13f4efa31cedc81eafaf0271bc5843db
Author: jinxing <jinxing6...@126.com>
Date:   2017-07-07T15:56:15Z

Refine the document for spark.reducer.maxReqSizeShuffleToMem.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18565: [SPARK-21342] Fix DownloadCallback to work well w...

2017-07-07 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18565#discussion_r126178968
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -151,15 +152,27 @@ private void failRemainingBlocks(String[] 
failedBlockIds, Throwable e) {
 }
   }
 
+  private static synchronized boolean renameFile(File src, File dest) {
+if (dest.exists()) {
+  if (!dest.delete()) {
+return false;
+  }
+}
+return src.renameTo(dest);
+  }
+
   private class DownloadCallback implements StreamCallback {
 
 private WritableByteChannel channel = null;
 private File targetFile = null;
+private File tmpFile = null;
 private int chunkIndex;
 
-DownloadCallback(File targetFile, int chunkIndex) throws IOException {
-  this.targetFile = targetFile;
-  this.channel = Channels.newChannel(new FileOutputStream(targetFile));
+DownloadCallback(int chunkIndex) throws IOException {
--- End diff --

This is unrelated, just remove a redundant param.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18565: [SPARK-21342] Fix DownloadCallback to work well with Ret...

2017-07-07 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18565
  
cc @zsxwing @cloud-fan @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18565: [SPARK-21342] Fix DownloadCallback to work well w...

2017-07-07 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18565

[SPARK-21342] Fix DownloadCallback to work well with RetryingBlockFetcher.

## What changes were proposed in this pull request?

When `RetryingBlockFetcher` retries fetching blocks. There could be two 
`DownloadCallback`s download the same content to the same target file. It could 
cause `ShuffleBlockFetcherIterator` reading a partial result. This pr proposes 
to write a tmp file and rename it to the target file when finish.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21342

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18565.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18565


commit da65427a94b24b2ad0a6ecf79ac1980dd458c684
Author: jinxing <jinxing6...@126.com>
Date:   2017-07-07T15:29:58Z

Fix DownloadCallback to work well with RetryingBlockFetcher.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
We didn't change `spark.shuffle.io.numConnectionsPerPeer`. Our biggest 
cluster has 6000 `NodeManager`s. There are 50 executors running on a same host 
at the same time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@cloud-fan 
To be honest, it's a little bit tricky to reject "open blocks" by closing 
the connection. The following reconnection will surely have extra cost. In 
current change we are relying on retry mechanism of `RetryingBlockFetcher`. 
`spark.shuffle.io.maxRetries` and `spark.shuffle.io.retryWait` should also be 
tuned, with this change maybe their meanings become different, users should 
know this. This is the sacrifice for compatibility.
It comes to me that can we add back `OpenBlocksFailed` and add a 
flag(default false)? If user wants to turned on, we can tell them they should 
upgrade the client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Analyzing the heap dump, there are 200K+ connections and 3.5M 
blocks(`FileSegmentManagedBuffer`) being fetched. Yes, flow control is a good 
idea. But I still think it make much sense to control the concurrency. Reject 
some "open blocks" requests, thus we can have sufficient bandwidth for the 
existing connections and we can finish the reduce task as soon as possible. 
Simple flow control(slow down connections when pressure) can help avoid OOM, 
but it seems more reduce tasks will run longer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18482: [SPARK-21262] Stop sending 'stream request' when ...

2017-07-06 Thread jinxing64
Github user jinxing64 closed the pull request at:

https://github.com/apache/spark/pull/18482


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18482: [SPARK-21262] Stop sending 'stream request' when shuffle...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18482
  
Sure, I will update the document soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18487#discussion_r125917832
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -321,6 +321,16 @@ package object config {
   .intConf
   .createWithDefault(3)
 
+  private[spark] val REDUCER_MAX_BLOCKS_IN_FLIGHT_PER_ADDRESS =
+ConfigBuilder("spark.reducer.maxBlocksInFlightPerAddress")
--- End diff --

When shuffle service gets OOM, there are always lots of (thousands of) 
reducers(maybe from different apps) fetching blocks. I'm not sure if it will 
help much to limit in-flight blocks from reducer.

Also we've already have maxReqsInFlight and maxBytesInFlight. Is it little 
bit redundant to to have maxBlocksInFlightPerAddress?

Sorry for this comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18541: [SPARK-21315][SQL]Skip some spill files when generateIte...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18541
  
@hvanhovell 
I refined according to your comments. Please take another look when you 
have time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18541: [SPARK-21315][SQL]Skip some spill files when generateIte...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18541
  
Did a small test:
2000200 rows in the `UnsafeExternalSorter`: 2 spill files(each contains 
100 rows) and `inMemSorter` contains 200 rows.
I want to target the iterator to index=201.

With this change: `getIterator(201)`, it will cost almost 0ms~1ms;
Without this change:
`for(int i=0; i<201; i++)geIterator().loadNext()`, it will cost 300ms;


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r125877439
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
 ---
@@ -126,4 +150,38 @@ private void failRemainingBlocks(String[] 
failedBlockIds, Throwable e) {
   }
 }
   }
+
+  private class DownloadCallback implements StreamCallback {
+
+private WritableByteChannel channel = null;
+private File targetFile = null;
+private int chunkIndex;
+
+public DownloadCallback(File targetFile, int chunkIndex) throws 
IOException {
+  this.targetFile = targetFile;
+  this.channel = Channels.newChannel(new FileOutputStream(targetFile));
--- End diff --

@zsxwing 
Sorry, I just realized this issue. There can be conflict between two 
`DownloadCallback`s. I will figure out a way to resolve this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
I removed the `OpenBlocksFailed` for compatibility. In current change, the 
server reject the "open blocks" request by closing the connection. Then 
`RetryingBlockFetcher` will retry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-06 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs 
As in the screenshot, we have tons of `ChunkOutboundBuffer$Entry`. Yes we 
are using `transferTo`. Netty will put the `Entry`(containing reference to the 
`MessageWithHeader`) into `ChannelOutboundBuffer`, then consumer will ship the 
data onto network. We are running OOM because of too many 
`ChunkOutboundBuffer$Entry`, as you can see 3GB almost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-05 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
@tgravescs
I think it's not that hurt.
In current change, new client is compatible with the old and new shuffle 
service. In our clusters, we always upgrade the client first and then server 
side, which will not cause incompatible issue. The only risk here is that user 
upgrades the server but still using the old client. But I find no reason they 
do this. I think users usually tend to upgrade the client first and then deploy 
new servers gradually.

In our cluster, there are nodemanagers failing everyday because of OOM of 
shuffle service. The root  cause is that shuffle service is a hot point and 
there is no concurrency control.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18541: [SPARK-21315][SQL]Skip some spill files when gene...

2017-07-05 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18541

[SPARK-21315][SQL]Skip some spill files when generateIterator(startIndex) 
in ExternalAppendOnlyUnsafeRowArray.



## What changes were proposed in this pull request?

In current code, it is expensive to use 
`UnboundedFollowingWindowFunctionFrame`, because it is iterating from the start 
to lower bound every time calling `write` method. When traverse the iterator, 
it's possible to skip some spilled files thus to save some time.

## How was this patch tested?

Added unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-21315

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18541


commit 442abb32cc08a3311ce6540e7e40c4b62e51a0c1
Author: jinxing <jinxing6...@126.com>
Date:   2017-07-05T06:22:19Z

Skip some spill files when generateIterator(startIndex) in 
ExternalAppendOnlyUnsafeRowArray.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18482: [SPARK-21262] Stop sending 'stream request' when shuffle...

2017-07-04 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18482
  
Very gentle ping @zsxwing , How do you think about this idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-04 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Very gentle ping @zsxwing , would you mind help comment on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-06-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18388
  
Yes, there is a change. Server side may return `OpenBlocksFailed` for the 
"open blocks" request, which means that old client is not compatible with new 
server. Is it acceptable ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18482: [SPARK-21262] Stop sending 'stream request' when shuffle...

2017-06-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18482
  
In current change, it i fetching big chunk in memory and then writing to 
disk and then release the memory. I made this change for below reasons:
1. The client shouldn't break old shuffle service, thus cannot send "stream 
request" to server. We have to send `ChunkFetchRequest` and handle the 
`ChunkFetchSuccess` for response.
2. It's hard to make 'ChunkFetchSuccess' to be a stream and read it to 
disk. We need to implement another `TransportFrameDecoder`, which I think is 
too much cost.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-06-30 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18388#discussion_r125068210
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java
 ---
@@ -257,4 +257,31 @@ public Properties cryptoConf() {
 return CryptoUtils.toCryptoConf("spark.network.crypto.config.", 
conf.getAll());
   }
 
+  /**
+   * When memory usage of Netty is above this water mark, it's regarded as 
memory shortage.
--- End diff --

@cloud-fan
I made a JIRA(https://issues.apache.org/jira/browse/SPARK-21270) about 
merging the memory configs. Please take a look when you have time and give some 
comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18482: [SPARK-21262][WIP] Stop sending 'stream request' when sh...

2017-06-30 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18482
  
cc @zsxwing @cloud-fan @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   >