[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17739
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76096/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17739
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-24 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17503
  
@srowen Hi, could you review the PR? The PR is simple, though many code for 
unit test are added. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17730: [SPARK-20439] [SQL] Fix Catalog API listTables an...

2017-04-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17730#discussion_r112879694
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
@@ -197,7 +211,11 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
* `AnalysisException` when no `Table` can be found.
*/
   override def getTable(dbName: String, tableName: String): Table = {
-makeTable(TableIdentifier(tableName, Option(dbName)))
+if (tableExists(dbName, tableName)) {
+  makeTable(TableIdentifier(tableName, Option(dbName)))
+} else {
+  throw new AnalysisException(s"Table or view '$tableName' not found 
in database '$dbName'")
--- End diff --

Sure, let me revert it back


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17700: [SPARK-20391][Core] Rename memory related fields ...

2017-04-24 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/17700#discussion_r112879836
  
--- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala 
---
@@ -114,10 +114,16 @@ private[spark] object ExecutorsPage {
 val rddBlocks = status.numBlocks
 val memUsed = status.memUsed
 val maxMem = status.maxMem
-val onHeapMemUsed = status.onHeapMemUsed
-val offHeapMemUsed = status.offHeapMemUsed
-val maxOnHeapMem = status.maxOnHeapMem
-val maxOffHeapMem = status.maxOffHeapMem
+val memoryMetrics = for {
+  onHeapUsed <- status.onHeapMemUsed
+  offHeapUsed <- status.offHeapMemUsed
+  maxOnHeap <- status.maxOnHeapMem
+  maxOffHeap <- status.maxOffHeapMem
+} yield {
+  new MemoryMetrics(onHeapUsed, offHeapUsed, maxOnHeap, maxOffHeap)
--- End diff --

Yes, this is to make sure `memoryMetrics` field will only be existed when 
replaying new event logs, for Spark 2.1- event log, this field will not be 
present.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17730: [SPARK-20439] [SQL] Fix Catalog API listTables an...

2017-04-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17730#discussion_r112880305
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
@@ -197,7 +211,11 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
* `AnalysisException` when no `Table` can be found.
*/
   override def getTable(dbName: String, tableName: String): Table = {
-makeTable(TableIdentifier(tableName, Option(dbName)))
+if (tableExists(dbName, tableName)) {
+  makeTable(TableIdentifier(tableName, Option(dbName)))
+} else {
+  throw new AnalysisException(s"Table or view '$tableName' not found 
in database '$dbName'")
--- End diff --

Yes. That is why an `AnalysisException` is issued here. `makeTable` eats 
the expected exception. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17222
  
**[Test build #76097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76097/testReport)**
 for PR 17222 at commit 
[`e74883e`](https://github.com/apache/spark/commit/e74883ea53d9c389c16b2d984204ded800ac568d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-24 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17503
  
It looks reasonable though I don't feel qualified to review it. I thought 
the nodes had more than just the majority class - like the empirical 
distribution at the node? That would make them not possible to combine in 
general, but, I don't see that. However they do carry impurity info. Is that 
going to be equal in enough cases to make the merge effective?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16486: [SPARK-13610][ML] Create a Transformer to disassemble ve...

2017-04-24 Thread leonfl
Github user leonfl commented on the issue:

https://github.com/apache/spark/pull/16486
  
@mrjrdnthms , this is implemented by UDF, which will run a little bit 
slower, but easy to use.
If you want it run faster, you can implement it using mappatition and row 
iterator instead of udf.
That implementation will reduce the running time a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17695: [SPARK-20400][DOCS] Remove References to 3rd Part...

2017-04-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17695#discussion_r112887606
  
--- Diff: docs/configuration.md ---
@@ -2248,8 +2248,8 @@ should be included on Spark's classpath:
 * `hdfs-site.xml`, which provides default behaviors for the HDFS client.
 * `core-site.xml`, which sets the default filesystem name.
 
-The location of these configuration files varies across CDH and HDP 
versions, but
-a common location is inside of `/etc/hadoop/conf`. Some tools, such as 
Cloudera Manager, create
+The location of these configuration files varies across Hadoop versions, 
but
--- End diff --

Hm, I guess one issue I've realized now is that it doesn't really vary 
across versions of Hadoop but could vary according to packaging and 
distribution. If you change it again, also consider fixing the existing 
"mechanisms" typo in line 2253


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/17222
  
@holdenk But it has nothing to return, because scala side return Unit.  See 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L528


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17739
  
Just to confirm, the #users is 48 million, #items is 1.7 million?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17222
  
**[Test build #76098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76098/testReport)**
 for PR 17222 at commit 
[`6aa5d85`](https://github.com/apache/spark/commit/6aa5d85c91c33fd771a01e3b1370597b106d650e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17739
  
Or is it 48,000 and 1,700?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3673/testReport)**
 for PR 17556 at commit 
[`19eab3a`](https://github.com/apache/spark/commit/19eab3aea2cc15448eb7cac2f08f190fae1e0033).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17740: [SPARK-SPARK-20404][CORE] Using Option(name) inst...

2017-04-24 Thread szhem
GitHub user szhem opened a pull request:

https://github.com/apache/spark/pull/17740

[SPARK-SPARK-20404][CORE] Using Option(name) instead of Some(name)

Using Option(name) instead of Some(name) to prevent runtime failures when 
using accumulators created like the following
```
sparkContext.accumulator(0, null)
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/szhem/spark SPARK-20404-null-acc-names

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17740


commit 9c6e4d685cebe034d12c085ae9f97f5187cec36b
Author: Sergey Zhemzhitsky 
Date:   2017-04-24T08:58:12Z

[SPARK-SPARK-20404][CORE] Using Option(name) instead of Some(name) when 
creating accumulators to prevent failures at runtime when using null names




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17740: [SPARK-SPARK-20404][CORE] Using Option(name) instead of ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17740
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17556
  
**[Test build #3673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3673/testReport)**
 for PR 17556 at commit 
[`19eab3a`](https://github.com/apache/spark/commit/19eab3aea2cc15448eb7cac2f08f190fae1e0033).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17741: SNAP-1420

2017-04-24 Thread hbhanawat
GitHub user hbhanawat opened a pull request:

https://github.com/apache/spark/pull/17741

SNAP-1420

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SnappyDataInc/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17741.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17741


commit 35d832d42643f0bcfa8a775587841ce5c537ea5b
Author: Vivek Bhaskar 
Date:   2016-11-25T09:43:36Z

Helper classes for DataSerializable implementation.

commit d4e1c7044ced8c66c257c14976569dc6661fcf5f
Author: Vivek Bhaskar 
Date:   2016-11-29T09:06:15Z

Revert "Helper classes for DataSerializable implementation."

This reverts commit 35d832d42643f0bcfa8a775587841ce5c537ea5b.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17741: SNAP-1420

2017-04-24 Thread hbhanawat
Github user hbhanawat closed the pull request at:

https://github.com/apache/spark/pull/17741


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17730: [SPARK-20439] [SQL] Fix Catalog API listTables and getTa...

2017-04-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17730
  
LGTM, merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17736
  
seems all string literals in Spark 2.0 parser behave differently from Spark 
1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/17739
  
users is 480,000, items is 170,000. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17739
  
ok. And it is the timing for `recommendProductsForUsers`? Or 
`recommendUsersForProducts`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSMode...

2017-04-24 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17622#discussion_r112905614
  
--- Diff: python/pyspark/ml/recommendation.py ---
@@ -384,6 +392,28 @@ def itemFactors(self):
 """
 return self._call_java("itemFactors")
 
+@since("2.2.0")
+def recommendForAllUsers(self, numItems):
+"""
+Returns top `numItems` items recommended for each user, for all 
users.
+
+:param numItems: max number of recommendations for each user
+:return: a DataFrame of (userCol, recommendations), where 
recommendations are
+ stored as an array of (itemCol, rating) Rows.
+"""
+return self._call_java("recommendForAllUsers", numItems)
+
+@since("2.2.0")
+def recommendForAllItems(self, numUsers):
+"""
+Returns top `numUsers` users recommended for each item, for all 
items.
+
+:param numItems: max number of recommendations for each item
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17222
  
**[Test build #76097 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76097/testReport)**
 for PR 17222 at commit 
[`e74883e`](https://github.com/apache/spark/commit/e74883ea53d9c389c16b2d984204ded800ac568d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76097/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17222
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17222
  
**[Test build #76098 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76098/testReport)**
 for PR 17222 at commit 
[`6aa5d85`](https://github.com/apache/spark/commit/6aa5d85c91c33fd771a01e3b1370597b106d650e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17222
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76098/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recom...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17622
  
**[Test build #76099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76099/testReport)**
 for PR 17622 at commit 
[`7644e51`](https://github.com/apache/spark/commit/7644e518810e22445daf0f2ec84ef8d93bb5d89b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17503
  
**[Test build #3675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3675/testReport)**
 for PR 17503 at commit 
[`a8351f8`](https://github.com/apache/spark/commit/a8351f85f6f30b5c766600cf221c043afc9e2094).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17740: [SPARK-20404][CORE] Using Option(name) instead of Some(n...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17740
  
**[Test build #3674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3674/testReport)**
 for PR 17740 at commit 
[`e12058c`](https://github.com/apache/spark/commit/e12058cafa8a3c6c54aedefdcc5301ac75b81869).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17503
  
**[Test build #3675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3675/testReport)**
 for PR 17503 at commit 
[`a8351f8`](https://github.com/apache/spark/commit/a8351f85f6f30b5c766600cf221c043afc9e2094).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recom...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17622
  
**[Test build #76099 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76099/testReport)**
 for PR 17622 at commit 
[`7644e51`](https://github.com/apache/spark/commit/7644e518810e22445daf0f2ec84ef8d93bb5d89b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recom...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17622
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recom...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17622
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76099/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17742: [Spark-20446][ML][MLLIB]Optimize MLLIB ALS recomm...

2017-04-24 Thread mpjlu
GitHub user mpjlu opened a pull request:

https://github.com/apache/spark/pull/17742

[Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForAll

## What changes were proposed in this pull request?

The recommendForAll of MLLIB ALS is very slow.
GC is a key problem of the current method.
The task use the following code to keep temp result:
val output = new Array[(Int, (Int, Double))](m*n)
m = n = 4096 (default value, no method to set)
so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and 
cause serious GC problem, and it is frequently OOM.

Actually, we don't need to save all the temp result. Support we recommend 
topK (topK is about 10, or 20) product for each user, we only need 4k * topK * 
(4 + 4 + 8) memory to save the temp result.

The Test Environment:
3 workers: each work 10 core, each work 30G memory, each work 1 executor.
The Data: User 480,000, and Item 17,000

BlockSize: 1024  2048  4096  8192
Old method:  245s  332s  488s  OOM
This solution: 121s  118s   117s  120s



## How was this patch tested?
The existing UT.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mpjlu/spark OptimizeAls

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17742.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17742


commit 14cdbf63e79ebcf2d1207c79b0b4ba73e15729b2
Author: Peng 
Date:   2017-04-24T08:32:16Z

Optimize ALS recommendForAll




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17742
  
**[Test build #76100 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76100/testReport)**
 for PR 17742 at commit 
[`14cdbf6`](https://github.com/apache/spark/commit/14cdbf63e79ebcf2d1207c79b0b4ba73e15729b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/17739
  
RecommandProductsForUsers. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16705: [SPARK-19354] [Core] Killed tasks are getting marked as ...

2017-04-24 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/16705
  
Is there a chance to see this is 2.1.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17622: [SPARK-20300][ML][PYSPARK] Python API for ALSModel.recom...

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17622
  
If no other comments (@jkbradley) I will merge to branch-2.2 in a few days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17720: [SPARK-20407][TESTS][BACKPORT-2.1] ParquetQuerySu...

2017-04-24 Thread bogdanrdc
Github user bogdanrdc closed the pull request at:

https://github.com/apache/spark/pull/17720


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...

2017-04-24 Thread phatak-dev
Github user phatak-dev commented on the issue:

https://github.com/apache/spark/pull/16609
  
cacheTable API doesn't allow user to change the storage level. So having a 
setName is useful so that we can cache the dataframe in  different storage 
level and still use the UI to identify the dataframe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17673: [SPARK-20372] [ML] Word2Vec Continuous Bag of Words mode...

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17673
  
It would be ideal to have both methods, but I'm worried about reviewer 
bandwidth vs priority on this.

@Krimit you were working on Word2Vec recently - thoughts? Perhaps you have 
time to help on review also?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17621
  
**[Test build #76101 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76101/testReport)**
 for PR 17621 at commit 
[`07808fc`](https://github.com/apache/spark/commit/07808fc98609420f1c12ee131ad5e48204704a93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17742
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17742
  
**[Test build #76100 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76100/testReport)**
 for PR 17742 at commit 
[`14cdbf6`](https://github.com/apache/spark/commit/14cdbf63e79ebcf2d1207c79b0b4ba73e15729b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76100/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17621
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76101/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17621
  
**[Test build #76101 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76101/testReport)**
 for PR 17621 at commit 
[`07808fc`](https://github.com/apache/spark/commit/07808fc98609420f1c12ee131ad5e48204704a93).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17739
  
It's interesting to see the performance difference. I've also been looking 
at performance of recommend all but haven't gotten to varying the block sizes 
just yet.

I'm potentially in favor of exposing it as a param - but what you've got 
here doesn't do anything to the public API so how does that help?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/14731
  
Ok. what is the way? Do I write a formal proposal?

Because right now there is no reliable way to get the full dependency graph 
of Spark + hadoop cloud JARs + direct cloud provider JARs (azure,aws) and their 
dependencies (jackson) in sync. 

Which means that getting Spark to talk to object stores is more miss than 
hit.

I'm happy to follow the proposal mechanism, including progress reports &c, 
but I do at least need some kind of hope that my work will actually get in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17621
  
**[Test build #76102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76102/testReport)**
 for PR 17621 at commit 
[`94006a4`](https://github.com/apache/spark/commit/94006a404aeb2c9b05643080eece64f0506ebafd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2017-04-24 Thread steveloughran
Github user steveloughran closed the pull request at:

https://github.com/apache/spark/pull/14731


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/17739
  
Thanks @MLnick . Could you please review my another PR for recommend all 
performance problem.
https://github.com/apache/spark/pull/17742.
Sorry, I forget user cannot call recommendForAll directly for this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17742: [Spark-20446][ML][MLLIB]Optimize MLLIB ALS recommendForA...

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17742
  
Interesting - I was working on something very similar - a rough draft of it 
is in a 
[branch](https://github.com/mlnick/spark/tree/SPARK-13857-als-parity-v3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17621
  
**[Test build #76102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76102/testReport)**
 for PR 17621 at commit 
[`94006a4`](https://github.com/apache/spark/commit/94006a404aeb2c9b05643080eece64f0506ebafd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76102/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17621
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17736
  
Is it? Are there any significant difference? I don't remember there is 
necessary migration from 1.6 to 2.0 for string literals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17743: [SPARK-20448][DOCS] Document how FileInputDStream...

2017-04-24 Thread steveloughran
GitHub user steveloughran opened a pull request:

https://github.com/apache/spark/pull/17743

[SPARK-20448][DOCS] Document how FileInputDStream works with object storage

Change-Id: I88c272444ca734dc2cbc2592607c11287b90a383

## What changes were proposed in this pull request?

The documentation on File DStreams is enhanced to

1. Detail the exact timestamp logic for examining directories and files.
1. Detail how object stores different from filesystems, and so how using 
them as a source of data should be treated with caution, possibly publishing 
data to the store differently (direct PUTs as opposed to stage + rename)

## How was this patch tested?

n/a

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/spark 
cloud/SPARK-20448-document-dstream-blobstore

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17743


commit c83af37fa0258f6b32676c7f5f909143cc5c6caa
Author: Steve Loughran 
Date:   2017-04-24T12:26:26Z

SPARK-20448 Document how FileInputDStream works with object storage

Change-Id: I88c272444ca734dc2cbc2592607c11287b90a383




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-24 Thread ptkool
Github user ptkool commented on the issue:

https://github.com/apache/spark/pull/17648
  
@rxin Ok. So you're proposing rewrites for these aggregates that look 
something like this?

```
some(cond)  => sum(cond) > 0
every(cond) => sum(not(cond)) = 0
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17743
  
**[Test build #76103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76103/testReport)**
 for PR 17743 at commit 
[`c83af37`](https://github.com/apache/spark/commit/c83af37fa0258f6b32676c7f5f909143cc5c6caa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-24 Thread ptkool
Github user ptkool commented on the issue:

https://github.com/apache/spark/pull/17648
  
@rxin Actually, @hvanhovell proposed the following rewrites which I think 
are better:

```
some(cond)  => max(cond) = true
every(cond) => min(cond) = true
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17743
  
**[Test build #76104 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76104/testReport)**
 for PR 17743 at commit 
[`1e620ce`](https://github.com/apache/spark/commit/1e620ceb7b5eb0df6df83525366ebc1074f8e8ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76103/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17743
  
**[Test build #76103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76103/testReport)**
 for PR 17743 at commit 
[`c83af37`](https://github.com/apache/spark/commit/c83af37fa0258f6b32676c7f5f909143cc5c6caa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17743
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-24 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/17744

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle 
service.

## What changes were proposed in this pull request?
When application contains large amount of shuffle blocks. NodeManager 
requires lots of memory to keep metadata(`FileSegmentManagedBuffer`) in 
`StreamManager`. When the number of shuffle blocks is big enough. NodeManager 
can run OOM. This pr proposes to do lazy initialization of 
`FileSegmentManagedBuffer` in shuffle service.

## How was this patch tested?

Manually test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-20426

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17744.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17744


commit 5fb91bfc5cdd588ae728a39173521279f517f20e
Author: jinxing 
Date:   2017-04-24T12:52:00Z

[SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle 
service.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-04-24 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/17744
  
Spark jobs are running on yarn cluster in my warehouse. We enabled the 
external shuffle service(--conf spark.shuffle.service.enabled=true). Recently 
NodeManager runs OOM now and then. Dumping heap memory, we find that 
OneFroOneStreamManager's footprint is huge. NodeManager is configured with 5G 
heap memory. While OneForOneManager costs 2.5G and there are 5503233 
FileSegmentManagedBuffer objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17744
  
**[Test build #76105 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76105/testReport)**
 for PR 17744 at commit 
[`5fb91bf`](https://github.com/apache/spark/commit/5fb91bfc5cdd588ae728a39173521279f517f20e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17743
  
**[Test build #76104 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76104/testReport)**
 for PR 17743 at commit 
[`1e620ce`](https://github.com/apache/spark/commit/1e620ceb7b5eb0df6df83525366ebc1074f8e8ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17621: [SPARK-6227][MLLIB][PYSPARK] Implement PySpark wrappers ...

2017-04-24 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17621
  
If no further comments I'll merge this into branch-2.2 within a few days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17743
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76104/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17743: [SPARK-20448][DOCS] Document how FileInputDStream works ...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17743
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17745: [SPARK-17159][Streaming] optimise check for new f...

2017-04-24 Thread steveloughran
GitHub user steveloughran opened a pull request:

https://github.com/apache/spark/pull/17745

[SPARK-17159][Streaming] optimise check for new files in FileInputDStream

## What changes were proposed in this pull request?

Changes to `FileInputDStream` to eliminate multiple `getFileStatus()` calls 
when scanning directories for new files.

This is a minor optimisation when working with filesystems, but significant 
when working with object stores, as it eliminates HTTP requests per source file 
scanning the system. The current cost is 1-3 probing to see if a path is a 
directory or not, one more to actually timestamp a file. The new patch gets the 
file status and retains it through all the operations, so does not need to 
re-evaluate it. 

The impact of this optimisation is 3 HTTP requests per source directory and 
1 per file, for every single directory in the scan list, and for every file in 
the scanned directories, irrespective of the age of the directories. At 100+mS 
per HEAD request against S3, the speedup is significant, even when there are 
few files in the scanned directories.

 Before

1. Two separate list operations, `globStatus()` to find directories, then 
`listStatus()` to scan for new files under directories.
1.  The path filter in the `globStatus()` operations calls 
`getFileStatus(filename)` to probe for a file being a directory;
1. `getFileStatus()` is also used in the `listStatus()` call to check the 
timestamp. 

Against an object store `getFileStatus()` can cost 1-4 HTTPS requests per 
call (HEAD path, HEAD path + "/", LIST path), 

As both list operations return an array or iterator of `FileStatus` 
objects, the operations are utterly superfluous. Instead the filtering can take
place after the listing has returned.

 After

1. The output of `globStatus()` is filtered to select only directories.
1. The output of `listStatus()` is filtered by timestamp.
1. The special failure case of `globStatus()`: no path, is handled 
specially in the warning text by saying "No Directory to scan", and omitting 
the full stack trace.
1. The `fileToModTime` map is superflous, and so deleted.

## How was this patch tested?

1. There is a new test in `org.apache.spark.streaming.InputStreamsSuite`
1. I have object store integration tests in an external repository, which 
have been used to verify functionality and that the number of HTTP requests is 
reduced when invoked against S3A endpoints.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/steveloughran/spark 
cloud/SPARK-17159-listfiles-minimal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17745


commit f3ffe1db2e5edc9b6a60fb48b34b3099853e4324
Author: Steve Loughran 
Date:   2017-04-24T13:04:04Z

SPARK-17159 minimal patch of hchanges to FileInputDStream to reduce File 
status requests when querying files. This is a minor optimisation when working 
with filesystems, but significant when working with object stores.

Change-Id: I269d98902f615818941c88de93a124c65453756e




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17740: [SPARK-20404][CORE] Using Option(name) instead of Some(n...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17740
  
**[Test build #3674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3674/testReport)**
 for PR 17740 at commit 
[`e12058c`](https://github.com/apache/spark/commit/e12058cafa8a3c6c54aedefdcc5301ac75b81869).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17745: [SPARK-17159][Streaming] optimise check for new files in...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17745
  
**[Test build #76106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76106/testReport)**
 for PR 17745 at commit 
[`f3ffe1d`](https://github.com/apache/spark/commit/f3ffe1db2e5edc9b6a60fb48b34b3099853e4324).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17582
  
changes lgtm. Did you file a jira to track changing to not use withSparkUI? 
 If user is downloading because the file is huge and takes a long time to 
render or causes history server to have issue this would hurt that use case.   
We could wait and see if someone has that use case too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17459
  
**[Test build #76107 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76107/testReport)**
 for PR 17459 at commit 
[`d692d30`](https://github.com/apache/spark/commit/d692d3031f9c57ff92b63ccce0962bc899402826).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-24 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17658
  
+1. @vanzin any further comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17582
  
Thanks @tgravescs for your comments. Do you think it is a good idea to read 
out ACLs when `mergeApplicationListing ` in 
[here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L457)
 and keep in `applications`, so that we don't need to load SparkUI to check 
ACLs when downloading event logs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17739
  
**[Test build #76108 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76108/testReport)**
 for PR 17739 at commit 
[`b4e392e`](https://github.com/apache/spark/commit/b4e392ea249d37e91995e1d604a0d463567a7624).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-04-24 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/17678
  
@rxin seems like test is not started, could you help trigger it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17745: [SPARK-17159][Streaming] optimise check for new files in...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17745
  
**[Test build #76106 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76106/testReport)**
 for PR 17745 at commit 
[`f3ffe1d`](https://github.com/apache/spark/commit/f3ffe1db2e5edc9b6a60fb48b34b3099853e4324).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17729
  
**[Test build #76109 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76109/testReport)**
 for PR 17729 at commit 
[`ce0c4b6`](https://github.com/apache/spark/commit/ce0c4b62fb95575e3df27c8e30511ecfa769af98).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17745: [SPARK-17159][Streaming] optimise check for new files in...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17745
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17745: [SPARK-17159][Streaming] optimise check for new files in...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17745
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76106/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/17582
  
As @vanzin said I think this is fine for now to get this fixed quickly, but 
filing a follow up jira makes sense.Actually this might be good to get into 
the 2.1.1 release if they are going to spin another rc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17746: [SPARK-20449][ML] Upgrade breeze version to 0.13....

2017-04-24 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/17746

[SPARK-20449][ML] Upgrade breeze version to 0.13.1

## What changes were proposed in this pull request?
Upgrade breeze version to 0.13.1, which fixed some critical bugs of 
L-BFGS-B.

## How was this patch tested?
Existing unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-20449

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17746.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17746


commit aeb7eb588fa7779cf64e51fcfa083056e3d8ccbf
Author: Yanbo Liang 
Date:   2017-04-24T14:36:20Z

Upgrade breeze version to 0.13.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17746: [SPARK-20449][ML] Upgrade breeze version to 0.13.1

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17746
  
**[Test build #76110 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76110/testReport)**
 for PR 17746 at commit 
[`aeb7eb5`](https://github.com/apache/spark/commit/aeb7eb588fa7779cf64e51fcfa083056e3d8ccbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17459
  
**[Test build #76107 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76107/testReport)**
 for PR 17459 at commit 
[`d692d30`](https://github.com/apache/spark/commit/d692d3031f9c57ff92b63ccce0962bc899402826).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-04-24 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/15009
  
@kishorvpatil  please fix documentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76107/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17459
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17744
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76105/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17729: [SPARK-20438][R] SparkR wrappers for split and repeat

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17729
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76109/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >