[GitHub] spark issue #17715: [SPARK-20047][ML] Constrained Logistic Regression

2017-04-24 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17715
  
Many use-cases are setting the bounds as a constant instead of setting each 
dimensional individually. Maybe we can add the following APIs.

```scala
def setLowerBoundsOnIntercepts(bound: Double)
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17342
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17342
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76128/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17342
  
**[Test build #76128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76128/testReport)**
 for PR 17342 at commit 
[`fb1ee81`](https://github.com/apache/spark/commit/fb1ee811e12f05c5d31880e6d88f306148612c18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17748: [SPARK-19812] YARN shuffle service fails to reloc...

2017-04-24 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/17748#discussion_r113115850
  
--- Diff: 
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
 ---
@@ -363,25 +362,29 @@ protected File initRecoveryDb(String dbFileName) {
   // make sure to move all DBs to the recovery path from the old 
NM local dirs.
   // If another DB was initialized first just make sure all the 
DBs are in the same
   // location.
-  File newLoc = new File(_recoveryPath.toUri().getPath(), 
dbFileName);
-  if (!newLoc.equals(f)) {
+  Path newLoc = new Path(_recoveryPath, dbName);
+  Path copyFrom = new Path(f.toURI()); 
+  if (!newLoc.equals(copyFrom)) {
+logger.info("Moving " + copyFrom + " to: " + newLoc); 
 try {
-  Files.move(f.toPath(), newLoc.toPath());
+  // The move here needs to handle moving non-empty 
directories across NFS mounts
+  FileSystem fs = FileSystem.getLocal(_conf);
+  fs.rename(copyFrom, newLoc);
--- End diff --

How much more expensive is this for non nfs cases ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17756: [SPARK-20455][DOCS] Fix Broken Docker IT Docs

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17756
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17756: [SPARK-20455][DOCS] Fix Broken Docker IT Docs

2017-04-24 Thread original-brownbear
GitHub user original-brownbear opened a pull request:

https://github.com/apache/spark/pull/17756

[SPARK-20455][DOCS] Fix Broken Docker IT Docs

## What changes were proposed in this pull request?

Just added the Maven `test`goal.

## How was this patch tested?

No test needed, just a trivial documentation fix.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/spark SPARK-20455

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17756


commit 01fdcea9c8a62e6800fcc73a7137672fbf77e2cd
Author: Armin Braun 
Date:   2017-04-25T06:10:15Z

[SPARK-20455][DOCS] Fix Broken Docker IT Docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17737
  
The point here is to fix Python documentation and it matches if there are 
some mismatches at `bitwiseOR`, `bitwiseAND`, `bitwiseXOR` `contains`, `asc` 
and `desc` among `functions.py`, `column.py`, `functions.scala` and 
`Column.scala`. 

I hope other extra changes do not hold off this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r113112450
  
--- Diff: python/pyspark/sql/column.py ---
@@ -527,7 +583,7 @@ def _test():
 .appName("sql.column tests")\
 .getOrCreate()
 sc = spark.sparkContext
-globs['sc'] = sc
+globs['spark'] = spark
 globs['df'] = sc.parallelize([(2, 'Alice'), (5, 'Bob')]) \
--- End diff --

Maybe we could. I think this is not related with Python documentation fix 
BTW.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...

2017-04-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17755
  
CC @vanzin , this backport can be merged to branch 2.0 cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17680: [SPARK-20364][SQL] Support Parquet predicate pushdown on...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17680
  
@liancheng and @davies, if you are not sure of this way, I could simply 
avoid to push down the filters in this case for now. Please let me know. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-24 Thread map222
Github user map222 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r113110805
  
--- Diff: python/pyspark/sql/column.py ---
@@ -527,7 +583,7 @@ def _test():
 .appName("sql.column tests")\
 .getOrCreate()
 sc = spark.sparkContext
-globs['sc'] = sc
+globs['spark'] = spark
 globs['df'] = sc.parallelize([(2, 'Alice'), (5, 'Bob')]) \
--- End diff --

Do you want to update the `globs['df']` definition to 
`spark.createDataFrame`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17737
  
Thank you for your review and approval @felixcheung, @zero323 and @map222.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76129/testReport)**
 for PR 17737 at commit 
[`eaeb456`](https://github.com/apache/spark/commit/eaeb4564562272ae021fa1a7a8a083ccc56e5c33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r113109064
  
--- Diff: python/pyspark/sql/column.py ---
@@ -288,8 +324,16 @@ def __iter__(self):
 >>> df.filter(df.name.endswith('ice$')).collect()
 []
 """
+_contains_doc = """
+Contains the other element. Returns a boolean :class:`Column` based on 
a string match.
+
+:param other: string in line
+
+>>> df.filter(df.name.contains('o')).collect()
+[Row(age=5, name=u'Bob')]
+"""
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r113109049
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -1008,7 +1009,7 @@ class Column(val expr: Expression) extends Logging {
   def cast(to: String): Column = cast(CatalystSqlParser.parseDataType(to))
 
   /**
-   * Returns an ordering used in sorting.
+   * Returns a sort expression based on the descending order of the column.
--- End diff --

Yea, that sounds good in a way but the downside of adding examples is to 
maintain and keep them up to date. Let's leave them out here as this PR targets 
to fix Python documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r113108974
  
--- Diff: python/pyspark/sql/column.py ---
@@ -251,15 +285,16 @@ def __iter__(self):
 
 # string methods
 _rlike_doc = """
-Return a Boolean :class:`Column` based on a regex match.
+SQL RLIKE expression (LIKE with Regex). Returns a boolean 
:class:`Column` based on a regex
--- End diff --

Let's leave so that it indicates the regular expression is in SQL syntax. I 
would like to keep them identically in most cases to reduce the overhead when 
someone needs to sweep the documentation.

It looks there are few places that needs the clarification. If this is 
something that has to be done, then, let's do this in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17755
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17755
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76127/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17755
  
**[Test build #76127 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76127/testReport)**
 for PR 17755 at commit 
[`2fc1525`](https://github.com/apache/spark/commit/2fc1525c4e0f55a684bc894403694fcfac8f878e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17751: [SPARK-20451] Filter out nested mapType datatypes...

2017-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17751


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...

2017-04-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17751
  
thanks, merging to master/2.2/2.1/2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17753: [SPARK-20453] Bump master branch version to 2.3.0...

2017-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17753


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...

2017-04-24 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17753
  
Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17736
  
It is. But it has no problem for normal string literal. It causes problem 
only if the string literal is used as regex pattern string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17467: [SPARK-20140][DStream] Remove hardcoded kinesis retry wa...

2017-04-24 Thread yssharma
Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/17467
  
@brkyvz - Added new changes that adds -
- A case class `KinesisReadConfigurations` that adds all the kinesis read 
configs in a single place
- A test class that passes the kinesis configs in `SparkConf` which are 
then used to create the kinesis configs object in `KinesisInputDStream` and 
passed down to `KinesisBackedBlockRDD`
- Docs improvement

I also played with the `PrivateMethodTester ` but wasn't able to access the 
private function `KinesisSequenceRangeIterator#retryOrTimeout` . Probably 
because of the generics used in the function. I used an alternative to fetch 
the RDD's directly and check the configs passed in there.
I would still like to learn how to get the `retryOrTimeout` working just 
out of interest. Adding the error below:

```
// KinesisSequenceRangeIterator # retryOrTimeout
val retryOrTimeoutMethod = PrivateMethod[Object]('retryOrTimeout) // 
<<<- Issue

val partitions = kinesisRDD.partitions.map {
  _.asInstanceOf[KinesisBackedBlockRDDPartition] }.toSeq

seqNumRanges1.ranges.map{ range =>
  val seqRangeIter =
new 
KinesisSequenceRangeIterator(DefaultCredentials.provider.getCredentials,
dummyEndpointUrl, dummyRegionName, range, 
kinesisRDD.kinesisReadConfigs)

  seqRangeIter.invokePrivate(retryOrTimeoutMethod("Passing custom 
message"))

}



  - Kinesis read with custom configurations *** FAILED ***
  java.lang.IllegalArgumentException: Can't find a private method named: 
retryOrTimeout
  at 
org.scalatest.PrivateMethodTester$Invoker.invokePrivate(PrivateMethodTester.scala:247)
  at 
org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7$$anonfun$apply$mcV$sp$13.apply(KinesisStreamSuite.scala:286)
  at 
org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7$$anonfun$apply$mcV$sp$13.apply(KinesisStreamSuite.scala:281)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.immutable.List.map(List.scala:285)
  at 
org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7.apply$mcV$sp(KinesisStreamSuite.scala:281)
  at 
org.apache.spark.streaming.kinesis.KinesisStreamTests$$anonfun$7.apply(KinesisStreamSuite.scala:237)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17342
  
**[Test build #76128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76128/testReport)**
 for PR 17342 at commit 
[`fb1ee81`](https://github.com/apache/spark/commit/fb1ee811e12f05c5d31880e6d88f306148612c18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-24 Thread weiqingy
Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r113103389
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   case ae: AnalysisException => assert(ae.plan == null && 
ae.getMessage == ae.getSimpleMessage)
 }
   }
+
+  test("SPARK-12868: Allow adding jars from hdfs ") {
+val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
+val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
+
+// if 'hdfs' is not supported, MalformedURLException will be thrown
+new URL(jarFromHdfs)
+var exceptionThrown: Boolean = false
--- End diff --

Thanks. PR has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/17222
  
@holdenk The link you pasted is for the case that using scala closure to 
create udf. While `registerJava` use java reflection to create udf. This is 
what I use in `registerJava` 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L528
 It returns Unit.
Maybe it is possible to create `registerScala` to return scala udf. But it 
seems it is not possible for java udf. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17459: [SPARK-20109][MLlib] Rewrote toBlockMatrix method on Ind...

2017-04-24 Thread johnc1231
Github user johnc1231 commented on the issue:

https://github.com/apache/spark/pull/17459
  
@viirya I fixed the test as you asked, so please take a look when you get a 
chance. I'm having a little bit of trouble with my local spark build for some 
reason, but I'll do that other benchmark when it's resolved. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17736
  
isn't the regex parsed as string literal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17714: [SPARK-20428][Core]REST interface about 'v1/submissions/...

2017-04-24 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/17714
  
Help with code review.Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17698: [SPARK-20403][SQL][Documentation]Modify the instructions...

2017-04-24 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/17698
  
@srowen  test is not started, could you help trigger it ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17755: [SPARK-20239][CORE][2.1-backport] Improve HistoryServer'...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17755
  
**[Test build #76127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76127/testReport)**
 for PR 17755 at commit 
[`2fc1525`](https://github.com/apache/spark/commit/2fc1525c4e0f55a684bc894403694fcfac8f878e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17755: [SPARK-20239][CORE][2.1-backport] Improve History...

2017-04-24 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/17755

[SPARK-20239][CORE][2.1-backport] Improve HistoryServer's ACL mechanism

Current SHS (Spark History Server) has two different ACLs:

* ACL of base URL, it is controlled by "spark.acls.enabled" or 
"spark.ui.acls.enabled", and with this enabled, only user configured with 
"spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user 
who started SHS could list all the applications, otherwise none of them can be 
listed. This will also affect REST APIs which listing the summary of all apps 
and one app.
* Per application ACL. This is controlled by 
"spark.history.ui.acls.enabled". With this enabled only history admin user and 
user/group who ran this app can access the details of this app.

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no 
view permission. User "A" cannot see the app list but could still access 
details of it's own app.
2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" 
could download any application's event log, even it is not run by user "A".
3. The changes of Live UI's ACL will affect History UI's ACL which share 
the same conf file.

The unexpected behaviors is mainly because we have two different ACLs, 
ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, here in this PR proposed to:

1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" 
for history server.
2. Check permission for event-log download REST API.

With this PR:

1. Admin user could see/download the list of all applications, as well as 
application details.
2. Normal user could see the list of all applications, but can only 
download and check the details of applications accessible to him.

New UTs are added, also verified in real cluster.

CC tgravescs vanzin please help to review, this PR changes the semantics 
you did previously. Thanks a lot.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark 
SPARK-20239-2.1-backport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17755


commit 2fc1525c4e0f55a684bc894403694fcfac8f878e
Author: jerryshao 
Date:   2017-04-25T01:18:59Z

[SPARK-20239][CORE] Improve HistoryServer's ACL mechanism

Current SHS (Spark History Server) two different ACLs:

* ACL of base URL, it is controlled by "spark.acls.enabled" or 
"spark.ui.acls.enabled", and with this enabled, only user configured with 
"spark.admin.acls" (or group) or "spark.ui.view.acls" (or group), or the user 
who started SHS could list all the applications, otherwise none of them can be 
listed. This will also affect REST APIs which listing the summary of all apps 
and one app.
* Per application ACL. This is controlled by 
"spark.history.ui.acls.enabled". With this enabled only history admin user and 
user/group who ran this app can access the details of this app.

With this two ACLs, we may encounter several unexpected behaviors:

1. if base URL's ACL (`spark.acls.enable`) is enabled but user A has no 
view permission. User "A" cannot see the app list but could still access 
details of it's own app.
2. if ACLs of base URL (`spark.acls.enable`) is disabled, then user "A" 
could download any application's event log, even it is not run by user "A".
3. The changes of Live UI's ACL will affect History UI's ACL which share 
the same conf file.

The unexpected behaviors is mainly because we have two different ACLs, 
ideally we should have only one to manage all.

So to improve SHS's ACL mechanism, here in this PR proposed to:

1. Disable "spark.acls.enable" and only use "spark.history.ui.acls.enable" 
for history server.
2. Check permission for event-log download REST API.

With this PR:

1. Admin user could see/download the list of all applications, as well as 
application details.
2. Normal user could see the list of all applications, but can only 
download and check the details of applications accessible to him.

New UTs are added, also verified in real cluster.

CC tgravescs vanzin please help to review, this PR changes the semantics 
you did previously. Thanks a lot.

Author: jerryshao 

Closes #17582 from jerryshao/SPARK-20239.

Change-Id: I65d5d0c5e5a76f08abbe2b7dd43a2e08d295f6b6




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have thi

[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76122/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17753
  
**[Test build #76122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76122/testReport)**
 for PR 17753 at commit 
[`983f746`](https://github.com/apache/spark/commit/983f74659a310a970280ae3696ee40e244cf67a0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17750: [SPARK-4899][MESOS] Support for checkpointing on Coarse ...

2017-04-24 Thread lhoss
Github user lhoss commented on the issue:

https://github.com/apache/spark/pull/17750
  
would be great to have this soon in 2.2.x (maybe even backported to 2.1.x)
many accepted reviews already in https://github.com/metamx/spark/pull/26


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17751
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76123/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17751
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17751
  
**[Test build #76123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76123/testReport)**
 for PR 17751 at commit 
[`b9dbb9c`](https://github.com/apache/spark/commit/b9dbb9c9515b2b53cc03e59935cca740a2a56f44).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17582
  
OK, let me try it, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76121/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17698: [SPARK-20403][SQL][Documentation]Modify the instructions...

2017-04-24 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/17698
  
can Jenkins to test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17640
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17640: [SPARK-17608][SPARKR]:Long type has incorrect serializat...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17640
  
**[Test build #76121 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76121/testReport)**
 for PR 17640 at commit 
[`331b781`](https://github.com/apache/spark/commit/331b781f4d396e4dcf981d3b50ba63e770ec9880).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17582
  
It would be good, but maybe the 2.1 backport will merge cleanly to 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17582
  
What about branch 2.0, do we also need to backport to it @vanzin ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113092246
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkAppHandle.java ---
@@ -95,7 +95,8 @@ public boolean isFinal() {
   void kill();
 
   /**
-   * Disconnects the handle from the application, without stopping it. 
After this method is called,
+   * Disconnects the handle from the application. If using {@link 
SparkLauncher#autoShutdown()}
--- End diff --

Sorry, I thought it was Scala one. I just checked it passes the doc build 
locally with the suggestion above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17725#discussion_r113091533
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -404,6 +407,37 @@ class SparkSubmitSuite
 runSparkSubmit(args)
   }
 
+  test("launch simple application with spark-submit with redaction") {
+val testDir = Utils.createTempDir()
+testDir.deleteOnExit()
+val testDirPath = new Path(testDir.getAbsolutePath())
+val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+val fileSystem = Utils.getHadoopFileSystem("/",
+  SparkHadoopUtil.get.newConfiguration(new SparkConf()))
+try {
+  val args = Seq(
+"--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
+"--name", "testApp",
+"--master", "local",
+"--conf", "spark.ui.enabled=false",
+"--conf", "spark.master.rest.enabled=false",
+"--conf", 
"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password",
+"--conf", "spark.eventLog.enabled=true",
+"--conf", "spark.eventLog.testing=true",
+"--conf", s"spark.eventLog.dir=${testDirPath.toUri.toString}",
+"--conf", "spark.hadoop.fs.defaultFS=unsupported://example.com",
+unusedJar.toString)
+  runSparkSubmit(args)
+  val listStatuses = fileSystem.listStatus(testDirPath)
--- End diff --

s/listStatuses/something else.

Use list, statuses, statusList, but "listStatuses" doesn't parse for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17725#discussion_r113091328
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2606,8 +2606,22 @@ private[spark] object Utils extends Logging {
   }
 
   private def redact(redactionPattern: Regex, kvs: Seq[(String, String)]): 
Seq[(String, String)] = {
+// If the sensitive information regex matches with either the key or 
the value, redact the value
+// While the original intent was to only redact the value if the key 
matched with the regex,
+// we've found that especially in verbose mode, the value of the 
property may contain sensitive
+// information like so:
+// "sun.java.command":"org.apache.spark.deploy.SparkSubmit ... \
+// --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password 
...
+//
+// And, in such cases, simply searching for the sensitive information 
regex in the key name is
+// not sufficient. The values themselves have to be searched as well 
and redacted if matched.
+// This does mean we may be accounting more false positives - for 
example, if the value of an
+// arbitrary property contained the term 'password', we may redact the 
value from the UI and
+// logs. In order to work around it, user would have to make the 
spark.redaction.regex property
+// more specific.
 kvs.map { kv =>
--- End diff --

Since you're looking at values now...

`.map { case (key, value) =>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17725#discussion_r113091235
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -252,11 +252,17 @@ private[spark] class EventLoggingListener(
 
   private[spark] def redactEvent(
   event: SparkListenerEnvironmentUpdate): 
SparkListenerEnvironmentUpdate = {
-// "Spark Properties" entry will always exist because the map is 
always populated with it.
-val redactedProps = Utils.redact(sparkConf, 
event.environmentDetails("Spark Properties"))
-val redactedEnvironmentDetails = event.environmentDetails +
-  ("Spark Properties" -> redactedProps)
-SparkListenerEnvironmentUpdate(redactedEnvironmentDetails)
+// environmentDetails maps a string descriptor to a set of properties
+// Similar to:
+// "JVM Information" -> jvmInformation,
+// "Spark Properties" -> sparkProperties,
+// ...
+// where jvmInformation, sparkProperties, etc. are sequence of tuples.
+// We go through the various  of properties and redact sensitive 
information from them.
+val redactedProps = event.environmentDetails.map{
--- End diff --

`.map { case (name, props) =>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17725: [SPARK-20435][CORE] More thorough redaction of se...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17725#discussion_r113091382
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -404,6 +407,37 @@ class SparkSubmitSuite
 runSparkSubmit(args)
   }
 
+  test("launch simple application with spark-submit with redaction") {
+val testDir = Utils.createTempDir()
+testDir.deleteOnExit()
+val testDirPath = new Path(testDir.getAbsolutePath())
+val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+val fileSystem = Utils.getHadoopFileSystem("/",
+  SparkHadoopUtil.get.newConfiguration(new SparkConf()))
+try {
+  val args = Seq(
+"--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
+"--name", "testApp",
+"--master", "local",
+"--conf", "spark.ui.enabled=false",
+"--conf", "spark.master.rest.enabled=false",
+"--conf", 
"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password",
+"--conf", "spark.eventLog.enabled=true",
+"--conf", "spark.eventLog.testing=true",
+"--conf", s"spark.eventLog.dir=${testDirPath.toUri.toString}",
+"--conf", "spark.hadoop.fs.defaultFS=unsupported://example.com",
+unusedJar.toString)
+  runSparkSubmit(args)
+  val listStatuses = fileSystem.listStatus(testDirPath)
+  val logData = 
EventLoggingListener.openEventLog(listStatuses.last.getPath, fileSystem)
+  Source.fromInputStream(logData).getLines().foreach {
--- End diff --

`.foreach { line =>`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76126/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17754
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17754
  
**[Test build #76126 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76126/testReport)**
 for PR 17754 at commit 
[`771e490`](https://github.com/apache/spark/commit/771e490fd46c277479b4a06cfa6bb166d1f62856).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17582: [SPARK-20239][Core] Improve HistoryServer's ACL m...

2017-04-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17582


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17582
  
No luck with 2.1, please file a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17582
  
LGTM. Merging to master / 2.2, will try 2.1 and 2.0 too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17222
  
@zjffdu - if you look at 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala#L129
  though it returns the `UserDefinedFunction` (currently the Python one is Unit 
but it would be more useful if it returned a `UserDefinedFunction`). I think to 
make it easier for people to take advantage of Java UDFs we would want them to 
be able to use it programmatic ally in the Dataframe DSL not just in SQL string 
expressions.

What do you think @gatorsmile & @zjffdu ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17342#discussion_r113088010
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   case ae: AnalysisException => assert(ae.plan == null && 
ae.getMessage == ae.getSimpleMessage)
 }
   }
+
+  test("SPARK-12868: Allow adding jars from hdfs ") {
+val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
+val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
+
+// if 'hdfs' is not supported, MalformedURLException will be thrown
+new URL(jarFromHdfs)
+var exceptionThrown: Boolean = false
--- End diff --

Replace this whole block with:

```
intercept[MalformedURLException] {
  new URL(jarFromInvalidFs)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-24 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/17540
  
The current `withNewExecutionId` issue is that it doesn't support nested 
QueryExecution. I'm wondering if you can really fix this issue without 
introducing regression, e.g., track the nested QueryExecution and display them 
properly in the UI.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-04-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r113087928
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -161,50 +161,51 @@ object FileFormatWriter extends Logging {
   }
 }
 
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
-  // This call shouldn't be put into the `try` block below because it 
only initializes and
-  // prepares the job, any exception thrown from here shouldn't cause 
abortJob() to be called.
-  committer.setupJob(job)
-
-  try {
-val rdd = if (orderingMatched) {
-  queryExecution.toRdd
-} else {
-  SortExec(
-requiredOrdering.map(SortOrder(_, Ascending)),
-global = false,
-child = queryExecution.executedPlan).execute()
-}
-val ret = new Array[WriteTaskResult](rdd.partitions.length)
-sparkSession.sparkContext.runJob(
-  rdd,
-  (taskContext: TaskContext, iter: Iterator[InternalRow]) => {
-executeTask(
-  description = description,
-  sparkStageId = taskContext.stageId(),
-  sparkPartitionId = taskContext.partitionId(),
-  sparkAttemptNumber = taskContext.attemptNumber(),
-  committer,
-  iterator = iter)
-  },
-  0 until rdd.partitions.length,
-  (index, res: WriteTaskResult) => {
-committer.onTaskCommit(res.commitMsg)
-ret(index) = res
-  })
-
-val commitMsgs = ret.map(_.commitMsg)
-val updatedPartitions = ret.flatMap(_.updatedPartitions)
-  .distinct.map(PartitioningUtils.parsePathFragment)
-
-committer.commitJob(job, commitMsgs)
-logInfo(s"Job ${job.getJobID} committed.")
-refreshFunction(updatedPartitions)
-  } catch { case cause: Throwable =>
-logError(s"Aborting job ${job.getJobID}.", cause)
-committer.abortJob(job)
-throw new SparkException("Job aborted.", cause)
+// During tests, make sure there is an execution ID.
+SQLExecution.checkSQLExecutionId(sparkSession)
--- End diff --

To make SQL metrics work, we should always wrap the correct QueryExecution 
with `SparkListenerSQLExecutionStart`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17658#discussion_r113087506
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ApplicationEventListener.scala 
---
@@ -57,4 +58,10 @@ private[spark] class ApplicationEventListener extends 
SparkListener {
   adminAclsGroups = allProperties.get("spark.admin.acls.groups")
 }
   }
+
+  override def onOtherEvent(event:SparkListenerEvent):Unit = event match {
--- End diff --

nit: space after `:`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113086978
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/launcher/YarnCommandBuilderUtils.scala
 ---
@@ -17,10 +17,11 @@
 
 package org.apache.spark.launcher
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable.ListBuffer
 import scala.util.Properties
 
+import org.apache.spark.SparkConf
+
--- End diff --

nit: one too many empty lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113086177
  
--- Diff: 
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java ---
@@ -183,6 +183,28 @@ public void testChildProcLauncher() throws Exception {
 assertEquals(0, app.waitFor());
   }
 
+  @Test
+  public void testThreadLauncher() throws Exception {
+// This test is failed on Windows due to the failure of initiating 
executors
+// by the path length limitation. See SPARK-18718.
+assumeTrue(!Utils.isWindows());
+
+launcher
+  .setMaster("local")
+  .setAppResource(SparkLauncher.NO_RESOURCE)
+  .setConf(SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS,
+"-Dfoo=bar -Dtest.appender=childproc")
+  .setConf(SparkLauncher.DRIVER_EXTRA_CLASSPATH, 
System.getProperty("java.class.path"))
+  .setMainClass(SparkLauncherTestApp.class.getName())
+  .launchAsThread(true)
+  .addAppArgs("proc");
+final Process app = launcher.launch();
--- End diff --

What is this testing? `launch()` will always launch a child process. Which 
indicates two problems:

- this test is not testing anything that hasn't been tested before.
- `SparkLauncher` should probably be throwing an error if you use 
`.launchAsThread(true)` and then call `launch()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113086519
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -488,11 +549,24 @@ public Process launch() throws IOException {
* In all cases, the logger name will start with 
"org.apache.spark.launcher.app", to fit more
* easily into the configuration of commonly-used logging systems.
*
+   * If the application is launched as a thread, the log redirection 
methods are not supported,
+   * and the parent process's output and log configuration will be used.
+   *
* @since 1.6.0
* @param listeners Listeners to add to the handle before the app is 
launched.
* @return A handle for the launched application.
*/
   public SparkAppHandle startApplication(SparkAppHandle.Listener... 
listeners) throws IOException {
+if (launchAsThread) {
+  checkArgument(builder.childEnv.isEmpty(),
+"Environment variables are not supported while launching as 
Thread");
--- End diff --

s/Environment variables/Custom environment variables
s/as Thread/in a thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113085844
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -725,9 +722,15 @@ object SparkSubmit extends CommandLineUtils {
   printWarning("Subclasses of scala.App may not work correctly. Use a 
main() method instead.")
 }
 
-val mainMethod = mainClass.getMethod("main", new 
Array[String](0).getClass)
-if (!Modifier.isStatic(mainMethod.getModifiers)) {
-  throw new IllegalStateException("The main method in the given main 
class must be static")
+val sparkAppMainMethod = mainClass.getMethods().find(_.getName == 
"sparkMain")
+val childSparkConf = sysProps.filter{ p => p._1.startsWith("spark.") 
}.toMap
--- End diff --

nit: again, please address the feedback that is given. I've lost count of 
how many times I've pointed out that there's a missing space between `filter` 
and `{`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113086823
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/package-info.java ---
@@ -21,12 +21,14 @@
  * 
  * This library allows applications to launch Spark programmatically. 
There's only one entry
  * point to the library - the {@link 
org.apache.spark.launcher.SparkLauncher} class.
+ * Under YARN manager cluster mode, it supports launching in Application 
in thread or
--- End diff --

Delete this sentence. It's actually not correct on top of being a little 
confusing.

You can launch any application in a child thread. What YARN cluster mode 
currently gives you is that it's safe to launch multiple applications as child 
threads in the same process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113087052
  
--- Diff: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala
 ---
@@ -201,6 +192,71 @@ class YarnClusterSuite extends BaseYarnClusterSuite {
 finalState should be (SparkAppHandle.State.FAILED)
   }
 
+  test("monitor app running in thread using launcher library") {
+var handle : SparkAppHandle = null
+try {
+  handle = launchSparkAppWithConf(true, false, "cluster")
+  handle.stop()
+
+  eventually(timeout(30 seconds), interval(100 millis)) {
+handle.getState() should be (SparkAppHandle.State.KILLED)
+  }
+} finally {
+  handle.kill()
+}
+  }
+
+  test("monitor app using launcher library for proc with auto shutdown") {
+var handle : SparkAppHandle = null
+try {
+  handle = launchSparkAppWithConf(false, true, "cluster")
+  handle.disconnect()
+  val applicationId = ConverterUtils.toApplicationId(handle.getAppId)
+  val yarnClient: YarnClient = getYarnClient
+  eventually(timeout(30 seconds), interval(100 millis)) {
+handle.getState() should be (SparkAppHandle.State.LOST)
+var status = 
yarnClient.getApplicationReport(applicationId).getFinalApplicationStatus()
+status should be (FinalApplicationStatus.KILLED)
+  }
+} finally {
+  handle.kill()
+}
+  }
+
+  test("monitor app using launcher library for thread with auto shutdown") 
{
+var handle : SparkAppHandle = null
--- End diff --

`val handle = launchSparkAppWithConf(true, true, "cluster")`

Otherwise your `finally` block can throw an NPE. Also happens elsewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113086469
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -107,6 +119,34 @@ public static void setConfig(String name, String 
value) {
 launcherConfig.put(name, value);
   }
 
+
+
+  /**
+   * Specifies that Spark Application be stopped if current process goes 
away.
+   * It tries stop/kill Spark Application if launching process goes away.
+   *
+   * @since 2.2.0
+   * @param autoShutdown Flag for shutdown Spark Application if launcher 
process goes away.
+   * @return This launcher.
+   */
+  public SparkLauncher autoShutdown(boolean autoShutdown) {
+this.autoShutdown = autoShutdown;
+return this;
+  }
+
+  /**
+   * Specifies that Spark Submit be launched as a daemon thread. Please 
note
+   * this feature is currently supported only for YARN cluster deployment 
mode.
+   *
+   * @since 2.2.0
+   * @param launchAsThread Flag for launching app as a thread.
--- End diff --

"Whether to launch the Spark application in a new thread in the same 
process."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113086407
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -107,6 +119,34 @@ public static void setConfig(String name, String 
value) {
 launcherConfig.put(name, value);
   }
 
+
+
+  /**
+   * Specifies that Spark Application be stopped if current process goes 
away.
+   * It tries stop/kill Spark Application if launching process goes away.
+   *
+   * @since 2.2.0
+   * @param autoShutdown Flag for shutdown Spark Application if launcher 
process goes away.
--- End diff --

"Whether to shut down the Spark application if the launcher process goes 
away."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17754
  
**[Test build #76126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76126/testReport)**
 for PR 17754 at commit 
[`771e490`](https://github.com/apache/spark/commit/771e490fd46c277479b4a06cfa6bb166d1f62856).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17752
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17752
  
**[Test build #76124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76124/testReport)**
 for PR 17752 at commit 
[`b59573f`](https://github.com/apache/spark/commit/b59573f5ae827e7cb14757297d6bf092bd7f21aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17752
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76124/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15009: [SPARK-17443][SPARK-11035] Stop Spark Application...

2017-04-24 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15009#discussion_r113085569
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkAppHandle.java ---
@@ -95,7 +95,8 @@ public boolean isFinal() {
   void kill();
 
   /**
-   * Disconnects the handle from the application, without stopping it. 
After this method is called,
+   * Disconnects the handle from the application. If using {@link 
SparkLauncher#autoShutdown()}
--- End diff --

That's because the method is `autoShutdown(boolean)` and not 
`autoShutdown()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-04-24 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17222#discussion_r113085517
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
 case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
-case n => logError(s"UDF class with ${n} type arguments is not 
supported ")
+case n =>
+  throw new IOException(s"UDF class with ${n} type arguments 
is not supported.")
   }
 } catch {
   case e @ (_: InstantiationException | _: 
IllegalArgumentException) =>
-logError(s"Can not instantiate class ${className}, please make 
sure it has public non argument constructor")
+throw new IOException(s"Can not instantiate class 
${className}, please make sure it has public non argument constructor")
 }
   }
 } catch {
-  case e: ClassNotFoundException => logError(s"Can not load class 
${className}, please make sure it is on the classpath")
+  case e: ClassNotFoundException => throw new IOException(s"Can not 
load class ${className}, please make sure it is on the classpath")
 }
 
   }
 
   /**
+   * Register a Java UDAF class using reflection, for use from pyspark
+   *
+   * @param name UDAF name
+   * @param classNamefully qualified class name of UDAF
--- End diff --

@since is needed for private function ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17582: [SPARK-20239][Core] Improve HistoryServer's ACL mechanis...

2017-04-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17582
  
OK, thanks @tgravescs .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17754
  
**[Test build #76125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76125/testReport)**
 for PR 17754 at commit 
[`dbff961`](https://github.com/apache/spark/commit/dbff96111fd00c2127afe2a46515efc163aa36b8).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17754
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76125/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17754
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17754: [FollowUp][SPARK-18901][ML]: Require in LR LogisticAggre...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17754
  
**[Test build #76125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76125/testReport)**
 for PR 17754 at commit 
[`dbff961`](https://github.com/apache/spark/commit/dbff96111fd00c2127afe2a46515efc163aa36b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operat...

2017-04-24 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/17540#discussion_r113084795
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -161,50 +161,51 @@ object FileFormatWriter extends Logging {
   }
 }
 
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
-  // This call shouldn't be put into the `try` block below because it 
only initializes and
-  // prepares the job, any exception thrown from here shouldn't cause 
abortJob() to be called.
-  committer.setupJob(job)
-
-  try {
-val rdd = if (orderingMatched) {
-  queryExecution.toRdd
-} else {
-  SortExec(
-requiredOrdering.map(SortOrder(_, Ascending)),
-global = false,
-child = queryExecution.executedPlan).execute()
-}
-val ret = new Array[WriteTaskResult](rdd.partitions.length)
-sparkSession.sparkContext.runJob(
-  rdd,
-  (taskContext: TaskContext, iter: Iterator[InternalRow]) => {
-executeTask(
-  description = description,
-  sparkStageId = taskContext.stageId(),
-  sparkPartitionId = taskContext.partitionId(),
-  sparkAttemptNumber = taskContext.attemptNumber(),
-  committer,
-  iterator = iter)
-  },
-  0 until rdd.partitions.length,
-  (index, res: WriteTaskResult) => {
-committer.onTaskCommit(res.commitMsg)
-ret(index) = res
-  })
-
-val commitMsgs = ret.map(_.commitMsg)
-val updatedPartitions = ret.flatMap(_.updatedPartitions)
-  .distinct.map(PartitioningUtils.parsePathFragment)
-
-committer.commitJob(job, commitMsgs)
-logInfo(s"Job ${job.getJobID} committed.")
-refreshFunction(updatedPartitions)
-  } catch { case cause: Throwable =>
-logError(s"Aborting job ${job.getJobID}.", cause)
-committer.abortJob(job)
-throw new SparkException("Job aborted.", cause)
+// During tests, make sure there is an execution ID.
+SQLExecution.checkSQLExecutionId(sparkSession)
--- End diff --

The major issue is this change. For all queries using FileFormatWriter, we 
won't get any metrics because of 
https://github.com/apache/spark/blob/7536e2849df6d63587fbf16b4ecb5db06fed7125/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L139
 . It creates a new QueryExecution and we don't track it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17754: [FollowUp][SPARK-18901][ML]: Require in LR Logist...

2017-04-24 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/17754

[FollowUp][SPARK-18901][ML]: Require in LR LogisticAggregator is redundant

## What changes were proposed in this pull request?

This is a follow-up PR of #17478. 

## How was this patch tested?

Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17754


commit dbff96111fd00c2127afe2a46515efc163aa36b8
Author: wangmiao1981 
Date:   2017-04-25T00:11:08Z

remove extra require check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17625: [SPARK-9103][WIP] Add Memory Tracking UI and track Netty...

2017-04-24 Thread jsoltren
Github user jsoltren commented on the issue:

https://github.com/apache/spark/pull/17625
  
This PR was closed, so, I'll create a new one focusing on just the back end 
pieces. I'll create a fresh JIRA for more general memory tracking improvements 
to the UI where we can hash out more of the details. The UI has changed quite a 
lot since the original PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17752: [SPARK-20452][SS][Kafka]Fix a potential ConcurrentModifi...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17752
  
**[Test build #76124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76124/testReport)**
 for PR 17752 at commit 
[`b59573f`](https://github.com/apache/spark/commit/b59573f5ae827e7cb14757297d6bf092bd7f21aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17478: [SPARK-18901][ML]:Require in LR LogisticAggregator is re...

2017-04-24 Thread wangmiao1981
Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/17478
  
@yanboliang I will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...

2017-04-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17751
  
LGTM pending Jenkins 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17666: [SPARK-20311][SQL] Support aliases for table value funct...

2017-04-24 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17666
  
@hvanhovell ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-04-24 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17711
  
@hvanhovell ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17751: [SPARK-20451] Filter out nested mapType datatypes from s...

2017-04-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17751
  
**[Test build #76123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76123/testReport)**
 for PR 17751 at commit 
[`b9dbb9c`](https://github.com/apache/spark/commit/b9dbb9c9515b2b53cc03e59935cca740a2a56f44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17222
  
Will review this PR more carefully in the next few days.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17751: [SPARK-20451] Filter out nested mapType datatypes...

2017-04-24 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/17751#discussion_r113082123
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1726,15 +1726,23 @@ class Dataset[T] private[sql](
 // It is possible that the underlying dataframe doesn't guarantee the 
ordering of rows in its
 // constituent partitions each time a split is materialized which 
could result in
 // overlapping splits. To prevent this, we explicitly sort each input 
partition to make the
-// ordering deterministic.
-// MapType cannot be sorted.
-val sorted = 
Sort(logicalPlan.output.filterNot(_.dataType.isInstanceOf[MapType])
-  .map(SortOrder(_, Ascending)), global = false, logicalPlan)
+// ordering deterministic. Note that MapTypes cannot be sorted and are 
explicitly pruned out
+// from the sort order.
+val sortOrder = logicalPlan.output
+  .filterNot(_.dataType.existsRecursively(dt => 
dt.isInstanceOf[MapType]))
+  .map(SortOrder(_, Ascending))
+val plan = if (sortOrder.nonEmpty) {
+  Sort(sortOrder, global = false, logicalPlan)
+} else {
+  // SPARK-12662: If sort order is empty, we materialize the dataset 
to guarantee determinism
--- End diff --

We actually discussed materialization in 
https://issues.apache.org/jira/browse/SPARK-12662 so that ticket should provide 
direct context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-04-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17222#discussion_r113082057
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
 case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
-case n => logError(s"UDF class with ${n} type arguments is not 
supported ")
+case n =>
+  throw new IOException(s"UDF class with ${n} type arguments 
is not supported.")
   }
 } catch {
   case e @ (_: InstantiationException | _: 
IllegalArgumentException) =>
-logError(s"Can not instantiate class ${className}, please make 
sure it has public non argument constructor")
+throw new IOException(s"Can not instantiate class 
${className}, please make sure it has public non argument constructor")
--- End diff --

Please throw an `AnalysisException`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17751: [SPARK-20451] Filter out nested mapType datatypes...

2017-04-24 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/17751#discussion_r113081974
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1726,15 +1726,23 @@ class Dataset[T] private[sql](
 // It is possible that the underlying dataframe doesn't guarantee the 
ordering of rows in its
 // constituent partitions each time a split is materialized which 
could result in
 // overlapping splits. To prevent this, we explicitly sort each input 
partition to make the
-// ordering deterministic.
-// MapType cannot be sorted.
-val sorted = 
Sort(logicalPlan.output.filterNot(_.dataType.isInstanceOf[MapType])
-  .map(SortOrder(_, Ascending)), global = false, logicalPlan)
+// ordering deterministic. Note that MapTypes cannot be sorted and are 
explicitly pruned out
+// from the sort order.
+val sortOrder = logicalPlan.output
+  .filterNot(_.dataType.existsRecursively(dt => 
dt.isInstanceOf[MapType]))
--- End diff --

nice, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJav...

2017-04-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17222#discussion_r113082001
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -475,20 +475,42 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
 case 21 => register(name, udf.asInstanceOf[UDF20[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 22 => register(name, udf.asInstanceOf[UDF21[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
 case 23 => register(name, udf.asInstanceOf[UDF22[_, _, _, _, 
_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _]], returnType)
-case n => logError(s"UDF class with ${n} type arguments is not 
supported ")
+case n =>
+  throw new IOException(s"UDF class with ${n} type arguments 
is not supported.")
   }
 } catch {
   case e @ (_: InstantiationException | _: 
IllegalArgumentException) =>
-logError(s"Can not instantiate class ${className}, please make 
sure it has public non argument constructor")
+throw new IOException(s"Can not instantiate class 
${className}, please make sure it has public non argument constructor")
 }
   }
 } catch {
-  case e: ClassNotFoundException => logError(s"Can not load class 
${className}, please make sure it is on the classpath")
+  case e: ClassNotFoundException => throw new IOException(s"Can not 
load class ${className}, please make sure it is on the classpath")
 }
 
   }
 
   /**
+   * Register a Java UDAF class using reflection, for use from pyspark
+   *
+   * @param name UDAF name
+   * @param classNamefully qualified class name of UDAF
--- End diff --

Missing @since. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17752: [SPARK-20452][SS][Kafka]Fix a potential Concurren...

2017-04-24 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/17752#discussion_r113081225
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
 ---
@@ -125,16 +125,15 @@ private[kafka010] class KafkaSourceRDD(
   context: TaskContext): Iterator[ConsumerRecord[Array[Byte], 
Array[Byte]]] = {
 val sourcePartition = thePart.asInstanceOf[KafkaSourceRDDPartition]
 val topic = sourcePartition.offsetRange.topic
-if (!reuseKafkaConsumer) {
-  // if we can't reuse CachedKafkaConsumers, let's reset the groupId 
to something unique
-  // to each task (i.e., append the task's unique partition id), 
because we will have
-  // multiple tasks (e.g., in the case of union) reading from the same 
topic partitions
-  val old = 
executorKafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-  val id = TaskContext.getPartitionId()
-  executorKafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, old + "-" + 
id)
-}
 val kafkaPartition = sourcePartition.offsetRange.partition
-val consumer = CachedKafkaConsumer.getOrCreate(topic, kafkaPartition, 
executorKafkaParams)
+val consumer =
+  if (!reuseKafkaConsumer) {
+// If we can't reuse CachedKafkaConsumers, creating a new 
CachedKafkaConsumer. As here we
+// uses `assign`, we don't need to worry about the "group.id" 
conflicts.
+new CachedKafkaConsumer(new TopicPartition(topic, kafkaPartition), 
executorKafkaParams)
--- End diff --

Would be more consistent with `getOrCreate` if you just add `create` method 
to CachedKafkaConsumer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17752: [SPARK-20452][SS][Kafka]Fix a potential Concurren...

2017-04-24 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/17752#discussion_r113080828
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
 ---
@@ -95,8 +95,10 @@ private[kafka010] class KafkaOffsetReader(
* Closes the connection to Kafka, and cleans up state.
*/
   def close(): Unit = {
-consumer.close()
-kafkaReaderThread.shutdownNow()
+runUninterruptibly {
--- End diff --

nvm. i understand that `runUninterruptibly` ensures that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >