date:20160609

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-09 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13483
  
Thank you @marmbrus @dilipbiswal and @viirya !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13558
  
**[Test build #60226 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60226/consoleFull)**
 for PR 13558 at commit 
[`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12836
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12836
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60223/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13552: [SPARK-15813] Use past tense for the cancel conta...

2016-06-09 Thread peterableda

Github user peterableda commented on a diff in the pull request:

https://github.com/apache/spark/pull/13552#discussion_r66394240
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -353,7 +353,7 @@ private[yarn] class YarnAllocator(
 
 } else if (missing < 0) {
   val numToCancel = math.min(numPendingAllocate, -missing)
-  logInfo(s"Canceling requests for $numToCancel executor containers")
+  logInfo(s"Canceled requests for $numToCancel executor container(s)")
--- End diff --

Thanks for the explanation, I see what's happening there now. It would be 
more clear if we would add the target number to the cancel message like this:
`Canceling requests for $numToCancel executor container(s) to have a new 
desired total $targetNumExecutors executor(s).`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13558
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60226/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13558
  
**[Test build #60226 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60226/consoleFull)**
 for PR 13558 at commit 
[`8d87c0f`](https://github.com/apache/spark/commit/8d87c0f2bd9140928915f835fd7d21b178422c69).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13558
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13573: [SPARK-15839] Fix Maven doc-jar generation when J...

2016-06-09 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/13573

[SPARK-15839] Fix Maven doc-jar generation when JAVA_7_HOME is set

## What changes were proposed in this pull request?

It looks like the nightly Maven snapshots broke after we set `JAVA_7_HOME` 
in the build: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/1573/.
 It seems that passing `-javabootclasspath` to scalac using scala-maven-plugin 
ends up preventing the Scala library classes from being added to scalac's 
internal class path, causing compilation errors while building doc-jars.

There might be a principled fix to this inside of the scala-maven-plugin 
itself, but for now this patch configures the build to omit the 
`-javabootclasspath` option during Maven doc-jar generation.

## How was this patch tested?

Tested manually with `build/mvn clean install -DskipTests=true` when 
`JAVA_7_HOME` was set. Also manually inspected the effective POM diff to verify 
that the final POM changes were scoped correctly: 
https://gist.github.com/JoshRosen/f889d1c236fad14fa25ac4be01654653

/cc @vanzin and @yhuai for review.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-15839

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13573


commit 523487bb0be24c698ee66d8c09430d1909eff81c
Author: Josh Rosen 
Date:   2016-06-09T07:45:50Z

Reduce scope of -javabootclasspath in scala-maven-plugin.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13568: [SPARK-12712] Fix failure in ./dev/test-dependencies whe...

2016-06-09 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13568
  
Merging to master, 2.0, and branch-1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13573
  
**[Test build #60227 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60227/consoleFull)**
 for PR 13573 at commit 
[`523487b`](https://github.com/apache/spark/commit/523487bb0be24c698ee66d8c09430d1909eff81c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13568: [SPARK-12712] Fix failure in ./dev/test-dependenc...

2016-06-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13568


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...

2016-06-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13403
  
What about just adding an explicit note on old `StatCounter.stdev`?


http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter

MLLIB `stat.Statistics` is also consistent with `Dataset`. 
```
scala> import org.apache.spark.mllib.linalg.Vectors
scala> import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, 
Statistics}
scala> 
Statistics.colStats(sc.parallelize(Seq(Vectors.dense(1.0),Vectors.dense(2.0),Vectors.dense(3.0.variance
res10: org.apache.spark.mllib.linalg.Vector = [1.0]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...

2016-06-09 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13403
  
Although we can not change old API, I think it's a good idea to add 
`popVariance` and `popStdev` clearly.

If everything in this PR is now allowed, what about just adding an explicit 
note on old `StatCounter.variance` and `StatCounter.stdev`?


http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-09 Thread NarineK

Github user NarineK commented on the issue:

https://github.com/apache/spark/pull/12836
  
Hi @sun-rui, hi @shivaram,

I've overwritten the stringArgs - I've pushed my changes in the following 
branch. I haven't created a jira yet. 

https://github.com/apache/spark/commit/939dbd5a17e63171ef2c18d5b23874daa75dbfcc


This is how the output looks like after my modification. Do you think this 
is good enough ?

== Parsed Logical Plan ==
'SerializeFromObject [if (assertnotnull(input[0, org.apache.spark.sql.Row, 
true], top level row object).isNullAt) null else 
validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true], top level row object), 0, a), IntegerType) AS 
a#13504, if (assertnotnull(input[0, org.apache.spark.sql.Row, true], top level 
row object).isNullAt) null else 
validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true], top level row object), 1, b), DoubleType) AS 
b#13505, if (assertnotnull(input[0, org.apache.spark.sql.Row, true], top level 
row object).isNullAt) null else staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, 
validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true], top level row object), 2, c), StringType), 
true) AS c#13506]
+- 'MapPartitionsInR [StructField(a,IntegerType,true), 
StructField(b,DoubleType,true), StructField(c,StringType,true)], 
[StructField(a,IntegerType,true), StructField(b,DoubleType,true), 
StructField(c,StringType,true)], obj#13500: org.apache.spark.sql.Row
   +- 'DeserializeToObject 
unresolveddeserializer(createexternalrow(getcolumnbyordinal(0, IntegerType), 
getcolumnbyordinal(1, DoubleType), getcolumnbyordinal(2, StringType).toString, 
StructField(a,IntegerType,true), StructField(b,DoubleType,true), 
StructField(c,StringType,true))), obj#13496: org.apache.spark.sql.Row
  +- LogicalRDD [a#13485, b#13486, c#13487]
...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13572
  
**[Test build #60224 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60224/consoleFull)**
 for PR 13572 at commit 
[`082c8f2`](https://github.com/apache/spark/commit/082c8f27adc49ba8cbb9b32a2fac74a16bc999f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13572
  
**[Test build #60225 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60225/consoleFull)**
 for PR 13572 at commit 
[`bd66c7e`](https://github.com/apache/spark/commit/bd66c7eb2425431f2201dfba30e8e931ffe25f63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13552: [SPARK-15813] Use past tense for the cancel conta...

2016-06-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13552#discussion_r66404015
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -353,7 +353,7 @@ private[yarn] class YarnAllocator(
 
 } else if (missing < 0) {
   val numToCancel = math.min(numPendingAllocate, -missing)
-  logInfo(s"Canceling requests for $numToCancel executor containers")
+  logInfo(s"Canceled requests for $numToCancel executor container(s)")
--- End diff --

That seems fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13572
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13572
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60224/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13572: [SPARK-15838] [SQL] CACHE TABLE AS SELECT should not rep...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60225/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13574: [SPARK-15841] REPLSuite has incorrect env set for...

2016-06-09 Thread ScrapCodes

GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/13574

[SPARK-15841]  REPLSuite has incorrect env set for a couple of tests.

## What changes were proposed in this pull request?

Description from JIRA.
In ReplSuite, for a test that can be tested well on just local should not 
really have to start a local-cluster. And similarly a test is in-sufficiently 
run if it's actually fixing a problem related to a distributed run.


## How was this patch tested?
Existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark SPARK-15841/repl-suite-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13574


commit 72b2ff61eed25c94ba6f78cd1d9906328778d0e2
Author: Prashant Sharma 
Date:   2016-06-09T09:03:49Z

[SPARK-15841]  REPLSuite has incorrect env set for a couple of tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13574: [SPARK-15841] REPLSuite has incorrect env set for a coup...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13574
  
**[Test build #60228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60228/consoleFull)**
 for PR 13574 at commit 
[`72b2ff6`](https://github.com/apache/spark/commit/72b2ff61eed25c94ba6f78cd1d9906328778d0e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13437: [SPARK-15697] [REPL] Unblock some of the useful repl com...

2016-06-09 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/13437
  
I have decided to leave out reset as blocked as we can discuss the 
semantics of having it in an another issue.

@zsxwing Please take a look !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13437: [SPARK-15697] [REPL] Unblock some of the useful repl com...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13437
  
**[Test build #60229 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60229/consoleFull)**
 for PR 13437 at commit 
[`13e1e0c`](https://github.com/apache/spark/commit/13e1e0cec39056aa91c9eec600f8df2594f12c1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13574: [SPARK-15841] REPLSuite has incorrect env set for a coup...

2016-06-09 Thread ScrapCodes

Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/13574
  
@zsxwing Please take a look !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13562: [SPARK-15821] [DOCS] Include parallel build info

2016-06-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/13562
  
OK fair enough, worth keeping in mind as I know any little helps for build 
times. For machines that aren't loaded, any parallelism could help, even `-T 
2`. For machines that are, in theory things don't get any slower since the 
machine is 100% busy, though in practice we may find some grinding. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13556: [SPARK-15818] [BUILD] Upgrade to Hadoop 2.7.2

2016-06-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/13556
  
Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13556: [SPARK-15818] [BUILD] Upgrade to Hadoop 2.7.2

2016-06-09 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13556


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13573
  
**[Test build #60227 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60227/consoleFull)**
 for PR 13573 at commit 
[`523487b`](https://github.com/apache/spark/commit/523487bb0be24c698ee66d8c09430d1909eff81c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13573
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13573: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_H...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13573
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60227/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13560: [SPARK-15823][PySpark][ML] Add @property for 'acc...

2016-06-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13560#discussion_r66415310
  
--- Diff: python/pyspark/mllib/evaluation.py ---
@@ -183,7 +183,7 @@ class MulticlassMetrics(JavaModelWrapper):
 0.66...
 >>> metrics.recall()
 0.66...
->>> metrics.accuracy()
+>>> metrics.accuracy
--- End diff --

While we're here, maybe we should remove the deprecated calls to recall() 
and precision() above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13518: [WIP][SPARK-15472][SQL] Add support for writing i...

2016-06-09 Thread lw-lin

Github user lw-lin closed the pull request at:

https://github.com/apache/spark/pull/13518


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

GitHub user lw-lin opened a pull request:

https://github.com/apache/spark/pull/13575

[SPARK-15472][SQL] Add support for writing in `csv`, `json`, `text` formats 
in Structured Streaming

## What changes were proposed in this pull request?

This patch adds support for writing in `csv`, `json`, `text` formats in 
Structured Streaming:

**1. at a high level, this patch forms the following hierarchy**(`text` as 
an example):
```

  â
 TextOutputWriterBase
 â  â
BatchTextOutputWriter   StreamingTextOutputWriter
```
```

â  â
BatchTextOutputWriterFactory   StreamingOutputWriterFactory
  â
  StreamingTextOutputWriterFactory
```
The `StreamingTextOutputWriter` and other 'streaming' output writers would 
write data **without** using an `OutputCommitter`. This was the same approach 
taken by [SPARK-14716](https://github.com/apache/spark/pull/12409).

**2. to support compression, this patch attaches an extension to the path 
assigned by `FileStreamSink`**, which is slightly different from 
[SPARK-14716](https://github.com/apache/spark/pull/12409). For example, if we 
are writing out using the `gzip` compression and `FileStreamSink` assigns path 
`${uuid}` to a text writer, then in the end the file written out will be 
`${uuid}.txt.gz` -- so that when we read the file back, we'll correctly 
interpret it as `gzip` compressed.

## How was this patch tested?

`FileStreamSinkSuite` is expanded much more to cover the added `csv`, 
`json`, `text` formats:

```scala
test(" csv - unpartitioned data - codecs: none/gzip")
test("json - unpartitioned data - codecs: none/gzip")
test("text - unpartitioned data - codecs: none/gzip")

test(" csv - partitioned data - codecs: none/gzip")
test("json - partitioned data - codecs: none/gzip")
test("text - partitioned data - codecs: none/gzip")

test(" csv - unpartitioned writing and batch reading - codecs: none/gzip")
test("json - unpartitioned writing and batch reading - codecs: none/gzip")
test("text - unpartitioned writing and batch reading - codecs: none/gzip")

test(" csv - partitioned writing and batch reading - codecs: none/gzip")
test("json - partitioned writing and batch reading - codecs: none/gzip")
test("text - partitioned writing and batch reading - codecs: none/gzip")
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lw-lin/spark add-csv-json-text-in-ss

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13575.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13575


commit c70083e9f76c20f6bf48e7ec821452f9bf63783a
Author: Liwei Lin 
Date:   2016-06-05T09:03:04Z

Add csv, json, text

commit bc28f4112ca9eca6a9f1602a891dd0388fa3185c
Author: Liwei Lin 
Date:   2016-06-09T03:31:59Z

Fix parquet extension




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-09 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r66418116
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-09 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r66418176
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -115,4 +142,58 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 }
 sb.toString()
   }
+
+  private def getIp4(ip: Int): String = {
+val buffer = ByteBuffer.allocate(4)
+buffer.putInt(ip)
+// we need to check about that because protocolbuf changes the order
+// which by mesos api is considered to be network order (big endian).
+val result = if (ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN) {
+  buffer.array.toList.reverse
+} else {
+  buffer.array.toList
+}
+result.map{byte => byte & 0xFF}.mkString(".")
+  }
+
+  private def getListFromJson(value: JValue): List[Map[String, Any]] = {
+value.values.asInstanceOf[List[Map[String, Any]]]
+  }
+
+  private def getTaskDirectory(masterUri: String, driverFwId: String, 
slaveId: String):
+  Option[String] = {
+
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13575
  
**[Test build #60230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60230/consoleFull)**
 for PR 13575 at commit 
[`bc28f41`](https://github.com/apache/spark/commit/bc28f4112ca9eca6a9f1602a891dd0388fa3185c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-09 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r66418140
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
+  val masterUri = masterInfo.map{info => 
s"http://${getIp4(info.getIp)}:${info.getPort}"}.get
+  val directory = getTaskDirectory(masterUri, id, 
state.slaveId.getValue)
+
+  if(directory.isDefined) {
--- End diff --

yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-09 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r66418166
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
+  val masterUri = masterInfo.map{info => 
s"http://${getIp4(info.getIp)}:${info.getPort}"}.get
+  val directory = getTaskDirectory(masterUri, id, 
state.slaveId.getValue)
+
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11887: [SPARK-13041][Mesos]add driver sandbox uri to the dispat...

2016-06-09 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/11887
  
@tnachen i will fix those. I am also waiting for the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11887: [SPARK-13041][Mesos]add driver sandbox uri to the...

2016-06-09 Thread skonto

Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/11887#discussion_r66418456
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala ---
@@ -68,6 +75,25 @@ private[mesos] class MesosClusterPage(parent: 
MesosClusterUI) extends WebUIPage(
 
   private def driverRow(state: MesosClusterSubmissionState): Seq[Node] = {
 val id = state.driverDescription.submissionId
+val masterInfo = parent.scheduler.getSchedulerMasterInfo()
+val schedulerFwId = parent.scheduler.getSchedulerFrameworkId()
+val sandboxCol = if (masterInfo.isDefined && schedulerFwId.isDefined) {
+
+  val masterUri = masterInfo.map{info => 
s"http://${getIp4(info.getIp)}:${info.getPort}"}.get
+  val directory = getTaskDirectory(masterUri, id, 
state.slaveId.getValue)
+
+  if(directory.isDefined) {
+val sandBoxUri = s"$masterUri" +
+  s"/#/slaves/${state.slaveId.getValue}" +
+  s"/browse?path=${directory.get}"
+  Sandbox
--- End diff --

Ok i agree i will look into this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13574: [SPARK-15841] REPLSuite has incorrect env set for a coup...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13574
  
**[Test build #60228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60228/consoleFull)**
 for PR 13574 at commit 
[`72b2ff6`](https://github.com/apache/spark/commit/72b2ff61eed25c94ba6f78cd1d9906328778d0e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13574: [SPARK-15841] REPLSuite has incorrect env set for a coup...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13574
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13574: [SPARK-15841] REPLSuite has incorrect env set for a coup...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13574
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60228/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13323
  
**[Test build #60231 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60231/consoleFull)**
 for PR 13323 at commit 
[`a6473e3`](https://github.com/apache/spark/commit/a6473e39080ae649d1ec01d4f3bef5785c113e15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13437: [SPARK-15697] [REPL] Unblock some of the useful repl com...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13437
  
**[Test build #60229 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60229/consoleFull)**
 for PR 13437 at commit 
[`13e1e0c`](https://github.com/apache/spark/commit/13e1e0cec39056aa91c9eec600f8df2594f12c1f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13437: [SPARK-15697] [REPL] Unblock some of the useful repl com...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13437
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13437: [SPARK-15697] [REPL] Unblock some of the useful repl com...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13437
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60229/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13575
  
**[Test build #60230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60230/consoleFull)**
 for PR 13575 at commit 
[`bc28f41`](https://github.com/apache/spark/commit/bc28f4112ca9eca6a9f1602a891dd0388fa3185c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13575
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60230/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13575
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-06-09 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13575
  
@marmbrus @tdas @zsxwing , would you mind taking a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66433407
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -246,7 +247,12 @@ case class DataSource(
   case s: StreamSinkProvider =>
 s.createSink(sparkSession.sqlContext, options, partitionColumns, 
outputMode)
 
-  case parquet: parquet.ParquetFileFormat =>
+  // TODO: Remove the `isInstanceOf` check when other formats have 
been ported
+  case fileFormat: FileFormat
+if (fileFormat.isInstanceOf[CSVFileFormat]
+  || fileFormat.isInstanceOf[JsonFileFormat]
--- End diff --

I think there is a better syntax to achieve this.
```scala 
case fileFormat: CSVFileFormat | JsonFileFormat | .. =>
``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13576: [SPARK-15840][SQL] Add missing options in documen...

2016-06-09 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/13576

[SPARK-15840][SQL] Add missing options in documentation, inferSchema for 
CSV and mergeSchema for Parquet

## What changes were proposed in this pull request?

This PR

1. Adds the documentations for some missing options, `inferSchema` and 
`mergeSchema` for Python and Scala.


2. Fiixes `[[DataFrame]]` to ```:class:`DataFrame` ``` so that this can be 
shown 

  - from
![2016-06-09 9 31 
16](https://cloud.githubusercontent.com/assets/6477701/15929721/8b864734-2e89-11e6-83f6-207527de4ac9.png)

  - to (with class link)
![2016-06-09 9 31 
00](https://cloud.githubusercontent.com/assets/6477701/15929717/8a03d728-2e89-11e6-8a3f-08294964db22.png)

  (Please refer the documentation, 
https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/api/python/pyspark.sql.html)

3. Moves `mergeSchema` option to `ParquetOptions` with removing unused 
options, `metastoreSchema` and `metastoreTableName`.

  They are not used anymore. They were removed in 
https://github.com/apache/spark/commit/e720dda42e806229ccfd970055c7b8a93eb447bf 
and there are no use cases as below:

  ```bash
  grep -r -e METASTORE_SCHEMA -e \"metastoreSchema\" -e 
\"metastoreTableName\" -e METASTORE_TABLE_NAME .
  ```

  ```
  
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
  private[sql] val METASTORE_SCHEMA = "metastoreSchema"
  
./sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:
  private[sql] val METASTORE_TABLE_NAME = "metastoreTableName"
  
./sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala: 
   ParquetFileFormat.METASTORE_TABLE_NAME -> TableIdentifier(
```

  It only sets `metastoreTableName` in the last case but does not use the 
table name.

## How was this patch tested?

Existing tests should cover this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-15840

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13576.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13576


commit bc42993ef6734b168782d656c853979b46e453fe
Author: hyukjinkwon 
Date:   2016-06-09T12:18:41Z

Missing options

commit c764a25330a8e6231007eb5328bef79f9063e380
Author: hyukjinkwon 
Date:   2016-06-09T12:21:29Z

Fix typoes

commit bce6c040df1c2f2d5eee18112cb0fddf8826df13
Author: hyukjinkwon 
Date:   2016-06-09T12:26:46Z

More detailed explanation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13576
  
**[Test build #60232 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60232/consoleFull)**
 for PR 13576 at commit 
[`bce6c04`](https://github.com/apache/spark/commit/bce6c040df1c2f2d5eee18112cb0fddf8826df13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #7786: [SPARK-9468][Yarn][Core] Avoid scheduling tasks on preemp...

2016-06-09 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/7786
  
@vanzin I suspect that if you get told you are being pre-empted, you aren't 
likely to get containers elsewhere âpre-emption is a sign of demand being too 
high, and your queue lower priority. But pre-requesting a new container while 
continuing the current work might be a nice trick, keeping the live executors 
busy while queueing up early requests for the replacements


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13577: [Minor][Doc] Improve SQLContext Documentation and...

2016-06-09 Thread techaddict

GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/13577

[Minor][Doc] Improve SQLContext Documentation and Fix SparkSession and 
sql.functions Documentation

## What changes were proposed in this pull request?
1. In SparkSession, add emptyDataset to dataset group and fix groupname 
mapping
3. Add documentation for createDataset for SQLContext
4. Fix the documentation of `months_between` in functions

## How was this patch tested?
Verified manually by generating api docs using `build/sbt 
spark/scalaunidoc:doc`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark minor-5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13577.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13577


commit 9844808f70cc926c86b29df141e2660086571b7d
Author: Sandeep Singh 
Date:   2016-06-09T12:22:49Z

[Minor][Doc] In SparkSession, add emptyDataset to dataset group

commit dbccfbf6492388b9f26cc4ec8365f1a3a6e111b3
Author: Sandeep Singh 
Date:   2016-06-09T12:31:44Z

Fix months_between's documentation in sql functions

commit 0f22de020f3b6935810d935481fcbe0dfc420ff5
Author: Sandeep Singh 
Date:   2016-06-09T12:41:45Z

Add Documentation to SqlContext's createDataset Methods

commit 41f696d87203098e4de34d88625c846024674a02
Author: Sandeep Singh 
Date:   2016-06-09T12:46:13Z

Add groupNames to SparkSession

commit 1e05d6d470ce35b6f38c1a5df2b3c6315439169f
Author: Sandeep Singh 
Date:   2016-06-09T12:48:34Z

nit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13577
  
**[Test build #60233 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60233/consoleFull)**
 for PR 13577 at commit 
[`1e05d6d`](https://github.com/apache/spark/commit/1e05d6d470ce35b6f38c1a5df2b3c6315439169f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13576
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13576
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60232/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13576
  
**[Test build #60232 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60232/consoleFull)**
 for PR 13576 at commit 
[`bce6c04`](https://github.com/apache/spark/commit/bce6c040df1c2f2d5eee18112cb0fddf8826df13).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13577
  
**[Test build #60234 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60234/consoleFull)**
 for PR 13577 at commit 
[`f0459bc`](https://github.com/apache/spark/commit/f0459bce2b10086cd4f418b98ae4bdc5435eeeba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13576: [SPARK-15840][SQL] Add missing options in documen...

2016-06-09 Thread ernstp

Github user ernstp commented on a diff in the pull request:

https://github.com/apache/spark/pull/13576#discussion_r66436751
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -304,13 +308,13 @@ def text(self, paths):
 
 @since(2.0)
 def csv(self, path, schema=None, sep=None, encoding=None, quote=None, 
escape=None,
-comment=None, header=None, ignoreLeadingWhiteSpace=None, 
ignoreTrailingWhiteSpace=None,
-nullValue=None, nanValue=None, positiveInf=None, 
negativeInf=None, dateFormat=None,
-maxColumns=None, maxCharsPerColumn=None, mode=None):
-"""Loads a CSV file and returns the result as a [[DataFrame]].
+comment=None, header=None, inferSchema=None, 
ignoreLeadingWhiteSpace=None,
+ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, 
positiveInf=None,
+negativeInf=None, dateFormat=None, maxColumns=None, 
maxCharsPerColumn=None, mode=None):
+"""Loads a CSV file and returns the result as a  
:class:`DataFrame`.
 
 This function goes through the input once to determine the input 
schema. To avoid going
--- End diff --

This comment is not correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13576
  
**[Test build #60235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60235/consoleFull)**
 for PR 13576 at commit 
[`eb4a77b`](https://github.com/apache/spark/commit/eb4a77b0b36bb628a5bf5b8f71225d5d89f99d68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13323
  
**[Test build #60231 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60231/consoleFull)**
 for PR 13323 at commit 
[`a6473e3`](https://github.com/apache/spark/commit/a6473e39080ae649d1ec01d4f3bef5785c113e15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60231/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13323
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13576: [SPARK-15840][SQL] Add missing options in documen...

2016-06-09 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13576#discussion_r66438370
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -304,13 +308,13 @@ def text(self, paths):
 
 @since(2.0)
 def csv(self, path, schema=None, sep=None, encoding=None, quote=None, 
escape=None,
-comment=None, header=None, ignoreLeadingWhiteSpace=None, 
ignoreTrailingWhiteSpace=None,
-nullValue=None, nanValue=None, positiveInf=None, 
negativeInf=None, dateFormat=None,
-maxColumns=None, maxCharsPerColumn=None, mode=None):
-"""Loads a CSV file and returns the result as a [[DataFrame]].
+comment=None, header=None, inferSchema=None, 
ignoreLeadingWhiteSpace=None,
+ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, 
positiveInf=None,
+negativeInf=None, dateFormat=None, maxColumns=None, 
maxCharsPerColumn=None, mode=None):
+"""Loads a CSV file and returns the result as a  
:class:`DataFrame`.
 
 This function goes through the input once to determine the input 
schema. To avoid going
--- End diff --

I see. Thanks! I will correct this tomorrow while I am in here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13543: [SPARK-15806] [Documentation] update doc for SPAR...

2016-06-09 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13543#discussion_r66438856
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -20,18 +20,24 @@ package org.apache.spark.deploy.master
 import scala.annotation.tailrec
 
 import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.util.{IntParam, Utils}
 
 /**
  * Command-line parser for the master.
  */
-private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) {
+private[master] class MasterArguments(args: Array[String], conf: 
SparkConf) extends Logging {
   var host = Utils.localHostName()
   var port = 7077
   var webUiPort = 8080
   var propertiesFile: String = null
 
   // Check for settings in environment variables
+  if (System.getenv("SPARK_MASTER_IP") != null) {
+logWarning("SPARK_MASTER_IP is deprecated, please use 
SPARK_MASTER_HOST")
+host = System.getenv("SPARK_MASTER_IP")
+  }
+
   if (System.getenv("SPARK_MASTER_HOST") != null) {
--- End diff --

This looks good to me. Now, one final thought. We are sort of moving away 
from env variables anyway. I think we could now remove the handling of 
`SPARK_MASTER_HOST` here since it's redundant with the scripts in `sbin/`.

And then beyond that, I wonder if we should just handle the deprecation 
warning in the same place, in the scripts, rather than code. What do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13576: [SPARK-15840][SQL] Add missing options in documen...

2016-06-09 Thread ernstp

Github user ernstp commented on a diff in the pull request:

https://github.com/apache/spark/pull/13576#discussion_r66439085
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -304,13 +308,13 @@ def text(self, paths):
 
 @since(2.0)
 def csv(self, path, schema=None, sep=None, encoding=None, quote=None, 
escape=None,
-comment=None, header=None, ignoreLeadingWhiteSpace=None, 
ignoreTrailingWhiteSpace=None,
-nullValue=None, nanValue=None, positiveInf=None, 
negativeInf=None, dateFormat=None,
-maxColumns=None, maxCharsPerColumn=None, mode=None):
-"""Loads a CSV file and returns the result as a [[DataFrame]].
+comment=None, header=None, inferSchema=None, 
ignoreLeadingWhiteSpace=None,
+ignoreTrailingWhiteSpace=None, nullValue=None, nanValue=None, 
positiveInf=None,
+negativeInf=None, dateFormat=None, maxColumns=None, 
maxCharsPerColumn=None, mode=None):
+"""Loads a CSV file and returns the result as a  
:class:`DataFrame`.
 
 This function goes through the input once to determine the input 
schema. To avoid going
--- End diff --

Yeah it implies that inferSchema is always enabled. I would just remove it 
and let the inferSchema parameter documentation speak for itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13323: [SPARK-15555] [Mesos] Driver with --supervise option can...

2016-06-09 Thread devaraj-kavali

Github user devaraj-kavali commented on the issue:

https://github.com/apache/spark/pull/13323
  
@tnachen Thanks for your review, I have added a test for this, can you have 
a look into it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13531: [SPARK-15654] [SQL] fix non-splitable files for t...

2016-06-09 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13531#discussion_r66451629
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala
 ---
@@ -298,6 +309,28 @@ trait FileFormat {
 }
 
 /**
+ * The base class file format that is based on text file.
+ */
+abstract class TextBasedFileFormat extends FileFormat {
+  private var codecFactory: CompressionCodecFactory = null
+  override def isSplitable(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  path: Path): Boolean = {
+if (codecFactory == null) {
+  synchronized {
+if (codecFactory == null) {
+  codecFactory = new CompressionCodecFactory(
--- End diff --

sorry. It seems we can use `sparkSession.sessionState.newHadoopConf()` 
instread of `sparkSession.sessionState.newHadoopConfWithOptions(options)` (I 
checked  the `FileSourceStrategySuite` test passed without passing `options` in 
`CompressionCodecFactory`).
So, we need `options` in the arguments of `isSplitable`?

```
if (codecFactory == null) {
  codecFactory = new 
CompressionCodecFactory(sparkSession.sessionState.newHadoopConf())
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60233/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13577
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13577
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60234/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13577
  
**[Test build #60234 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60234/consoleFull)**
 for PR 13577 at commit 
[`f0459bc`](https://github.com/apache/spark/commit/f0459bce2b10086cd4f418b98ae4bdc5435eeeba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13577
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13577: [Minor][Doc] Improve SQLContext Documentation and Fix Sp...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13577
  
**[Test build #60233 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60233/consoleFull)**
 for PR 13577 at commit 
[`1e05d6d`](https://github.com/apache/spark/commit/1e05d6d470ce35b6f38c1a5df2b3c6315439169f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13557: [SPARK-15819][PYSPARK] Add KMeanSummary in KMeans of PyS...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13557
  
**[Test build #60236 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60236/consoleFull)**
 for PR 13557 at commit 
[`d2fd75a`](https://github.com/apache/spark/commit/d2fd75a631a01e2c0625d733066d6f2e824290c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13576
  
**[Test build #60235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60235/consoleFull)**
 for PR 13576 at commit 
[`eb4a77b`](https://github.com/apache/spark/commit/eb4a77b0b36bb628a5bf5b8f71225d5d89f99d68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13576
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13576: [SPARK-15840][SQL] Add missing options in documentation,...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13576
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60235/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13557: [SPARK-15819][PYSPARK] Add KMeanSummary in KMeans of PyS...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13557
  
**[Test build #60236 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60236/consoleFull)**
 for PR 13557 at commit 
[`d2fd75a`](https://github.com/apache/spark/commit/d2fd75a631a01e2c0625d733066d6f2e824290c5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13557: [SPARK-15819][PYSPARK] Add KMeanSummary in KMeans of PyS...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13557
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13557: [SPARK-15819][PYSPARK] Add KMeanSummary in KMeans of PyS...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60236/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66465201
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -246,7 +247,12 @@ case class DataSource(
   case s: StreamSinkProvider =>
 s.createSink(sparkSession.sqlContext, options, partitionColumns, 
outputMode)
 
-  case parquet: parquet.ParquetFileFormat =>
+  // TODO: Remove the `isInstanceOf` check when other formats have 
been ported
+  case fileFormat: FileFormat
+if (fileFormat.isInstanceOf[CSVFileFormat]
+  || fileFormat.isInstanceOf[JsonFileFormat]
--- End diff --

@ScrapCodes , thanks! But I'm afraid that syntax would raise a compilation 
error:
```
[ERROR] .../datasources/DataSource.scala:250: illegal variable in pattern 
alternative
[ERROR]   case fileFormat: CSVFileFormat | JsonFileFormat | 
ParquetFileFormat | TextFileFormat =>
[ERROR]^
```
A work-around can be the following, but I found it somewhat less intuitive:
```scala
case fileFormat@(_: CSVFileFormat |
 _: JsonFileFormat |
 _: ParquetFileFormat |
 _: TextFileFormat) =>
  // other code
  ... fileFormat.asInstanceOf[FileFormat] ...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13578: [SPARK-15837][ML][PySpark]Word2vec python add max...

2016-06-09 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/13578

[SPARK-15837][ML][PySpark]Word2vec python add maxsentence parameter

## What changes were proposed in this pull request?

Word2vec python add maxsentence parameter.

## How was this patch tested?

Existing test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark 
word2vec_python_add_maxsentence

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13578.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13578


commit 57384a0a9ffbfe9befe44cb7a9ae226eff603c94
Author: WeichenXu 
Date:   2016-06-08T21:28:41Z

word2vec_python_add_maxsentence_param




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/13578
  
Seems OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66466310
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -143,39 +146,99 @@ object CSVRelation extends Logging {
   if (nonEmptyLines.hasNext) nonEmptyLines.drop(1)
 }
   }
+
+  /**
+   * Setup writing configurations into the given [[Configuration]], and 
then return the
+   * wrapped [[CSVOptions]].
+   * Both continuous-queries writing process and non-continuous-queries 
writing process will
+   * call this function.
+   */
+  private[csv] def prepareConfForWriting(
+  conf: Configuration,
+  options: Map[String, String]): CSVOptions = {
+val csvOptions = new CSVOptions(options)
+csvOptions.compressionCodec.foreach { codec =>
+  CompressionCodecs.setCodecConfiguration(conf, codec)
+}
+csvOptions
+  }
--- End diff --

These mostly are moved from `CSVFileFormat.prepareWrite()` to here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13578
  
**[Test build #60237 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60237/consoleFull)**
 for PR 13578 at commit 
[`57384a0`](https://github.com/apache/spark/commit/57384a0a9ffbfe9befe44cb7a9ae226eff603c94).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66466502
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
 ---
@@ -143,39 +146,99 @@ object CSVRelation extends Logging {
   if (nonEmptyLines.hasNext) nonEmptyLines.drop(1)
 }
   }
+
+  /**
+   * Setup writing configurations into the given [[Configuration]], and 
then return the
+   * wrapped [[CSVOptions]].
+   * Both continuous-queries writing process and non-continuous-queries 
writing process will
+   * call this function.
+   */
+  private[csv] def prepareConfForWriting(
+  conf: Configuration,
+  options: Map[String, String]): CSVOptions = {
+val csvOptions = new CSVOptions(options)
+csvOptions.compressionCodec.foreach { codec =>
+  CompressionCodecs.setCodecConfiguration(conf, codec)
+}
+csvOptions
+  }
 }
 
-private[sql] class CSVOutputWriterFactory(params: CSVOptions) extends 
OutputWriterFactory {
+/**
+ * A factory for generating OutputWriters for writing csv files. This is 
implemented different
+ * from the 'batch' CSVOutputWriter as this does not use any 
[[OutputCommitter]]. It simply
+ * writes the data to the path used to generate the output writer. Callers 
of this factory
+ * has to ensure which files are to be considered as committed.
+ */
+private[csv] class StreamingCSVOutputWriterFactory(
+  sqlConf: SQLConf,
+  dataSchema: StructType,
+  hadoopConf: Configuration,
+  options: Map[String, String]) extends StreamingOutputWriterFactory {
+
+  private val (csvOptions: CSVOptions, serializableConf: 
SerializableConfiguration) = {
+val conf = Job.getInstance(hadoopConf).getConfiguration
+val csvOptions = CSVRelation.prepareConfForWriting(conf, options)
+(csvOptions, new SerializableConfiguration(conf))
+  }
+
+  /**
+   * Returns a [[OutputWriter]] that writes data to the give path without 
using an
+   * [[OutputCommitter]].
+   */
+  override private[sql] def newWriter(path: String): OutputWriter = {
+val hadoopTaskAttempId = new TaskAttemptID(new TaskID(new JobID, 
TaskType.MAP, 0), 0)
+val hadoopAttemptContext =
+  new TaskAttemptContextImpl(serializableConf.value, 
hadoopTaskAttempId)
+// Returns a 'streaming' CSVOutputWriter
+new CSVOutputWriterBase(dataSchema, hadoopAttemptContext, csvOptions) {
+  override private[csv] val recordWriter: RecordWriter[NullWritable, 
Text] =
+createNoCommitterTextRecordWriter(
+  path,
+  hadoopAttemptContext,
+  (c: TaskAttemptContext, ext: String) => { new 
Path(s"$path.csv$ext") })
+}
+  }
+}
+
+private[csv] class BatchCSVOutputWriterFactory(params: CSVOptions) extends 
OutputWriterFactory {
   override def newInstance(
   path: String,
   bucketId: Option[Int],
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 if (bucketId.isDefined) sys.error("csv doesn't support bucketing")
-new CsvOutputWriter(path, dataSchema, context, params)
+// Returns a 'batch' CSVOutputWriter
+new CSVOutputWriterBase(dataSchema, context, params) {
+  private[csv] override val recordWriter: RecordWriter[NullWritable, 
Text] = {
+new TextOutputFormat[NullWritable, Text]() {
+  override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
+val conf = context.getConfiguration
+val uniqueWriteJobId = 
conf.get(CreateDataSourceTableUtils.DATASOURCE_WRITEJOBUUID)
+val taskAttemptId = context.getTaskAttemptID
+val split = taskAttemptId.getTaskID.getId
+new Path(path, 
f"part-r-$split%05d-$uniqueWriteJobId.csv$extension")
+  }
+}.getRecordWriter(context)
+  }
+}
   }
 }
 
-private[sql] class CsvOutputWriter(
-path: String,
+/**
+ * Base CSVOutputWriter class for 'batch' CSVOutputWriter and 'streaming' 
CSVOutputWriter. The
+ * writing logic to a single file resides in this base class.
+ */
+private[csv] abstract class CSVOutputWriterBase(
--- End diff --

This `CSVOutputWriterBase` is basically the original `CsvOutputWriter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-ma

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66466672
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala
 ---
@@ -146,16 +173,53 @@ class JsonFileFormat extends FileFormat with 
DataSourceRegister {
 }
   }
 
-  override def toString: String = "JSON"
-
   override def hashCode(): Int = getClass.hashCode()
 
   override def equals(other: Any): Boolean = 
other.isInstanceOf[JsonFileFormat]
 }
 
-private[json] class JsonOutputWriter(
-path: String,
-bucketId: Option[Int],
+/**
+ * A factory for generating [[OutputWriter]]s for writing json files. This 
is implemented different
+ * from the 'batch' JsonOutputWriter as this does not use any 
[[OutputCommitter]]. It simply
+ * writes the data to the path used to generate the output writer. Callers 
of this factory
+ * has to ensure which files are to be considered as committed.
+ */
+private[json] class StreamingJsonOutputWriterFactory(
+sqlConf: SQLConf,
+dataSchema: StructType,
+hadoopConf: Configuration,
+options: Map[String, String]) extends StreamingOutputWriterFactory {
+
+  private val serializableConf = {
+val conf = Job.getInstance(hadoopConf).getConfiguration
+JsonFileFormat.prepareConfForWriting(conf, options)
+new SerializableConfiguration(conf)
+  }
+
+  /**
+   * Returns a [[OutputWriter]] that writes data to the give path without 
using an
+   * [[OutputCommitter]].
+   */
+  override private[sql] def newWriter(path: String): OutputWriter = {
+val hadoopTaskAttempId = new TaskAttemptID(new TaskID(new JobID, 
TaskType.MAP, 0), 0)
+val hadoopAttemptContext =
+  new TaskAttemptContextImpl(serializableConf.value, 
hadoopTaskAttempId)
+// Returns a 'streaming' JsonOutputWriter
+new JsonOutputWriterBase(dataSchema, hadoopAttemptContext) {
+  override private[json] val recordWriter: RecordWriter[NullWritable, 
Text] =
+createNoCommitterTextRecordWriter(
+  path,
+  hadoopAttemptContext,
+  (c: TaskAttemptContext, ext: String) => { new 
Path(s"$path.json$ext") })
+}
+  }
+}
+
+/**
+ * Base JsonOutputWriter class for 'batch' JsonOutputWriter and 
'streaming' JsonOutputWriter. The
+ * writing logic to a single file resides in this base class.
+ */
+private[json] abstract class JsonOutputWriterBase(
--- End diff --

This `JsonOutputWriterBase` is basically the original `JsonOutputWriter`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/13578
  
@srowen Hi srowen, I have another similar PR #13558  which past test on my 
machine, but the official test fail. It seems to be the test server's problem, 
can you help to check it ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66467095
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -488,7 +488,12 @@ private[sql] class ParquetOutputWriterFactory(
 // Custom ParquetOutputFormat that disable use of committer and writes 
to the given path
 val outputFormat = new ParquetOutputFormat[InternalRow]() {
   override def getOutputCommitter(c: TaskAttemptContext): 
OutputCommitter = { null }
-  override def getDefaultWorkFile(c: TaskAttemptContext, ext: String): 
Path = { new Path(path) }
+  override def getDefaultWorkFile(c: TaskAttemptContext, ext: String): 
Path = {
+// It has the `.parquet` extension at the end because 
(de)compression tools
+// such as gunzip would not be able to decompress this as the 
compression
+// is not applied on this whole file but on each "page" in Parquet 
format.
+new Path(s"$path$ext")
+  }
--- End diff --

This patch appends an extension to the assigned `path`; new `path` would be 
like `some_path.gz.parquet`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13575: [SPARK-15472][SQL] Add support for writing in `cs...

2016-06-09 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13575#discussion_r66467191
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala
 ---
@@ -120,24 +109,31 @@ class TextFileFormat extends FileFormat with 
DataSourceRegister {
   }
 }
   }
+
+  override def buildWriter(
+  sqlContext: SQLContext,
+  dataSchema: StructType,
+  options: Map[String, String]): OutputWriterFactory = {
+verifySchema(dataSchema)
+new StreamingTextOutputWriterFactory(
+  sqlContext.conf,
+  dataSchema,
+  sqlContext.sparkContext.hadoopConfiguration,
+  options)
+  }
 }
 
-class TextOutputWriter(path: String, dataSchema: StructType, context: 
TaskAttemptContext)
+/**
+ * Base TextOutputWriter class for 'batch' TextOutputWriter and 
'streaming' TextOutputWriter. The
+ * writing logic to a single file resides in this base class.
+ */
+private[text] abstract class TextOutputWriterBase(context: 
TaskAttemptContext)
--- End diff --

This `TextOutputWriterBase` is basically the original `TextOutputWriter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13578
  
**[Test build #60237 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60237/consoleFull)**
 for PR 13578 at commit 
[`57384a0`](https://github.com/apache/spark/commit/57384a0a9ffbfe9befe44cb7a9ae226eff603c94).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13578
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60237/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13578
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 437 matches

Mail list logo