[GitHub] spark pull request #14645: [MINOR] [DOC] Correct code snippet results in qui...

2016-08-14 Thread linbojin
GitHub user linbojin opened a pull request:

https://github.com/apache/spark/pull/14645

[MINOR] [DOC] Correct code snippet results in quick start documentation

## What changes were proposed in this pull request?

As README.md file is updated over time. Some code snippet outputs are not 
correct based on new README.md file. For example:
```
scala> textFile.count()
res0: Long = 126
```
should be
```
scala> textFile.count()
res0: Long = 99
```
This pr is to correct these outputs so that new spark learners have a 
correct reference.
Also, fixed a samll bug, inside current documentation, the outputs of 
linesWithSpark.count() without and with cache are different (one is 15 and the 
other is 19)
```
scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at 
filter at :27

scala> textFile.filter(line => line.contains("Spark")).count() // How many 
lines contain "Spark"?
res3: Long = 15

...

scala> linesWithSpark.cache()
res7: linesWithSpark.type = MapPartitionsRDD[2] at filter at :27

scala> linesWithSpark.count()
res8: Long = 19
```

## How was this patch tested?

manual test:  run `$ SKIP_API=1 jekyll serve --watch`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/linbojin/spark quick-start-documentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14645.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14645


commit f093e3a44a6447f619edd987bf30ee838899c578
Author: linbojin 
Date:   2016-08-15T06:26:39Z

correct result numbers inside quick start docs based on new README.md file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun ...

2016-08-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14520#discussion_r74725756
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -932,11 +935,15 @@ class BinaryLogisticRegressionSummary 
private[classification] (
  * Two LogisticAggregator can be merged together to have a summary of loss 
and gradient of
  * the corresponding joint dataset.
  *
+ * @param bcCoeffs The broadcast coefficients corresponding to the 
features.
--- End diff --

Call it `bcCoefficients` for consistency with other changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14613
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63775/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14613
  
**[Test build #63775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63775/consoleFull)**
 for PR 14613 at commit 
[`61b7a48`](https://github.com/apache/spark/commit/61b7a48178741ed69afb110972ebf35ca32fb31f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14613
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [MESOS] Enable GPU support with Mesos

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14644
  
**[Test build #63777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63777/consoleFull)**
 for PR 14644 at commit 
[`ef59b34`](https://github.com/apache/spark/commit/ef59b349988c740c205450459ec6f300eae68dbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14644: [MESOS] Enable GPU support with Mesos

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14644
  
**[Test build #63776 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63776/consoleFull)**
 for PR 14644 at commit 
[`4edc6db`](https://github.com/apache/spark/commit/4edc6db5329a19f49af9303897ee0a2f1fc91a14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14644: Enable GPU support with Mesos

2016-08-14 Thread tnachen
GitHub user tnachen opened a pull request:

https://github.com/apache/spark/pull/14644

Enable GPU support with Mesos

## What changes were proposed in this pull request?

Enable GPU resources to be used when running coarse grain mode with Mesos.


## How was this patch tested?

Manual test with GPU.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tnachen/spark gpu_mesos

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14644.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14644


commit 163cfa49b2116612f981aa8158054e006d40b52d
Author: Timothy Chen 
Date:   2016-05-23T23:23:51Z

Enable GPU with Mesos on Spark

commit 4edc6db5329a19f49af9303897ee0a2f1fc91a14
Author: Timothy Chen 
Date:   2016-08-15T06:39:05Z

Enable GPU support with Mesos




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14643: [SPARK-17057][ML] ProbabilisticClassifierModels' predict...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14643
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63774/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14634: [SPARK-17051][SQL] we should use hadoopConf in InsertInt...

2016-08-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14634
  
Sorry, let me rephrase the potential issue. 

- `insertInto` API forces users to set `hive.exec.dynamic.partition` to 
`true` and `hive.exec.dynamic.partition.mode` to `nonstrict`. This might not be 
convinient. The default value of `hive.exec.dynamic.partition.mode` is 
`strict`. 

- If we always read the setting from `hadoopConf`, does that mean users are 
unable to control these settings for different queries? Is that possible users 
can change the setting values in `hadoopConf` at runtime?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14643: [SPARK-17057][ML] ProbabilisticClassifierModels' predict...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14643
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14643: [SPARK-17057][ML] ProbabilisticClassifierModels' predict...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14643
  
**[Test build #63774 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63774/consoleFull)**
 for PR 14643 at commit 
[`4ec606d`](https://github.com/apache/spark/commit/4ec606df4e4bb74c9edf79a7629d600cbdbaed91).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14634: [SPARK-17051][SQL] we should use hadoopConf in InsertInt...

2016-08-14 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14634
  
If users want to use the `DataFrameWriter`'s `insertInto` API for 
partitioned Hive table, they have to set `hive.exec.dynamic.partition` to 
`true` and `hive.exec.dynamic.partition.mode` to `nonstrict`. Otherwise, it 
does not work. Users are unable to specify the partition values in the 
`insertInto` APIs. See [the 
code](https://github.com/apache/spark/blob/8c8acdec9365136cba13060ce36c22b28e29b59b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L258).
 

Let me use a test case to show the issue. 
```Scala
withTempDir { tmpDir =>
  val basePath = tmpDir.getCanonicalPath
  val externalTab = "extTable_with_partitions"
  withTable(externalTab) {
assert(tmpDir.listFiles.isEmpty)
sql(
  s"""
 |CREATE EXTERNAL TABLE $externalTab (key INT, value STRING)
 |PARTITIONED BY (ds STRING, hr STRING)
 |stored as Parquet
 |LOCATION '$basePath'
   """.stripMargin)

for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) {
  sql(
s"""
   |INSERT OVERWRITE TABLE $externalTab
   |partition (ds='$ds',hr='$hr')
   |SELECT 1, 'a'
 """.stripMargin)
}
withSQLConf("hive.exec.dynamic.partition" -> "true",
"hive.exec.dynamic.partition.mode" -> "nonstrict") {
  Seq((1, "2", "2008-04-09", "12")).toDF("key", "value", "ds", 
"hr").write
.insertInto(externalTab)
}
  }
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14613: [SPARK-16883][SparkR]:SQL decimal type is not properly c...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14613
  
**[Test build #63775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63775/consoleFull)**
 for PR 14613 at commit 
[`61b7a48`](https://github.com/apache/spark/commit/61b7a48178741ed69afb110972ebf35ca32fb31f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...

2016-08-14 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/14641
  
cc @felixcheung @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14643: [SPARK-17057][ML] ProbabilisticClassifierModels' predict...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14643
  
**[Test build #63774 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63774/consoleFull)**
 for PR 14643 at commit 
[`4ec606d`](https://github.com/apache/spark/commit/4ec606df4e4bb74c9edf79a7629d600cbdbaed91).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14643: [SPARK-17057][ML] ProbabilisticClassifierModels' ...

2016-08-14 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/14643

[SPARK-17057][ML] ProbabilisticClassifierModels' prediction more reasonable 
with multi zero thresholds

## What changes were proposed in this pull request?

Change the behavior of `transform` in `ProbabilisticClassifierModel` while 
there are more than one thresholds set zero. 


## How was this patch tested?

unit tests and manual tests




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark fix_proba_with_threshoulds

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14643


commit 4ec606df4e4bb74c9edf79a7629d600cbdbaed91
Author: Zheng RuiFeng 
Date:   2016-08-03T06:54:56Z

deal with zero thresholds




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14426
  
Hi, @cloud-fan .
Could you review this PR about HINT when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14642: [SPARK-17056][Core] Fix a wrong assert regarding unroll ...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14642
  
**[Test build #63773 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63773/consoleFull)**
 for PR 14642 at commit 
[`cf4387a`](https://github.com/apache/spark/commit/cf4387ada2117d8216d886b579af2d2a1d67e835).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14642: [SPARK-17056][Core] Fix a wrong assert regarding unroll ...

2016-08-14 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14642
  
cc @cloud-fan @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14642: [SPARK-17056][Core] Fix a wrong assert regarding ...

2016-08-14 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/14642

[SPARK-17056][Core] Fix a wrong assert regarding unroll memory in 
MemoryStore

## What changes were proposed in this pull request?

There is an assert in MemoryStore's putIteratorAsValues method which is 
used to check if unroll memory is not released too much. This assert looks 
wrong.

## How was this patch tested?

Jenkins tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 fix-unroll-memory

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14642


commit cf4387ada2117d8216d886b579af2d2a1d67e835
Author: Liang-Chi Hsieh 
Date:   2016-08-15T05:14:05Z

Fix a wrong assert.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14638
  
I see. Thank you for guidance. Initially, I thought this table property is 
orthogonally skipping the starting rows of each file for all formats. And, I 
assumed that users will not use this property improperly.

But, what you mean is it has no meaning for columnar and vectorized 
formats. So, if a user give this table property for this Parquet or ORC, Spark 
need to ignore this.

If then, definitely, we should find some places for TEXT format only. BTW, 
do you have some proper location in your mind instead of the current 
`hadoopRDD.mapPartitionsWithIndex`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14597: [SPARK-17017][MLLIB] add a chiSquare Selector bas...

2016-08-14 Thread mpjlu
Github user mpjlu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14597#discussion_r74721128
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala ---
@@ -197,3 +197,28 @@ class ChiSqSelector @Since("1.3.0") (
 new ChiSqSelectorModel(indices)
   }
 }
+
+/**
+ * Creates a ChiSquared feature selector by False Positive Rate (FPR) test.
+ * @param alpha the highest p-value for features to be kept
+ */
+@Since("2.1.0")
+class ChiSqSelectorByFpr @Since("2.1.0") (
+  @Since("2.1.0") val alpha: Double) extends Serializable {
+
+  /**
+   * Returns a ChiSquared feature selector by FPR.
+   *
+   * @param data an `RDD[LabeledPoint]` containing the labeled dataset 
with categorical features.
+   * Real-valued features will be treated as categorical for 
each distinct value.
+   * Apply feature discretizer before using this function.
+   */
+  @Since("2.1.0")
+  def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = {
+val indices = Statistics.chiSqTest(data)
+  .zipWithIndex.filter { case (res, _) => res.pValue < alpha }
--- End diff --

Hi @srowen ,  if we configure the model instance with different parameter  
to perform different types of selection, will that be inconsistent with the 
MLlib Estimator/Model style, and cause the user confused?  
If it is not a problem, I will submit a new PR to configure the model to 
perform different selection.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-14 Thread mpjlu
Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/14597
  
Hi  @avulanov .  In general, FPR feature selection should not modify the 
code of existing ChiSqSelector, as we have implemented in this PR.  But if we 
need to reuse the ChiSqTestResult (Statistics.chiSqTest(data)),  it is better 
to modify the code of ChiSqSelector.  

In Scikit-learn, for each SelectKBest, SelectFpr, SelectPercentile and so 
on, create an object for it, as we implemented in this PR. The good point of 
this method is it is consistent across the LIB, all use the same 
Estimator/Model style.  The disadvantage is it cannot reuse the results of 
score function. @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63769/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14452
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #63769 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63769/consoleFull)**
 for PR 14452 at commit 
[`0d5eea7`](https://github.com/apache/spark/commit/0d5eea748b333f4a5ba2680b8c9ca47e6f3da53e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RandomForestClassificationModel(TreeEnsembleModel, 
JavaMLWritable, JavaMLReadable):`
  * `class GBTClassificationModel(TreeEnsembleModel, JavaMLWritable, 
JavaMLReadable):`
  * `class TreeEnsembleModel(JavaModel):`
  * `class RandomForestRegressionModel(TreeEnsembleModel, JavaMLWritable, 
JavaMLReadable):`
  * `class GBTRegressionModel(TreeEnsembleModel, JavaMLWritable, 
JavaMLReadable):`
  * `case class With(child: LogicalPlan, cteRelations: Seq[(String, 
SubqueryAlias)]) extends UnaryNode `
  * `class JDBCOptions(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14392
  
**[Test build #63771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63771/consoleFull)**
 for PR 14392 at commit 
[`90fcb79`](https://github.com/apache/spark/commit/90fcb794e9a08777f6373d5782474003fe9026da).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14392
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63771/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14392
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14641
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14641
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63772/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14641
  
**[Test build #63772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63772/consoleFull)**
 for PR 14641 at commit 
[`926739a`](https://github.com/apache/spark/commit/926739a9197cb0fca3df2b85350ef890299c91c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14638
  
Does this option make sense in Parquet or ORC?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14557
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14557
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63768/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14557
  
**[Test build #63768 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63768/consoleFull)**
 for PR 14557 at commit 
[`fbe31eb`](https://github.com/apache/spark/commit/fbe31eb63fa34a19c1b7c44e4ec40fb31835a3fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14641: [Minor] [SparkR] spark.glm weightCol should in the signa...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14641
  
**[Test build #63772 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63772/consoleFull)**
 for PR 14641 at commit 
[`926739a`](https://github.com/apache/spark/commit/926739a9197cb0fca3df2b85350ef890299c91c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14634: [SPARK-17051][SQL] we should use hadoopConf in InsertInt...

2016-08-14 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14634
  
> hive.exec.dynamic.partition also impacts our regular writing paths

I think it's hive only conf? Normal data source relation should not read 
this conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14641: [Minor] [SparkR] spark.glm weightCol should in th...

2016-08-14 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/14641

[Minor] [SparkR] spark.glm weightCol should in the signature.

## What changes were proposed in this pull request?
Fix the issue that ```spark.glm``` ```weightCol``` should in the signature.

## How was this patch tested?
Existing tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark weightCol

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14641


commit 926739a9197cb0fca3df2b85350ef890299c91c1
Author: Yanbo Liang 
Date:   2016-08-15T03:57:20Z

SparkR glm weightCol should in the signature.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-14 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/14392
  
@felixcheung @shivaram @junyangq I changed the name to 
```spark.gaussianMixture``` following other SparkR ML wrappers such as 
```spark.naiveBayes```. The Spark implementation is very similar with R 
```mvnormalmixEM```, but I agree it's less descriptive as @shivaram said. Any 
other comments, please feel free to let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14392: [SPARK-16446] [SparkR] [ML] Gaussian Mixture Model wrapp...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14392
  
**[Test build #63771 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63771/consoleFull)**
 for PR 14392 at commit 
[`90fcb79`](https://github.com/apache/spark/commit/90fcb794e9a08777f6373d5782474003fe9026da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14568
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14568
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63766/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14568
  
**[Test build #63766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63766/consoleFull)**
 for PR 14568 at commit 
[`57ab6fa`](https://github.com/apache/spark/commit/57ab6fa395fce93d2df6b0e043568cab92a68366).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14633: [Trivial] [ML] Fix LogisticRegression typo in error mess...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63770/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14633: [Trivial] [ML] Fix LogisticRegression typo in error mess...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14633
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14633: [Trivial] [ML] Fix LogisticRegression typo in error mess...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14633
  
**[Test build #63770 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63770/consoleFull)**
 for PR 14633 at commit 
[`1700b51`](https://github.com/apache/spark/commit/1700b513349e25898a946c6a2542304805c62cbf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14481: [WIP][SPARK-16844][SQL] Generate code for sort based agg...

2016-08-14 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/14481
  
retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-14 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/14557
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14640: [SPARK-17055] add labelKFold to CrossValidator

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14640
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14640: [SPARK-17055] add labelKFold to CrossValidator

2016-08-14 Thread VinceShieh
GitHub user VinceShieh opened a pull request:

https://github.com/apache/spark/pull/14640

[SPARK-17055] add labelKFold to CrossValidator

## What changes were proposed in this pull request?

This patch improves the CrossValidator by adding a new training/validation 
split method -labelKFold, which splits data based on data labels and makes sure 
that the same label is not in both testing and training sets. 

This is necessary, for example when data is gathered from different 
subjects by testing and training on different subjects, i.e., learning cat 
specific features, and it can avoid over-fitting.

## How was this patch tested?

Unit test was added to MLUtilsSuite.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VinceShieh/spark labelKFold2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14640


commit cbb78bce4022bfc46f570264de4087a01a84b281
Author: Vincent Xie 
Date:   2016-08-08T13:28:08Z

Add labelKFold to cross validation

Currently, only KFold is supported in cross validation. But in cases
when data is gathered from different subjects and we want to avoid
over-fitting. labelKFold is a variation of k-fold which ensures that
the same label is not in both testing and training sets.

Unit test -'test labelKFold', is also added in MLUtilsSuite

Signed-off-by: Vincent Xie 
Signed-off-by: VinceShieh 

commit 461d696aa6aa41818be31dc1628e3282e560854a
Author: VinceShieh 
Date:   2016-08-15T01:53:51Z

Merge remote-tracking branch 'origin/master' into labelKFold2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14633: [Trivial] [ML] Fix LogisticRegression typo in error mess...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14633
  
**[Test build #63770 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63770/consoleFull)**
 for PR 14633 at commit 
[`1700b51`](https://github.com/apache/spark/commit/1700b513349e25898a946c6a2542304805c62cbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #63769 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63769/consoleFull)**
 for PR 14452 at commit 
[`0d5eea7`](https://github.com/apache/spark/commit/0d5eea748b333f4a5ba2680b8c9ca47e6f3da53e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14557
  
**[Test build #63768 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63768/consoleFull)**
 for PR 14557 at commit 
[`fbe31eb`](https://github.com/apache/spark/commit/fbe31eb63fa34a19c1b7c44e4ec40fb31835a3fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-18054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14639
  
**[Test build #63767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63767/consoleFull)**
 for PR 14639 at commit 
[`8007298`](https://github.com/apache/spark/commit/80072988e1ec7a0fb5aeabaf8761f08b421d0abd).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-18054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14639
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63767/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-18054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14639
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-18054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14639
  
**[Test build #63767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63767/consoleFull)**
 for PR 14639 at commit 
[`8007298`](https://github.com/apache/spark/commit/80072988e1ec7a0fb5aeabaf8761f08b421d0abd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14639: [SPARK-18054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-14 Thread zjffdu
Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/14639
  
Although I fix it by using the correct cache dir for mac OS, I am confused 
why we need to download sparkR. I don't remember it is needed in spark 1.x. Is 
this expected behavior ? @shivaram @junyangq 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14639: [SPARK-18054][SPARKR] SparkR can not run in yarn-...

2016-08-14 Thread zjffdu
GitHub user zjffdu opened a pull request:

https://github.com/apache/spark/pull/14639

[SPARK-18054][SPARKR] SparkR can not run in yarn-cluster mode on mac os

## What changes were proposed in this pull request?

Change the cache dir in mac os. 


## How was this patch tested?

Tested manually. Run simple R script on yarn-cluster mode. 





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-17054

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14639


commit 80072988e1ec7a0fb5aeabaf8761f08b421d0abd
Author: Jeff Zhang 
Date:   2016-08-15T02:01:41Z

[SPARK-18054][SPARKR] SparkR can not run in yarn-cluster mode on mac os




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...

2016-08-14 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14557#discussion_r74714601
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -798,6 +798,19 @@ private[spark] class TaskSetManager(
   }
 }
 maybeFinishTaskSet()
+
+// kill running task if stage failed
+if(reason.isInstanceOf[FetchFailed]) {
+  killTasks(runningTasksSet, taskInfos)
+}
+  }
+
+  def killTasks(tasks: HashSet[Long], taskInfo: HashMap[Long, TaskInfo]): 
Boolean = {
+tasks.foreach { task =>
+  val executorId = taskInfo(task).executorId
+  sched.sc.schedulerBackend.killTask(task, executorId, true)
--- End diff --

Do you mean to add a parameter to the function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14568
  
**[Test build #63766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63766/consoleFull)**
 for PR 14568 at commit 
[`57ab6fa`](https://github.com/apache/spark/commit/57ab6fa395fce93d2df6b0e043568cab92a68366).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63763/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14568
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14568
  
**[Test build #63763 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63763/consoleFull)**
 for PR 14568 at commit 
[`8a987c5`](https://github.com/apache/spark/commit/8a987c5b75437512e98c739787d7ff44ece73bd1).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14182
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14182
  
**[Test build #63764 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63764/consoleFull)**
 for PR 14182 at commit 
[`50eee2b`](https://github.com/apache/spark/commit/50eee2b7b36397ab75b6d6f00059b04526db6594).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14182
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63764/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14637
  
**[Test build #63765 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63765/consoleFull)**
 for PR 14637 at commit 
[`4e276ab`](https://github.com/apache/spark/commit/4e276ab81848669ec5c7da6d51cf63a46b1bac87).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14637
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63765/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14637
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14637
  
**[Test build #63765 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63765/consoleFull)**
 for PR 14637 at commit 
[`4e276ab`](https://github.com/apache/spark/commit/4e276ab81848669ec5c7da6d51cf63a46b1bac87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14447
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14447
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63761/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14447
  
**[Test build #63761 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63761/consoleFull)**
 for PR 14447 at commit 
[`de9d0a7`](https://github.com/apache/spark/commit/de9d0a7908cf451c4368bcc085ef6e2bb306f7a8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14638
  
Hi, @rxin . I update the PR description first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14182: [SPARK-16444][SparkR]: Isotonic Regression wrapper in Sp...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14182
  
**[Test build #63764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63764/consoleFull)**
 for PR 14182 at commit 
[`50eee2b`](https://github.com/apache/spark/commit/50eee2b7b36397ab75b6d6f00059b04526db6594).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14568
  
**[Test build #63763 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63763/consoleFull)**
 for PR 14568 at commit 
[`8a987c5`](https://github.com/apache/spark/commit/8a987c5b75437512e98c739787d7ff44ece73bd1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63760/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14638
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread tedyu
Github user tedyu commented on the issue:

https://github.com/apache/spark/pull/14568
  
```
/home/jenkins/workspace/SparkPullRequestBuilder/dev/mima: line 37: 40498 
Aborted (core dumped) java -XX:MaxPermSize=1g -Xmx2g -cp 
"$TOOLS_CLASSPATH:$OLD_DEPS_CLASSPATH" org.apache.spark.tools.GenerateMIMAIgnore
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/dev/mima 
-Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive ; received return 
code 134
```
Not sure what caused the core dump.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread tedyu
Github user tedyu commented on the issue:

https://github.com/apache/spark/pull/14568
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #63760 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63760/consoleFull)**
 for PR 14638 at commit 
[`519b14a`](https://github.com/apache/spark/commit/519b14a24edb7a06a035ea233f17394a42ccb310).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14182: [SPARK-16444][SparkR]: Isotonic Regression wrappe...

2016-08-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14182#discussion_r74710705
  
--- Diff: R/pkg/R/mllib.R ---
@@ -299,6 +308,91 @@ setMethod("summary", signature(object = 
"NaiveBayesModel"),
 return(list(apriori = apriori, tables = tables))
   })
 
+#' Isotonic Regression Model
+#'
+#' Fits an Isotonic Regression model against a Spark DataFrame, similarly 
to R's isoreg().
+#' Users can print, make predictions on the produced model and save the 
model to the input path.
+#'
+#' @param data SparkDataFrame for training
+#' @param formula A symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#' @param isotonic Whether the output sequence should be 
isotonic/increasing (TRUE) or
+#' antitonic/decreasing (FALSE)
+#' @param featureIndex The index of the feature if \code{featuresCol} is a 
vector column (default: `0`),
+#' no effect otherwise
+#' @param weightCol The weight column name.
+#' @return \code{spark.isoreg} returns a fitted Isotonic Regression model
+#' @rdname spark.isoreg
+#' @aliases spark.isoreg,SparkDataFrame,formula-method
+#' @name spark.isoreg
+#' @export
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0),
+#' list(5.0, 3.0), list(1.0, 4.0))
+#' df <- createDataFrame(data, c("label", "feature"))
+#' model <- spark.isoreg(df, label ~ feature, isotonic = FALSE)
+#' # return model boundaries and prediction as lists
+#' result <- summary(model, df)
+#' # prediction based on fitted model
+#' predict_data <- list(list(-2.0), list(-1.0), list(0.5),
+#' list(0.75), list(1.0), list(2.0), list(9.0))
+#' predict_df <- createDataFrame(predict_data, c("feature"))
+#' # get prediction column
+#' predict_result <- collect(select(predict(model, predict_df), 
"prediction"))
+#'
+#' # save fitted model to input path
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#'
+#' # can also read back the saved model and print
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.isoreg since 2.1.0
+setMethod("spark.isoreg", signature(data = "SparkDataFrame", formula = 
"formula"),
+  function(data, formula, isotonic = TRUE, featureIndex = 0, 
weightCol = NULL) {
+formula <- paste0(deparse(formula), collapse = "")
+
+if (is.null(weightCol)) {
+  weightCol <- ""
+}
+
+jobj <- 
callJStatic("org.apache.spark.ml.r.IsotonicRegressionWrapper", "fit",
+data@sdf, formula, as.logical(isotonic), 
as.integer(featureIndex),
+  as.character(weightCol))
+return(new("IsotonicRegressionModel", jobj = jobj))
+  })
+
+#  Predicted values based on an isotonicRegression model
+
+#' @param object a fitted IsotonicRegressionModel
+#' @param newData SparkDataFrame for testing
+#' @return \code{predict} returns a SparkDataFrame containing predicted 
values
+#' @rdname spark.isoreg
+#' @export
+#' @note predict(IsotonicRegressionModel) since 2.1.0
+setMethod("predict", signature(object = "IsotonicRegressionModel"),
+  function(object, newData) {
+return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
+  })
+
+#  Get the summary of an IsotonicRegressionModel model
+
+#' @param object a fitted IsotonicRegressionModel
--- End diff --

Got it! Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14613: [SPARK-16883][SparkR]:SQL decimal type is not pro...

2016-08-14 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/14613#discussion_r74710682
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -354,6 +354,24 @@ setMethod("colnames<-",
 dataFrame(sdf)
   })
 
+specialtypeshandle <- function(type) {
--- End diff --

I will add comments. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63762/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14568
  
**[Test build #63762 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63762/consoleFull)**
 for PR 14568 at commit 
[`8a987c5`](https://github.com/apache/spark/commit/8a987c5b75437512e98c739787d7ff44ece73bd1).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14568
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14623
  
Hi, @rxin .
For `window_function.sql`, could you review again when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14568: [SPARK-10868] monotonicallyIncreasingId() support...

2016-08-14 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14568#discussion_r74710318
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -426,6 +426,29 @@ def monotonically_increasing_id():
 return Column(sc._jvm.functions.monotonically_increasing_id())
 
 
+@since(2.1)
+def monotonically_increasing_id(offset):
--- End diff --

Or we can default offset to 0 which covers the current usage.
But I am not sure which version to put in @since()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14568: [SPARK-10868] monotonicallyIncreasingId() supports offse...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14568
  
**[Test build #63762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63762/consoleFull)**
 for PR 14568 at commit 
[`8a987c5`](https://github.com/apache/spark/commit/8a987c5b75437512e98c739787d7ff44ece73bd1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...

2016-08-14 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14447#discussion_r74710214
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,6 +421,94 @@ setMethod("predict", signature(object = "KMeansModel"),
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
 
+#' Multilayer Perceptron Classification Model
+#'
+#' \code{spark.mlp} fits a multi-layer perceptron neural network model 
against a SparkDataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#' Only categorical data is supported.
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html
+#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}.
+#'
+#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
+#' @param blockSize BlockSize parameter
+#' @param layers Integer vector containing the number of nodes for each 
layer
+#' @param solver Solver parameter, supported options: "gd" (minibatch 
gradient descent) or "l-bfgs"
+#' @param maxIter Maximum iteration number
+#' @param tol Convergence tolerance of iterations
+#' @param stepSize StepSize parameter
+#' @param seed Seed parameter for weights initialization
+#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron 
Classification Model
+#' @rdname spark.mlp
+#' @aliases spark.mlp,SparkDataFrame-method
+#' @name spark.mlp
+#' @seealso \link{read.ml}
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_multiclass_classification_data.txt", 
source = "libsvm")
+#'
+#' # fit a Multilayer Perceptron Classification Model
+#' model <- spark.mlp(df, blockSize = 128, layers = c(4, 5, 4, 3), solver 
= "l-bfgs",
+#'maxIter = 100, tol = 0.5, stepSize = 1, seed = 1)
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.mlp since 2.1.0
+setMethod("spark.mlp", signature(data = "SparkDataFrame"),
+  function(data, blockSize = 128, layers = c(3, 5, 2), solver = 
"l-bfgs", maxIter = 100,
+   tol = 0.5, stepSize = 1, seed = 1, ...) {
--- End diff --

oh this `...` is just following what it is from other wrappers, or I should 
remove `...` for this one as well as `spark.naiveBayes` and `spark.survreg`?

https://github.com/apache/spark/blob/master/R/pkg/R/mllib.R#L455

https://github.com/apache/spark/blob/master/R/pkg/R/mllib.R#L601




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14568: [SPARK-10868] monotonicallyIncreasingId() support...

2016-08-14 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14568#discussion_r74710145
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -426,6 +426,29 @@ def monotonically_increasing_id():
 return Column(sc._jvm.functions.monotonically_increasing_id())
 
 
+@since(2.1)
+def monotonically_increasing_id(offset):
--- End diff --

We can introduce a new method which accepts offset.
How about monotonically_increasing_id_w_offset ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptron Class...

2016-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14447
  
**[Test build #63761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63761/consoleFull)**
 for PR 14447 at commit 
[`de9d0a7`](https://github.com/apache/spark/commit/de9d0a7908cf451c4368bcc085ef6e2bb306f7a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14636: [SPARK-17053][SQL] Support `hive.exec.drop.ignore...

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/14636


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14636: [SPARK-17053][SQL] Support `hive.exec.drop.ignorenonexis...

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14636
  
I'm close this issue since Spark does not want to support this option. The 
issue was finalized as `WON'T FIX`. See the Jira issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14638
  
Actually, I described like that for simplicity. But, CSV is one of the 
typical case of this option. This issue is not a CSV specific issue. I think 
this is a general table properties as you see the following.
```
CREATE TABLE t1 (id INT, b VARCHAR(10))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE LOCATION '/data'
TBLPROPERTIES('skip.header.line.count'='1')
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14447: [SPARK-16445][MLlib][SparkR] Multilayer Perceptro...

2016-08-14 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/14447#discussion_r74709917
  
--- Diff: R/pkg/R/mllib.R ---
@@ -414,6 +421,94 @@ setMethod("predict", signature(object = "KMeansModel"),
 return(dataFrame(callJMethod(object@jobj, "transform", 
newData@sdf)))
   })
 
+#' Multilayer Perceptron Classification Model
+#'
+#' \code{spark.mlp} fits a multi-layer perceptron neural network model 
against a SparkDataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#' Only categorical data is supported.
+#' For more details, see
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html
+#' #multilayer-perceptron-classifier}{Multilayerperceptron classifier}.
+#'
+#' @param data A \code{SparkDataFrame} of observations and labels for 
model fitting
+#' @param blockSize BlockSize parameter
+#' @param layers Integer vector containing the number of nodes for each 
layer
+#' @param solver Solver parameter, supported options: "gd" (minibatch 
gradient descent) or "l-bfgs"
+#' @param maxIter Maximum iteration number
+#' @param tol Convergence tolerance of iterations
+#' @param stepSize StepSize parameter
+#' @param seed Seed parameter for weights initialization
+#' @return \code{spark.mlp} returns a fitted Multilayer Perceptron 
Classification Model
+#' @rdname spark.mlp
+#' @aliases spark.mlp,SparkDataFrame-method
+#' @name spark.mlp
+#' @seealso \link{read.ml}
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- read.df("data/mllib/sample_multiclass_classification_data.txt", 
source = "libsvm")
+#'
+#' # fit a Multilayer Perceptron Classification Model
+#' model <- spark.mlp(df, blockSize = 128, layers = c(4, 5, 4, 3), solver 
= "l-bfgs",
+#'maxIter = 100, tol = 0.5, stepSize = 1, seed = 1)
+#'
+#' # get the summary of the model
+#' summary(model)
+#'
+#' # make predictions
+#' predictions <- predict(model, df)
+#'
+#' # save and load the model
+#' path <- "path/to/model"
+#' write.ml(model, path)
+#' savedModel <- read.ml(path)
+#' summary(savedModel)
+#' }
+#' @note spark.mlp since 2.1.0
+setMethod("spark.mlp", signature(data = "SparkDataFrame"),
+  function(data, blockSize = 128, layers = c(3, 5, 2), solver = 
"l-bfgs", maxIter = 100,
+   tol = 0.5, stepSize = 1, seed = 1, ...) {
+jobj <- 
callJStatic("org.apache.spark.ml.r.MultilayerPerceptronClassifierWrapper",
+"fit", data@sdf, as.integer(blockSize), 
as.array(layers),
+as.character(solver), as.integer(maxIter), 
as.numeric(tol),
+as.numeric(stepSize), as.integer(seed))
+return(new("MultilayerPerceptronClassificationModel", jobj = 
jobj))
+  })
+
+# Makes predictions from a model produced by spark.mlp().
+
+#' @param newData A SparkDataFrame for testing
+#' @return \code{predict} returns a SparkDataFrame containing predicted 
labeled in a column named
+#' "prediction"
+#' @rdname spark.mlp
+#' @aliases spark.mlp,SparkDataFrame-method
--- End diff --

I see, fixing it. sorry missing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-14 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14638
  
Is this a CSV specific issue? If yes, it seems wrong to have logic 
implemented for all hive serdes for this thing.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >