[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135308659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135308292 [Test build #41671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/console) for PR 7520 at commit [`055cd09`](https://github.com/apache/spark/commit/055cd09a09fff47cf43578a19ac78b77610231ce). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135308665 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/7520#discussion_r38065116 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -253,7 +260,7 @@ private[orc] case class OrcTableScan( maybeStructOI.map { soi => val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map { case (attr, ordinal) => - soi.getStructFieldRef(attr.name.toLowerCase) -> ordinal --- End diff -- If don't do the normalization, is this the only place we need to change? Since both `StructObjectInspector` and `OrcStructObjectInspector` are working for the same purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10251][CORE] some common types are not ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8465 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10251][CORE] some common types are not ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8465#issuecomment-135304194 Thanks - I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135304152 The normalization is not done by StructObjectInspector or OrcStructObjectInspector, but in `SemanticAnalyzer` of Hive. I've checked with Hive, even the orc column names are in capital, Hive works well, the only thing I am not sure is about the column pruning and predicate push down, seems "explain extended select xx" of Hive doesn't give those information, maybe @zhzhan can give some comments on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10287] [SQL] Fixes JSONRelation refresh...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8469#issuecomment-135302032 I will test it with my partitioned JSON table. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8476#discussion_r38063970 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformation.scala --- @@ -222,7 +221,7 @@ private class ScriptTransformationWriterThread( override def run(): Unit = Utils.logUncaughtExceptions { TaskContext.setTaskContext(taskContext) - +val newLineCode = 10 --- End diff -- Seems never used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9003] [MLlib] Add mapActive{Pairs,Value...
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/7357#issuecomment-135300832 @yanboliang I think there are some [discussions about whether we should enrich the Vectors API](https://www.mail-archive.com/user@spark.apache.org/msg35434.html) CC @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135300665 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41674/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135300664 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9003] [MLlib] Add mapActive{Pairs,Value...
Github user Mageswaran1989 commented on a diff in the pull request: https://github.com/apache/spark/pull/7357#discussion_r38063538 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -692,6 +744,29 @@ class SparseVector @Since("1.0.0") ( private[spark] override def toBreeze: BV[Double] = new BSV[Double](indices, values, size) + private[spark] override def mapActiveValues(f: Double => Double): Vector = +new SparseVector(size, indices.clone(), values.map(f)) + + private[spark] def mapActivePairs(f: (Int, Double) => Double): Vector = { --- End diff -- I am learning Scala. Could you please explain how this maps only the active pairs? what is happening in the line no: 756 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8476#issuecomment-135297803 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8476#issuecomment-135297810 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41667/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8476#issuecomment-135297607 [Test build #41667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41667/console) for PR 8476 at commit [`cafe301`](https://github.com/apache/spark/commit/cafe3013fc8ce8f94cac82eb6378bd8a5c609409). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10219] Fix varargsToEnv and ad...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8475 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135296970 [Test build #41673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41673/consoleFull) for PR 7883 at commit [`54365fc`](https://github.com/apache/spark/commit/54365fca94fc9857f035a0773ffd6ace650105ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135296539 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135296481 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10219] Fix varargsToEnv and ad...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8475#issuecomment-135295820 Thanks for taking a look. Merging this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135295552 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135295542 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8444 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135295340 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9148][SPARK-10252][SQL] Update SQL Prog...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8441#issuecomment-135295332 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135295239 Thanks - I've merged this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135294447 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41672/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135294443 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135294397 [Test build #41672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41672/console) for PR 8118 at commit [`acfc9fe`](https://github.com/apache/spark/commit/acfc9fe2bd2dc9903bddfe932b12123861e0aef6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9879][SQL][WIP] Fix OOM in Limit clause...
Github user Mageswaran1989 commented on a diff in the pull request: https://github.com/apache/spark/pull/8128#discussion_r38062415 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -192,6 +192,12 @@ private[spark] object SQLConf { "column based on statistics of the data.", isPublic = false) + val LIMIT_ROWS = longConf("spark.sql.limit.rows", +defaultValue = Some(10L), +doc = "For the LIMIT clause, put all of the output rows in a single partition " + + "iif the required row number less than the threshold, otherwise fetch the rows in a " + --- End diff -- I think iif => is typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135293645 [Test build #41664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41664/console) for PR 7883 at commit [`54365fc`](https://github.com/apache/spark/commit/54365fca94fc9857f035a0773ffd6ace650105ec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135293673 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41664/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135293672 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135291919 [Test build #41672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41672/consoleFull) for PR 8118 at commit [`acfc9fe`](https://github.com/apache/spark/commit/acfc9fe2bd2dc9903bddfe932b12123861e0aef6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135289365 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8118#issuecomment-135289408 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/8118#discussion_r38061775 --- Diff: python/pyspark/ml/feature.py --- @@ -818,6 +818,76 @@ class StringIndexerModel(JavaModel): Model fitted by StringIndexer. """ +class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol): +""" +.. note:: Experimental + +A feature transformer that filters out stop words from input. +Note: null values from input array are preserved unless adding null to stopWords explicitly. +""" +# a placeholder to make the stopwords show up in generated doc +stopWords = Param(Params._dummy(), "stopWords", "The words to be filtered out") +caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " + + "comparison over the stop words") + +@keyword_only +def __init__(self, inputCol=None, outputCol=None, stopWords=None, + caseSensitive=False): +""" +__init__(self, inputCol=None, outputCol=None, stopWords=None, + caseSensitive=false) +""" +super(StopWordsRemover, self).__init__() +self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover", +self.uid) +self.stopWords = Param(self, "stopWords", "The words to be filtered out") +self.caseSensitive = Param(self._dummy(), "caseSensitive", "whether to do a case " + + "sensitive comparison over the stop words") +stopWordsObj = _jvm().org.apache.spark.ml.feature.StopWords +defaultStopWords = stopWordsObj.ENGLISH_STOP_WORDS() +print "Constructing java param pair for value "+str(defaultStopWords) --- End diff -- oh no, I was checking the type when debugging something --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135289064 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135289065 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41670/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135289012 [Test build #41670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41670/console) for PR 8477 at commit [`3211a68`](https://github.com/apache/spark/commit/3211a68039b9886e31e6aabf00d6de335f81b4f6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135286827 [Test build #41671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/consoleFull) for PR 7520 at commit [`055cd09`](https://github.com/apache/spark/commit/055cd09a09fff47cf43578a19ac78b77610231ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135286166 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135286177 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135285964 * This patch doesn't claim performance improvement. I know that `OrcStructInspector` actually do very similar thing as `StructObjectInspector`. This patch is proposed to only deal with the lowercase problem. * Although Hive is a case-insensitive system and we just use lowercases in the querying, I just wonder if we need to normalise the column names to lowercases in persistence (such as ORC). Because according to `OrcStructInspector`, I don't see it automatically perform the normalisation. And you can find that we still keep the case-insensitive behavior when doing query (I will add it in the unit test). In other words, this patch only prevents automatically modifying the schema when serialising to ORC files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10104][SQL] Consolidate different forms...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8453#issuecomment-135285913 @cloud-fan Thank you for working on it. Since we are pretty close to 1.5 release, let's wait until the release and get this merged at the beginning of 1.6 release cycle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8333#issuecomment-135285441 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8333#issuecomment-135285443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41669/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: test
Github user hustnn closed the pull request at: https://github.com/apache/spark/pull/8478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8333#issuecomment-135285397 [Test build #41669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41669/console) for PR 8333 at commit [`209e94b`](https://github.com/apache/spark/commit/209e94b8bfaed7063d79266b442db893b2e5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: test
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8478#issuecomment-135285391 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: test
GitHub user hustnn opened a pull request: https://github.com/apache/spark/pull/8478 test You can merge this pull request into a Git repository by running: $ git pull https://github.com/hustnn/spark-adaptive-scheduling master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8478.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8478 commit 6e294733bf871a14b515ff502271f2aa04db9647 Author: hustnn Date: 2015-08-27T03:56:04Z record size of each reduce task commit b86c42b80d23b195ad3bc9a186290ffa7f817421 Author: hustnn Date: 2015-08-27T03:59:37Z Revert "record size of each reduce task" This reverts commit 6e294733bf871a14b515ff502271f2aa04db9647. commit be9db41f77228762ba01492b0f986435dd87d790 Author: hustnn Date: 2015-08-27T04:00:55Z record size of each reduce partition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135284113 [Test build #41670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41670/consoleFull) for PR 8477 at commit [`3211a68`](https://github.com/apache/spark/commit/3211a68039b9886e31e6aabf00d6de335f81b4f6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135283573 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135283336 [Test build #41668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41668/console) for PR 8444 at commit [`b2d072d`](https://github.com/apache/spark/commit/b2d072d4b089646591deb27eca97f989c4c5be7b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135283576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41668/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10219] Fix varargsToEnv and ad...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/8475#issuecomment-135283021 I forgot that, the current approach looks much better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9793] [MLlib] [PySpark] PySpark DenseVe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8166#issuecomment-135282553 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9793] [MLlib] [PySpark] PySpark DenseVe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8166#issuecomment-135282555 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41666/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9793] [MLlib] [PySpark] PySpark DenseVe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8166#issuecomment-135282290 [Test build #41666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41666/console) for PR 8166 at commit [`d63d54e`](https://github.com/apache/spark/commit/d63d54e835d82cac279b9a6b896b99f03c073ef8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135282056 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8477#issuecomment-135282013 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8333#issuecomment-135280626 [Test build #41669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41669/consoleFull) for PR 8333 at commit [`209e94b`](https://github.com/apache/spark/commit/209e94b8bfaed7063d79266b442db893b2e5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10311]Reload appId and attemptId when a...
GitHub user XuTingjun opened a pull request: https://github.com/apache/spark/pull/8477 [SPARK-10311]Reload appId and attemptId when a new ApplicationMaster registes You can merge this pull request into a Git repository by running: $ git pull https://github.com/XuTingjun/spark streaming-attempt Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8477.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8477 commit 3211a68039b9886e31e6aabf00d6de335f81b4f6 Author: xutingjun Date: 2015-08-27T03:31:03Z reload appId and attemptId when AM is new --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-135279946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41658/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8333#issuecomment-135279912 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-135279945 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8333#issuecomment-135279919 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-135279884 [Test build #41658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41658/console) for PR 8437 at commit [`090e1d4`](https://github.com/apache/spark/commit/090e1d4f67c89b9f272ef80e0c1a76a836813bb1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7544] [SQL] [PySpark] pyspark.sql.types...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8333#discussion_r38059443 --- Diff: python/pyspark/sql/types.py --- @@ -1233,6 +1235,19 @@ def __call__(self, *args): """create new Row object""" return _create_row(self, args) +def __getitem__(self, item): +if isinstance(item, (int, slice)): +return super(Row, self).__getitem__(item) +try: +# it will be slow when it has many fields, +# but this will not be used in normal cases +idx = self.__fields__.index(item) +return super(Row, self).__getitem__(idx) +except IndexError: +raise AttributeError(item) --- End diff -- I agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-135278321 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41657/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-135278319 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-135277681 [Test build #41657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41657/console) for PR 8437 at commit [`131e658`](https://github.com/apache/spark/commit/131e6586cefa80827221fdd579838ed32f6c412d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135277607 [Test build #41668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41668/consoleFull) for PR 8444 at commit [`b2d072d`](https://github.com/apache/spark/commit/b2d072d4b089646591deb27eca97f989c4c5be7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9793] [MLlib] [PySpark] PySpark DenseVe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8166#issuecomment-135277188 [Test build #41666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41666/consoleFull) for PR 8166 at commit [`d63d54e`](https://github.com/apache/spark/commit/d63d54e835d82cac279b9a6b896b99f03c073ef8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135276021 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9964] [PySpark] [SQL] PySpark DataFrame...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8444#issuecomment-135275967 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8476#issuecomment-135274867 [Test build #41667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41667/consoleFull) for PR 8476 at commit [`cafe301`](https://github.com/apache/spark/commit/cafe3013fc8ce8f94cac82eb6378bd8a5c609409). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9793] [MLlib] [PySpark] PySpark DenseVe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8166#issuecomment-135273565 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9793] [MLlib] [PySpark] PySpark DenseVe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8166#issuecomment-135273520 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8476#issuecomment-135273566 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8476#issuecomment-135273518 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10310][SQL]Using \t as the field delime...
GitHub user zhichao-li opened a pull request: https://github.com/apache/spark/pull/8476 [SPARK-10310][SQL]Using \t as the field delimeter and \n as the line delimeter Currently we are using `LazySimpleSerDe` to serialize the script input by default. but it would use '\001' not the same as hive. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhichao-li/spark delim Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8476.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8476 commit cafe3013fc8ce8f94cac82eb6378bd8a5c609409 Author: zhichao.li Date: 2015-08-27T03:12:29Z tab as the field delimeter and 10 as the line delimeter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8673] [launcher] API and infrastructure...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7052#issuecomment-135272582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41656/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8673] [launcher] API and infrastructure...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7052#issuecomment-135272348 [Test build #41656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41656/console) for PR 7052 at commit [`8608f56`](https://github.com/apache/spark/commit/8608f561be5a2ea3bc8981e9756ae127f4bcdd98). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ChildProcAppHandle implements SparkAppHandle ` * `abstract class LauncherConnection implements Closeable, Runnable ` * `final class LauncherProtocol ` * ` static class Message implements Serializable ` * ` static class Hello extends Message ` * ` static class SetAppId extends Message ` * ` static class SetState extends Message ` * ` static class Stop extends Message ` * `class LauncherServer implements Closeable ` * `class NamedThreadFactory implements ThreadFactory ` * `class OutputRedirector ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8673] [launcher] API and infrastructure...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7052#issuecomment-135272579 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9910][ML]User guide for train validatio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8377#issuecomment-135270185 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7884#discussion_r38058812 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -218,31 +217,59 @@ class LogisticRegression(override val uid: String) override def getThreshold: Double = super.getThreshold + /** + * Whether to over-/undersamples each of training sample according to the given + * weight in `weightCol`. If empty, all samples are supposed to have weights as 1.0. + * Default is empty, so all samples have weight one. + * @group setParam + */ + def setWeightCol(value: String): this.type = set(weightCol, value) + setDefault(weightCol -> "") + override def setThresholds(value: Array[Double]): this.type = super.setThresholds(value) override def getThresholds: Array[Double] = super.getThresholds override protected def train(dataset: DataFrame): LogisticRegressionModel = { // Extract columns from data. If dataset is persisted, do not persist oldDataset. -val instances = extractLabeledPoints(dataset).map { - case LabeledPoint(label: Double, features: Vector) => (label, features) -} +val instances: Either[RDD[(Double, Vector)], RDD[(Double, Double, Vector)]] = + if ($(weightCol).isEmpty) { +Left(dataset.select($(labelCol), $(featuresCol)).map { + case Row(label: Double, features: Vector) => (label, features) +}) + } else { +Right(dataset.select($(labelCol), $(weightCol), $(featuresCol)).map { + case Row(label: Double, weight: Double, features: Vector) => +(label, weight, features) +}) + } + val handlePersistence = dataset.rdd.getStorageLevel == StorageLevel.NONE -if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK) - -val (summarizer, labelSummarizer) = instances.treeAggregate( - (new MultivariateOnlineSummarizer, new MultiClassSummarizer))( -seqOp = (c, v) => (c, v) match { - case ((summarizer: MultivariateOnlineSummarizer, labelSummarizer: MultiClassSummarizer), - (label: Double, features: Vector)) => -(summarizer.add(features), labelSummarizer.add(label)) -}, -combOp = (c1, c2) => (c1, c2) match { - case ((summarizer1: MultivariateOnlineSummarizer, - classSummarizer1: MultiClassSummarizer), (summarizer2: MultivariateOnlineSummarizer, - classSummarizer2: MultiClassSummarizer)) => -(summarizer1.merge(summarizer2), classSummarizer1.merge(classSummarizer2)) - }) +if (handlePersistence) instances.fold(identity, identity).persist(StorageLevel.MEMORY_AND_DISK) + +val (summarizer, labelSummarizer) = { + val combOp = (c1: (MultivariateOnlineSummarizer, MultiClassSummarizer), +c2: (MultivariateOnlineSummarizer, MultiClassSummarizer)) => + (c1._1.merge(c2._1), c1._2.merge(c2._2)) + + instances match { --- End diff -- This is not working due to some type issue. https://cloud.githubusercontent.com/assets/1134574/9511773/f3b99dc6-4c2e-11e5-8d6e-e421907ebf41.png";> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9910][ML]User guide for train validatio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8377#issuecomment-135269856 [Test build #41665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41665/console) for PR 8377 at commit [`1dd1cd1`](https://github.com/apache/spark/commit/1dd1cd11d7abe72973d6339e028c55ac695e8fb9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaTrainValidationSplitExample ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9910][ML]User guide for train validatio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8377#issuecomment-135270190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41665/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8796][SQL] make sure SparkPlan is only ...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/7192 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7494#issuecomment-135265160 @CHOIJAEHONG1 , basically LGTM. Some minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7884#discussion_r38058578 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -218,31 +217,59 @@ class LogisticRegression(override val uid: String) override def getThreshold: Double = super.getThreshold + /** + * Whether to over-/undersamples each of training sample according to the given + * weight in `weightCol`. If empty, all samples are supposed to have weights as 1.0. + * Default is empty, so all samples have weight one. + * @group setParam + */ + def setWeightCol(value: String): this.type = set(weightCol, value) + setDefault(weightCol -> "") + override def setThresholds(value: Array[Double]): this.type = super.setThresholds(value) override def getThresholds: Array[Double] = super.getThresholds override protected def train(dataset: DataFrame): LogisticRegressionModel = { // Extract columns from data. If dataset is persisted, do not persist oldDataset. -val instances = extractLabeledPoints(dataset).map { - case LabeledPoint(label: Double, features: Vector) => (label, features) -} +val instances: Either[RDD[(Double, Vector)], RDD[(Double, Double, Vector)]] = + if ($(weightCol).isEmpty) { +Left(dataset.select($(labelCol), $(featuresCol)).map { + case Row(label: Double, features: Vector) => (label, features) +}) + } else { +Right(dataset.select($(labelCol), $(weightCol), $(featuresCol)).map { + case Row(label: Double, weight: Double, features: Vector) => +(label, weight, features) +}) + } + val handlePersistence = dataset.rdd.getStorageLevel == StorageLevel.NONE -if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK) - -val (summarizer, labelSummarizer) = instances.treeAggregate( - (new MultivariateOnlineSummarizer, new MultiClassSummarizer))( -seqOp = (c, v) => (c, v) match { - case ((summarizer: MultivariateOnlineSummarizer, labelSummarizer: MultiClassSummarizer), - (label: Double, features: Vector)) => -(summarizer.add(features), labelSummarizer.add(label)) -}, -combOp = (c1, c2) => (c1, c2) match { - case ((summarizer1: MultivariateOnlineSummarizer, - classSummarizer1: MultiClassSummarizer), (summarizer2: MultivariateOnlineSummarizer, - classSummarizer2: MultiClassSummarizer)) => -(summarizer1.merge(summarizer2), classSummarizer1.merge(classSummarizer2)) - }) +if (handlePersistence) instances.fold(identity, identity).persist(StorageLevel.MEMORY_AND_DISK) + +val (summarizer, labelSummarizer) = { + val combOp = (c1: (MultivariateOnlineSummarizer, MultiClassSummarizer), +c2: (MultivariateOnlineSummarizer, MultiClassSummarizer)) => + (c1._1.merge(c2._1), c1._2.merge(c2._2)) + + instances match { --- End diff -- Good point! Gonna to change to this style. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/7494#discussion_r38058562 --- Diff: R/pkg/inst/tests/test_sparkSQL.R --- @@ -417,6 +417,32 @@ test_that("collect() and take() on a DataFrame return the same number of rows an expect_equal(ncol(collect(df)), ncol(take(df, 10))) }) +test_that("collect() support Unicode characters", { + markUtf8 <- function(s) { +Encoding(s) <- "UTF-8" +s + } + + lines <- c("{\"name\":\"ìë íì¸ì\"}", + "{\"name\":\"æ¨å¥½\", \"age\":30}", + "{\"name\":\"ããã«ã¡ã¯\", \"age\":19}", + "{\"name\":\"Xin chà o\"}") + --- End diff -- Still a little bit confused about the behavior of treating unicode string in non-UTF8 local. Why no need to makeUtf8 for these unicode strings? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7685][ML] Apply weights to different sa...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7884#discussion_r38058517 --- Diff: project/MimaExcludes.scala --- @@ -60,6 +60,10 @@ object MimaExcludes { "org.apache.spark.ml.regression.LeastSquaresCostFun.this"), ProblemFilters.exclude[MissingMethodProblem]( "org.apache.spark.ml.classification.LogisticCostFun.this"), +ProblemFilters.exclude[MissingMethodProblem]( --- End diff -- That will work. But it's private class, and it's only used here, so I don't think it's necessary to preserve the original signature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9910][ML]User guide for train validatio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8377#issuecomment-135263765 [Test build #41665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41665/consoleFull) for PR 8377 at commit [`1dd1cd1`](https://github.com/apache/spark/commit/1dd1cd11d7abe72973d6339e028c55ac695e8fb9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9910][ML]User guide for train validatio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8377#issuecomment-135263522 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9910][ML]User guide for train validatio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8377#issuecomment-135263511 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6763][SQL] Add CountMinSketch to DataFr...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/6416#issuecomment-135263455 @rxin Is it good to have this as an aggregation function too? If so, I will update it too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135263294 [Test build #41664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41664/consoleFull) for PR 7883 at commit [`54365fc`](https://github.com/apache/spark/commit/54365fca94fc9857f035a0773ffd6ace650105ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org