[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72318533 [Test build #26463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26463/consoleFull) for PR 4193 at commit [`5b33f66`](https://github.com/apache/spark/commit/5b33f66424155bc4d3d58315a4ef2168177b802f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class IsotonicRegressionModel (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72318539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26463/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72319231 [Test build #26464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26464/consoleFull) for PR 3976 at commit [`d60bc60`](https://github.com/apache/spark/commit/d60bc6069cf65637622472ef1cd2715df53c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72319354 I think we can relax the constraint of `toDataFrame`. So we can directly create a DataFrame with partial columns of a RDD of tuples. No need to call an extra `select`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72319380 [Test build #26465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26465/consoleFull) for PR 4298 at commit [`f36efb5`](https://github.com/apache/spark/commit/f36efb56b40fd9e191587ed79d842efdb2358b39). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23889636 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.reflect.{classTag, ClassTag} + +import org.apache.spark.{Logging, Partition, SparkContext, SparkException, TaskContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.NextIterator + +import java.util.Properties +import kafka.api.{FetchRequestBuilder, FetchResponse} +import kafka.common.{ErrorMapping, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import kafka.message.{MessageAndMetadata, MessageAndOffset} +import kafka.serializer.Decoder +import kafka.utils.VerifiableProperties + +/** + * A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each KafkaRDDPartition in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param messageHandler function for translating each message into the desired type + */ +private[spark] +class KafkaRDD[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag, + R: ClassTag] private[spark] ( +sc: SparkContext, +kafkaParams: Map[String, String], +private[spark] val batch: Array[KafkaRDDPartition], +messageHandler: MessageAndMetadata[K, V] = R + ) extends RDD[R](sc, Nil) with Logging with HasOffsetRanges { + + def offsetRanges: Array[OffsetRange] = batch.asInstanceOf[Array[OffsetRange]] + + override def getPartitions: Array[Partition] = batch.asInstanceOf[Array[Partition]] + + override def getPreferredLocations(thePart: Partition): Seq[String] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +// TODO is additional hostname resolution necessary here +Seq(part.host) + } + + private def errBeginAfterEnd(part: KafkaRDDPartition): String = +sBeginning offset ${part.fromOffset} is after the ending offset ${part.untilOffset} + + sfor topic ${part.topic} partition ${part.partition}. + + You either provided an invalid fromOffset, or the Kafka topic has been damaged + + private def errRanOutBeforeEnd(part: KafkaRDDPartition): String = +sRan out of messages before reaching ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates that messages may have been lost + + private def errOvershotEnd(itemOffset: Long, part: KafkaRDDPartition): String = +sGot ${itemOffset} ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates a message may have been skipped + + override def compute(thePart: Partition, context: TaskContext): Iterator[R] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +assert(part.fromOffset = part.untilOffset, errBeginAfterEnd(part)) +if (part.fromOffset == part.untilOffset) { + log.warn(Beginning offset ${part.fromOffset} is the same as ending offset + +sskipping ${part.topic} ${part.partition}) + Iterator.empty +} else { + new KafkaRDDIterator(part, context) +} + } + + private class KafkaRDDIterator( + part:
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4292#discussion_r2348 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -820,7 +822,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli kClass: Class[K], vClass: Class[V]): RDD[(K, V)] = { assertNotStopped() -new NewHadoopRDD(this, fClass, kClass, vClass, conf) +// Add necessary security credentials to the JobConf. Required to access secure HDFS. +val jconf = new JobConf(conf) +SparkHadoopUtil.get.addCredentials(jconf) --- End diff -- if mode is not yarn, SparkHadoopUtil.addCredentials didnot do anything. so here it donot resolve when non-Yarn mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23889938 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.reflect.{classTag, ClassTag} + +import org.apache.spark.{Logging, Partition, SparkContext, SparkException, TaskContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.NextIterator + +import java.util.Properties +import kafka.api.{FetchRequestBuilder, FetchResponse} +import kafka.common.{ErrorMapping, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import kafka.message.{MessageAndMetadata, MessageAndOffset} +import kafka.serializer.Decoder +import kafka.utils.VerifiableProperties + +/** + * A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each KafkaRDDPartition in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param messageHandler function for translating each message into the desired type + */ +private[spark] +class KafkaRDD[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag, + R: ClassTag] private[spark] ( +sc: SparkContext, +kafkaParams: Map[String, String], +private[spark] val batch: Array[KafkaRDDPartition], +messageHandler: MessageAndMetadata[K, V] = R + ) extends RDD[R](sc, Nil) with Logging with HasOffsetRanges { + + def offsetRanges: Array[OffsetRange] = batch.asInstanceOf[Array[OffsetRange]] + + override def getPartitions: Array[Partition] = batch.asInstanceOf[Array[Partition]] + + override def getPreferredLocations(thePart: Partition): Seq[String] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +// TODO is additional hostname resolution necessary here +Seq(part.host) + } + + private def errBeginAfterEnd(part: KafkaRDDPartition): String = +sBeginning offset ${part.fromOffset} is after the ending offset ${part.untilOffset} + + sfor topic ${part.topic} partition ${part.partition}. + + You either provided an invalid fromOffset, or the Kafka topic has been damaged + + private def errRanOutBeforeEnd(part: KafkaRDDPartition): String = +sRan out of messages before reaching ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates that messages may have been lost + + private def errOvershotEnd(itemOffset: Long, part: KafkaRDDPartition): String = +sGot ${itemOffset} ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates a message may have been skipped + + override def compute(thePart: Partition, context: TaskContext): Iterator[R] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +assert(part.fromOffset = part.untilOffset, errBeginAfterEnd(part)) +if (part.fromOffset == part.untilOffset) { + log.warn(Beginning offset ${part.fromOffset} is the same as ending offset + +sskipping ${part.topic} ${part.partition}) + Iterator.empty +} else { + new KafkaRDDIterator(part, context) +} + } + + private class KafkaRDDIterator( + part:
[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23888903 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -808,6 +810,7 @@ class DAGScheduler( // will be posted, which should always come after a corresponding SparkListenerStageSubmitted // event. stage.latestInfo = StageInfo.fromStage(stage, Some(partitionsToCompute.size)) +outputCommitCoordinator.stageStart(stage.id) --- End diff -- yes, i agree with @vanzin 's Opinion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72321243 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26464/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72321234 [Test build #26464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26464/consoleFull) for PR 3976 at commit [`d60bc60`](https://github.com/apache/spark/commit/d60bc6069cf65637622472ef1cd2715df53c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72321964 [Test build #26465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26465/consoleFull) for PR 4298 at commit [`f36efb5`](https://github.com/apache/spark/commit/f36efb56b40fd9e191587ed79d842efdb2358b39). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val elem = sarray (class $` * `val elem = sexternalizable object (class $` * `val elem = sobject (class $` * ` implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal ` * `class IsotonicRegressionModel (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72321965 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26465/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23889820 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.reflect.{classTag, ClassTag} + +import org.apache.spark.{Logging, Partition, SparkContext, SparkException, TaskContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.NextIterator + +import java.util.Properties +import kafka.api.{FetchRequestBuilder, FetchResponse} +import kafka.common.{ErrorMapping, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import kafka.message.{MessageAndMetadata, MessageAndOffset} +import kafka.serializer.Decoder +import kafka.utils.VerifiableProperties + +/** + * A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each KafkaRDDPartition in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param messageHandler function for translating each message into the desired type + */ +private[spark] +class KafkaRDD[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag, + R: ClassTag] private[spark] ( +sc: SparkContext, +kafkaParams: Map[String, String], +private[spark] val batch: Array[KafkaRDDPartition], +messageHandler: MessageAndMetadata[K, V] = R + ) extends RDD[R](sc, Nil) with Logging with HasOffsetRanges { + + def offsetRanges: Array[OffsetRange] = batch.asInstanceOf[Array[OffsetRange]] + + override def getPartitions: Array[Partition] = batch.asInstanceOf[Array[Partition]] + + override def getPreferredLocations(thePart: Partition): Seq[String] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +// TODO is additional hostname resolution necessary here +Seq(part.host) + } + + private def errBeginAfterEnd(part: KafkaRDDPartition): String = +sBeginning offset ${part.fromOffset} is after the ending offset ${part.untilOffset} + + sfor topic ${part.topic} partition ${part.partition}. + + You either provided an invalid fromOffset, or the Kafka topic has been damaged + + private def errRanOutBeforeEnd(part: KafkaRDDPartition): String = +sRan out of messages before reaching ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates that messages may have been lost + + private def errOvershotEnd(itemOffset: Long, part: KafkaRDDPartition): String = +sGot ${itemOffset} ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates a message may have been skipped + + override def compute(thePart: Partition, context: TaskContext): Iterator[R] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +assert(part.fromOffset = part.untilOffset, errBeginAfterEnd(part)) +if (part.fromOffset == part.untilOffset) { + log.warn(Beginning offset ${part.fromOffset} is the same as ending offset + +sskipping ${part.topic} ${part.partition}) + Iterator.empty +} else { + new KafkaRDDIterator(part, context) +} + } + + private class KafkaRDDIterator( + part:
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72323941 @JoshRosen can you help me? now i add a unit test for python applicaiton on yarn cluster mode. but now there is a failed. i think its reason is environment of pyspark are not set. can you tell me how to set pyspark's environment in my java unit test.thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72317547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26461/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72317545 [Test build #26461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26461/consoleFull) for PR 4298 at commit [`2c9f370`](https://github.com/apache/spark/commit/2c9f370b312d241e6c72abe812614ac1590a032f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72317452 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26462/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72317446 [Test build #26462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26462/consoleFull) for PR 3976 at commit [`5b30064`](https://github.com/apache/spark/commit/5b300648fe53d9de604e8afce7580fddfe6bbaef). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3519 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3975] Added support for BlockMatrix add...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4274#issuecomment-72309957 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3975] Added support for BlockMatrix add...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4274 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4193#discussion_r23887726 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala --- @@ -152,7 +152,7 @@ class RowMatrix( * storing the right singular vectors, is computed via matrix multiplication as * U = A * (V * S^-1^), if requested by user. The actual method to use is determined * automatically based on the cost: - * - If n is small (n lt; 100) or k is large compared with n (k n / 2), we compute the Gramian + * - If n is small (n lt; 100) or k is large compared with n (k gt; n / 2), we compute the Gramian --- End diff -- Is this line too wide? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...
Github user jacek-lewandowski commented on the pull request: https://github.com/apache/spark/pull/4220#issuecomment-72309821 Agreed, I'll make the changes and add the clarifying comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4294#discussion_r23887546 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -28,10 +32,10 @@ private[sql] class DefaultSource extends RelationProvider with SchemaRelationPro override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { -val fileName = parameters.getOrElse(path, sys.error(Option 'path' not specified)) +val path = parameters.getOrElse(path, sys.error(Option 'path' not specified)) --- End diff -- I think we should define common data source options like `path` in a single place. But we can leave it to another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1484#discussion_r23887545 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala --- @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.feature + +import org.apache.spark.annotation.Experimental +import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vectors, Vector} +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.mllib.stat.Statistics +import org.apache.spark.rdd.RDD + +/** + * :: Experimental :: + * Chi Squared selector model. + * + * @param indices list of indices to select (filter) + */ +@Experimental +class ChiSqSelectorModel(indices: Array[Int]) extends VectorTransformer { + /** + * Applies transformation on a vector. + * + * @param vector vector to be transformed. + * @return transformed vector. + */ + override def transform(vector: Vector): Vector = { +Compress(vector, indices) + } +} + +/** + * :: Experimental :: + * Creates a ChiSquared feature selector. + * @param numTopFeatures number of features that selector will select + * (ordered by statistic value descending) + */ +@Experimental +class ChiSqSelector (val numTopFeatures: Int) { + + /** + * Returns a ChiSquared feature selector. + * + * @param data data used to compute the Chi Squared statistic. + */ + def fit(data: RDD[LabeledPoint]): ChiSqSelectorModel = { +val indices = Statistics.chiSqTest(data) + .zipWithIndex.sortBy { case(res, _) = -res.statistic } + .take(numTopFeatures) + .map{ case(_, indices) = indices } +new ChiSqSelectorModel(indices) + } +} + +/** + * :: Experimental :: + * Filters features in a given vector + */ +@Experimental +object Compress { + /** + * Returns a vector with features filtered. + * Preserves the order of filtered features the same as their indices are stored. + * @param features vector + * @param filterIndices indices of features to filter + */ + def apply(features: Vector, filterIndices: Array[Int]): Vector = { +features match { + case SparseVector(size, indices, values) = +val filterMap = filterIndices.zipWithIndex.toMap --- End diff -- Please see my new inline comments about the constructor. Basically, if we expose the constructor, we should check the ordering of `indices`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] Add a config option for Serializa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5307] Add a config option for Serializa...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4297#issuecomment-72309051 cc @pwendell we can add this to the release notes. If the debugger malfunctions, it can be turned off with this flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-72309076 [Test build #26458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26458/consoleFull) for PR 3519 at commit [`5a54ea4`](https://github.com/apache/spark/commit/5a54ea45334b9ef1273a7303be0ef20b97896c92). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class IsotonicRegressionModel (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-72309078 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26458/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB][SPARK-3278] Monotone (Isotonic) regres...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3519#issuecomment-72309894 LGTM. Merged into master. Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r23887814 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -68,6 +79,65 @@ class LogisticRegressionModel ( case None = score } } + + override def save(sc: SparkContext, path: String): Unit = { +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Create JSON metadata. +val metadata = LogisticRegressionModel.Metadata( + clazz = this.getClass.getName, version = Exportable.latestVersion) --- End diff -- Should each model have its own version? For example, we might add some statistics to LRModel and save them during model export. If the version is global, we might have trouble loading it back. With the new DataFrame API, this could be done by ~~~ sc.parallelize(Seq((clazz, version))).toDataFrame(clazz, version) ~~~ and hence we don't need the case class and it is easier to use class instead of clazz. For the JSON format, we can use json4s directly and save as `RDD[String]`. The code will be cleaner and have less dependency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r23887815 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -68,6 +79,65 @@ class LogisticRegressionModel ( case None = score } } + + override def save(sc: SparkContext, path: String): Unit = { +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Create JSON metadata. +val metadata = LogisticRegressionModel.Metadata( + clazz = this.getClass.getName, version = Exportable.latestVersion) +val metadataRDD: DataFrame = sc.parallelize(Seq(metadata)) +metadataRDD.toJSON.saveAsTextFile(path + /metadata) +// Create Parquet data. +val data = LogisticRegressionModel.Data(weights, intercept, threshold) +val dataRDD: DataFrame = sc.parallelize(Seq(data)) +dataRDD.saveAsParquetFile(path + /data) + } +} + +object LogisticRegressionModel extends Importable[LogisticRegressionModel] { + + private case class Metadata(clazz: String, version: String) + + private case class Data(weights: Vector, intercept: Double, threshold: Option[Double]) + + override def load(sc: SparkContext, path: String): LogisticRegressionModel = { +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Load JSON metadata. +val metadataRDD = sqlContext.jsonFile(path + /metadata) --- End diff -- Same here. We can load it as RDD[String] and use json4s to parse it. The question is whether we want to treat metadata as a single-record table. It might have nested fields, which doesn't look like a table to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r23887813 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -68,6 +79,65 @@ class LogisticRegressionModel ( case None = score } } + + override def save(sc: SparkContext, path: String): Unit = { +val sqlContext = new SQLContext(sc) +import sqlContext._ --- End diff -- `import sqlContext._` is no longer needed due to recent API change. `implicit val sqlContext = new SQLContext(sc)` should work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r23887816 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -68,6 +79,65 @@ class LogisticRegressionModel ( case None = score } } + + override def save(sc: SparkContext, path: String): Unit = { +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Create JSON metadata. +val metadata = LogisticRegressionModel.Metadata( + clazz = this.getClass.getName, version = Exportable.latestVersion) +val metadataRDD: DataFrame = sc.parallelize(Seq(metadata)) +metadataRDD.toJSON.saveAsTextFile(path + /metadata) +// Create Parquet data. +val data = LogisticRegressionModel.Data(weights, intercept, threshold) +val dataRDD: DataFrame = sc.parallelize(Seq(data)) +dataRDD.saveAsParquetFile(path + /data) + } +} + +object LogisticRegressionModel extends Importable[LogisticRegressionModel] { + + private case class Metadata(clazz: String, version: String) + + private case class Data(weights: Vector, intercept: Double, threshold: Option[Double]) + + override def load(sc: SparkContext, path: String): LogisticRegressionModel = { +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Load JSON metadata. +val metadataRDD = sqlContext.jsonFile(path + /metadata) +val metadataArray = metadataRDD.select(clazz, version).take(1) +assert(metadataArray.size == 1, + sUnable to load LogisticRegressionModel metadata from: ${path + /metadata}) +metadataArray(0) match { --- End diff -- The loading part could be more generic. For each model and each version, we need to maintain a loader that reads saved models to the current version or claim incompatible. For example, ~~~ object LogisticRegressionModel extends Importable[LogisticRegressionModel] { def load(sc path): LogisticRegressionModel = { val metadata = ... val importer = Importers.get(version) importer.load(sc, path) } object Importers { def get(version: String): Importer[LogisticRegressionModel] = { version match { case 1.0 = new V1() case 2.0 = ... case _ = throw new Exception(...) } } class V1 extends Importer[LogisticRegressionModel] { ... } } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890348 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.reflect.{classTag, ClassTag} + +import org.apache.spark.{Logging, Partition, SparkContext, SparkException, TaskContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.NextIterator + +import java.util.Properties +import kafka.api.{FetchRequestBuilder, FetchResponse} +import kafka.common.{ErrorMapping, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import kafka.message.{MessageAndMetadata, MessageAndOffset} +import kafka.serializer.Decoder +import kafka.utils.VerifiableProperties + +/** + * A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each KafkaRDDPartition in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param messageHandler function for translating each message into the desired type + */ +private[spark] +class KafkaRDD[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag, + R: ClassTag] private[spark] ( +sc: SparkContext, +kafkaParams: Map[String, String], +private[spark] val batch: Array[KafkaRDDPartition], +messageHandler: MessageAndMetadata[K, V] = R + ) extends RDD[R](sc, Nil) with Logging with HasOffsetRanges { + + def offsetRanges: Array[OffsetRange] = batch.asInstanceOf[Array[OffsetRange]] + + override def getPartitions: Array[Partition] = batch.asInstanceOf[Array[Partition]] + + override def getPreferredLocations(thePart: Partition): Seq[String] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +// TODO is additional hostname resolution necessary here +Seq(part.host) + } + + private def errBeginAfterEnd(part: KafkaRDDPartition): String = +sBeginning offset ${part.fromOffset} is after the ending offset ${part.untilOffset} + + sfor topic ${part.topic} partition ${part.partition}. + + You either provided an invalid fromOffset, or the Kafka topic has been damaged + + private def errRanOutBeforeEnd(part: KafkaRDDPartition): String = +sRan out of messages before reaching ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates that messages may have been lost + + private def errOvershotEnd(itemOffset: Long, part: KafkaRDDPartition): String = +sGot ${itemOffset} ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates a message may have been skipped + + override def compute(thePart: Partition, context: TaskContext): Iterator[R] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +assert(part.fromOffset = part.untilOffset, errBeginAfterEnd(part)) +if (part.fromOffset == part.untilOffset) { + log.warn(Beginning offset ${part.fromOffset} is the same as ending offset + +sskipping ${part.topic} ${part.partition}) + Iterator.empty +} else { + new KafkaRDDIterator(part, context) +} + } + + private class KafkaRDDIterator( + part:
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user dibbhatt commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890423 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.reflect.{classTag, ClassTag} + +import org.apache.spark.{Logging, Partition, SparkContext, SparkException, TaskContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.NextIterator + +import java.util.Properties +import kafka.api.{FetchRequestBuilder, FetchResponse} +import kafka.common.{ErrorMapping, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import kafka.message.{MessageAndMetadata, MessageAndOffset} +import kafka.serializer.Decoder +import kafka.utils.VerifiableProperties + +/** + * A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each KafkaRDDPartition in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param messageHandler function for translating each message into the desired type + */ +private[spark] +class KafkaRDD[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag, + R: ClassTag] private[spark] ( +sc: SparkContext, +kafkaParams: Map[String, String], +private[spark] val batch: Array[KafkaRDDPartition], +messageHandler: MessageAndMetadata[K, V] = R + ) extends RDD[R](sc, Nil) with Logging with HasOffsetRanges { + + def offsetRanges: Array[OffsetRange] = batch.asInstanceOf[Array[OffsetRange]] + + override def getPartitions: Array[Partition] = batch.asInstanceOf[Array[Partition]] + + override def getPreferredLocations(thePart: Partition): Seq[String] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +// TODO is additional hostname resolution necessary here +Seq(part.host) + } + + private def errBeginAfterEnd(part: KafkaRDDPartition): String = +sBeginning offset ${part.fromOffset} is after the ending offset ${part.untilOffset} + + sfor topic ${part.topic} partition ${part.partition}. + + You either provided an invalid fromOffset, or the Kafka topic has been damaged + + private def errRanOutBeforeEnd(part: KafkaRDDPartition): String = +sRan out of messages before reaching ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates that messages may have been lost + + private def errOvershotEnd(itemOffset: Long, part: KafkaRDDPartition): String = +sGot ${itemOffset} ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates a message may have been skipped + + override def compute(thePart: Partition, context: TaskContext): Iterator[R] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +assert(part.fromOffset = part.untilOffset, errBeginAfterEnd(part)) +if (part.fromOffset == part.untilOffset) { + log.warn(Beginning offset ${part.fromOffset} is the same as ending offset + +sskipping ${part.topic} ${part.partition}) + Iterator.empty +} else { + new KafkaRDDIterator(part, context) +} + } + + private class KafkaRDDIterator( + part:
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4193 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890594 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.kafka + +import scala.reflect.{classTag, ClassTag} + +import org.apache.spark.{Logging, Partition, SparkContext, SparkException, TaskContext} +import org.apache.spark.rdd.RDD +import org.apache.spark.util.NextIterator + +import java.util.Properties +import kafka.api.{FetchRequestBuilder, FetchResponse} +import kafka.common.{ErrorMapping, TopicAndPartition} +import kafka.consumer.{ConsumerConfig, SimpleConsumer} +import kafka.message.{MessageAndMetadata, MessageAndOffset} +import kafka.serializer.Decoder +import kafka.utils.VerifiableProperties + +/** + * A batch-oriented interface for consuming from Kafka. + * Starting and ending offsets are specified in advance, + * so that you can control exactly-once semantics. + * @param kafkaParams Kafka a href=http://kafka.apache.org/documentation.html#configuration; + * configuration parameters/a. + * Requires metadata.broker.list or bootstrap.servers to be set with Kafka broker(s), + * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + * @param batch Each KafkaRDDPartition in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + * @param messageHandler function for translating each message into the desired type + */ +private[spark] +class KafkaRDD[ + K: ClassTag, + V: ClassTag, + U : Decoder[_]: ClassTag, + T : Decoder[_]: ClassTag, + R: ClassTag] private[spark] ( +sc: SparkContext, +kafkaParams: Map[String, String], +private[spark] val batch: Array[KafkaRDDPartition], +messageHandler: MessageAndMetadata[K, V] = R + ) extends RDD[R](sc, Nil) with Logging with HasOffsetRanges { + + def offsetRanges: Array[OffsetRange] = batch.asInstanceOf[Array[OffsetRange]] + + override def getPartitions: Array[Partition] = batch.asInstanceOf[Array[Partition]] + + override def getPreferredLocations(thePart: Partition): Seq[String] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +// TODO is additional hostname resolution necessary here +Seq(part.host) + } + + private def errBeginAfterEnd(part: KafkaRDDPartition): String = +sBeginning offset ${part.fromOffset} is after the ending offset ${part.untilOffset} + + sfor topic ${part.topic} partition ${part.partition}. + + You either provided an invalid fromOffset, or the Kafka topic has been damaged + + private def errRanOutBeforeEnd(part: KafkaRDDPartition): String = +sRan out of messages before reaching ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates that messages may have been lost + + private def errOvershotEnd(itemOffset: Long, part: KafkaRDDPartition): String = +sGot ${itemOffset} ending offset ${part.untilOffset} + +sfor topic ${part.topic} partition ${part.partition} start ${part.fromOffset}. + + This should not happen, and indicates a message may have been skipped + + override def compute(thePart: Partition, context: TaskContext): Iterator[R] = { +val part = thePart.asInstanceOf[KafkaRDDPartition] +assert(part.fromOffset = part.untilOffset, errBeginAfterEnd(part)) +if (part.fromOffset == part.untilOffset) { + log.warn(Beginning offset ${part.fromOffset} is the same as ending offset + +sskipping ${part.topic} ${part.partition}) + Iterator.empty +} else { + new KafkaRDDIterator(part, context) +} + } + + private class KafkaRDDIterator( + part:
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72324906 [Test build #26466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26466/consoleFull) for PR 3976 at commit [`2adc8f5`](https://github.com/apache/spark/commit/2adc8f591ddd0f253496c18d32b1910d29e04c8d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5341] Use maven coordinates as dependen...
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/4215#issuecomment-72326441 Interesting... The tests are successful on my local computer but fails in Jenkins... The end to end test that downloads spark-avro and spark-csv succeeds which is nice. Searching for artifacts at other repositories looks like it failed, but actually it says: Test succeeded, but ended abruptly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72327242 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26466/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72328501 @srowen The changes look good to me. Are you going to push more changes? I'm thinking of merging this in to avoid merge conflicts with other PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4983]exception handling about adding ta...
Github user GenTang commented on a diff in the pull request: https://github.com/apache/spark/pull/3986#discussion_r23890734 --- Diff: ec2/spark_ec2.py --- @@ -569,6 +569,8 @@ def launch_cluster(conn, opts, cluster_name): master_nodes = master_res.instances print Launched master in %s, regid = %s % (zone, master_res.id) +# Wait for the information of the just-launched instances to be propagated within AWS +time.sleep(5) --- End diff -- @nchammas Maybe we should insert a print here to tell the client that we are waiting for the information to be propagated, since there will be 5 seconds idle between `launch master in` and `wait for the cluster to enter`. How do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72328806 @mengxr That's all I've got for now. It fixes javadoc 8 errors due to the code, but not due to the output of unidoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72330127 Merged into master. Thanks! I left the JIRA as open. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4983]exception handling about adding ta...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3986#discussion_r23891135 --- Diff: ec2/spark_ec2.py --- @@ -569,6 +569,8 @@ def launch_cluster(conn, opts, cluster_name): master_nodes = master_res.instances print Launched master in %s, regid = %s % (zone, master_res.id) +# Wait for the information of the just-launched instances to be propagated within AWS +time.sleep(5) --- End diff -- Sure. Also, in the comments, reference SPARK-4983. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-72331831 Thanks :) Looking forward to your reviews. I will work on other stuff till then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r23891545 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.examples.ml + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.ml.classification.{Classifier, ClassifierParams, ClassificationModel} +import org.apache.spark.ml.param.{Params, IntParam, ParamMap} +import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors} +import org.apache.spark.mllib.regression.LabeledPoint +import org.apache.spark.sql.{DataFrame, Row, SQLContext} + + +/** + * A simple example demonstrating how to write your own learning algorithm using Estimator, + * Transformer, and other abstractions. + * This mimics [[org.apache.spark.ml.classification.LogisticRegression]]. + * Run with + * {{{ + * bin/run-example ml.DeveloperApiExample + * }}} + */ +object DeveloperApiExample { + + def main(args: Array[String]) { +val conf = new SparkConf().setAppName(DeveloperApiExample) +val sc = new SparkContext(conf) +val sqlContext = new SQLContext(sc) +import sqlContext._ + +// Prepare training data. +val training = sparkContext.parallelize(Seq( + LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)), + LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)), + LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)), + LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5 + +// Create a LogisticRegression instance. This instance is an Estimator. +val lr = new MyLogisticRegression() +// Print out the parameters, documentation, and any default values. +println(MyLogisticRegression parameters:\n + lr.explainParams() + \n) + +// We may set parameters using setter methods. +lr.setMaxIter(10) + +// Learn a LogisticRegression model. This uses the parameters stored in lr. +val model = lr.fit(training) + +// Prepare test data. +val test = sparkContext.parallelize(Seq( + LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)), + LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)), + LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5 + +// Make predictions on test data. +val sumPredictions: Double = model.transform(test) + .select(features, label, prediction) + .collect() + .map { case Row(features: Vector, label: Double, prediction: Double) = +prediction + }.sum +assert(sumPredictions == 0.0, + MyLogisticRegression predicted something other than 0, even though all weights are 0!) + +sc.stop() + } +} + +/** + * Example of defining a parameter trait for a user-defined type of [[Classifier]]. + * + * NOTE: This is private since it is an example. In practice, you may not want it to be private. + */ +private trait MyLogisticRegressionParams extends ClassifierParams { + + /** + * Param for max number of iterations + * + * NOTE: The usual way to add a parameter to a model or algorithm is to include: + * - val myParamName: ParamType + * - def getMyParamName + * - def setMyParamName --- End diff -- Will do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r23891546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/Regressor.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import org.apache.spark.annotation.{DeveloperApi, AlphaComponent} +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor, PredictorParams} + +/** + * :: DeveloperApi :: + * Params for regression. + * Currently empty, but may add functionality later. + */ +@DeveloperApi +trait RegressorParams extends PredictorParams + +/** + * :: AlphaComponent :: + * + * Single-label regression + * + * @tparam FeaturesType Type of input features. E.g., [[org.apache.spark.mllib.linalg.Vector]] + * @tparam Learner Concrete Estimator type + * @tparam M Concrete Model type + */ +@AlphaComponent +abstract class Regressor[ +FeaturesType, +Learner : Regressor[FeaturesType, Learner, M], --- End diff -- What about E for Estimator since Learner is not really a concept? (The only weird thing is that the type parameters are E and M, which reminds me of EM.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Disabling Utils.chmod700 for Windows
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4299#issuecomment-72333480 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r23891575 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -80,69 +50,157 @@ class LogisticRegression extends Estimator[LogisticRegressionModel] with Logisti def setRegParam(value: Double): this.type = set(regParam, value) def setMaxIter(value: Int): this.type = set(maxIter, value) - def setLabelCol(value: String): this.type = set(labelCol, value) def setThreshold(value: Double): this.type = set(threshold, value) - def setFeaturesCol(value: String): this.type = set(featuresCol, value) - def setScoreCol(value: String): this.type = set(scoreCol, value) - def setPredictionCol(value: String): this.type = set(predictionCol, value) override def fit(dataset: SchemaRDD, paramMap: ParamMap): LogisticRegressionModel = { +// Check schema transformSchema(dataset.schema, paramMap, logging = true) -import dataset.sqlContext._ + +// Extract columns from data. If dataset is persisted, do not persist oldDataset. +val oldDataset = extractLabeledPoints(dataset, paramMap) val map = this.paramMap ++ paramMap -val instances = dataset.select(map(labelCol).attr, map(featuresCol).attr) - .map { case Row(label: Double, features: Vector) = -LabeledPoint(label, features) - }.persist(StorageLevel.MEMORY_AND_DISK) +val handlePersistence = dataset.getStorageLevel == StorageLevel.NONE +if (handlePersistence) { + oldDataset.persist(StorageLevel.MEMORY_AND_DISK) +} + +// Train model val lr = new LogisticRegressionWithLBFGS lr.optimizer .setRegParam(map(regParam)) .setNumIterations(map(maxIter)) -val lrm = new LogisticRegressionModel(this, map, lr.run(instances).weights) -instances.unpersist() +val oldModel = lr.run(oldDataset) +val lrm = new LogisticRegressionModel(this, map, oldModel.weights, oldModel.intercept) + +if (handlePersistence) { + oldDataset.unpersist() +} + // copy model params Params.inheritValues(map, this, lrm) lrm } - private[ml] override def transformSchema(schema: StructType, paramMap: ParamMap): StructType = { -validateAndTransformSchema(schema, paramMap, fitting = true) - } + override protected def featuresDataType: DataType = new VectorUDT } + /** * :: AlphaComponent :: + * * Model produced by [[LogisticRegression]]. */ @AlphaComponent class LogisticRegressionModel private[ml] ( override val parent: LogisticRegression, override val fittingParamMap: ParamMap, -weights: Vector) - extends Model[LogisticRegressionModel] with LogisticRegressionParams { +val weights: Vector, +val intercept: Double) + extends ProbabilisticClassificationModel[Vector, LogisticRegressionModel] + with LogisticRegressionParams { + + setThreshold(0.5) def setThreshold(value: Double): this.type = set(threshold, value) - def setFeaturesCol(value: String): this.type = set(featuresCol, value) - def setScoreCol(value: String): this.type = set(scoreCol, value) - def setPredictionCol(value: String): this.type = set(predictionCol, value) - private[ml] override def transformSchema(schema: StructType, paramMap: ParamMap): StructType = { -validateAndTransformSchema(schema, paramMap, fitting = false) + private val margin: Vector = Double = (features) = { +BLAS.dot(features, weights) + intercept + } + + private val score: Vector = Double = (features) = { +val m = margin(features) +1.0 / (1.0 + math.exp(-m)) } override def transform(dataset: SchemaRDD, paramMap: ParamMap): SchemaRDD = { +// Check schema transformSchema(dataset.schema, paramMap, logging = true) + import dataset.sqlContext._ val map = this.paramMap ++ paramMap -val score: Vector = Double = (v) = { - val margin = BLAS.dot(v, weights) - 1.0 / (1.0 + math.exp(-margin)) + +// Output selected columns only. +// This is a bit complicated since it tries to avoid repeated computation. --- End diff -- I'll add a TODO for now...but hope I have time to make it part of this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Disabling Utils.chmod700 for Windows
Github user MartinWeindel commented on the pull request: https://github.com/apache/spark/pull/4299#issuecomment-72333732 Yes, it's a regression. It worked with 1.2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Disabling Utils.chmod700 for Windows
GitHub user MartinWeindel opened a pull request: https://github.com/apache/spark/pull/4299 Disabling Utils.chmod700 for Windows This patch makes Spark 1.2.1rc2 work again on Windows. Without it you get following log output on creating a Spark context: INFO org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in Ignoring this directory. ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MartinWeindel/spark branch-1.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4299 commit ac4749c1a2ef180b3ed2b01ff6397c60173a7b33 Author: mweindel m.wein...@usu-software.de Date: 2015-01-30T16:09:25Z fixed chmod700 for Windows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user tomerk commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r23891560 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/Regressor.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import org.apache.spark.annotation.{DeveloperApi, AlphaComponent} +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor, PredictorParams} + +/** + * :: DeveloperApi :: + * Params for regression. + * Currently empty, but may add functionality later. + */ +@DeveloperApi +trait RegressorParams extends PredictorParams + +/** + * :: AlphaComponent :: + * + * Single-label regression + * + * @tparam FeaturesType Type of input features. E.g., [[org.apache.spark.mllib.linalg.Vector]] + * @tparam Learner Concrete Estimator type + * @tparam M Concrete Model type + */ +@AlphaComponent +abstract class Regressor[ +FeaturesType, +Learner : Regressor[FeaturesType, Learner, M], --- End diff -- Maybe it would also make sense to have FeaturesType be a single letter for consistency? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-72333635 @shivaram Thanks for the comments! As far as adding a strongly typed protected API later on, we might be able to, but I don't see it as a high priority, especially since the DataFrame changes are making it easier to work with schema. What are your thoughts on it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Disabling Utils.chmod700 for Windows
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4299#issuecomment-72333666 (This needs a JIRA) Did this work in 1.2.0? the question will be whether it's a regression or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r23891605 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/Regressor.scala --- @@ -0,0 +1,72 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.regression + +import org.apache.spark.annotation.{DeveloperApi, AlphaComponent} +import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor, PredictorParams} + +/** + * :: DeveloperApi :: + * Params for regression. + * Currently empty, but may add functionality later. + */ +@DeveloperApi +trait RegressorParams extends PredictorParams + +/** + * :: AlphaComponent :: + * + * Single-label regression + * + * @tparam FeaturesType Type of input features. E.g., [[org.apache.spark.mllib.linalg.Vector]] + * @tparam Learner Concrete Estimator type + * @tparam M Concrete Model type + */ +@AlphaComponent +abstract class Regressor[ +FeaturesType, +Learner : Regressor[FeaturesType, Learner, M], --- End diff -- E is fine by me. And we could change FeaturesType, but that seems pretty descriptive, so I'm okay with leaving that in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-72335560 I guess the more usable DataFrame does make it better. Agree that we can revisit this after we write a few more estimators and see how they look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Disabling Utils.chmod700 for Windows
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/4299#issuecomment-72335760 I didn't try 1.2.1, but in 1.2.0, I didn't met such problems in windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72335894 [Test build #26467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26467/consoleFull) for PR 4294 at commit [`d37b19c`](https://github.com/apache/spark/commit/d37b19c3d1891d077205772c97534fdff04830a1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72337568 [Test build #26467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26467/consoleFull) for PR 4294 at commit [`d37b19c`](https://github.com/apache/spark/commit/d37b19c3d1891d077205772c97534fdff04830a1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `protected[sql] class DDLException(message: String) extends Exception(message)` * `trait TableScan extends BaseRelation ` * `trait PrunedScan extends BaseRelation ` * `trait PrunedFilteredScan extends BaseRelation ` * `trait CatalystScan extends BaseRelation ` * `trait InsertableRelation extends BaseRelation ` * `case class CreateMetastoreDataSourceAsSelect(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72337572 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26467/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72337823 [Test build #26468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26468/consoleFull) for PR 4294 at commit [`95a7c71`](https://github.com/apache/spark/commit/95a7c716b8c6864fb4e0c73ec01df854b8475e9f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Disabling Utils.chmod700 for Windows
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4299#discussion_r23892404 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -255,12 +255,17 @@ private[spark] object Utils extends Logging { * @return true if the permissions were successfully changed, false otherwise. */ def chmod700(file: File): Boolean = { -file.setReadable(false, false) -file.setReadable(true, true) -file.setWritable(false, false) -file.setWritable(true, true) -file.setExecutable(false, false) -file.setExecutable(true, true) +if (!isWindows) { + file.setReadable(false, false) + file.setReadable(true, true) + file.setWritable(false, false) + file.setWritable(true, true) + file.setExecutable(false, false) + file.setExecutable(true, true) +} else { + // this logic does not work for Windows --- End diff -- Doesn't this comment belong after `if (!isWindows)`, and not in the `else` clause that presumably does work for Windows? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5422] Add support for sending Graphite ...
Github user hammer commented on the pull request: https://github.com/apache/spark/pull/4218#issuecomment-72352426 @pwendell @rxin we'd love to get this patch into 1.3 to improve the scalability of metrics collection. it's a pretty straightforward patch. Could you guys give it some Databricks love this week? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4298#discussion_r23895621 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala --- @@ -139,12 +139,13 @@ class DataFrame protected[sql]( * val rdd: RDD[(Int, String)] = ... * rdd.toDataFrame // this implicit conversion creates a DataFrame with column name _1 and _2 * rdd.toDataFrame(id, name) // this creates a DataFrame with column name id and name + * rdd.toDataFrame(id) // this creates a DataFrame with only column name id * }}} */ @scala.annotation.varargs def toDataFrame(colName: String, colNames: String*): DataFrame = { val newNames = colName +: colNames -require(schema.size == newNames.size, +require(schema.size = newNames.size, --- End diff -- I'm worried that silently dropping fields here can cause trouble without the user knowing. Can you revert this line? If we see a lot of requests to make it less strict, we can always relax it in the future. But we cannot go the other way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/4294#discussion_r23892984 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -145,6 +145,11 @@ abstract class PrunedFilteredScan extends BaseRelation { * for experimentation. */ @Experimental -abstract class CatalystScan extends BaseRelation { +trait CatalystScan extends BaseRelation { def buildScan(requiredColumns: Seq[Attribute], filters: Seq[Expression]): RDD[Row] } + +@DeveloperApi +trait InsertableRelation extends BaseRelation { + def insertInto(data: DataFrame, overwrite: Boolean): Unit --- End diff -- The method name looks a bit weird when you put them together: `relation.insertInto(dataFrame)`, seems that we are inserting a relation into a data frame... Maybe just `insert`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5422] Add support for sending Graphite ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4218 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5413] bump metrics dependency to v3.1.0
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4209#issuecomment-72355239 Actually I merged https://github.com/apache/spark/pull/4218 which contains this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5422] Add support for sending Graphite ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4218#issuecomment-72355227 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72340638 [Test build #26468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26468/consoleFull) for PR 4294 at commit [`95a7c71`](https://github.com/apache/spark/commit/95a7c716b8c6864fb4e0c73ec01df854b8475e9f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val elem = sarray (class $` * `val elem = sexternalizable object (class $` * `val elem = sobject (class $` * ` implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal ` * `class IsotonicRegressionModel (` * `protected[sql] class DDLException(message: String) extends Exception(message)` * `trait TableScan extends BaseRelation ` * `trait PrunedScan extends BaseRelation ` * `trait PrunedFilteredScan extends BaseRelation ` * `trait CatalystScan extends BaseRelation ` * `trait InsertableRelation extends BaseRelation ` * `case class CreateMetastoreDataSourceAsSelect(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72346268 [Test build #26469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26469/consoleFull) for PR 4294 at commit [`1a719a5`](https://github.com/apache/spark/commit/1a719a54edbf891f80ea2283ac17a8dd5482e81e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72349756 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26470/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72349753 [Test build #26470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26470/consoleFull) for PR 3976 at commit [`47d2fc3`](https://github.com/apache/spark/commit/47d2fc35e53a8851790607085bc67e94736358d6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72340640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26468/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3778] newAPIHadoopRDD doesn't properly ...
Github user harishreedharan commented on a diff in the pull request: https://github.com/apache/spark/pull/4292#discussion_r23893987 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -820,7 +822,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli kClass: Class[K], vClass: Class[V]): RDD[(K, V)] = { assertNotStopped() -new NewHadoopRDD(this, fClass, kClass, vClass, conf) +// Add necessary security credentials to the JobConf. Required to access secure HDFS. +val jconf = new JobConf(conf) +SparkHadoopUtil.get.addCredentials(jconf) --- End diff -- Since security is supported only in Yarn mode, this should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72348214 [Test build #26469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26469/consoleFull) for PR 4294 at commit [`1a719a5`](https://github.com/apache/spark/commit/1a719a54edbf891f80ea2283ac17a8dd5482e81e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `val elem = sarray (class $` * `val elem = sexternalizable object (class $` * `val elem = sobject (class $` * ` implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal ` * `class IsotonicRegressionModel (` * `protected[sql] class DDLException(message: String) extends Exception(message)` * `trait TableScan extends BaseRelation ` * `trait PrunedScan extends BaseRelation ` * `trait PrunedFilteredScan extends BaseRelation ` * `trait CatalystScan extends BaseRelation ` * `trait InsertableRelation extends BaseRelation ` * `case class CreateMetastoreDataSourceAsSelect(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-5501][SQL] Write support for the d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4294#issuecomment-72348215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26469/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4382] Add locations parameter to Twitte...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/3246#issuecomment-72351744 @srowen Does @tdas have time to look at this? If no, maybe others can? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5413] bump metrics dependency to v3.1.0
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4209#issuecomment-72355173 @andrewor14 are you going to merge this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72348373 [Test build #26470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26470/consoleFull) for PR 3976 at commit [`47d2fc3`](https://github.com/apache/spark/commit/47d2fc35e53a8851790607085bc67e94736358d6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4047#issuecomment-72348800 *Update on tests* Summary: * On a small dataset (20 newsgroups), it seems to work fine (on my laptop). * On a big dataset (Wikipedia dump with close to 1 billion tokens), it's been hard to get it to run for more than 10 or 20 iterations (on a 16-node EC2 cluster). Details: Small dataset: You can see the output here: [https://github.com/jkbradley/spark/blob/lda-tmp/20news.lda.out]. The log likelihood improves with each iteration, and iteration running times stay about the same throughout training. The topics are really nicely divided among the newsgroups. (But I did run this using 20 topics.) I used 100 iterations and the stopwords mentioned above. Large dataset: Even with checkpointing, it has been hard to run for many iterations, mainly because of shuffle files and checkpoint files building up. I need to spend some more time running tests. Currently, the results on the Wikipedia dump do not look good; topics are pretty much all the same. It is unclear if this is because of poor convergence, a need for parameter tuning, a need for supporting sparsity as mentioned above (which might help to force topics to differentiate), or a need for better initialization (since EM can have lots of trouble with LDA's many local minima). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user lianhuiwang closed the pull request at: https://github.com/apache/spark/pull/3976 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
GitHub user lianhuiwang reopened a pull request: https://github.com/apache/spark/pull/3976 [SPARK-5173]support python application running on yarn cluster mode now when we run python application on yarn cluster mode through spark-submit, spark-submit does not support python application on yarn cluster mode. so i modify code of submit and yarn's AM in order to support it. through specifying .py file or primaryResource file via spark-submit, we can make pyspark run in yarn-cluster mode. example:spark-submit --master yarn-master --num-executors 1 --driver-memory 1g --executor-memory 1g xx.py --primaryResource yy.conf this config is same as pyspark on yarn-client mode. firstly,we put local path of .py or primaryResource to yarn's dist.files.that can be distributed on slave nodes.and then in spark-submit we transfer --py-files and --primaryResource to yarn.Client and use org.apache.spark.deploy.PythonRunner to user class that can run .py files on ApplicationMaster. in yarn.Client we transfer --py-files and --primaryResource to ApplicationMaster. in ApplicationMaster, user's class is org.apache.spark.deploy.PythonRunner, and user's args is primaryResource and -py-files. so that can make pyspark run on ApplicationMaster. @JoshRosen @tgravescs @sryza You can merge this pull request into a Git repository by running: $ git pull https://github.com/lianhuiwang/spark SPARK-5173 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3976.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3976 commit 9c941bc59527e594ee1d155c00cb8e55d7c40fe8 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-09T12:58:24Z support python application running on yarn cluster mode commit 172eec10b9daaf9ed838e821474d28871ab63462 Author: Wang Lianhui lianhuiwan...@gmail.com Date: 2015-01-09T15:01:52Z fix a min submit's bug commit f1f55b6eb4b65499be8e182e857d89a158873234 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-29T11:13:35Z when yarn-cluster, all python files can be non-local commit 905a10610532578c774e58d12b927597330fb9ff Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-31T03:29:09Z update with sryza and andrewor 's comments commit 097a5ec37456bf9d13a952f4108a750b9f9f84d0 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-31T03:59:06Z fix line length exceeds 100 commit 5b300648fe53d9de604e8afce7580fddfe6bbaef Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-31T12:18:22Z add test commit d60bc6069cf65637622472ef1cd2715df53c Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-31T14:07:03Z fix test commit 2adc8f591ddd0f253496c18d32b1910d29e04c8d Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-01-31T16:35:01Z add spark.test.home commit 47d2fc35e53a8851790607085bc67e94736358d6 Author: lianhuiwang lianhuiwan...@gmail.com Date: 2015-02-01T02:40:25Z fix test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4233#discussion_r23888056 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -68,6 +79,65 @@ class LogisticRegressionModel ( case None = score } } + + override def save(sc: SparkContext, path: String): Unit = { +val sqlContext = new SQLContext(sc) +import sqlContext._ --- End diff -- I tried that, and it fails to compile. (I tried removed sqlContext from save(), as well as having it be an implicit val without the import. Neither worked.) Is there another import I need? ``` [error] /Users/josephkb/spark/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala:89: type mismatch; [error] found : org.apache.spark.rdd.RDD[org.apache.spark.mllib.classification.LogisticRegressionModel.Metadata] [error] required: org.apache.spark.sql.DataFrame [error] Error occurred in an application involving default arguments. [error] val metadataRDD: DataFrame = sc.parallelize(Seq(metadata)) [error]^ [error] /Users/josephkb/spark/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala:94: type mismatch; [error] found : org.apache.spark.rdd.RDD[org.apache.spark.mllib.classification.LogisticRegressionModel.Data] [error] required: org.apache.spark.sql.DataFrame [error] Error occurred in an application involving default arguments. [error] val dataRDD: DataFrame = sc.parallelize(Seq(data)) [error]^ [error] two errors found [error] (mllib/compile:compile) Compilation failed ``` It may not be an issue though, with the other change you suggested below. I'll see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4233#issuecomment-72313229 [Test build #26459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26459/consoleFull) for PR 4233 at commit [`638fa81`](https://github.com/apache/spark/commit/638fa81e40ef64558a6de70b4ad669fb63c46ada). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait Exportable ` * `trait Importable[Model : Exportable] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4233#issuecomment-72313232 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26459/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4587] [mllib] ML model import/exp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4233#issuecomment-72311863 [Test build #26459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26459/consoleFull) for PR 4233 at commit [`638fa81`](https://github.com/apache/spark/commit/638fa81e40ef64558a6de70b4ad669fb63c46ada). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72313729 [Test build #26460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26460/consoleFull) for PR 4193 at commit [`6ffb27f`](https://github.com/apache/spark/commit/6ffb27fde0224f08c7679d9e645ae1cbca1b1e98). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Refactor DataFrame related codes
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/4298 [Minor][SQL] Refactor DataFrame related codes Simplify some codes related to DataFrame. * Calling `toAttributes` instead of a `map`. * Original `createDataFrame` creates the `StructType` and its attributes in a redundant way. Refactored it to create `StructType` and call `toAttributes` on it directly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 refactor_df Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4298.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4298 commit 2c9f370b312d241e6c72abe812614ac1590a032f Author: Liang-Chi Hsieh vii...@gmail.com Date: 2015-01-31T11:58:36Z Just refactor DataFrame codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72315407 [Test build #26460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26460/consoleFull) for PR 4193 at commit [`6ffb27f`](https://github.com/apache/spark/commit/6ffb27fde0224f08c7679d9e645ae1cbca1b1e98). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72315410 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26460/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][SQL] Little refactor DataFrame related...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4298#issuecomment-72315508 [Test build #26461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26461/consoleFull) for PR 4298 at commit [`2c9f370`](https://github.com/apache/spark/commit/2c9f370b312d241e6c72abe812614ac1590a032f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5212][SQL] Add support of schema-less, ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/4014#issuecomment-72315535 @rxin I have added the explanation for this feature. Would you have time to review this pr and see if it is ok to merge? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72315747 [Test build #26462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26462/consoleFull) for PR 3976 at commit [`5b30064`](https://github.com/apache/spark/commit/5b300648fe53d9de604e8afce7580fddfe6bbaef). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5173]support python application running...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23888565 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -430,6 +430,10 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, private def startUserClass(): Thread = { logInfo(Starting the user JAR in a separate Thread) --- End diff -- https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L451 @andrewor14 here is not outdated. this is same to master's code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4193#issuecomment-72316297 [Test build #26463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26463/consoleFull) for PR 4193 at commit [`5b33f66`](https://github.com/apache/spark/commit/5b33f66424155bc4d3d58315a4ef2168177b802f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org