[GitHub] spark issue #14342: [SPARK-16685] Remove audit-release scripts.
Github user pwendell commented on the issue: https://github.com/apache/spark/pull/14342 LGTM - I added these and I think they are dead code right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate dependencies in a file...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/10461#discussion_r48445168 --- Diff: dev/deps/spark-deps-hadoop-2.4 --- @@ -0,0 +1,185 @@ +JavaEWAH-0.3.2.jar --- End diff -- yes these are automatically generated, so it's not a huge maintenance cost. I think having them int he repo is good so people can have a definitive reference for what dependencies exist in which package of spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11808] Remove Bagel.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/10395#issuecomment-166069156 LGTM (I downloaded your PR and did some grepping to make sure there are no references). One other thing that occured to me is someone could easily create a package with this if they want to continue using in in Spark 2.0+, or just copy-paste the source code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update branch-1.6 for 1.6.0 release
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/10317#issuecomment-164923664 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12101][Core]Fix thread pools that canno...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/10108#issuecomment-161419476 BTW @srowen, some protocol for announcing these is probably a good idea to avoid races. I think we haven't suffered from races in the past, but mostly out of luck. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix thread pools that cannot cache tasks in Wo...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/10108#issuecomment-161418651 Hey @marmbrus is actually managing the RC - it just has my name on it because some automated tooling uses my account. Ping @marmbrus. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3580][CORE] Add Consistent Method To Ge...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9767#issuecomment-161203603 Yeah I think it's fine to pull in - but do it quickly because an RC will go out very soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11903] Remove --skip-java-test
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9924#issuecomment-159118994 Jenkins, retest this please. This LGTM - I think it's good to simply remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1297#issuecomment-158851260 @josephlijia this feature has moved into a Spark package. If you want to file an issue report it's best to do it here: https://github.com/amplab/spark-indexedrdd --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11732] Removes some MiMa false positive...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9697#issuecomment-157469133 Yep, this LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11081] Shade Jersey and javax.rs.ws
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9615#issuecomment-155962272 Regarding testing, the best way is to inspect the contents of the jar to make sure that the shaded version is inlined. If Spark code uses the shaded dependency directly, you can also use `javap` to inspect the byte code and make sure that the references are to the shaded versions of jersey rather than the real one. @mccheah given the comments by @vanzin, can you say more about the specific incompatibility you are facing? It would be good to make sure that if we shade something it's due to a known incompatibility between some set of versions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11081] Shade Jersey and javax.rs.ws
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9615#discussion_r44503873 --- Diff: pom.xml --- @@ -2165,6 +2166,9 @@ org.eclipse.jetty:jetty-security org.eclipse.jetty:jetty-util org.eclipse.jetty:jetty-server + com.sun.jersey:jersey-core + com.sun.jersey:jersey-json + com.sun.jersey:jersey-server --- End diff -- Hey @mccheah - if you look at jetty-server as an example, there are other build changes related to shading that you haven't done here: 1. In the root pom, they should be marked as provided. 2. In core/pom.xml they need to be added in includeArtifactIds. 3. They may also need to be listed in other poms as well due to some compiler bugs. I'd just go through and look everywhere in the source code you see `jetty-server` and do the same for these 3 artifacts. If it's still not working then, let me know and I can take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9818] Re-enable Docker tests for JDBC d...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9503#issuecomment-155341829 I only reviewed the build changes, but they look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7841][BUILD] Stop using retrieveManaged...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9575#issuecomment-155341204 It's hard for me to rule out that there is _no_ other reason lib_managed is used at present. I audited all the uses of it I could find in the codebase and it appears they all relate to the DataNucleus jars. So LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6152] Use shaded ASM5 to support closur...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9512#issuecomment-155334662 This looks fine (i.e. LGTM). However, we could also look into actually shading asm ourselves in our published artifacts, similar to how we now shade jetty and other things. I'm fine to just use this already shaded one too though. Seems to do no harm and will allow us to work with Java 8. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r44055168 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.dstream + +import scala.reflect.ClassTag + +import org.apache.spark._ +import org.apache.spark.rdd.{EmptyRDD, RDD} +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming._ +import org.apache.spark.streaming.rdd.{TrackStateRDD, TrackStateRDDRecord} + + +abstract class EmittedRecordsDStream[K: ClassTag, V: ClassTag, S: ClassTag, T: ClassTag]( +ssc: StreamingContext) extends DStream[T](ssc) { + + def stateSnapshots(): DStream[(K, S)] --- End diff -- snapshotStream? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11440][CORE][STREAMING][BUILD] Declare ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9396#issuecomment-153978879 Sean - thanks for doing this. I think these are all reasonable to promote... IMO if it's been around for this many versions, we can't drop them anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9256#issuecomment-153975680 I took a broad pass on the public API's and left comments throughout, mostly around naming rather than core structure. What would really be helpful for me is filling in the documentation on the key public classes `State`, `StateSpec`, `trackStateByKey`, and `stateSnapshots()` to make sure I fully understand their semantics. I can take a pass again once that is done, but the high level approach seems good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43982459 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala --- @@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)]) ) } + /** TODO: Add scala docs */ + def trackStateByKey[S: ClassTag, T: ClassTag]( +spec: TrackStateSpec[K, V, S, T]): EmittedRecordsDStream[K, V, S, T] = { +new EmittedRecordsDStreamImpl[K, V, S, T]( + new TrackStateDStream[K, V, S, T]( +self, +spec.asInstanceOf[TrackStateSpecImpl[K, V, S, T]] --- End diff -- Seems like this is what we did with DataFrameReader which follows a similar pattern: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L47 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43982371 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala --- @@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)]) ) } + /** TODO: Add scala docs */ + def trackStateByKey[S: ClassTag, T: ClassTag]( +spec: TrackStateSpec[K, V, S, T]): EmittedRecordsDStream[K, V, S, T] = { +new EmittedRecordsDStreamImpl[K, V, S, T]( + new TrackStateDStream[K, V, S, T]( +self, +spec.asInstanceOf[TrackStateSpecImpl[K, V, S, T]] --- End diff -- It doesn't seem like a big deal if the getters are public. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43982348 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala --- @@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)]) ) } + /** TODO: Add scala docs */ + def trackStateByKey[S: ClassTag, T: ClassTag]( +spec: TrackStateSpec[K, V, S, T]): EmittedRecordsDStream[K, V, S, T] = { +new EmittedRecordsDStreamImpl[K, V, S, T]( + new TrackStateDStream[K, V, S, T]( +self, +spec.asInstanceOf[TrackStateSpecImpl[K, V, S, T]] --- End diff -- This cast is a little weird. Can you just have a single class `TrackStateSpec` that has both getters and setters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43982032 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/TrackStateSpec.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming + +import scala.reflect.ClassTag + +import org.apache.spark.{HashPartitioner, Partitioner} +import org.apache.spark.api.java.JavaPairRDD +import org.apache.spark.rdd.RDD + + +/** + * Abstract class having all the specifications of DStream.trackStateByKey(). + * Use the `TrackStateSpec.create()` or `TrackStateSpec.create()` to create instances of this class. + * + * {{{ + *TrackStateSpec(trackingFunction)// in Scala + *TrackStateSpec.create(trackingFunction) // in Java + * }}} + */ +sealed abstract class TrackStateSpec[K: ClassTag, V: ClassTag, S: ClassTag, T: ClassTag] + extends Serializable { + + def initialState(rdd: RDD[(K, S)]): this.type + def initialState(javaPairRDD: JavaPairRDD[K, S]): this.type + + def numPartitions(numPartitions: Int): this.type + def partitioner(partitioner: Partitioner): this.type + + def timeout(interval: Duration): this.type --- End diff -- A doc here to precisely define timeouts would be really helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43981921 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming + +/** + * Abstract class for getting and updating the tracked state in the `trackStateByKey` operation of + * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair DStream]] and + * [[org.apache.spark.streaming.api.java.JavaPairDStream]]. + * {{{ + * + * }}} + */ +sealed abstract class State[S] { + + /** Whether the state already exists */ + def exists(): Boolean + + /** + * Get the state if it exists, otherwise wise it will throw an exception. + * Check with `exists()` whether the state exists or not before calling `get()`. + */ + def get(): S + + /** + * Update the state with a new value. Note that you cannot update the state if the state is + * timing out (that is, `isTimingOut() return true`, or if the state has already been removed by + * `remove()`. + */ + def update(newState: S): Unit + + /** Remove the state if it exists. */ + def remove(): Unit --- End diff -- BTW I have no strong feeling here other than that it match existing things, if we have them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43981909 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.dstream + +import scala.reflect.ClassTag + +import org.apache.spark._ +import org.apache.spark.rdd.{EmptyRDD, RDD} +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming._ +import org.apache.spark.streaming.rdd.{TrackStateRDD, TrackStateRDDRecord} + + +abstract class EmittedRecordsDStream[K: ClassTag, V: ClassTag, S: ClassTag, T: ClassTag]( --- End diff -- I see - i guess it depends what you define as state. I think of `S` as "stored state" and `T` as "emitted state". Maybe that's off? I think `EmittedDStream` could be okay, it's a bit awkward but I think still better than adding a new term. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43981603 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming + +/** + * Abstract class for getting and updating the tracked state in the `trackStateByKey` operation of + * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair DStream]] and + * [[org.apache.spark.streaming.api.java.JavaPairDStream]]. + * {{{ + * + * }}} + */ +sealed abstract class State[S] { + + /** Whether the state already exists */ + def exists(): Boolean + + /** + * Get the state if it exists, otherwise wise it will throw an exception. + * Check with `exists()` whether the state exists or not before calling `get()`. + */ + def get(): S + + /** + * Update the state with a new value. Note that you cannot update the state if the state is + * timing out (that is, `isTimingOut() return true`, or if the state has already been removed by + * `remove()`. + */ + def update(newState: S): Unit --- End diff -- minor nit: but prefer `def update(state: S): Unit` since it's implied by the semantics of update that it is new. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43981556 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/State.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming + +/** + * Abstract class for getting and updating the tracked state in the `trackStateByKey` operation of + * [[org.apache.spark.streaming.dstream.PairDStreamFunctions pair DStream]] and + * [[org.apache.spark.streaming.api.java.JavaPairDStream]]. + * {{{ + * + * }}} + */ +sealed abstract class State[S] { + + /** Whether the state already exists */ + def exists(): Boolean + + /** + * Get the state if it exists, otherwise wise it will throw an exception. + * Check with `exists()` whether the state exists or not before calling `get()`. + */ + def get(): S + + /** + * Update the state with a new value. Note that you cannot update the state if the state is + * timing out (that is, `isTimingOut() return true`, or if the state has already been removed by + * `remove()`. + */ + def update(newState: S): Unit + + /** Remove the state if it exists. */ + def remove(): Unit --- End diff -- Should this be `delete` or `destroy`? Not sure if we have used similar terminology elsewhere. Also it would be good to state the semantics of calling this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43980731 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/EmittedRecordsDStream.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.dstream + +import scala.reflect.ClassTag + +import org.apache.spark._ +import org.apache.spark.rdd.{EmptyRDD, RDD} +import org.apache.spark.storage.StorageLevel +import org.apache.spark.streaming._ +import org.apache.spark.streaming.rdd.{TrackStateRDD, TrackStateRDDRecord} + + +abstract class EmittedRecordsDStream[K: ClassTag, V: ClassTag, S: ClassTag, T: ClassTag]( --- End diff -- could this be called `EmittedStateDStream`? I don't think the term "Record" has clear semantics here. It might be good to tie it back to the terms already defined (i.e. State). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43980502 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala --- @@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)]) ) } + /** TODO: Add scala docs */ + def trackStateByKey[S: ClassTag, T: ClassTag]( --- End diff -- Especially if there can be a doc that describes (K, V, S, T) and what their semantics are. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43980438 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/TrackStateSpec.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming + +import scala.reflect.ClassTag + +import org.apache.spark.{HashPartitioner, Partitioner} +import org.apache.spark.api.java.JavaPairRDD +import org.apache.spark.rdd.RDD + + +/** + * Abstract class having all the specifications of DStream.trackStateByKey(). + * Use the `TrackStateSpec.create()` or `TrackStateSpec.create()` to create instances of this class. + * + * {{{ + *TrackStateSpec(trackingFunction)// in Scala + *TrackStateSpec.create(trackingFunction) // in Java + * }}} + */ +sealed abstract class TrackStateSpec[K: ClassTag, V: ClassTag, S: ClassTag, T: ClassTag] --- End diff -- I would prefer to just call this `StateSpec`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2629][STREAMING] Basic implementation o...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/9256#discussion_r43980418 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala --- @@ -350,6 +349,18 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)]) ) } + /** TODO: Add scala docs */ + def trackStateByKey[S: ClassTag, T: ClassTag]( --- End diff -- Can you add the docs? Would make it easier to review the public API's here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-153469556 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-153259490 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-153252357 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-153252094 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9926] [SPARK-10340] [SQL] Use S3 bulk l...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/8512#discussion_r43717724 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkS3Util.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.net.URI +import java.util + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import com.amazonaws.{AmazonClientException, AmazonServiceException, ClientConfiguration, Protocol} +import com.amazonaws.auth.{AWSCredentialsProvider, BasicAWSCredentials, InstanceProfileCredentialsProvider, STSAssumeRoleSessionCredentialsProvider} +import com.amazonaws.internal.StaticCredentialsProvider +import com.amazonaws.services.s3.AmazonS3Client +import com.amazonaws.services.s3.model.{ListObjectsRequest, ObjectListing, S3ObjectSummary} + +import com.google.common.annotations.VisibleForTesting +import com.google.common.base.{Preconditions, Strings} +import com.google.common.cache.{Cache, CacheBuilder} +import com.google.common.collect.AbstractSequentialIterator + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileStatus, GlobPattern, Path, PathFilter} +import org.apache.hadoop.fs.s3.S3Credentials +import org.apache.hadoop.io.compress.{CompressionCodecFactory, SplittableCompressionCodec} +import org.apache.hadoop.mapred.{FileInputFormat, FileSplit, InputSplit, JobConf} + +import org.apache.spark.{Logging, SparkEnv} +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.util.Utils + +/** + * :: DeveloperApi :: + * Contains util methods to interact with S3 from Spark. + */ +@DeveloperApi +object SparkS3Util extends Logging { --- End diff -- Shouldn't this just be private[spark]? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9926] [SPARK-10340] [SQL] Use S3 bulk l...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/8512#discussion_r43717704 --- Diff: core/pom.xml --- @@ -40,6 +40,11 @@ ${avro.mapred.classifier} + com.amazonaws + aws-java-sdk + ${aws.java.sdk.version} + + --- End diff -- I took a quick look and unfortunately this has more than 50 transitive dependencies (jackson, joda time, apache http client) that are likely to cause conflicts. I don't think we can merge this until we look into this more deeply. Can we use a more narrow version, for instance only the s3 sdk? Even then we'll still have many potential conflicts but it would at least reduce the amount of auditing we need to do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [test-maven][test-hadoop1.0][SPARK-11236][CORE...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9395#issuecomment-153208299 Got it- thanks I can merge it then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [test-maven][test-hadoop1.0][SPARK-11236][CORE...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9395#issuecomment-153203911 Hey @calvinjia seems okay to merge since this is just triggering other failures. Can you explain more though how this works around the mima issue - the patch seems the same as #9204. Did something change between Tachyon 0.8.0 and 0.8.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [test-maven][test-hadoop1.0][SPARK-11236][CORE...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9395#issuecomment-152929052 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [test-hadoop1.0][SPARK-11236][CORE] Update Tac...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9395#issuecomment-152923143 Hm - can you try also adding [test-maven] - might be better to test with maven. On Sun, Nov 1, 2015 at 9:40 PM, Calvin Jia wrote: > @yhuai <https://github.com/yhuai> Thanks for the retest. I'm not sure if > this will go away by re-running or if there is something up with > Jenkins/Spark master branch. It seems like the current Spark-Master-SBT > build is not happy on hadoop1.0 for the same reason. (See: > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3911/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/consoleFull > ) > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/9395#issuecomment-152919342>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [test-hadoop1.0][SPARK-11236][CORE] Update Tac...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9395#issuecomment-152892982 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11236][CORE] Update Tachyon dependency ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9395#issuecomment-152891750 @calvinjia can you add "[test-hadoop1.0]" to the title of this PR and then retest it? That will run the tests with hadoop 1. See more info here: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11236][CORE] Update Tachyon dependency ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9204#issuecomment-152415593 @haoyuan hey HY - can you not merge build related patches without asking for feedback from one of the build maintainers (me or @srowen)? This patch makes changes to Spark's dependency graph that need to be audited carefully because they affect all users. There is discussion of the maintainer/review process here: https://cwiki.apache.org/confluence/display/SPARK/Committers#Committers-ReviewProcessandMaintainers I did a post hoc review and it appears this does not change the contents of the assembly jar. So I think it is okay. Separately, it would be good to spin tachyon support out into a package so these changes do not need to go through the upstream review process. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2533 - Add locality levels (with tasks c...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9117#issuecomment-149127284 Yeah this was my thought - could we provide a summary in the stage page rather than in the stage index.I do see how an aggregated summary is significantly more useful than just mentally aggregating based on the task table. Doing this on the stage page would avoid adding more columns to the index view, and this column could get complicated if stages have many locality levels in play. It might be nice to see an alternative patch that does that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2533 - Add locality levels (with tasks c...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9117#issuecomment-149125876 ping @kayousterhout to me it's the normal issue - sure it's useful for some cases, but is it worth putting in the index page? It could be better to just put a locality summary on the stage page itself, if one doesn't already exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11110] [Build] Remove transient annotat...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9126#issuecomment-148231844 ping @marmbrus for any thoughts. But I think removing them makes sense. If someone makes it a val later they will have to reason about whether it should be transient or not as we would for any new field. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1537 publisher-side code and tests
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/8744#discussion_r41951480 --- Diff: yarn/pom.xml --- @@ -164,6 +164,92 @@ + + --- End diff -- Can this say "The YARN application server..." there is already a different component in Spark called history server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10930] Adds max task duration to all st...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/9051#issuecomment-147493632 I see the underlying problem posed in the JIRA - it's difficult to assess duration since it currently includes the time spent waiting on dependent stages. However, this patch doesn't seem like the obvious way to fix that. I think there are some alternatives that would make more sense: 1. Re-define duration so that it's only defined starting when the first task in a stage launches (some concerns here about changing semantics, though). 2. Add a new field that represents the time spent servicing the stage "service time" (?) 3. Add a new field that represents the time spent queuing before any tasks launched "queue time" (?) Those all seem better ways to address the issue in the JIRA. This way of showing the max task time, it seems indirect. And also not always helpful since max task time doesn't have a simple relationship with "duration" as desired here... for instance the max task could be pretty short but the duration is anyways really long for the stage. /cc @rxin @kayousterhout for any thoughts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8931#issuecomment-146293403 SGTM. On Wed, Oct 7, 2015 at 11:55 AM, Reynold Xin wrote: > (and data size should include all the data, including spilled) > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/8931#issuecomment-146292851>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8931#issuecomment-146289643 BTW - one alternative would be to create an accumulator that tracks max, min, median, and total and then have it display nicely in two lines. For instance: ``` memory total (min,med,max): 10GB (1MB,100MB,1GB) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8931#issuecomment-146288685 The reason I like accumulated memory is that it's something that should be roughly constant over multiple runs of a workload so people can get a sense of how much data they are buffering during execution. The max and median will depend a lot on how tasks are scheduled, etc, so they don't give someone a great idea of how they can change their query or data to get memory under control. It's just how in hadoop you can see the total input size for a job. These totals are often really helpful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8931#issuecomment-146285580 I think we want to have concise and consistent names here across all the metrics. Here is my proposal for naming: ``` input rows output rows spilled data memory task memory (max) task spilled data (max) ``` I think the word "peak" is not necessary because I assume your report the peak memory over the lifetime of a task. I think the word "total" is not necessary because these are accumulated values and can be assumed to be total unless otherwise stated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/8931#discussion_r41426589 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala --- @@ -69,6 +72,9 @@ case class TungstenAggregate( protected override def doExecute(): RDD[InternalRow] = attachTree(this, "execute") { val numInputRows = longMetric("numInputRows") val numOutputRows = longMetric("numOutputRows") +val totalPeakMemory = longMetric("totalPeakMemory") --- End diff -- Also it's weird to say "total" in some places but not in others. For instance, records in and out are also totals, but it doesn't say "total" in those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/8931#discussion_r41426412 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala --- @@ -69,6 +72,9 @@ case class TungstenAggregate( protected override def doExecute(): RDD[InternalRow] = attachTree(this, "execute") { val numInputRows = longMetric("numInputRows") val numOutputRows = longMetric("numOutputRows") +val totalPeakMemory = longMetric("totalPeakMemory") --- End diff -- Hey can you give a more precise definition here of what this means? I think the word "Peak" is throwing me off and maybe we could delete it, if you say "memory" I will assume you mean the maximum amount of memory a task is using over its lifetime. I think on this one it might be best to just discuss it briefly in person. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10833] [BUILD] Inline, organize BSD/MIT...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8919#issuecomment-143837144 Hey Sean - looks good to me, but can't claim to be nearly as deep as you are on this stuff! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update branch-1.5 for 1.5.1 release.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8890#issuecomment-142783383 @rxin I think you should upgrade R/pkg/DESCRIPTION - otherwise LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8350#issuecomment-140562180 No problem - I should have caught it earlier. Hopefully people didn't spin too many cycles on this. On Tue, Sep 15, 2015 at 12:56 AM, Sean Owen wrote: > I think that's reasonable since the only failure mode this patch should > cause is MiMa failure and that passed now. Thanks @pwendell > <https://github.com/pwendell> for reminding me about the previousVersion > thing, had totally overlooked how that worked. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/8350#issuecomment-140312928>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10511][BUILD] Reset git repository befo...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8774#issuecomment-140508440 This is perfect, thanks. LGTM. Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10300] [build] [tests] Add support for ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8437#issuecomment-140288092 LGTM - I've modified that script recently enough to be familiar with how it works. This seems like a good approach and could be useful for us in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8350#issuecomment-140286995 Hm - this is a test file that is in the source folder, so MIMA is complaining about changes. The error message gives you a filter you can add to the MIMA file to ignore it. On Mon, Sep 14, 2015 at 10:29 PM, Apache Spark QA wrote: > Test build #1754 has finished > <https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1754/console> > for PR 8350 at commit ce5b5bb > <https://github.com/apache/spark/commit/ce5b5bbe2efbc92df5105d3072ee103ccc8e3e36> > . > >- This patch *fails MiMa tests*. >- This patch merges cleanly. >- This patch adds the following public classes *(experimental)*: > - class MinMaxScaler(JavaEstimator, HasInputCol, HasOutputCol): > - class MinMaxScalerModel(JavaModel): > - case class Stddev(child: Expression) extends StddevAgg(child) > - case class StddevPop(child: Expression) extends StddevAgg(child) > - case class StddevSamp(child: Expression) extends StddevAgg(child) > - abstract class StddevAgg(child: Expression) extends > AlgebraicAggregate > - abstract class StddevAgg1(child: Expression) extends > UnaryExpression with PartialAggregate1 > - case class Stddev(child: Expression) extends StddevAgg1(child) > - case class StddevPop(child: Expression) extends StddevAgg1(child) > - case class StddevSamp(child: Expression) extends StddevAgg1(child) > - case class ComputePartialStd(child: Expression) extends > UnaryExpression with AggregateExpression1 > - case class ComputePartialStdFunction ( > - case class MergePartialStd( > - case class MergePartialStdFunction( > - case class StddevFunction( > - case class IntersectNode(conf: SQLConf, left: LocalNode, right: > LocalNode) > - case class SampleNode( > - case class TakeOrderedAndProjectNode( > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/8350#issuecomment-140286013>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8350#issuecomment-140231316 I tested it locally, you'll need something like this: ``` version match { +case v if v.startsWith("1.6") => + Seq( +MimaBuild.excludeSparkPackage("deploy"), +// These are needed if checking against the sbt build, since they are part of +// the maven-generated artifacts in 1.3. +excludePackage("org.spark-project.jetty"), +MimaBuild.excludeSparkPackage("unused"), +ProblemFilters.exclude[MissingClassProblem]( + "org.apache.spark.sql.execution.datasources.DefaultSource") + ) + ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update version to 1.6.0-SNAPSHOT.
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8350#issuecomment-140213492 I think if you update previousVersion in MimaBuild.scala many of these should go away. I'm happy to look at the error output after doing that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8570#issuecomment-137221455 Yes - I mean the triangle image used there and in a few other places (such as showing more metrics). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10411][SQL]Move visualization above exp...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8570#issuecomment-137170215 For the details link - can you use the standard drop down icon we use in other places? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-136271740 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-136268566 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8531#issuecomment-136255396 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10359] Enumerate Spark's dependencies i...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/8531 [SPARK-10359] Enumerate Spark's dependencies in a file and diff against it for new pull requests DON'T MERGE ME - TESTING ON JENKINS You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark dependency-audits Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8531.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8531 commit 9cf442df2c04291ae1a7df567658b490eaa46708 Author: Patrick Wendell Date: 2015-08-28T22:43:25Z Adding build test module --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-136250200 Thanks for looking at this @vanzin. I do agree it would be a lot nicer to base things on comments, but because the comment stream isn't available as meta-data on jenkins, that's a huge amount of additional work that IMO is best left to an extension if someone is feeling interested. Given that it seems like you are cool to merge this for now as is, then look at evolving it more later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/7878#discussion_r38283921 --- Diff: dev/run-tests.py --- @@ -227,11 +228,32 @@ def build_spark_documentation(): os.chdir(SPARK_HOME) +def get_zinc_port(): +""" +Get a randomized port on which to start Zinc +""" +return random.randrange(3030, 4030) --- End diff -- This logic is identical to that hard coded in the bash scripts that run the maven builds, in the past they've never failed for this reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135871143 Okay this seems to be passing now - any thoughts @JoshRosen or @vanzin? IMO this would be really nice since we can test build changes with either build, before they make it into spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135870939 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10004] [shuffle] Perform auth checks wh...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8218#issuecomment-135637236 ping @aarondav --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135635256 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135582563 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-135200381 Great - looks good! On Aug 26, 2015 3:53 PM, "Marcelo Vanzin" wrote: > Yay! > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-135199475>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-135144862 K - I just sent a hotfix to up the timeout. On Wed, Aug 26, 2015 at 9:51 AM, Marcelo Vanzin wrote: > I'm working on fixing the root cause of the timeouts (running unnecessary > tests). If you think it would be beneficial to just bump the timeout right > now, please just send a PR for that; I'm pretty confident that this PR does > not make the timeout issue any worse. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-135105812>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-135144791 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-135105365 I still don't understand - why not fix the issue in a separate PR, get this passing and then merge this? It will then benefit the other PR's also that are facing this issue. On Wed, Aug 26, 2015 at 9:46 AM, Marcelo Vanzin wrote: > but this one seems to be timing out with certainty every time > > Often, but not deterministically every time. It seems to time out as often > as any other PR that needs to run all tests (I've already cc'ed you on at > least another one that times out just as often). > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-135104616>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-135103392 Okay then - I would do that separately, get this PR passing, then merge it. It is not good to merge a PR that deterministically fails jenkins. Have we done that in the recent past? I saw a few other PR's that hit an occasional timeout, but this one seems to be timing out with certainty every time. On Wed, Aug 26, 2015 at 9:15 AM, Marcelo Vanzin wrote: > you will need to change the timeout in the code > > Yes but I don't want to do that as part of this change, since they're > unrelated things. All tests have been passing, the timeouts are unrelated > to the PR. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-135082936>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-134867614 Marcelo you will need to change the timeout in the code itself for it to increase from 175 On Aug 25, 2015 11:24 PM, "UCB AMPLab" wrote: > Test FAILed. > Refer to this link for build results (access rights to CI server needed): > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41584/ > Test FAILed. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-134855001>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-134732350 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-134683004 Does this PR increase test time in some way? Just wondering why this would consistently timeout when others don't. On Tue, Aug 25, 2015 at 10:43 AM, Marcelo Vanzin wrote: > 175m is starting to look really low. the scala/java unit tests took 143m > to run. anyway, retest this please. > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-134682606>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6196] [BUILD] Remove MapR profiles in f...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8338#issuecomment-134449642 Yes this LGTM - these are outdated and I don't even think MapR is advising their customers to use these. They are asking people to use hadoop-provided, which was created to simplify using Spark with different versions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-134422382 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-134422378 Yeah sounds good - might be good to let it run one more time just to be sure it's not affecting jenkins somehow. On Mon, Aug 24, 2015 at 4:40 PM, Marcelo Vanzin wrote: > Seems like all tests passed, no idea why jenkins thinks they timed out. > > AFAICT, this is good to do. @pwendell <https://github.com/pwendell> ? > > â > Reply to this email directly or view it on GitHub > <https://github.com/apache/spark/pull/7629#issuecomment-134414895>. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-130156023 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7411#issuecomment-130155218 Okay will merge this - I've been keeping things in a separate repo and it's much better to have it in the upstream in case others want to modify it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7411#issuecomment-129976266 test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7411#issuecomment-129954901 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7726: Add import so Scaladoc doesn't fai...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8095#issuecomment-129950381 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7411#issuecomment-129950164 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7726: Add import so Scaladoc doesn't fai...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/8095#issuecomment-129713711 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7726 Add import so Scaladoc doesn't fail...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/8095 SPARK-7726 Add import so Scaladoc doesn't fail. This is another import needed so Scala 2.11 doc generation doesn't fail. See SPARK-7726 for more detail. I tested this locally and the 2.11 install goes from failing to succeeding with this patch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark scaladoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8095 commit 5eba40b908824faa85ac324365f9c3374bbb1f0f Author: Patrick Wendell Date: 2015-08-11T05:34:53Z SPARK-7726 Add import so Scaladoc doesn't fail. This is another import needed so Scala 2.11 doc generation doesn't fail. See SPARK-7726 for more detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7411#issuecomment-129708771 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/7411#discussion_r36684821 --- Diff: dev/create-release/release-build.sh --- @@ -0,0 +1,320 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +function exit_with_usage { + cat << EOF +usage: release-build.sh +Creates build deliverables from a Spark commit. + +Top level targets are + package: Create binary packages and copy them to people.apache + docs: Build docs and copy them to people.apache + publish-snapshot: Publish snapshot release to Apache snapshots + publish-release: Publish a release to Apache release repo + +All other inputs are environment variables + +GIT_REF - Release tag or commit to build from +SPARK_VERSION - Release identifier used when publishing +SPARK_PACKAGE_VERSION - Release identifier in top level package directory +REMOTE_PARENT_DIR - Parent in which to create doc or release builds. +REMOTE_PARENT_MAX_LENGTH - If set, parent directory will be cleaned to only + have this number of subdirectories (by deleting old ones). WARNING: This deletes data. + +ASF_USERNAME - Username of ASF committer account +ASF_PASSWORD - Password of ASF committer account +ASF_RSA_KEY - RSA private key file for ASF committer account + +GPG_KEY - GPG key used to sign release artifacts +GPG_PASSPHRASE - Passphrase for GPG key +EOF + exit 1 +} + +set -e + +if [ $# -eq 0 ]; then + exit_with_usage +fi + +if [[ $@ == *"help"* ]]; then + exit_with_usage +fi + +for env in ASF_USERNAME ASF_RSA_KEY GPG_PASSPHRASE GPG_KEY; do + if [ -z "${!env}" ]; then +echo "ERROR: $env must be set to run this script" +exit_with_usage + fi +done + +# Commit ref to checkout when building +GIT_REF=${GIT_REF:-master} + +# Destination directory parent on remote server +REMOTE_PARENT_DIR=${REMOTE_PARENT_DIR:-/home/$ASF_USERNAME/public_html} + +SSH="ssh -o StrictHostKeyChecking=no -i $ASF_RSA_KEY" +GPG="gpg --no-tty --batch" +NEXUS_ROOT=https://repository.apache.org/service/local/staging +NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging uploads +BASE_DIR=$(pwd) + +PUBLISH_PROFILES="-Pyarn -Phive -Phadoop-2.2" +PUBLISH_PROFILES="$PUBLISH_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl" + +rm -rf spark +git clone https://git-wip-us.apache.org/repos/asf/spark.git +cd spark +git checkout $GIT_REF +git_hash=`git rev-parse --short HEAD` +echo "Checked out Spark git hash $git_hash" + +if [ -z "$SPARK_VERSION" ]; then + SPARK_VERSION=$(mvn help:evaluate -Dexpression=project.version \ +| grep -v INFO | grep -v WARNING | grep -v Download) +fi + +if [ -z "$SPARK_PACKAGE_VERSION" ]; then + SPARK_PACKAGE_VERSION="${SPARK_VERSION}-$(date +%Y_%m_%d_%H_%M)-${git_hash}" +fi + +DEST_DIR_NAME="spark-$SPARK_PACKAGE_VERSION" +USER_HOST="$asf_usern...@people.apache.org" + +rm .gitignore +rm -rf .git +cd .. + +if [ -n "$REMOTE_PARENT_MAX_LENGTH" ]; then + old_dirs=$($SSH $USER_HOST ls -t $REMOTE_PARENT_DIR | tail -n +$REMOTE_PARENT_MAX_LENGTH) + for old_dir in $old_dirs; do +echo "Removing directory: $old_dir" +$SSH $USER_HOST rm -r $REMOTE_PARENT_DIR/$old_dir + done +fi + +if [[ "$1" == "package" ]]; then + # Source and binary tarballs + echo "Packaging release tarballs" + cp -r spark spark-$SPARK_VERSION + tar cvzf spark-$SPARK_VERSION.tgz spark-$SPARK_VERSION + echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour --output spark-$SPARK_VERSION.tgz.asc \ +--detach-sig spark-$SPARK_VERSION.tgz + echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md MD5
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/7411#discussion_r36684757 --- Diff: dev/create-release/release-build.sh --- @@ -0,0 +1,320 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +function exit_with_usage { + cat << EOF +usage: release-build.sh +Creates build deliverables from a Spark commit. + +Top level targets are + package: Create binary packages and copy them to people.apache + docs: Build docs and copy them to people.apache + publish-snapshot: Publish snapshot release to Apache snapshots + publish-release: Publish a release to Apache release repo + +All other inputs are environment variables + +GIT_REF - Release tag or commit to build from +SPARK_VERSION - Release identifier used when publishing +SPARK_PACKAGE_VERSION - Release identifier in top level package directory +REMOTE_PARENT_DIR - Parent in which to create doc or release builds. +REMOTE_PARENT_MAX_LENGTH - If set, parent directory will be cleaned to only + have this number of subdirectories (by deleting old ones). WARNING: This deletes data. + +ASF_USERNAME - Username of ASF committer account +ASF_PASSWORD - Password of ASF committer account +ASF_RSA_KEY - RSA private key file for ASF committer account + +GPG_KEY - GPG key used to sign release artifacts +GPG_PASSPHRASE - Passphrase for GPG key +EOF + exit 1 +} + +set -e + +if [ $# -eq 0 ]; then + exit_with_usage +fi + +if [[ $@ == *"help"* ]]; then + exit_with_usage +fi + +for env in ASF_USERNAME ASF_RSA_KEY GPG_PASSPHRASE GPG_KEY; do + if [ -z "${!env}" ]; then +echo "ERROR: $env must be set to run this script" +exit_with_usage + fi +done + +# Commit ref to checkout when building +GIT_REF=${GIT_REF:-master} + +# Destination directory parent on remote server +REMOTE_PARENT_DIR=${REMOTE_PARENT_DIR:-/home/$ASF_USERNAME/public_html} + +SSH="ssh -o StrictHostKeyChecking=no -i $ASF_RSA_KEY" +GPG="gpg --no-tty --batch" +NEXUS_ROOT=https://repository.apache.org/service/local/staging +NEXUS_PROFILE=d63f592e7eac0 # Profile for Spark staging uploads +BASE_DIR=$(pwd) + +PUBLISH_PROFILES="-Pyarn -Phive -Phadoop-2.2" +PUBLISH_PROFILES="$PUBLISH_PROFILES -Pspark-ganglia-lgpl -Pkinesis-asl" + +rm -rf spark +git clone https://git-wip-us.apache.org/repos/asf/spark.git +cd spark +git checkout $GIT_REF +git_hash=`git rev-parse --short HEAD` +echo "Checked out Spark git hash $git_hash" + +if [ -z "$SPARK_VERSION" ]; then + SPARK_VERSION=$(mvn help:evaluate -Dexpression=project.version \ +| grep -v INFO | grep -v WARNING | grep -v Download) +fi + +if [ -z "$SPARK_PACKAGE_VERSION" ]; then + SPARK_PACKAGE_VERSION="${SPARK_VERSION}-$(date +%Y_%m_%d_%H_%M)-${git_hash}" +fi + +DEST_DIR_NAME="spark-$SPARK_PACKAGE_VERSION" +USER_HOST="$asf_usern...@people.apache.org" + +rm .gitignore +rm -rf .git +cd .. + +if [ -n "$REMOTE_PARENT_MAX_LENGTH" ]; then + old_dirs=$($SSH $USER_HOST ls -t $REMOTE_PARENT_DIR | tail -n +$REMOTE_PARENT_MAX_LENGTH) + for old_dir in $old_dirs; do +echo "Removing directory: $old_dir" +$SSH $USER_HOST rm -r $REMOTE_PARENT_DIR/$old_dir + done +fi + +if [[ "$1" == "package" ]]; then + # Source and binary tarballs + echo "Packaging release tarballs" + cp -r spark spark-$SPARK_VERSION + tar cvzf spark-$SPARK_VERSION.tgz spark-$SPARK_VERSION + echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour --output spark-$SPARK_VERSION.tgz.asc \ +--detach-sig spark-$SPARK_VERSION.tgz + echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --print-md MD5
[GitHub] spark pull request: [SPARK-1517] Refactor release scripts to facil...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7411#issuecomment-129605050 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2017] [UI] Stage page hangs with many t...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7296#issuecomment-128600574 This is still good to have - it doesn't hurt to gzip the output --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9284] [tests] Allow all tests to run wi...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7629#issuecomment-128099706 If you break out the individual builds, many of them have been fine until the Hive 1.2.1 patch. The top level dashboard isn't that useful bcause if any one of the ~5 sub builds has a single flay test, it shows red. For instance: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-Master-Maven/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=centos/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org